Converting large collection (newbie)
#1
Posted 26 August 2012 - 04:42 AM
I'm a newbie to bible software, but I have 20+ years experience writing software. This is a bit long, sorry, but I do have a question by the time you get to the end!
What I want to achieve:
Firstly, convert the books into a more suitable format. One which enables the essential non-linearity of much the information to be expressed naturally (for example, a side-bar on "judgement" might be useful to display in a number of books, or even a few times in a single book, but the side-bar information itself should only need to be written once: Any updates to that information should edit the canonical copy of that information, and everywhere that sidebar appears can simply 'point' to the one source.) This sort of non-linearity is probably best represented with a website, as the www is inherently non-linear.
Once the books have been dissected into their component parts, a "book" then can be specified by a sequential listing of the individual components of which that book comprises.
A tool can then be written which takes a "book" specification, and generates the output format that is to be desired, whether that is pdf, MS-word, e-sword module, or sword-project module, or something else. I'm yet to find really good low-level information on how these modules are formatted, which is essential if I'm to write such a tool.
The advantage of this system is that any change to the source material can simply be propagated to all the different works that include that material by a simply pushing a virtual button.
Question:
1. Is there a set of tools for doing this? (I'm guessing "no" here. At least, I haven't found anything obvious)
2. If I'm going to do this, is there any tools/advice/information that can help me? And what pitfalls are out there?
Thanks,
Carl.
#2
Posted 26 August 2012 - 05:45 AM
Many of the members are there to help you. You can do the search on thsi site in eSword downloads to find out what is already available here. if you can let us know what is not available here, many of us will pitch in and help you to convert the modules. thanks for posting the information.
david
The first goal in life is to make ourselves acceptable to the LORD
#3
Posted 26 August 2012 - 06:44 PM
Many of the members are there to help you. You can do the search on thsi site in eSword downloads to find out what is already available here. if you can let us know what is not available here, many of us will pitch in and help you to convert the modules. thanks for posting the information.
Actually, this is the sort of thing I'm trying to avoid. I don't want to take a static book and then have someone make a module from it. I want to automate the creation of modules for e-sword and other bible software. That way, when there are any changes in the source material, the modules can be automatically recreated. I don't see any reason why, given suitably organised source material, an automated conversion would not work. I guess what I want to know is, is there a suitable super-format for theological literature from which automatic module creation can be performed for whatever bible software is required.
To me, this is the "right way" to do things for material which is still in a state of flux (rather than, say, a commentary from the 19th Century, which is, for all intents and purposes, a static document). If it's not the best way to do things, please, by all means, stop me from doing something that could be a large waste of time....
Cheers,
Carl.
#4
Posted 26 August 2012 - 06:55 PM
Actually, this is the sort of thing I'm trying to avoid. I don't want to take a static book and then have someone make a module from it. I want to automate the creation of modules for e-sword and other bible software. That way, when there are any changes in the source material, the modules can be automatically recreated. I don't see any reason why, given suitably organised source material, an automated conversion would not work. I guess what I want to know is, is there a suitable super-format for theological literature from which automatic module creation can be performed for whatever bible software is required.
To me, this is the "right way" to do things for material which is still in a state of flux (rather than, say, a commentary from the 19th Century, which is, for all intents and purposes, a static document). If it's not the best way to do things, please, by all means, stop me from doing something that could be a large waste of time....
Cheers,
Carl.
There's no way to dynamically update content. I could come close with a set of regular expressions to transform documents into a format readable by ToolTip NT, the software used to create e-Sword modules. But it would not be perfect and would require manual intervention on occasion.
#5
Posted 26 August 2012 - 07:33 PM
There's no way to dynamically update content. I could come close with a set of regular expressions to transform documents into a format readable by ToolTip NT, the software used to create e-Sword modules. But it would not be perfect and would require manual intervention on occasion.
I was thinking of directly generating the appropriate sqlite DB from the source format without going through another program. I haven't had a look yet at the DB files, so not sure how feasible this is.
Don't want to reinvent the wheel though.
Cheers,
Carl.
Edited by Carl Cerecke, 26 August 2012 - 08:01 PM.
#6
Posted 27 August 2012 - 06:28 AM
I don't see any reason why, given suitably organised source material, an automated conversion would not work.
So this means the source material is to be in some particular format to get this automation work?. i doubt about this. we are not in control of source materials. these materials are gathered mostly from OCR text written long back. some sources were in PDF, text format. as they come from various authors and sources we can not control their format.
to know about eSword databases have a look at these 2 posts
http://www.biblesupport.com/topic/2240-computer-science-e-sword-and-databases-part-1/
http://www.biblesupport.com/topic/2270-computer-science-e-sword-and-databases-part-2/
i think your idea is still at 10,000 feet view. can you give low level detail
- what is your technical plan?. what could be the particular format of the source?. XML/text/RTF/......
- have you already worked in this direction. i mean similar projects. any POC was done?
- what technologies do you like to use?. based on this we can see if any of our members knows them already and pitchin for help.
- are you willing to give the tool freely to use after development?.
The first goal in life is to make ourselves acceptable to the LORD
#7
Posted 27 August 2012 - 09:53 AM
Firstly, convert the books into a more suitable format.
Do you want to preserve presentation markup, or semantic markup?
Maybe the first question should be "Is there any semantic markup that needs to be preserved?", followed by "Is there any presentation markup that needs to be preserved?"
You don't say what file format the data currently is in. That makes a major difference in how easy it will be to convert the content to USFM, OSIS, Z-XML, ThML, or other markup language.
One which enables the essential non-linearity of much the information to be expressed naturally
TeX.
A tool can then be written which takes a "book" specification, and generates the output format that is to be desired, whether that is pdf, MS-word, e-sword module, or sword-project module, or something else.
Keep things simple.
Write one tool for each target file format.
If semantic content is irrelevant, then HTML 5.0 with CSS 3.0 is the simplest file format to preserve content, taht also enables easy transformations to document file formats.
I'm yet to find really good low-level information on how these modules are formatted, which is essential if I'm to write such a tool.
For Biblical software file formats, the only reliable way of finding out how the modules are formatted, is to analyze half a dozen or more resources of each module type.
Formal specifications for ISO/IEC 29500:2008 can be obtained from http://www.iso.org/i...csnumber=51463.
Formal specifications for ISO 32000-1:2008 can be obtained from
http://www.iso.org/i...?csnumber=51502
Formal specifications for ISO 26300 can be obtained from
https://lists.oasis-...1/msg00001.html
1. Is there a set of tools for doing this?
Only if one wants to describe PERL, or Python, as your pre-existing set of tools.
2. If I'm going to do this, is there any tools/advice/information that can help me? And what pitfalls are out there?
Represenatives from Olive Tree, Libronix, OakTree Software, and Laridian have told me on several different occasions, that their tool chain to create resources has to be fine tuned for each specific resource. Sometimes the changes are minor. Sometimes the changes are major. Either way, automatic conversion results in errors in the target resource.
jonathon
#8
Posted 28 August 2012 - 01:13 AM
So this means the source material is to be in some particular format to get this automation work?. i doubt about this. we are not in control of source materials. these materials are gathered mostly from OCR text written long back. some sources were in PDF, text format. as they come from various authors and sources we can not control their format.
I know. But I have access to a collection of books, and they are all in MS Word (That's what the authors knew, so that's what they used.)
i think your idea is still at 10,000 feet view. can you give low level detail
- what is your technical plan?. what could be the particular format of the source?. XML/text/RTF/......
- have you already worked in this direction. i mean similar projects. any POC was done?
- what technologies do you like to use?. based on this we can see if any of our members knows them already and pitchin for help.
- are you willing to give the tool freely to use after development?.
Yes, it is a high-level view; I have to start somewhere. But I am not ignorant of the low-level details.
- Technical plan? Convert the collection to something. I haven't yet decided. It needs to be easily editable online by the authors - some sort of wiki-like thing. And easily processed by computer program. Bit vague about this yet. The current documents have some repetition which I would also like to eliminate. Follow the DRY principle (Don't Repeat Yourself): "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system."
- Have I worked in this direction already? Not theological markup specifically, but information manipulation in other contexts. My PhD in Computer Science was in the area of parsing computer languages - markup languages are pretty easy in comparison.
- Technologies? Whatever is handy for the job. Probably python to stick it all together.
- Make the tool freely available after development? I'll go one step further - I'll make it freely available *before* development is finished. It would be free software (The FSF definition; beer and speech).
Carl.
#9
Posted 28 August 2012 - 01:38 AM
Do you want to preserve presentation markup, or semantic markup?
Semantic. Presentation without semantics would be a waste of time for my project. See previous comment in thread for some ideas.
You don't say what file format the data currently is in. That makes a major difference in how easy it will be to convert the content to USFM, OSIS, Z-XML, ThML, or other markup language.
MS Word files.
LaTeX could be an output type, on the way to generating nice pdf.
Keep things simple.
Write one tool for each target file format.
Yes, I would.
The critical hinge is to get the source information in a form that is both easily editable by the non-techy authors (some sort of wiki maybe), yet rich enough in semantic information that it is possible to write tools for automatically generating modules in different formats.
Thanks for the links.For Biblical software file formats, the only reliable way of finding out how the modules are formatted, is to analyze half a dozen or more resources of each module type.
Formal specifications for ISO/IEC 29500:2008 can be obtained from http://www.iso.org/i...csnumber=51463.
Formal specifications for ISO 32000-1:2008 can be obtained from
http://www.iso.org/i...?csnumber=51502
Formal specifications for ISO 26300 can be obtained from
https://lists.oasis-...1/msg00001.html
I have over 10 years of python experience.1. Is there a set of tools for doing this? (I'm guessing "no" here. At least, I haven't found anything obvious)
Only if one wants to describe PERL, or Python, as your pre-existing set of tools.
Represenatives from Olive Tree, Libronix, OakTree Software, and Laridian have told me on several different occasions, that their tool chain to create resources has to be fine tuned for each specific resource. Sometimes the changes are minor. Sometimes the changes are major. Either way, automatic conversion results in errors in the target resource.
Thanks. I'm hoping that by putting the effort into the first step - converting the MS Word files to a semantically marked up format - that my 'tool chain' will only have to deal with the one input resource type. I would only target non-proprietary output formats.
Thanks for your comments Jonathon.
Cheers,
Carl.
#10
Posted 28 August 2012 - 07:58 AM
Follow the DRY principle (Don't Repeat Yourself): "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
OSIS comes close, but I shrudder at using it for content other than Bibles and commentaries
Technologies? Whatever is handy for the job. Probably python to stick it all together
jonathon
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users
Similar Topics
Help Converting Old .top File to .topxStarted by Guest_DoctorDaveT_* , 09 Oct 2023 |
Converting tagged verses from E-Sword LT to Verse lists for PC?Started by Guest_mattymatt82_* , 11 Sep 2023 |
Converting Modules for use in Olive Tree Bible AppStarted by Guest_Roz Ki Roti_* , 04 Aug 2023 |
Planning On Converting PDF Bible into e-Sword using Tooltip NTStarted by Guest_RTB_* , 24 May 2023 |
Hi all (newbie) introductionStarted by Guest_Brits_* , 05 Dec 2011 |
Frequently Asked Questions
Latest Blogs
- 10 justifications for Bible reading
by SarahSherrill, Jun 27 2022 02:53 AM
According to the experts from Astros DigitalHere are ten reaso... - How can I fix the Cash App if it isn't Working?
by cashcardhelps, Jun 07 2022 01:12 AM
First, you must keep the app up to date with the most recent v... - How can an app benefit your pharmacy in growing business earnings
by Alteza, Jun 06 2022 03:14 AM
Apps comprise a large portion of the tech-driven society that... - What Is a Satire Essay Anyway?
by Caleb9, Jun 04 2022 08:22 AM
What Is a Satire Essay Anyway?A large number of professional w... - 25% OFF on Mauli Thread Rakhi Online for Brother
by giftsvalla, Jun 03 2022 02:02 AM
Mauli Rakhi are generally red. The red colour symbolizes stren...