Jump to content

Please read the Forum Rules before posting.

Photo
- - - - -

Help needed with making modules of rare books


  • Please log in to reply
11 replies to this topic

#1 gad4

gad4

    e-Sword Supporter

  • Veterans
  • PipPipPip
  • 34 posts
Offline

Posted 17 September 2019 - 01:23 PM

I have several digital books that would be very helpful as software modules. Bibles,dictionaries and books that are not here.

Some may be appropriate for site. Others would be for our private use.

If anyone can help make modules please let me know.

#2 billhuff2002

billhuff2002

    Module Moderator

  • Moderators
  • 1,574 posts
  • LocationFrederick, MD, USA
Offline

Posted 17 September 2019 - 02:20 PM

Depending on the quality of you digital books, there are many of here that can convert them to e-Sword.  If you would like me to take a look at what you have e-mail me a couple of samples to billhuff2002@yahoo.com.  I will let you know if the quality is easy to convert.



#3 djmarko53

djmarko53

    e-Sword Fanatic

  • Members (T)
  • PipPipPipPipPip
  • 193 posts
  • Location64093
Online

Posted 17 September 2019 - 03:27 PM

For certain pdf files, when you convert them to Word format, often each page will be an image and not actual text. I have had a lot of success converting these images to actual text. Only problem is that the formatting for the actual text is lost. What I end up having to do is reformat the text. Often a lot of rare books I have worked on are this way and takes a long time to convert each page's image to a text file. Sometimes this is the only way to convert rare books thay are in pdf format.

#4 gad4

gad4

    e-Sword Supporter

  • Veterans
  • PipPipPip
  • 34 posts
Offline

Posted 17 September 2019 - 03:48 PM

The thoughts shared are correct and appreciated.

I have been able to convert some things to rtf easily. Other things have had challenges.

Books could be the easiest to convert to modules if the pdf becomes text easily. Otherwise it may require other options like ocr.

Bibles dictionaries and commentaries are likely more time consuming. Because of the linking.

In the past I have been able to automate putting some formatting in with a perl script. However there needs to a pattern for that to work and one might lose other formatting in text process.
Im not sure how most do it or who has interest in that.

Someone who I shared some works with told me that they were able to ocr some things but I have not seen the results. If those results would be helpful and we're able to accomplish something then I'll try giving them a loving reminder.

Edited by gad4, 17 September 2019 - 04:07 PM.


#5 djmarko53

djmarko53

    e-Sword Fanatic

  • Members (T)
  • PipPipPipPipPip
  • 193 posts
  • Location64093
Online

Posted 18 September 2019 - 08:33 AM

There are some software programs that will perform "OCR" on .PDF files/books. Would be nice if the Tooltip Editor for e-sword had this feature

#6 Tj Higgins

Tj Higgins

    e-Sword Fanatic

  • Members
  • PipPipPipPipPip
  • 628 posts
Offline

Posted 18 September 2019 - 09:45 AM

The thoughts shared are correct and appreciated.

I have been able to convert some things to rtf easily. Other things have had challenges.

Books could be the easiest to convert to modules if the pdf becomes text easily. Otherwise it may require other options like ocr.

Bibles dictionaries and commentaries are likely more time consuming. Because of the linking.

In the past I have been able to automate putting some formatting in with a perl script. However there needs to a pattern for that to work and one might lose other formatting in text process.
Im not sure how most do it or who has interest in that.

Someone who I shared some works with told me that they were able to ocr some things but I have not seen the results. If those results would be helpful and we're able to accomplish something then I'll try giving them a loving reminder.

With the PDF files you can try and PDF to MS-Word format conversion, if the conversion is successful the MS-Word format file can then be saved in rtf format 



#7 djmarko53

djmarko53

    e-Sword Fanatic

  • Members (T)
  • PipPipPipPipPip
  • 193 posts
  • Location64093
Online

Posted 18 September 2019 - 11:59 AM

That wont work if when the PDF file is converted to a Word file and each resulting page is an omage and not text.

#8 djmarko53

djmarko53

    e-Sword Fanatic

  • Members (T)
  • PipPipPipPipPip
  • 193 posts
  • Location64093
Online

Posted 18 September 2019 - 12:05 PM

You would then have to convert each page individually into a separate text file. Unfortunately, when you do this, yiu will probably lose all the fortting such as bold and/or italics that is your original PDF file...

#9 gad4

gad4

    e-Sword Supporter

  • Veterans
  • PipPipPip
  • 34 posts
Offline

Posted 19 September 2019 - 08:25 AM

If the pdf has highlight able text and the rtf has only images then maybe a different conversation could keep the text.

If it is a pdf of images than that alwayss requires ocr.

One of my documents needing help is an image pdf which would require ocr. Another is text based and the English translates but the Hebrew doesn't.

#10 djmarko53

djmarko53

    e-Sword Fanatic

  • Members (T)
  • PipPipPipPipPip
  • 193 posts
  • Location64093
Online

Posted 19 September 2019 - 01:54 PM

Doesn't always require ocr.  With the right conversion utility/program, one

can convert the image to text, just need a converter that will convert to

.docx format and preserve the text formatting at the same time.  I have had

some luck with that, but takes a lot longer.  Each image/page in Word has

to be save usually as a .jpg file then run through a utility that will convert it

to text.  There are some free online apps that will do that, but still, if it's a

500 page book, to convert 500 image files individually to text files will take

a while....  It can be done..   I've done it, but don't have the patience to do

it for a very large book.






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users




Similar Topics



Latest Blogs