Jump to content

Please read the Forum Rules before posting.

Photo

Pdf to doc. Losing Greek and Hebrew


  • Please log in to reply
6 replies to this topic

#1 gad4

gad4

    e-Sword Fanatic

  • Veterans
  • PipPipPipPipPip
  • 164 posts
Online

Posted 24 October 2019 - 08:56 AM

I tried ms word pdf to doc, Adobe export to doc,adobe output to jpg and then ocr to doc, and omnipAge OCR jpg to doc. Is there something I need to add or do different?

Thanks

Edited by gad4, 24 October 2019 - 08:58 AM.


#2 djmarko53

djmarko53

    e-Sword Fanatic

  • Members (T)
  • PipPipPipPipPip
  • 235 posts
  • Location64093
Offline

Posted 24 October 2019 - 09:03 AM

I bought the PDF Conversion Utility from Microsoft.

I is available as an App from their Microsoft Store online.

I works good on good legible PDF files, where the pages

do not come out as "IMAGES" in the resulting ".DOC" or ."DOCX"

file.

 

I have come across a good online Converter that will take

the "IMAGES" in a Word Document and convert to text, oftentimes

preserving the Greek Text....

 

djmarko53



#3 Katoog

Katoog

    e-Sword Fanatic

  • Members (T)
  • PipPipPipPipPip
  • 1,285 posts
Offline

Posted 25 October 2019 - 12:59 AM

It is better to convert first from PDF to HTM and save it in the UTF-8 encoding (That is the Unicode that you need for the Hebrew or Greek scripts)

Open the HTM with notepad and type

<!DOCTYPE html>
<html class="client-nojs">
<head>
<meta charset="UTF-8"/>
</head>
<body>

Enter here the text under the body and "save as" in UTF-8 encoding.

βιβλος is Book in Greek.
</body>
</html>


Edited by Katoog, 25 October 2019 - 01:18 AM.

Restored Holy Bible 17 and the Restored Textus Receptus

https://rhb.altervis...rg/homepage.htm


#4 gad4

gad4

    e-Sword Fanatic

  • Veterans
  • PipPipPipPipPip
  • 164 posts
Online

Posted 25 October 2019 - 12:11 PM

Thanks for the ideas.

Marco. I have tried different software and they didn't work. Before purchasing the software, did you notice the problem I experienced and this solved it?

Katoog. I tried using adobe acrobat to output to html setting, i was  not seeing an option for utf8. i was not seeing an html output for omnipage.  Im curious how the notepad data should be included in the html output.Can i ask a  more step by step procedure and which software  you are using?


Edited by gad4, 25 October 2019 - 05:47 PM.


#5 Katoog

Katoog

    e-Sword Fanatic

  • Members (T)
  • PipPipPipPipPip
  • 1,285 posts
Offline

Posted 26 October 2019 - 02:46 AM

Normally if you convert adobe acrobat with Hebrew and Greek script html use it by default UF8 (If you can see the Hebrew and Greek script correctly with a browser)

 

Open a random htm file and change the .htm in the file name into  .txt

Or open or create  a txt file with notepad.

click on the file menu and "save as" OR Crtr+Shift+S

Left from the save button is there an "Encoding" menu

If the menu shows ANSI or another UTF format then click and change it to UTF-8

then click on the save button.

Now is you document formatted in  UTF-8

delete everything OR Crtr+a and the delete key on your keyboard.

Then copy this

 

<!DOCTYPE html>
<html class="client-nojs">
<head>
<meta charset="UTF-8"/>
</head>
<body>

Enter here the text under the body and "save as" in UTF-8 encoding.

βιβλος is Book in Greek.
</body>
</html>

 

save it again and change the .txt extension into .htm or .html


Edited by Katoog, 26 October 2019 - 02:47 AM.

Restored Holy Bible 17 and the Restored Textus Receptus

https://rhb.altervis...rg/homepage.htm


#6 gad4

gad4

    e-Sword Fanatic

  • Veterans
  • PipPipPipPipPip
  • 164 posts
Online

Posted 26 October 2019 - 11:30 AM

Im sorry but I'm still not understanding

#7 DSaw

DSaw

    Resource Builder

  • Members (T)
  • PipPipPipPipPip
  • 1,159 posts
  • LocationEast Coast
Offline

Posted 28 October 2019 - 09:26 PM

I tried ms word pdf to doc, Adobe export to doc,adobe output to jpg and then ocr to doc, and omnipAge OCR jpg to doc. Is there something I need to add or do different?

Thanks

is the PDF you are starting with image or text if you are using omnipage pro I assume its image OCR is not even close to perfect and less so with Greek. The Hebrew never did work right so i replaced with strongs number what I had to do is proof the resulting text with the un-OCR'ed PDF manually fix the English and then manually enter the Greek 

 

DSaw


May God change our hearts to what the truth is

2Ti_2:15 Study to shew thyself approved unto God, a workman that needeth not to be ashamed, rightly dividing the word of truth.

Rom_9:16 So then it is not of him that willeth, nor of him that runneth, but of God that sheweth mercy.

2Ti 2:24-25  And the servant of the Lord must not strive; but be gentle unto all men, apt to teach, patient, In meekness instructing those that oppose themselves; if God peradventure will give them repentance to the acknowledging of the truth; 
 

 

 





0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users




Similar Topics



Latest Blogs