Best way to ocr pdf
#1
Posted 18 November 2021 - 10:23 AM
Thanks
#2
Posted 18 November 2021 - 03:39 PM
Some documents have Greek or Hebrew characters so I'm unsure what is best method when either is included.
Thanks
So am I.
Guess that after scanning, edit out what you don't want and keep the rest. Works fine that way with me.
Blessings,
#3
Posted 19 November 2021 - 03:59 PM
Thanks
#4
Posted 19 November 2021 - 05:19 PM
I appreciate the response. I was curious if there is a known ocr system that keeps the Greek and or Hebrew rather then editing it out.
Thanks
There are a number of ocr software packages available such as PDF Wondershare that should do what you want
#5
Posted 20 November 2021 - 12:53 AM
Do you have a document you can share for testing.
I have ABBYY FineReader 15.
Plain Greek or Polytonic Greek can be trained to work reasonable well.
Hebrew is more challenging if it has vowels. I have not got that working so good. Plain Hebrew works reasonably well with a good clear scan and some training and fixing.
#6
Posted 02 October 2023 - 05:40 PM
on linux we use a cli tool called tesseract-ocr, that support several languages, for example you will need the packages (on a debian system):
for old hebrew
and
for ancient greek
usually applied on a img file , but you can do it directly on a pdf file using
ocrmypdf
tool that uses Tesseract engine and support its features.
0 user(s) are reading this topic
0 members, 0 guests, 0 anonymous users