Javascript Disabled Detected

You currently have javascript disabled. Several functions may not work. Please re-enable javascript to access full functionality.

Please read the Forum Rules before posting.

Talking about scanning

Started by emilzalez, Jul 13 2014 07:39 PM

Reply to this topic

10 replies to this topic

#1 emilzalez

e-Sword Addict

Veterans
64 posts

Offline

Posted 13 July 2014 - 07:39 PM

I've downloaded some material from archive, in Google. I am interested to convert them to modules, but I've read about that is better to scann them before make modules of them. My questions: why is necessary to scan something that is pdf already? and how to do it? Have I to print it and scan it again?

Blessings.

#2 Josh Bond

Administrator

Administrators
2,891 posts

LocationGallatin, TN

Offline

Posted 13 July 2014 - 09:13 PM

When someone scans a book with a scanner, this results in a PDF. Each scanned page of the PDF is a digital picture.

To the computer, each scanned page is a picture, similar to a picture you would take with your digital camera or phone. The computer doesn't "know" that the digital photos in the PDF are really snapshots of text.

The OCR process recognizes the text of each page. This results in digitized text. Digitized text can then be used by e-Sword. Text that is used by e-Sword can be tooltipped, the text can be formatted for the page with different and seamless line widths, the text can be searched, etc.

If you Google OCR, you'll find various OCR software packages, like Abby Finereader. OCRing text can be very time consuming and it is the reason certain modules have never been made...

Josh

#3 emilzalez

e-Sword Addict

Veterans
64 posts

Offline

Posted 13 July 2014 - 10:17 PM

Josh, thank you for help me to see. I have an idea about what happens in the process, but in case I decide to give my time to it, what would be my action with this books in pdf from archive? B'cause I think any scanner has an ocr software, right?

#4 bjohns

e-Sword Supporter

Members (T)
30 posts

Offline

Posted 13 July 2014 - 10:23 PM

I would recommend that you try a Freeware Program called FreeOCR. I have version 5.02 on my system and it works great when using it with your scanner and also, you can load a large PDF File into the program and it will "Scan" via OCR each page in the PDF. Naturally, it isn't perfect but sure is a helpful program. I use it on a regular basis and it might help you do what you want to do...

bjohns

coc.myfreewebhosting.net

#5 Josh Bond

Administrator

Administrators
2,891 posts

LocationGallatin, TN

Offline

Posted 13 July 2014 - 10:23 PM

Yes, most scanners probably have some type of OCR software.

Let me give you 2 pointers:

1. Not all scans are equal. You can scan a book at different resolutions. Higher resolutions = higher quality. Higher quality means the OCR process recognizes the text with fewer errors. Archive.org scans are lower quality....because they are trying to save disk space and bandwidth.

2. Not all OCR software is equal, especially when trying to recognize older books. A couple of titles are great. A few are mediocre. The rest are junk.

Scanned text that you just printed from Microsoft Word? Sure, many titles can handle this. But older text with all of its inconsistencies of the type-set world back then? I'd would only try Abby FineReader or Omnipage Pro if the project were very large at all, especially on Archive.org scans.

#6 bjohns

e-Sword Supporter

Members (T)
30 posts

Offline

Posted 13 July 2014 - 10:33 PM

You may find the program FreeOCR at the following website:

http://www.freeocr.net/

It will load digital (.PDF) documents from Google and scan each page and insert the text into the built-in word processor and I believe it will paste the text directly into Microsoft Word.

bjohns

coc.myfreewebhosting.net

#7 BaptizedBeliever

Christian

Members (T)
924 posts

Offline

Posted 13 July 2014 - 10:46 PM

Having worked on several scan/pdf projects, I can say that if it is possible, it's best to work with your own scan of something. That way, you control the quality of the pdf. For example, Google and Archive have some books that are 600+ pages, and only 4.5 megabytes. That's a really low-quality scan, and when you run an OCR program on it, you will find tons of errors in the conversion. You'll likely find the word "arid" several times when it should be "and." "n" and "u" are often mistaken for each other. And if the book has footnotes, you can just plan ahead to re-type every one of them because the OCR program won't have a clue what those are supposed to be.

But if you can get a high-quality scan (which is easy to do if you're the one scanning), then most of those problems disappear, and you will have much less work to do in the proofreading process (which, honestly, is the part that takes the most time).

Have I used archive and Google books before for OCR projects? Yep. Some books they have are ones that I couldn't get a hold of in print without a substantial financial investment. Others are just plain unavailable anywhere else. But many of them have had to have entire pages re-typed by hand, because the scan was so bad.

I've never tried the OCR program that Bennie Johns mentions above. I'm quite thrilled with Abby FineReader. It's better than any other program I've used. But it isn't cheap.

-Brad

#8 bjohns

e-Sword Supporter

Members (T)
30 posts

Offline

Posted 13 July 2014 - 10:47 PM

About FreeOCR

FreeOCR is a free Optical Character Recognition Software for Windows and supports scanning from most Twain scanners and can also open most scanned PDF's and multi page Tiff images as well as popular image file formats. FreeOCR outputs plain text and can export directly to Microsoft Word format.

Free OCR uses the latest Tesseract (v3.01) OCR engine. It includes a Windows installer and It is very simple to use and supports opening multi-page tiff documents, Adobe PDF and fax documents as well as most image types including compressed Tiff's which the Tesseract engine on its own cannot read .It now can scan using Twain and WIA scanning drivers.

FreeOCR V4 includes Tesseract V3 which increases accuracy and has page layout analysis so more accurate results can be achieved without using the zone selection tool.

Scanning Software

As well as OCR FreeOCR can scan and save images as JPG's and we are currently working on "Scan to PDF" capability with the option to save as searchable PDF

OCR Engine

The included Tesseract OCR PDF engine is an open source product released by Google. It was developed at Hewlett Packard Laboratories between 1985 and 1995. In 1995 it was one of the top 3 performers at the OCR accuracy contest organized by University of Nevada in Las Vegas. The Tesseract engine source code is now maintained by Google and the project can be found here: http://code.google.c.../tesseract-ocr/

License

FreeOCR is a freeware OCR & scanning software and you can do what you like with it including commercial use. The included Tesseract OCR engine is distributed under the Apache V2.0 license.

#9 bjohns

e-Sword Supporter

Members (T)
30 posts

Offline

Posted 13 July 2014 - 10:52 PM

Bradley,

I am not Bennie Johns, but believe it or not, my Dad who passed away In January was related to Bennie. His name was Clennie Johns. Some kind of a play on words. My dad like Bennie was a preacher in the Church of Christ for about 50 years.....

bjohns (Mark Johns)

coc.myfreewebhosting.net

Edited by bjohns, 13 July 2014 - 10:58 PM.

#10 BaptizedBeliever

Christian

Members (T)
924 posts

Offline

Posted 13 July 2014 - 11:46 PM

Sorry, I assumed that the BJohns and the talk about scanners could have only been Bennie. Nice to meet you!

Back to e-Sword Questions & Answers

Reply to this topic

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users

Tweets by @biblesupport

Frequently Asked Questions

Latest Blogs

10 justifications for Bible reading
by SarahSherrill, Jun 27 2022 02:53 AM
According to the experts from Astros DigitalHere are ten reaso...
How can I fix the Cash App if it isn't Working?
by cashcardhelps, Jun 07 2022 01:12 AM
First, you must keep the app up to date with the most recent v...
How can an app benefit your pharmacy in growing business earnings
by Alteza, Jun 06 2022 03:14 AM
Apps comprise a large portion of the tech-driven society that...
What Is a Satire Essay Anyway?
by Caleb9, Jun 04 2022 08:22 AM
What Is a Satire Essay Anyway?A large number of professional w...
25% OFF on Mauli Thread Rakhi Online for Brother
by giftsvalla, Jun 03 2022 02:02 AM
Mauli Rakhi are generally red. The red colour symbolizes stren...

Talking about scanning

#1 emilzalez

#2 Josh Bond

#3 emilzalez

#4 bjohns

#5 Josh Bond

#6 bjohns

#7 BaptizedBeliever

#8 bjohns

#9 bjohns

#10 BaptizedBeliever

Reply to this topic

0 user(s) are reading this topic

Similar Topics

Scanning App

Frequently Asked Questions

Latest Blogs

Talking about scanning

Reply to this topic

0 user(s) are reading this topic

Similar Topics

Frequently Asked Questions

Latest Blogs

Sign In