Having worked on several scan/pdf projects, I can say that if it is possible, it's best to work with your own scan of something. That way, you control the quality of the pdf. For example, Google and Archive have some books that are 600+ pages, and only 4.5 megabytes. That's a really low-quality scan, and when you run an OCR program on it, you will find tons of errors in the conversion. You'll likely find the word "arid" several times when it should be "and." "n" and "u" are often mistaken for each other. And if the book has footnotes, you can just plan ahead to re-type every one of them because the OCR program won't have a clue what those are supposed to be.
But if you can get a high-quality scan (which is easy to do if you're the one scanning), then most of those problems disappear, and you will have much less work to do in the proofreading process (which, honestly, is the part that takes the most time).
Have I used archive and Google books before for OCR projects? Yep. Some books they have are ones that I couldn't get a hold of in print without a substantial financial investment. Others are just plain unavailable anywhere else. But many of them have had to have entire pages re-typed by hand, because the scan was so bad.
I've never tried the OCR program that Bennie Johns mentions above. I'm quite thrilled with Abby FineReader. It's better than any other program I've used. But it isn't cheap.
-Brad