The most time consuming part of module making is cleaning the text so that the format is consistent, professional and as typo-free as possible.
People always seem to be surprised when I say that it takes at least 40 - 60 hours to create a Bible resource (Protestant Canon) properly, and 60 - 80 hours for a Bible resource that includes the Anagignoskomena.
My rule of thumb is at least thirty minutes per page, to clean up the text. This includes, but is not limited to proof reading, presentation markup, semantic markup. In some instances, it can take as much as six hours per page.
This rule of thumb applies to everything from Project Gutenburg to Google Books PDFs, to Archive.org ePub to the photographs of the manuscripts of the Bible.
we have a guy working hard now to clean up "pump and dump" modules of the past.
My big objection to releasing "pump and dump" resources, is that they are not marked as such, and except on very rare occasions, nobody cleans up the resource.
I can see releasing a quick "pump and dump" for the early birds that want the content, and then releasing a cleaned up version. However, that requires the resource creator to be committed to the project, and willing to have "bad content" with their name on it being distributed after they release the cleaned up version.
I'd hate to make more messes for our guys to clean up.
I've talked to several people whose profession is to scan, OCR, and create clean texts. Almost all of them have said that it is faster, easier, and for the client cheaper, to throw away the new digital content, and restart from the original edata, regardless of how bad that edata is. One of the consultants went as far as saying that for some types of data, retyping everything using a modern word processor that outputs to an ISO standard file format is faster, simpler, and produces more consistent results than OCRing the text, and fixing it "later". (Key phrase: ISO standard file format.)