Best OCR for Technical Texts? 28
An anonymous reader asks: "I'm scanning in user manuals for older lab equipment. I've never used OCR before today, so I installed the Caere Omnipage 9.0 that came with the scanner. I was pretty happy except for a few things. It doesn't seem to want to recognize engineering symbols like the one char +/-,square root, omega, simple equations, it has trouble with super- and subscripts, and it outputs funky Word files. For example, from an 8.5 x 11 original page scanned in at 1 bit at 300 dpi, the output Word file was 10 inches wide, used tons of Omnipage text styles and didn't match the original text's flow. It did do a good job of italicizing headers and recognizing the various sections in a two column page. Googling the news and net just backs up my claims but provides no real solution. A Google search that provides nothing useful looking for best OCR for engineering."
Comment removed (Score:5, Informative)
Re:Clara OCR (Score:3, Informative)
Good Luck! (Score:4, Interesting)
Good luck!
I've used a few different version of Omnipage PRO, and it works OK if the layout is not complicated, it uses standard fonts, the text is clean and clear and it doesn't have too many weird logos or symbols. You still have to proofread everything and correct it by hand, though, so I'm not convinced it's a time saver as much as it is a typing saver.
OmniPage Pro does do a MUCH better job of identifying words that the free version they throw in with scanners because it uses spelling and grammar checkers to help ID words from context. The free version is as close to useless as you can get in the software world - it's really just an ad for Pro.
Engineering and math symbols are right out.
Try Different (tm) (Score:2, Informative)
Finereader (Score:3, Informative)
Use Greyscale (Score:5, Informative)
Since Omnipage is up to version 12, perhaps there's been an improvement since your version.
Your google skills are sorely lacking, the "Hacking Google" book would be a good investment for you. Eliminating the quotes and word "best" in your search string would help.
2 different free web based ocr, just upload a 300 dpi b/w (8bit greyscale) file
http://www.expervision.com/webtr6.htm
http
here are some OCR programs
http://www.scansoft.com/omnipage/
http://www.abbyy.com/
http://www.newsoftinc.com/redir/digitaloffice_a
more ocr links than you really want
http://web3.humboldt1.com/~jiva/ocr/_ocr_re
Re:Use Greyscale: With links (Score:2, Informative)
www.expervision.com/webtr6.htm [expervision.com]
http://docmorph.nlm.nih.gov/docmorph/ [nih.gov] here are some OCR programs
http://www.scansoft.com/omnipage/ [scansoft.com]
http://www.abbyy.com/ [abbyy.com]
more ocr links than you really want http://web3.humboldt1.com/~jiva/ocr/_ocr_resource. htm [humboldt1.com]
Abby (Score:1)
Hacking Google on the Cheap (Score:2, Offtopic)
I don't think you need to read a book to understand that too many keywords eliminate all useful results. Also, the Yahoo engine is not quite the same as the Google engine, even though it's licensed from Google. Which is why it didn't catch the fact that "superscipts" is not the correct spelling!
I got a lot of interesting results Goog
Re:Use Greyscale (Score:3, Insightful)
No joke! The link in the post doesn't even connect to Google - it's a Yahoo link.
ICR, Google, etc (Score:3, Informative)
Better [google.com] Google [google.com] searching [google.com] makes [google.com] the difference [neurascript.com].
Re:ICR, Google, etc (Score:2)
Good all'round scanner? (Score:1)
Can it also take a stack of 4x5 photos?
The Best! (Score:4, Funny)
better OCR... (Score:1)
Re:better OCR... (Score:1)
I HIGHLY recomment TypeReader!
Paul
OCR Software (Score:1)
Keep as image (Score:1)
Easy to access and read. The only loss is you can't do cut and paste or text searching.
Re:Keep as image (Score:2)
do not store in OCR'ed format (Score:2)
Then, use something like Adobe Acrobat to put them on-line: Acrobat uses OCR internally to make the text searchable, but it still displays the original page image. That means that formulas and appearance will be preserved even if the OCR screws up.
Primal Instincts (Score:1)
http://www.totalsol.com/products/doc_process/prime recognition/product_prime.html
Re:Primal Instincts (Score:1)
Gamera? (Score:1)