Simple Document Imaging for Unix? 47
andylievertz asks: "I have developed a logical system of directories for storing my digital documents (i.e. *.doc, *.mp3, *.gif, etc.), and can usually find any obscure document with relative speed. These 'must-keep' hardcopies include everything from bills and shipping invoices to brochures and chinese-food menus. I've tried applying my electronic filing techniques to an actual, real-world filing cabinet, complete with folders and labels, but such a system: requires a great deal of effort to maintain relative to the electronic system, especially considering the frequent influx of new hardcopy material; and doesn't address the greater issue of reducing the sheer paper bulk, organized or not. What solutions have you, the Slashdot Reader, employed to solve this situation for yourself? Are there viable Unix-based Document Imaging packages, similar in function to the Microsoft Document Imaging utility packaged with Office? Do you use a Unix-based Document Imaging solution personally or professionally? If so, what package, and why does it work for you?"
"So, step one is to find ways to reduce the influx of hardcopy (i.e. electronic billing, etc.), but for me, the second step is to find and utilize a [Unix-based!] system that will allow me to scan and file hardcopies electronically so they may be indexed, searched, re-organized, shared, and retrieved as easily as their electronic counterparts. Naturally, any such system would need tolerances for multi-paged documents, and would need to store its output in a non-proprietary file format."
Why bother? (Score:2, Informative)
I don't really need a "system" for that... just make your "root" folders explicit enough, then file everything where it should go.
I even have a "temp" dir for every category.
I don't really see the need for such a tool, IF you can spare a few seconds to browse&dump...
I use... (Score:5, Informative)
Its insanely good. I use it to scan in all my important documents. It useful multipage modes for... well, multipage documents.
Try it. It's actually been considerably revamped since I installed it, I will have to try a more recent version,
Oh, it comes in a nice debian package via apt-get.
HP Digital Sender and htDig (Score:5, Informative)
and htDIG to solve all my document storage problems.
The Digital Sender is a wonderful toy. Stick a stack of paper in the bin. Enter an email address. Press the big-green button. And a PDF shows up in my mailbox in a few minutes. Even does double sided. Very simple device and it does most of what I need.
It doesn't do OCR. The Digital Sender outputs a bit-mapped PDF that looks very good. I usually use the full version of Adobe Acrobat to do optical character recognition and store the results in the background. That way I still see the good scan on the screen and when I print. But I can copy and search the text as I would normally.
I use htDig (http://www.htdig.org/) to index my archive. I store content in file folders that make sense (2002 taxes, pitch perception papers, etc). But I still find htdig useful. It indexes both HTML (my lab notebook) and PDF files. All is good.
PDF is a well-documented file format. I wish there was a good free-OCR package, but sometimes you have to pay for good performance. htDig and PDF work great on Windows and Linux.
In three years I have accumulated just over 1Gbyte of content. That represents all my lab notes (in HTML format) and all the papers I've read (in PDF). It's wonderful having my entire paper life with me on my laptop. (I also back it up to three different machines.)