Ask Slashdot: Open Source For Bill and Document Management?

Ask Slashdot: Open Source For Bill and Document Management? 187

Posted by timothy on Sunday April 07, 2013 @03:40PM from the seasonally-appropriate dept.

Rinisari writes "Since striking out on my own nearly a decade ago, I've been collecting bills and important documents in a briefcase and small filing box. Since buying a house more than a year ago, the amount of paper that I receive and need to keep has increased to deluge amounts and is overflowing what space I want to dedicate. I would like to scan everything, and only retain the papers for things that don't require the original copies. I'd archive the scans in my heavily backed up NAS. What free and/or open source software is out there that can handle this task of document management? Being able to scan to PDF and associate a date and series of labels to a document would be great, as well as some other metadata such as bill amount. My target OS is OS X, but Linux and Windows would be OK."

Ask Slashdot: Open Source For Bill and Document Management?

This discussion has been archived. No new comments can be posted.

Search 187 Comments Log In/Create an Account

Comments Filter:

doxbox.ca formerly known as owl (Score:0, Informative)

by Anonymous Coward writes: on Sunday April 07, 2013 @03:49PM (#43385573)

Subject says it all

OpenKM (Score:3, Informative)

by Anonymous Coward writes: on Sunday April 07, 2013 @03:50PM (#43385585)

OpenKM (http://www.openkm.com/en/) is what I use to manage my documents, its tagging and document preview features are what I appreciate most. It runs as a web-service, FYI.

Alfresco (Score:3, Informative)

by Balr0g ( 960255 ) writes: on Sunday April 07, 2013 @04:39PM (#43385829)

I use the community edition of Alfresco [alfresco.com] for that task. You can tag all documents, add custom fields and have full text search and versioning out of the box. Documents can be accessed via web interface, smb, ftp and even imap.

Re:I was in the same boat (Score:5, Informative)

by tomtomtom ( 580791 ) writes: on Sunday April 07, 2013 @08:11PM (#43387145)

I ended up with gscan2pdf and a rigid directory and filename structure. It works, but yeah, tags would be nice.
gscan2pdf is OK, but if you want to do this seriously then you're probably going to want a reasonably fast sheet-fed scanner (I got a Fujitsu ScanSnap S1500, which is supported by SANE and can scan at 18-20 pages/36-40 sides per minute) with a button so that you can go through a whole stack of paper quickly with minimal keyboard/mouse interaction to slow you down. This led me to setting up scanbuttond (which just gained official support for the ScanSnap but there was a patch floating around somewhere for a while before that) with a custom script.
Make sure you OCR your documents to make them searchable then run an indexer (I like recoll [recoll.org] but KDE and GNOME both have their own desktop search solutions as well). I've found the best OCR engine on Linux seems to be tesseract [google.com], but there are a couple of others you can try. The process took me a while to get right and is a bit painful - the script which scanbuttond runs runs scanadf to scan to a string of image files per side and puts them in a processing directory. I then have another batch-processing script I run once I'm done with a pile of papers while I go and get a cup of tea which runs unpaper then tesseract on them, then hocr2pdf to convert each page individually into a searchable PDF file then finally pdftk to concatenate all the pages together into a scanned document. I split the two parts of the process out because the OCR bit can take some time and this way I can get maximum throughput on the scanner itself without needing to wait for the rest to catch up. If I could be bothered then I could make the scanning script run my de-batching script once only and have it pick up new files as they are dropped in the directory but it's not that much of an effort really.
I then sort my PDFs into a hierarchical directory structure once they've been OCRd (and at this point they get indexed as well for searching).
If you're on Windows/Mac then the software that comes with the ScanSnap will pretty much do all this for you; although it's better to scan with OCR disabled then use Acrobat to batch-OCR the PDFs later for the same reason. Add a decent desktop search solution like an old version of Copernic (or possible Windows Search) and all is good.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Ask Slashdot: Open Source For Bill and Document Management? 187

Ask Slashdot: Open Source For Bill and Document Management? More Login

Ask Slashdot: Open Source For Bill and Document Management?

doxbox.ca formerly known as owl (Score:0, Informative)

OpenKM (Score:3, Informative)

Alfresco (Score:3, Informative)

Re:I was in the same boat (Score:5, Informative)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot