Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Data Storage

Ask Slashdot: Open Source For Bill and Document Management? 187

Rinisari writes "Since striking out on my own nearly a decade ago, I've been collecting bills and important documents in a briefcase and small filing box. Since buying a house more than a year ago, the amount of paper that I receive and need to keep has increased to deluge amounts and is overflowing what space I want to dedicate. I would like to scan everything, and only retain the papers for things that don't require the original copies. I'd archive the scans in my heavily backed up NAS. What free and/or open source software is out there that can handle this task of document management? Being able to scan to PDF and associate a date and series of labels to a document would be great, as well as some other metadata such as bill amount. My target OS is OS X, but Linux and Windows would be OK."
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Open Source For Bill and Document Management?

Comments Filter:
  • by Anonymous Coward on Sunday April 07, 2013 @03:49PM (#43385573)

    Subject says it all

  • OpenKM (Score:3, Informative)

    by Anonymous Coward on Sunday April 07, 2013 @03:50PM (#43385585)

    OpenKM (http://www.openkm.com/en/) is what I use to manage my documents, its tagging and document preview features are what I appreciate most. It runs as a web-service, FYI.

  • Alfresco (Score:3, Informative)

    by Balr0g ( 960255 ) on Sunday April 07, 2013 @04:39PM (#43385829)
    I use the community edition of Alfresco [alfresco.com] for that task. You can tag all documents, add custom fields and have full text search and versioning out of the box. Documents can be accessed via web interface, smb, ftp and even imap.
  • by tomtomtom ( 580791 ) on Sunday April 07, 2013 @08:11PM (#43387145)

    I ended up with gscan2pdf and a rigid directory and filename structure. It works, but yeah, tags would be nice.

    gscan2pdf is OK, but if you want to do this seriously then you're probably going to want a reasonably fast sheet-fed scanner (I got a Fujitsu ScanSnap S1500, which is supported by SANE and can scan at 18-20 pages/36-40 sides per minute) with a button so that you can go through a whole stack of paper quickly with minimal keyboard/mouse interaction to slow you down. This led me to setting up scanbuttond (which just gained official support for the ScanSnap but there was a patch floating around somewhere for a while before that) with a custom script.

    Make sure you OCR your documents to make them searchable then run an indexer (I like recoll [recoll.org] but KDE and GNOME both have their own desktop search solutions as well). I've found the best OCR engine on Linux seems to be tesseract [google.com], but there are a couple of others you can try. The process took me a while to get right and is a bit painful - the script which scanbuttond runs runs scanadf to scan to a string of image files per side and puts them in a processing directory. I then have another batch-processing script I run once I'm done with a pile of papers while I go and get a cup of tea which runs unpaper then tesseract on them, then hocr2pdf to convert each page individually into a searchable PDF file then finally pdftk to concatenate all the pages together into a scanned document. I split the two parts of the process out because the OCR bit can take some time and this way I can get maximum throughput on the scanner itself without needing to wait for the rest to catch up. If I could be bothered then I could make the scanning script run my de-batching script once only and have it pick up new files as they are dropped in the directory but it's not that much of an effort really.

    I then sort my PDFs into a hierarchical directory structure once they've been OCRd (and at this point they get indexed as well for searching).

    If you're on Windows/Mac then the software that comes with the ScanSnap will pretty much do all this for you; although it's better to scan with OCR disabled then use Acrobat to batch-OCR the PDFs later for the same reason. Add a decent desktop search solution like an old version of Copernic (or possible Windows Search) and all is good.

"May your future be limited only by your dreams." -- Christa McAuliffe

Working...