Ask Slashdot: Open Source For Bill and Document Management? 187
Rinisari writes "Since striking out on my own nearly a decade ago, I've been collecting bills and important documents in a briefcase and small filing box. Since buying a house more than a year ago, the amount of paper that I receive and need to keep has increased to deluge amounts and is overflowing what space I want to dedicate. I would like to scan everything, and only retain the papers for things that don't require the original copies. I'd archive the scans in my heavily backed up NAS. What free and/or open source software is out there that can handle this task of document management? Being able to scan to PDF and associate a date and series of labels to a document would be great, as well as some other metadata such as bill amount. My target OS is OS X, but Linux and Windows would be OK."
muddle headed post (Score:2, Interesting)
by definition, "important" = keep original (I mean seriously, are u that short of basement space ??) /. gone over this - is this the editors idea of a yearly question ?)
Electronics are ephemeral; You can, today, read stuff on papyrus, as long as you know the language..do you really want to trust stuff that is important to ephemera electronics ?
(i mean, how many times has
tagging is an inherently stupid idea; it may be the best that you can do with current technology, buta google like full text search is much much better (tell me - if you want to pull out a piece of information you know is on your hard drive in a pdf, do you look for the pdf, or just google it ?)
it is possible,after 5 or ten years, you might know what tags you want....
tagging is hard work, that you have to do manually consistently; better to have 3 or 4 folders organized by client/project then tag
Try Alfresco (Score:2, Interesting)
You can try Alfresco DMS.
It requires a webserver so it might be too-much for a single user.
My Workflow (Score:5, Interesting)
1) Receive document.
2) Scan with Fujitsu Scansnap S1500 in about 10 seconds. $380 on sale, but so far worth it over cheap all-in-one scanners it's not even funny. Seriously, don't even bother going paperless unless you get a real document scanner.
3) Save PDF to simple software RAID-1 mirror of two 2TB drives. (Takes about 5 seconds to setup from disk management in Windows.) This should protect against sudden drive failure taking everything.
4) Backup nightly to external drive swapped off-site every other month. This should protect from accidental deletions, fires, etc. Bonus points if backup drive is ioSafe fire proof variety.
5) Throw away original. Only exception is official documents like titles, marriage certificate, etc.. Yes, I even throw away W2s and the like. My taxes are 100 percent digital nowadays.
6) Check and test restore from those backups on a semi-regular basis, and you're done!
You don't need a CMS (Score:5, Interesting)
So, I've been doing this pretty consistently for the past few years and sent this advice to some relatives asking basically the same question. (That's also why it's a little dumbed down.)
I haven't found a case where any sort of CMS makes more sense than the file system. This is after doing this for about 10 years, and I've got records going back to '01.
I'm using a Fujifilm Scansnap and a Fellowes Powershred, and running Mac OS X. OS X has decent indexing, a good file system manager (really can't beat column view) and the Preview app will let you reassemble PDFs, which is occasionally very handy.
1. The enemy is copies. I strongly recommend "scan and shred", or you'll wind up scanning the same thing over and over.
1.1. Don't bother with any scanner that doesn't do double-sided scans.
1.2. Use a shredder. You can take things out of a trash can.
1.3. The scanner should come with OCR software. Choose "Searchable PDFs".
2. Do scanning in small batches.
2.1. Create a folder "Scanned", and "Unfiled".
2.2. The scanned files go immediately into scans, and the paper immediately goes into the shredder.
2.3. After you've got a batch of stuff scanned, you move it into Unfiled and correct the names, or split the documents up as you need to.
3. If it takes any work to scan it just shove it in a filing cabinet, or, better yet, just shred it.
3.1. If you're having to use a flatbed, it's too complicated to scan and you should file or shred it.
3.2. You can often get manuals and pamphlets and stuff online by googling part of the text or the product name.
4. Don't scan anything you can get electronically.
4.1. Most companies would much rather let you download bills and statements and such.
4.2. Most of them will also delete those statements after a few months, so get in the habit of immediately downloading the statement.
5. It's *very* helpful to put a date on everything. I generally do YYMMDD, trying to guess from dates I find in the document.
5.1.If it's a document covering a period of time like a bill for the month of November, I use the ending date.
5.2. For tax documents I'll put TT-YYMMDD, where TT is the tax year, since the actual transactions occur that year, but filing and IRS stuff happens the year after.
6. I've found that even with full text search, you still need folders.
6.1. They just don't need to be extremely complicated; usually two levels seems to be fine. I'll put prior years into separate folders, too.
6.2. Your system will evolve as you work; just get it in there, and then be mindful of what you are commonly looking for.
6.3. Keep books and reference manuals in a folder that doesn't get indexed. (Spotlight has an option for this.) They tend to create a lot of spurious hits.
7. Keep your inbox clean, if an email wants you to download a statement, get it right away and put it in Unfiled.
7.1. Likewise, keep your desktop clean, scan and shred stuff as soon as it comes in.
7.2. Have a periodic to-do item to tidy your files, don't spend more than half an hour (tops!) at any given time.
Tossing hat into the ring for DJVU format. (Score:4, Interesting)
PDF is big and bulky. DJVU format makes for tiny document scans. And there are open source libraries for creating it, available even in Debian. Wavelet compression did finally make it into the wild. It's just nobody has ever heard of it, for some reason.
Doesn't help for organization, but it should be a reasonable option for storage.
It even embeds the OCR text in the document along with the image version, so it doesn't proliferate multiple copies of the same data.
Re:I just thought of something (Score:3, Interesting)
Absolutely, no question about it. Some documents are not that important, but the important ones shouldn't go there.