Preserving Old Research Notes and Documents? 101
twistedcubic asks: "I have several thousand 8.5 x 11 inch dead tree pages of notes and research that takes up too much storage space. I would like to have all these notes scanned into PDF files (for example) so I can recycle the pages and reclaim storage space. Does anyone know of a store that provides this service, or an inexpensive machine that will do the job in a reasonable amount of time?"
In a few months time... (Score:5, Funny)
"I have several thousand PDF files taking up too much disk storage space. I would like to have all these files printed on to 8.5 x 11 inch dead tree pages of notes so I can delete the files, empty the recycle bin and reclaim storage space. Does anyone know of a store that provides this service, or an inexpensive machine that will do the job in a reasonable amount of time?"
For future reference, I suggest a printer.
--BladeMelbourne
Re:In a few months time... (Score:3, Informative)
Re:In a few months time... (Score:2)
Just for future reference.
Easy (Score:4, Insightful)
Re:Easy (Score:1)
Re:Easy (Score:1, Interesting)
I'd imagine it'd cost a similar amount to get them to do the scanning, maybe a discount for bulk (maybe not, because scanning is more labour intensive than photocopying)...you'd probably have to teach them to make PDFs though...not hard.
If you provided a computer/scanner, it'd probably be cheaper too...then you could just pay somebody $100/month to scan books all day (you could probably pay less...but $100 isn't much i
Lots of Work (Score:1, Insightful)
untitled, untitled-1, untitled-2... untitled-3000
or are you going to rename all of them and organize them in some way? You probably won't find a solution that won't take a lot of time and work.
Re:Lots of Work (Score:2)
Re:Lots of Work (Score:2)
Re:Lots of Work (Score:1)
Re:Lots of Work (Score:3)
Re:Lots of Work (Score:1)
Re:Lots of Work (Score:2)
Re:Lots of Work (Score:2)
The advantages of transferring all of these pages onto disk storage, is that you can store around 1000 pages onto a single CD-ROM, 4000 pages on a DVD, and 60K pages or more on a USB drive, and be able to take them with you wherever you go, without having to lug around a bo
Re:Lots of Work (Score:2)
Re:Lots of Work (Score:2)
1. Look at how you've organized your paper documents. If they're in boxes, or file cabinet drawers, they must be named or coded, right? And within a coded box or drawer, you must have folders that are themselves named or coded (or notebooks?).
2. Name your top-level directories after your boxes or file cabinet drawers (or whole file cabinets if we're talking about a LOT of papers). Then, move down through the hierarchy to more specif
Dee, dee, dee... (Score:2, Insightful)
Re:Dee, dee, dee... (Score:1, Troll)
He wins a Dee, Dee, Dee award.
Seriously...This is a good idea (Score:2)
Re:Seriously...This is a good idea (Score:2)
Re:Dee, dee, dee... (Score:1)
I just cleaned out several boxes of stuff that I've been lugging around for more than a decade. Once I actually sat down and went through it, I realized that I'd never look at 95% of it again. I kept the stuff with sentimental value, and the rest is confetti. What a relief!
Not the ideal solution, but a start.. (Score:1)
Unfortunately, I'm having a terrible time remembering the brand of it. I also don't know if they're even made these days. It's not a great solution to
Re:Not the ideal solution, but a start.. (Score:4, Informative)
Good luck!
searchable db? (Score:2, Interesting)
Buy a scanner with an ADF (Score:4, Insightful)
My Brother MFC-2340C scanner comes with the PaperPort application, which generates PDFs and supports double-sided scanning even though the scanner doesn't support it. (You just flip over the whole stack once you've scanned one side, and start scanning the other side. Paperport knows how to automatically reconcile the pages.)
If you have Acrobat Professional, you can do a Paper Capture(TM) which is basically doing an OCR on the PDF and then storing the recognized words as "keywords" so that the PDF is searchable via Spotlight or other indexing mechanisms.
A document scanner is indeed a very useful piece of equipment -- I use it to scan notes and scrap paper containing rough ideas, often with lots of mathematics. Sometimes writing stuff on paper is just easier than typing in LaTeX...
The eminent computer scientist Edsger Dijkstra also liked to write stuff using pen and paper. His digitized works, called EWDs (after his initials, Edsger Wybe Dijkstra) are available here:
http://www.cs.utexas.edu/users/EWD/ [utexas.edu]
Re:Buy a scanner with an ADF (Score:3, Insightful)
With the merge of Adobe and Macromedia, the constant toying with DRM schemes, the allowing of unsafe code in current Adobe formats, etc, make format choice as vital as scanner choice.
A good exam
Re:Buy a scanner with an ADF (Score:4, Insightful)
Technically, yes, PDF is a proprietary format. a well-documented [amazon.com], widely licensed format. Really, it's just Postscript with a few organizational elements. Both Postscript and PDF have many third-part implementations, including one that's available under the GPL.
I don't see what the merger with Macromedia has to do with anything. DRM would be an issue if Adobe was the only source for PDF software -- but it's not. Hindsight is all very well -- but what format would you have chosen? Floppy disks would have been too expensive, CDs didn't exist yet. If it had been up to me, I would have chosen 9-track mag tape -- and I would have been wrong [slashdot.org]. (I still have a 9-track tape containing a backup of my student files, and no way to read it!) In any case, that mistake had to do with a choice of hardware. It's a lot easier to recreate old software than old hardware.I'll skip past all your other hardware examples (papyrus???) and skip to...
What, you think this is some kind of whim? If these documents are at all important, he has to bring them online. As long as they exist only in dead tree form, they are awkward to access, expensive to store, and run the risk of being lost in day-to-day use, to say nothing of the odd natural disaster.Re:Buy a scanner with an ADF (Score:2)
It's no problem to buy a "desktop" 9-track tape drive that understands EBCDIC. I'm sure you could then write a program to convert the data to something modern.
Re:Buy a scanner with an ADF (Score:2)
Buying a desktop 9-track would cost me something like $300 -- more than I care to spend to read one tape with data of purely nostalgic value. If I ever care enough, I'll send the tape to
Re:Buy a scanner with an ADF (Score:2)
I didn't assume. I was being prudent, since, as you say, Big Blue did have a huge chunk of the market.
PDF is NOT proprietary (Score:2)
Yes, PDF is controlled by Adobe. No, most wouldn't consider it proprietary. It is completely documented & has implementations for both authoring tools and viewers not written by Adobe. It is considered by most to be an open format.
Each version has had incremental changes. Readers which support version 1.5 of the PDF spec are backwards-compatible
Re:Buy a scanner with an ADF (Score:5, Insightful)
That would be the best method, but I would seriously question the wisdom of PDF files. Although they represent documents fairly well, the format is too proprietary and too variable to be safe. You want the baseline documents to be in a format you can read at ANY time in the future, not just three weeks down the road.
Bull. PDF is completely open and is not going away. To get the specs you merely have to download them for free from Adobe's web site. There are multiple open-source implementations of PDF readers. Although Adobe is adding features all the time, the basic format that would be used for storing scanned images has been stable and forward-compatible for years and years. There are multiple court systems which have designated PDF as the format for filing, storing, and archiving court records. There is work on an official national standard for long-term archiving of records in PDF format. (PDF-A, specifies things like: the PDF must embed the fonts used, and so on, to ensure that it will be portable across OS's and decades.)
A flaming example of a red herring. Your scanner software is not going to create a PDF with any DRM unless you tell it to. And some future version of your PDF reader is not going to suddenly refuse to read non-DRM'd files.
The "silver" alumin(i)um CDs are much less durable than the "gold" disks, but both will fail in the space of decades even if kept well.
Most "gold" CDs are merely "silver" CDs with a gold-colored label on the top. It's not even clear that the gold vs aluminum reflective layer is a real issue. But the dye type does matter, hugely.
Re:Buy a scanner with an ADF (Score:2)
I've given up on trying to find a storage medium that will last "forever" I'm going with multiple copies on multiple newer mediums. I have a few docs that first lived on 5.25" floppies. I just copied them to the newer option when they became popular, 3.5" floppy then larger hdd then CD then DVD and flash mem card. Storage formats don't go obsolete that fast, the CD has been
Re:Mod Overrated Parent Down (Score:2)
Re:Buy a scanner with an ADF (Score:3, Interesting)
If you have Acrobat Professional, you can do a Paper Capture(TM) which is basically doing an OCR on the PDF and then storing the recognized words as "keywords" so that the PDF is searchable via Spotlight or other indexing mechanisms.
Maybe I'm mistaken, but doesn't Google index PDFs? If that's the case, you can just upload it to a website and wait for it to be crawled for later searching.
That doesn't really help with the scanning problem though. Parent's solution of slave useage might be best.
Re:Buy a scanner with an ADF (Score:1)
Scanned documents would simply be images until OCR was applied (or manual transcribing).
Get a Document Scanner, not a Flatbed + ADF (Score:2)
I am extremely happy with my Canon DR-2080C. [canon.com] Note: It is the only piece of hardware I've bought, knowing that it won't work with Linux. I ran windows SPECIFICALLY to use this document scanner. It looks like it has been discontinued & the DR-2050C [canon.com] is the model to get now. Looks like it does larger documents, which is nice. These do duplex scans in one pass, so yo
This is the best advice (Score:2)
Legal Services Firm (Score:1, Informative)
Re:Legal Services Firm (Score:1, Insightful)
His one bargaining point is that he likely can wait much longer for his papers to be scanned. So he could negotiate on having his papers on a very low priority queue.
OCR probably not the way to go (Score:5, Insightful)
Are the notes graphics-heavy (i.e., scientific/engineering)?
If not, give it to a typing service. Once you show them how much "stuff" you have, I'm sure they'll give you a discount. They might even agree to use OpenOffice2 (because it handles huge documents well, the files are small, and it has an excellent PDF exporter).
You'd still have to scan in the pictures/drawing/graphs, and place them appropriately, which will take time.
Also, there are firms that specialize it digitizing paper documents (mostly forms and regularized documents for businesses). Depending on the amount of hand-writing & graphics, it might not be appropriate, though.
All in all, no matter how you do it, the project will
Re:OCR probably not the way to go (Score:2)
Hey, but at least having picked those, it's guaranteed to be good [kottke.org]
Scan to PDF with OCR behind the image (Score:2, Informative)
Re:Scan to PDF with OCR behind the image (Score:1)
In general, I scan at high resolution (600 dpi) black and white using an Epson Perfection 3170 scanner with ADF. I don't bother with the OCR. You can either scan straigth into Adobe Acrobat or into Adobe Photoshop if you need to touch up any o
The unorthodox method (Score:3, Funny)
Re:The unorthodox method (Score:1)
7. ??? 8. Profit!
Re:The unorthodox method (Score:2)
imDex (Score:3, Informative)
I would contact PRG Schultz [prgx.com] as they have done this for large clients in the past. Hey have a program called imDex [imdexsolutions.com] which is pretty slick. Basically, it's a searchable, cross-indexable database, so you'll have OCR'd text, along with TIFF's or PDF's of the documents. If you would like more information, let me know.
What are you going to store them on? (Score:3, Informative)
Clue:
There isn't one.
The best thing to do is NOT convert the paper to digitized format. Find some space instead, and store the paper. Your data will be much safer.
Re:What are you going to store them on? (Score:2, Insightful)
Re:What are you going to store them on? (Score:4, Informative)
Re:What are you going to store them on? (Score:1)
Apart from stone engraving, paper is probably the most reliable long-term archive solution, as long as it is stored properly. Of course it is difficult to seach and index but this is a different issue.
Re:What are you going to store them on? (Score:1)
Re:What are you going to store them on? (Score:1)
Re:What are you going to store them on? (Score:3, Interesting)
Re:What are you going to store them on? (Score:3, Insightful)
You have to copy EVERYTHING to new media eventually. You need to have a plan, and you need to execute it. Simple as that. Paper will disintegrate, and yes, hardware will become obsolete. You just need to progress to the stone in the river before the current one is submerged.
But which is easier/cheaper to propagate to new media and make backup copies? Digital data in open, documented, implement formats, or paper? Which is che
Go low tech? (Score:3, Informative)
Many libraries will have reader-printers that for a small fee (eg, $0.20/page?) you can print a copy.
Most of the expense with fiche is the production of the silver halide original; diazo copies are relatively cheap. If it's really important to you, have a copy made and lock the original film in a safe deposit box (or at least offsite)
Some tips (Score:2)
I've helped setup something like this. The best small scale solution would be to get a good flatbed scanner with an automatic document feeder (ADF). You can get decent HP scanners for about $400-700.
Once you have the scanner, you can setup a few scanning profiles that automatically set resolution, color depth, black&white threshold, etc. Then scan the notes into adobe as images. If you scan them in as monochrome images at 100-150dpi you can get fairly small files that are very readable on screen
Re:Some tips-ABBYY Finereader. (Score:2)
My experience was that OCR was not up to the task 2 years ago. However this was with notes and papers where up to 50% of the page was mathematics. Once you start seeing some of the more esoteric or specialized mathematical symbols, I think the OCR just breaks down.
However, even with 95% accuracy on math symbols that would leave a lot of pages to be reviewed for
ask the institution (Score:2)
I assume if you've collected that much research, you work for a university or some sort of research institution. My undergrad college of 1,100 students had like three of these, including one that was part of some ginormous Xerox do-everything-and-then-collate-and-bind-it-(if-you 're-printing, -of-course) machine that was sitting out so that
ADF (Score:2)
HP Digital Sender (Score:2)
The problem is that they're about $2500 each (MSRP $3200), because they're a niche item. Shame really, because if they'd dropped i
simple (Score:2, Funny)
wiki (Score:2)
Digital Copier with ADF (Score:2)
hylafax (Score:5, Interesting)
Re:hylafax (Score:2)
It's really no improvement over the obvious: get a scanner with an autofeeder. However, I'm assuming that he's looking for something more industrial/quick, since that answer is so obvious. I'm guessing he must have already checked out Kinkos (which
A store (Score:2)
Microfilm! (Score:2)
Unless you need the capability to grep the documents, there's little point in digitizing
Re:Microfilm! (Score:2, Insightful)
In a drawer or filing cabinet.
and what are guarantees that it'll actually stay preserved for that long?
Wet-film microfilm has an estimated survivability of 500 years in ideal conditions and a minimum of 100 years in any reasonable conditions. To my knowledge this exceeds the lifetime of any digital medium.
It's fairly trivial to store redundant copies of your digital files, even in multiple locations worldwide. The costs are minimal too.
It's f
Re:Microfilm! (Score:2, Insightful)
Ah, fuck it. I'm tired of doing your research for you. You log in as an AC, then expect a legitimate user to Google "lifetime of microfilm" and "cost of microfilm transfer" because y
Re:Maybe you should try djvulibre (Score:3, Informative)
It's possible... (Score:3)
First, you'll need a low-volume scanner. (Check the duty cycle to make sure it can handle you bookshelf of papers.) Then, you'll need something to convert the images to pdf. If you have any programming experience, write a quick app that uses http://www.imagemagick.org/ [imagemagick.org] Image Magick to convert from tiff to pdf. Put each binding in its own folder, and pretend the "untitled1.pdf" says "page1.pdf"
If you want to get fancier have the front end app rename the untiled1.tiff to whatever you'd like. Also, you can embed extra information into the pdf by using metadata and Adobe XMP SDK (free download from Adobe). Make the meta data like:
TITLE="My Book"
AUTHOR="Bart Simpson"
etc.
What are your retrieval needs? (Score:1)
Do you actually REFER to the notes every now and then?
Do you need text or just scanned-images?
Do the advantages of having them outweigh the advantages of destruction? Remember, if you destroy it then it can't come back and haunt you in a lawsuit. But then again, it can't help you either. Caution - before you destroy anything make sure you have an official data-retention policy, and stick with that policy. Otherwise, destroying data CAN be seen as
Document Management System (Score:2)
Legal businesses and accounting departments use this stuff
Retaining Legality (Score:2)
Re:Retaining Legality (Score:1)
Einstein Papers Project (Score:2)
DjVu, not PDF (Score:3, Informative)
(Of course, you will still need to spend lots of time scanning, naming and classifying those pages. The ADF and 10yo nephew suggested in another post might be useful for that.)
DjVu offers very compact representation without the need to OCR the document (I've converted a 13 megs scanned PDF into a 600K DjVu which was much faster and easier to read), and optionally a "hidden text layer" if you want to OCR it to make it searchable.
Junk them (Score:1)
If you really need to keep them, throw them in boxes and put them in document storage somewhere. Then, on the off chance you might need them for patent disputes, etc., you can hire someone for $8/hr to go thru them.
My Suggestion (Score:2)
Unless you plan on using OCR, these documents could also be saved in tiff, png, or jpeg formats. Personally, I would consider a format that allows for the embedding of keywords into the file, so that searching will be easier later on.
Good luck.
Possible Solution: (Score:2)
2. Get software to scan to PDF format.
3. Get Google Desktop Search which will index the contents of PDF or get an Apple Mac with Mac OS X 10.4 (Tiger) and Spotlight will index your PDF's. If you have a Mac, you may be able to scan to PDF without needing Adobe Acrobat.
Don't know about scanner services, but check around and you might find someone who can scan the documents to PDF and give you a DVD-R or CD-R's with the files. Kinko's? Print Shop?
We have highend Kodak
Big photocopiers (Score:1)
It is also a network device to scan / print. I took in my computer (mac mini) plugged into the ethernet port and (adding 20-30 minutes of fiddling) was away.
So.. make friends with your local big company (a hospital would be good - you can make a small donation).
Bear in mind though that it took me pretty much a
I can do it. (Score:1)
If you really need to be able to access it though something like that should cost between 10 and 20 cents a page (in that quantity) depenind on the standards for the accuracy and the feedability. (if it is 100 page documents in 3 ring biders with no staples and clean edges and no post its expect it to cost a lot less than 2 page documents covered wit
Hardware is not the limitation (Score:1)