High Speed Text To Digital Aquisition? 7
K asks: "As a long time subscriber to many magazines(Linux Journal, Dr. Dobbs, Byte, etc.) and as a person who can't throw away anything written for fear it might one day be useful, I was wondering if anyone knew of any high speed scanners that would allow me to archive these magazines for (possible) future use. My house is just getting too clutered and I would hate to lose these data. In addition, are there any legal questions about this. I have purchased the magazines so I believe that I should be allowed to archive them in any fashion I choose. Is this wrong?"
Re:Don't throw them away! (Score:2)
Books and magaazines, including the ads, are an experience in and of themselves, and as convenient as a CD with all the text in editable copyable form might be, it just ain't the same.
Re:Magazines should provide digital versions (Score:1)
Magazines should provide digital versions (Score:2)
Unfortunately, this is not the norm. Other magazines (like Spiegel [spiegel.de]) provide a digital version, but this one costs DM 260 (USD 130) - for anyone, also for subscribers.
I think it would be best to convince publishers to offer digital versions (or access to their online archives) for a small fee, at least for subscribers. They shouldn't have to pay again for the same content. The scanning is just too much work.
High Speed Scanner = Expensive (Score:2)
There are a couple of problems.
(1) Very high speed scanners that are used in document imaging are often only b/w. So, you loose all of the nice pretty advertisements.
(2) You obviously have to cut apart all of your magazines on the spine to get them to go through an auto feeder.
(3) The paper is likely to get jammed because its glossy and so thin.
(4) You'll spend better than $5,000 on any scanner that will do better than 10 ppm. And then you probably can't get duplex scanning.
But, if you want a solution that works great, look at a software package called Ascent Capture from Kofax (http://www.kofax.com ), and throw in a Kodak 3500 document scanner or better.
Or if you want a solution that is affordable, invite all of your friends to bring their scanners over and give them beer and pizza while
you all have a scanning party. It should only take about two hours to get through a years worth of Linux Journal.
What I do (Score:1)
Quick specs:
27 ppm (duplex, so 54 images/minute)
50 sheet document feeder
4 seconds for a flatbed pass (@ 400dpi b/w). This is great for stuff you don't want to
chop. (works best with one person changing the page while another hits the scan button.
up to 400dpi b/w or grayscale (which is hard to find), 600dpi b/w with an 8MB SIMM.
It's SCSI, so you don't need the optical interface cards, like some high-end duplex scanners. But, I think they only have TWAIN drivers for windows.
You can find them on eBay for $1000-$2500. Retail, they go for about $3500-$4000.
I've had mine for a year and a half ($2200 on eBay, brand new), and have scanned about 30,000 pages without any problems. Some paper does give it feed problems, but then I just throw the next page in as the current one is fed in. It means I have to stand there, but it doesn't slow the process down.
I use TypeReader for OCR (by ExperVision), but it works fine with Omnipage or any other TWAIN compatible program. I've found the 400dpi (over the standard 300dpi) does help reduce OCR errors, enough to offset the extra file size increase.
The bigger problem that you will find is what to do with all the info once it's scanned. You can just keep the
I personally use Folio Views (flat file-based, free-form database) to organize my data, with several perl scripts to add the heirarchy structure, and fix common OCR errors, etc. AskSam is another aption. Of course, a real database is also possible, but requires more know-how and time than I have.
As for your question about legal concern for doing this, technically it _is_ violating the copyright. By taking it to electronic format, you are--for all intents and purposes--making a new "copy" of the material, which you didn't pay for. Also, the magazine publisher just might not want that stuff to be in electronic format. They have that right. By scanning it in without asking and receiving permission, you are violating that.
Ryan
you could use a digital camera (Score:1)
damn, I hit "submit" when I meant "preview". (Score:1)