Follow Slashdot stories on Twitter


Forgot your password?

Slashdot videos: Now with more Slashdot!

  • View

  • Discuss

  • Share

We've improved Slashdot's video section; now you can view our video interviews, product close-ups and site visits with all the usual Slashdot options to comment, share, etc. No more walled garden! It's a work in progress -- we hope you'll check it out (Learn more about the recent updates).

Education Technology

Large-Scale Paper-To-Digital Conversion? 459

Posted by timothy
from the that's-asking-a-lot-professor dept.
An anonymous reader writes "I've just been asked to digitize several dozen sets of lecture outlines at the university where I work. Basically, professors want to hand me a big (often 100+ page) stack of their handwritten lecture notes (with messy text, equations, and diagrams; sometimes double-sided) and expect me to post a PDF-or-something-similar to their course's web page. However, every desktop scanner I've ever used takes 1-2 minutes of user-attention per page and the resulting files end up Huge, impossible-to-read, or both. All I have at my disposal is my PowerBook, Acrobat, a couple hundred dollars of department funds for a new scanner (this maybe?), and, if I ask nicely, overnight use of the secretary's Win2k box. Any ideas? Sheet-fed scanner recommendations? Better file formats than PDF (or better PDF settings)? Do any of you students have usability advice?"
This discussion has been archived. No new comments can be posted.

Large-Scale Paper-To-Digital Conversion?

Comments Filter:
  • Get stuffed (Score:4, Insightful)

    by October_30th (531777) on Sunday May 23, 2004 @01:28PM (#9231472) Homepage Journal
    Uh. How about telling your prof. to get stuffed and get a real secretary.
  • by Space cowboy (13680) * on Sunday May 23, 2004 @01:31PM (#9231517) Journal
    Just say 'No'. (If you're being told, it's a different matter, of course).

    It sounds to me like a damned hard job to automate (which is the only way it's not going to be a constant drain on your time), and you're being given next-to-no resources to even come up with a creative solution. Sometimes the best answer is in fact 'No' - it forces people to re-evaluate what they're asking. It comes with the danger of being sacked if it's you that's being unreasonable, of course....

  • by Exocet (3998) on Sunday May 23, 2004 @01:36PM (#9231560) Homepage Journal
    "Ummm yeahhhh... if you could just do that..."

    Faust7 is right about this one. Frankly, OCR is ok, but not great - on nice text on book-or-better paper. Handwritten notes? With equations? No. Not unless your profs have some damn fine handwriting and we all know that that is absolutely not the case.

    My advice is the same as Faust7's with these additions: spend some of that money on a really nice keyboard, wrist-rest and/or maybe a nice monitor. You are going to be needing all three. If there are any left over funds, get some really nice tea. I suggest Twinnings English Breakfast or Prince of Wales, if you're going to go bagged.
  • Re:HP Copiers (Score:3, Insightful)

    by XaXXon (202882) <xaxxon@[ ] ['gma' in gap]> on Sunday May 23, 2004 @01:36PM (#9231566) Homepage
    Will you please tell both of us where we can get one for a few hundred dollars, as specified in the question?

    I think the real answer is that this guy is S.O.L. .. he's just going to have to spend some good quality time getting to know a consumer-level scanner, and let the professor know to do his notes in software initially.
  • where to look (Score:3, Insightful)

    by bcrowell (177657) on Sunday May 23, 2004 @01:37PM (#9231575) Homepage
    Have a look at the archives of this [] mailing list, which is mainly populated by Project Guternberg folks.

    But the broader question is whether this is really a good idea. The result is going to be huge files, which will be messy, hard to read, and will lack an index or table of contents. Seems like a case of profs with too much ego and not enough willingness to put their own work into more useful form.

  • Re:Outsource it (Score:5, Insightful)

    by cloudmaster (10662) on Sunday May 23, 2004 @01:41PM (#9231621) Homepage Journal
    Maybe he *is* the cheap manual labor / unpaid intern...
  • Re:Get stuffed (Score:3, Insightful)

    by October_30th (531777) on Sunday May 23, 2004 @01:41PM (#9231625) Homepage Journal
    WOW! Thats *so* helpful! Just refuse to do the job your employer is paying you to do... DAMN... why didn't I think of that?

    How do you know he's getting paid to do it? Some professors have a nasty habit of getting all their nasty, menial and boring stuff done by their students who are already working on their degree projects 12 hours a day, six days a week.

    Ok, so for some reason I assumed that the poster is a student so my initial reaction was probably off. I would never assign such a menial, dead-end task to my postgrad students, nor would I have accepted such a task without objections when I was still a student.

  • Re:Get stuffed (Score:3, Insightful)

    by Walt Dismal (534799) on Sunday May 23, 2004 @01:42PM (#9231632)
    No, seriously, this request shows utter lack of concern by someone who may be a professor, but is also a bad manager and possibly an idiot. Your response perhaps should be to scope out the project and toss estimate and the funding issue back into his lap. But do not let yourself be used as slave labor.
  • by malia8888 (646496) on Sunday May 23, 2004 @01:48PM (#9231681)
    I really agree with Space cowboy. My former husband was a college professor. He was very brilliant in his field, but anything out side of his narrow realm daunted him. He wanted to put pennies in our fusebox when the lights went out. He stared at a breaker box in the condo like it was the control panel of an alien spacecraft.

    Explain the enormity of this scratched note-to-finished Pdf to this educator. Use crayons, mirrors, yarn and tape if necessary to get your point across. Just be diplomatic :P

  • Re:Simple. (Score:5, Insightful)

    by GothChip (123005) on Sunday May 23, 2004 @01:48PM (#9231685) Homepage
    I know the parent post was funny but he's thinking along the right ideas.

    Take the few hundred you have to spend on equipment and spend it hiring a few temps.

    A good typist should be able to type up hand written notes faster than scanning them all in and manually fixing all the mistakes.
  • Re:Get stuffed (Score:5, Insightful)

    by djplurvert (737910) on Sunday May 23, 2004 @01:49PM (#9231694)
    In addition to the points already made it is not unreasonable to simply tell the prof that his/her expectations are unreasonable. Perhaps "get stuffed" is a bit over the top but I've found that employers (even professors) will listen to reasonable explanations.

    I used to have a boss that would say things like "this should only take you about five minutes". I finally told him, "nothing takes just five minutes, if I have to stop what I'm doing there is a startup/teardown cost for every task." I convinced him that there was a granularity of 1/2 hour for every random task he wanted done. The discussion was fruitful for both of us, he was more reasonable about his expectations and put a bit more thought into what he wanted to distract me from my primary task to do.

    Now, the original idea is a reasonable proposition, however, it isn't really the sort of thing that should be done for just one prof. Perhaps several departments can combine their resources to setup something that will allow this type of thing to done in a reasonable time frame.

  • Re:Simple. (Score:2, Insightful)

    by pendragn (107545) on Sunday May 23, 2004 @01:50PM (#9231704)

    Outsource the job to India.

    Not as bad an idea as it sounds. My advice is to not waste the department's money, and your time, buying, installing, and using a sheet feed scanner. Somebody in your local area assuredly has one already that they either rent out to people in your situation, or that they use to do the work you need done.

    Use the funds that the department gave you to have your local copy shop do the work. They will almost certainly do it faster than you could, and the end product will most certainly be better than what you could provide. This is the kind of thing that the people who work at copy shops do for a living.

    Also PDF is a great format for this, highly portable, and so far fairly version proof. You don't have to worry about the PDF being obsolete before the professor decides to change the structure of his class.

  • All you can do... (Score:5, Insightful)

    by cliffiecee (136220) on Sunday May 23, 2004 @01:51PM (#9231712) Homepage Journal
    Is say "Sure. I'll get this done- when I can. Don't expect it to be done for at least a few weeks, maybe longer."

    DON'T CLEAN UP THE SCANS. Don't even look at the scans. DO NOT RETYPE ANYTHING.

    With the kind of volume you say you're receiving, the only way you're going to survive is to:

    1. close your eyes,
    2. load the documents into the feeder,
    3. press 'scan'.
    4. Make sure everyone knows this policy.
  • by Anonymous Coward on Sunday May 23, 2004 @01:52PM (#9231720)
    saying no is a good option, then follow it up by telling the teachers that if they want copies of their in class notes, THEY are going to have to change their habits as well. so, a better answer than no would be, NO- but if you can use a laptop to take notes( and join the 21st century with the rest of us) I can easily make copies of those for you.
  • by Bob Bitchen (147646) on Sunday May 23, 2004 @01:53PM (#9231729) Homepage
    poorly set expectations. How did the professors get the idea that it was possible? It's not pssobile under the contraints that you are faced with. If money was not a limiting factor you could do this. But I'll assume money is a factor and time as well. So go back and tell them that it's possible but it's going to cost this much to automate the process and this much if I type it in by hand and this much if someone else does it but with poorer accuracy and so on and so forth. Put the burden on them to decide how they want to deal with this. Only then will the appropriate solution be found and chosen.
  • by Anonymous Coward on Sunday May 23, 2004 @02:00PM (#9231792)
    Why do it all one way? It sounds like a very great deal of stuff that may never be used by students. Why not try to find a prof who will cooperate with letting you see his/her webpage usage patterns?

    In my experience, it is very hard to predict what students will use for any given class based on the moronic ramblings of /.ers claiming to represent all students. On the other hand, by trying different things in my classes, I've been able to find out what my students will use eagerly. Hint: It ain't the same type of thing for every class!!!!!!

    I'd like to say that you're at a really shitty university that would take this kind of student-hostile course of action, but then, I checked out MIT's Open Courseware only to find that the first course I looked at, Gilbert Strang's linear algebra, was a botch job. There was a postage-stamp-sized video of Strang telling anecdotes on the first day of class that could only be appreciated by someone who'd already taken the class. So much for leveraging the web's inherent strong points!
  • by davidoff404 (764733) on Sunday May 23, 2004 @02:06PM (#9231844)
    This is something I've come across a lot in the past. Unfortunately, unless you've got a lot of time on your hands, you're not going to be able to do anything beyond basic scanning of the notes. It's always nice to get lecture notes properly typeset with LaTeX (and Xfig if there are diagrams), but this isn't feasible for the amount of data you've got.

    Unfortunately, OCR software probably won't be much use to you either since academics' handwriting (especially those involved in the mathematical sciences) is almost universally poor. My best advice would be to (calmly) discuss the matter with those professors who gave you the notes, pointing out to them the futility of producing small, readable PDFs from handwritten notes. Maybe you'll convince them to start TeXing their notes in future. Good luck!
  • Re:Get stuffed (Score:5, Insightful)

    by Adian (104160) on Sunday May 23, 2004 @02:11PM (#9231876) Homepage
    On the contrary, it's your job as a professional and as an employee to keep your employers in tune with what is possible, and what is most efficient for the manhours/money involved. As employees you are also responsible to your employers to keep them informed of ways to actually save money also if there is a place this can be done. If this particular job would require hundreds of manhours to do, versus paying a place that actually specializes in these services to do it. Which I'd guess the university either has this equipment on campus, or has contracts with some company already for something similar.
    Besides the fact, it sounds like they are not aware of the time involved in scanning off 10's nonetheless hundreds of pages. It doesn't sound like they are too anxious to make it easy for him to get the job done either (not buying him new equipment, using the secretaries Win2k box after hours??).
    I've volunteered my efforts before on a simple scanning job that required hundreds of regular photos to be scanned in at relatively good quality (why else do it otherwise), and ended up taking forever. Upon informing the client of the amount of time required, they adjusted the way the job was being handled.
    I think being straight with your employers, and clients is the best approach to any situation where too much is being expected. The times I've had these instances come up, and recommended different approaches that resulted in money being saved, or manhours on a task being reduced, I saw benefit in my paycheck through raises or promotions.
  • I agree totally. Some people tend to look at an admin as someone who does magic. They dont understand that some things either costs money or takes time. Perhaps it would be better to give the people writing theese things a laptop in the first place. It sounds like a great waste of time to duplicate the work when it should have been given to the admin in digital format in the first place.
  • professionally (Score:3, Insightful)

    by curator_thew (778098) on Sunday May 23, 2004 @02:27PM (#9231988)

    The professional approach is to go back to them and clarify the outcome:

    (a) you can scan the documents in, and they'll take X amount of space, and Y time; and this doesn't include OCR;
    (b) you did a few tests (using the supplied document) and these are the results for TIFF, JPG, PDF, etc;
    (c) OCR is probably infeasible (or not, do some tests) because of the nature of the documents;

    Include in (a) the option of purchasing an automated document scanner, and the corresponding reduction in time.

    Based upon all the above, get a clear go-ahead, and make the purchase if new equipment is authorised.

    You said "where I work": this is your job: it's a bit poor to do as the other posters suggest and refuse to do the work: you need to make sure that the customer (professors) understand exactly what they are getting, and give them a choice to buy into it or not - i.e. "clarify the expectations".

    If you assess that it's 2 weeks worth of work, and the professors don't disagree, then you're supervisor just has to put up with it.

  • by madstork2000 (143169) * on Sunday May 23, 2004 @02:45PM (#9232100) Homepage
    It makes no sense at all to me, to have a PDF created of handwritten notes. Since most students will probably just download and print out the PDF anyway. The only adavntage is it may save a few trees not everyone will print them out.

    It sounds like the school wants to shift the production costs (i.e printing) to the students. This seems inefficient because the old way where the instructor could go to the copy center and have the notes copied the at the schools expense (I know these expenses are often passed along to the students anyway), rather than at the students DIRECT expense of their time for downloading, then printing out on their own equipment or using their own printing accounts at the computer center.

    If the notes were being OCR'd and then made available on-line, or post processed in such a fashion (where they are searchable, indexed, etc) where they were searchable, it would be useful. Otherwise this seems like a waste of time and money.

  • by spizm (626209) on Sunday May 23, 2004 @02:47PM (#9232115)

    The company I work at scans large amounts of documents to PDF format on a daily basis. Depending on the volume some people do, we use either a Canon DR-3060 or DR-5020 document scanner. These will scan both sides of a page simultaneously, clean up the image (despeckle and deskew) and convert them into TIF or PDF all on the fly. They're fast too. Between 20 and 50 pages per minute. Only problem is that they're expensive.

    For your budget, you may be able to afford the Canon DR-2080C [] which goes for around $600. It has all the features of the more expensive ones, but it's meant for smaller volumes like what you're dealing with. With that, you'd be able to scan 100 pages into a pdf document in around 5 minutes.

  • by timeOday (582209) on Sunday May 23, 2004 @03:00PM (#9232193)
    Yes, it makes a PDF of all the pages, but each page is just a picture. There's no way to search for text in the result.
    There is no way you're going to solve that problem with one person and a couple hundred dollars.

    I know there are Adobe archival systems that store the scanned image, along with whatever text they manage to recognize. You don't expect near 100% OCR accuracy from an old, largely handwritten sheaf of lecture notes and transparencies. But hopefully enough is recognized to be of some use.

  • by pikine (771084) on Sunday May 23, 2004 @03:06PM (#9232224) Journal

    I think what your professor wants is not a bitmapped copy of his handwritten notes or some vector curves that resembles such, but actually a typeset version of the lecture notes. If that is the case, assuming that his handwritten notes are sparse (and hopefully without diagrams, since it takes more time to mess around with them), you can definitely do a stack of 100 sheets in a week, or, as someone already suggested, hire some typists to help you out.

  • by dougmc (70836) <> on Sunday May 23, 2004 @03:20PM (#9232288) Homepage
    Xerox bundles OCR as a software add-on. It works well when you get it all set up at your company. By the time you get back to your desk, the document is open and ready to be OCR'd with a drag and drop.
    The original question said that the notes were handwritten. Has anybody had any sort of success whatsoever in reading handwriting with OCR? (Not that I'm aware of.)
  • by Anonymous Coward on Sunday May 23, 2004 @03:28PM (#9232331)
    1. Get Dragon Naturally speaking.
    2. Dictate the Essay, albeight a bit lengthy, into it.
    3. Import to Word or your favorite word processor.
    4. Add any cool equations and such that you cannot dictate.
    4. Publish to PDF.

    Nice small file size I'm sure.

    Scanning is nice, but it only works with fonts it can recognize. Not Proffesorese.

    It could take you a day or so to dictate, but after your finished, more than likely you will have alot less spelling and random letter and symbol problems.

    But again, this might be more work that you want to do. Why? Well, if you do it this way, make a nice clean portable document that everyone can read, you might find yourself getting more "extra work" than you wanted.
  • by misanthrope101 (253915) on Sunday May 23, 2004 @03:48PM (#9232474)
    I've used a flatbed for this type of thing, and it works, but it takes forever and it's frustrating. It isn't hard, mind you, but time-comsuming and mind-numbing. The first 30 pages is easy and then you get really really sick of it. If you do scan it yourself, you don't ned more than 200dpi or so, and you can save as high-quality jpeg. This isn't artwork, and there is no need for perfection. Acrobat will accept any image file. I'd scan with a standalone image program (I use ACDSee and it works well) and then feed the images into Acrobat. But as far as a recommendation...

    Have it professionally done, like other people here have recommended. High-end sheetfed scanners are great, but you probably can't afford one, and it wouldn't make sense as a one-time expense for this small of a job. I'm a big fan of just handing someone some money and it's magically accomplished.

    Alternatively, use a digital camera and well-lit copy stand. You can improvise a copy stand with a tripod or whatever, but make sure you have a lot of light. It's a lot faster than using a scanner, and the results are acceptable if you have a good camera. The more megapixels the better - don't use the old 1.3mp one you have lying around. 3mp will technically work, but more is better. Ideally a digital SLR pointed straight down at the page, a very well-lit area (a clamp light on either side of the page works nicely), and you sitting there sipping Starbucks while you hit a cable shutter release after you flip every page. You could get a few hundred pages an hour done this way--your only limitation is how fast you can turn the pages. You'd only have to stop to transfer images to your computer, and you only have to do that often if you don't have enough memory cards. After you get all the pages into the computer, feed them into Acrobat and you're done.

    If you don't want to use acrobat you could make a web-page with thumbnails linked to the hi-res images. Then your end-users wouldn't need to download the Acrobat reader. I love Acrobat's ubiquity but hate the file sizes and the slow start-up time.

  • Mass Scanning (Score:2, Insightful)

    by carldot67 (678632) on Sunday May 23, 2004 @04:10PM (#9232613)
    I looked into this once for a client. Agencies charge around 5c a page but that is only to scan. Add more for OCR, manual verification and/or transfer to M$ Word or what-have you. I think I recall seeing 50c a page for such value-adds. Agencies are good because you dont get need to buy the kit (30K and up) or watch it run (they need feeding and jam quite a lot, especially if the paper is lower quality). Agencies also make sense for shops with nil/low expectation of producing more paper in the future. Get some quotes, references and examples of their work and start with a short trial run.
  • Don't bother (Score:3, Insightful)

    by An Onerous Coward (222037) on Sunday May 23, 2004 @07:16PM (#9233719) Homepage
    Frankly, I've seen professors' handwritten lecture notes, and 90% of them add nothing to the educational process. Certainly not more than a quick note saying, "Read sections 2.1, 2.2, and 2.4, paying special attention to least-squares curve fitting and finding orthonormal bases." They're generally disorganized and difficult to follow because they usually take a lot of material for granted when they write.

    The mere fact that it's handwritten means that it's basically a rough draft that was hastily flung together. Send them back to him, and have him type them in and rework them until he figures they're worth recycling for next semester. The prof will save time in the long run, and the students will have something nice, clean, and organized to peruse.
  • by im a fucking coward (695509) on Monday May 24, 2004 @01:01AM (#9235250)
    Basically, professors want to hand me a big (often 100+ page) stack of their handwritten lecture notes (with messy text, equations, and diagrams; sometimes double-sided) and expect me to post a PDF-or-something-similar to their course's web page.

    After I stopped laughing, I realized this may be a serious inquiry rather than a joke. I've assisted local government agencies in converting clear, printed, 8.5x11" text documents into searchable text / pdf documents, and the cost for these is over 10 cents a page. (Tax and mill levy records have to be verified 100% correct, as I'm sure your prof's notes need to be.) That's with volume discounting (> 500,000 pages), using nearly perfect ascii text documents, not scribbled notes.

    So my advice is to get a few bids from outside contractors, then submit a realistic estimate based on the average. Hint: Given those spec's, it's clear you/your management have no idea what's involved in this process. (Shows at least a modicum of IQ that you had the good sense to ask, however.) If you simply need to scan/save as pics (jpg/tiff -> pdf), you can do this yourself at reasonable cost/effort expenditure. Seems to be implied that you need OCR capabilities for handwritten text, as complicated as equations at that, so you're really pretty screwed. Even simply creating 100-200 kb jpg's & emailing them in an automated process is going to run into problems when the campus mail servers refuse to accept attachements larger than a Meg.

    Good luck, BWAhahahahaha!

Put not your trust in money, but put your money in trust.