Stories
Slash Boxes
Comments

News for nerds, stuff that matters

Recorded Speech to Text Software?

Posted by Cliff on Fri Jan 23, 2004 08:23 PM
from the seeking-lil-help-from-the-processor dept.
shfted! asks: "Recently, I've been given the task of transcribing several dozen audio tapes of interviews to typed word, that is, listening for 10 seconds, write what was said, repeat. At around 4 hours per hour long tape, I would like to automate the process somehow. Recording the tape into the computer is no problem, but I need some software that will do the speech recognition accurately more than quickly -- several hours per tape is not an issue (I have access to several machines running 24/7). I will still have to go over the computer's work to correct any mistakes. A free solution for Linux would be best, non-free and Windows solutions are okay, but a working solution is highest priority. Can anyone point me in the right direction(s)?"
This discussion has been archived. No new comments can be posted.
Display Options Threshold:
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • by Txiasaeia (581598) <[kungpowfriesens] [at] [gmail.com]> on Friday January 23 2004, @08:27PM (#8072094)
    Several hours per tape is acceptable? Well, if you can do one tape in four hours, then two people can do one tape in two hours. In other words, hire a college student at minimum wage for a contract position (I.e. until the tapes are transcribed) and go to it.

    It's cost effective, as fast as you need it to be and best of all more accurate than any software solution to date. Most software packages are still at only about 90% accuracy, so that's still 24 minutes per four hour tape that you'll need to correct, and you'll still probably have to listen to the whole thing over again in order to verify the accuracy of any software program.

  • Simple Suggestion (Score:2, Insightful)

    by Anonymous Coward on Friday January 23 2004, @08:28PM (#8072096)
    Given that half-decent speech recognition is still struggling, might I suggest:

    1) Give your neighbour's kid $10 to transcribe the tape one afternoon
    2) ...
    3) Text!
  • by bluGill (862) on Friday January 23 2004, @08:31PM (#8072112)

    The technology to do this isn't really there. If the machine can learn how you speak, it can do it. If you limit yourself to just a few words (1000 perhaps?) it is easier. To do it in general for random speakers though?

    The problem is people are too varied. I have trouble understanding people from the "deep south". The accent is too think for my ears. I'm sure they have the same problem with my accent.

    That isn't to say don't try it, but don't get your hopes up. Vocie recignition is hard, and isn't done well. Just be glad you only have a few to do, my sister's full time job is typing things like that. (most of less interest as she describes it)

  • Existing software (Score:3, Interesting)

    by skinfitz (564041) on Friday January 23 2004, @08:37PM (#8072153)
    (Last Journal: Monday December 22 2003, @01:52PM)
    What about simply plugging the tape into a system running Dragon Naturally Speaking or [dragontalk.com] IBM ViaVoice? [ibm.com]

    From the Dragon page:
    True Continuous Speech - Speak to your computer naturally and at a normal pace--without pausing between words. Your spoken words swiftly appear on your computer screen.
  • You must be joking (Score:4, Insightful)

    by Radical Rad (138892) on Friday January 23 2004, @08:38PM (#8072158)
    (http://www.factcheck.org/)
    Just do the tapes. It will take longer to screw with software setup and cleanup than to just do it. But if you either buy or rig up a foot switch to play/rewind the tape I think it would help. Also I am assuming you are a touch typist. If not then get someone who is to do this job for you.
    • Re:You must be joking by Radical Rad (Score:2) Friday January 23 2004, @08:40PM
    • Re:You must be joking (Score:5, Insightful)

      by splattertrousers (35245) on Friday January 23 2004, @08:46PM (#8072203)
      (http://slashdot.org/)
      If not then get someone who is to do this job for you.

      Court reporters do this kind of thing for a living and some (all?) are contract workers. They can do it in real time and would probably be quite happy to be able to do it all at home rather than in a deposition room or court room. Oh, and their accuracy would be a lot higher than if you did it yourself without checking or if you hired a student to do it.

      Though a tech solution would be cool...

      [ Parent ]
    • Re:You must be joking by Via_Patrino (Score:1) Sunday January 25 2004, @09:48PM
  • Slow the playback down (Score:5, Informative)

    by billh (85947) on Friday January 23 2004, @08:48PM (#8072220)
    Slow the playback down and type them as you listen. If you can't do this, hire someone who can. I know many people that can keep up with spoken conversations in real-time.

    Years ago, I improved my own typing speed and accuracy by transcribing phone conversations with friends. It just takes some practice.

    Of course, if you are listening to this guy [demon.co.uk], you can disregard my advice.

  • Sphinx (Score:5, Informative)

    by jcausey (253286) on Friday January 23 2004, @09:03PM (#8072296)
    (http://kurobox.com/)
    Give Sphinx [sourceforge.net] a try. It's pretty accurate; especially Sphinx-3. I've used v2 before for a live test, and it works great -- even with different voices.
    • Re:Sphinx by shfted! (Score:2) Friday January 23 2004, @11:26PM
  • Hire a professional (Score:5, Insightful)

    by rueger (210566) on Friday January 23 2004, @09:57PM (#8072557)
    (http://www.threesquirrels.com/)
    If your hours of tape are something that has to be transcribed accurately, don't waste your time trying to do it with a computer.

    A person who does transcription for a living will do it faster, probably cheaper, and will be able to handle all of the quirks of human speech that will gum up the works of a voice to text program.

    There are still places where a machine cannot match the quality of a real live person.
  • Here's what you do:

    1. Convert the tapes to *.mp3 files.
    2. In the xterm type:
    cat speech.mp3 | transcribe > speech.txt
    3. Write a program called transcribe that converts audio data on stdin to text on stdout.
    4. Redo step 2 now that you have a program.

    Karma: desrever
  • Can't be done in this instance. The best software solution is to dictate the tape into one of the commercial SR products. This persupposes you have already trained the software with your voice or have the time to do so. Hiring stenographers is an option, but you will still take quite a long time checking their work for accuracy and making corrections, and the less you pay them, the more work there will be to fix the transcripts up.
  • SuSE 7.3 (Score:3, Informative)

    by Anthony Boyd (242971) on Saturday January 24 2004, @03:03AM (#8073734)
    (http://www.outshine.com/)

    If you can get a copy of SuSE 7.3 Professional, it comes with IBM's ViaVoice for Linux. It can take audio and turn it into text. The trick is that 7.3 came out about 2 years ago, I think. Most stores would have the newer 9.0 version, which doesn't have ViaVoice.

    I guess it is possible that IBM still sells ViaVoice for newer distros. I've never looked.

  • by the_womble (580291) on Saturday January 24 2004, @04:41AM (#8073948)
    (http://pietersz.co.uk/ | Last Journal: Wednesday May 04 2005, @05:22AM)
    What people with large amounts of this sort of stuff to do is outsource offshore.

    There are companies that do this sort of stuff but I suspect your volumes are way below what they handle.

  • Some (slightly) OT Advice (Score:3, Informative)

    by travail_jgd (80602) on Saturday January 24 2004, @10:21AM (#8074834)
    A friend was in a similar situation -- she had recorded a phone interview [1], and needed to transcribe it. To make certain there were no technical glitches, the interview was recorded to cassette and as a WAV file on her PC.

    When the time came to transcribe the interview, she found the version on her PC more helpful -- her hands never had to leave the keyboard in order to pause or "rewind" the audio.

    If you go this route, remember that you'll need about 600 MB per hour of uncompressed audio. If space is an issue and you need to compress, don't max out the compression; saving a few megabytes here and there could result in hours of extra work due to artifacts.

    [1] With explicit permission given.
  • Try this (Score:1, Informative)

    by Anonymous Coward on Saturday January 24 2004, @12:19PM (#8075452)
    http://download.com.com/3000-7239-10251419.html?ta g=lst-0-12
    • Re:Try this by shfted! (Score:2) Saturday January 24 2004, @05:48PM
  • by ralphclark (11346) on Saturday January 24 2004, @01:49PM (#8075969)
    (Last Journal: Tuesday June 24 2003, @10:34AM)
    Interviews are not formal speech. People mumble and slur their words in interviews. They cough, rustle and make non verbal vocalisations. There is no software that can deal with this. With the best voice recognition software you would be lucky to capture 30% of what was said, and that's *after* all the fricking about with settings etc. Quicker just to type it in.
  • by EastCoastLA (129478) on Saturday January 24 2004, @04:59PM (#8077227)
    (Last Journal: Friday March 05 2004, @06:48PM)
    Interesting Idea!
    I see this as in ideal area of genetic programming. Imagine that the person transcribes, by hand, a few random places on the tape. Maybe a total of 4 random places from the recorded material, at about a sentences each spot (or a paragraph). The program then generates an initial population of genetic programs who will breed based the fitness of your initial transcription. After a few rounds of this you take the best-fit individuals or something like that. If the speaker is the same on the audio then this may work. This may have problems with multiple speakers. I would imagine if you scale this up and use all of recorded English speech (court transcripts) with audio of the same transcripts and some lisp genetic programming the programs would localize to the English dialect in general. Granted this would take a lot of training and weeding out of bad populations. Interesting idea. See the Wired article: http://hotwired.wired.com/collections/genetics/5.0 5_symbolic_ai_pr.html
    for some interesting ideas of how this could work. L.A. on the East Coast
  • Re:Outsource (Score:1, Informative)

    by Anonymous Coward on Friday January 23 2004, @08:32PM (#8072118)
    The people who do lots of these (such as transcription services for doctors) use India. You need educated people by the way, they have to know the words being used.
    [ Parent ]
    • Re:Outsource by manavendra (Score:1) Saturday January 24 2004, @01:02PM
    • 1 reply beneath your current threshold.
  • 3 replies beneath your current threshold.