Slashdot Log In
Recorded Speech to Text Software?
Posted by
Cliff
on Fri Jan 23, 2004 08:23 PM
from the seeking-lil-help-from-the-processor dept.
from the seeking-lil-help-from-the-processor dept.
shfted! asks: "Recently, I've been given the task of transcribing several dozen audio tapes of interviews to typed word, that is, listening for 10 seconds, write what was said, repeat. At around 4 hours per hour long tape, I would like to automate the process somehow. Recording the tape into the computer is no problem, but I need some software that will do the speech recognition accurately more than quickly -- several hours per tape is not an issue (I have access to several machines running 24/7). I will still have to go over the computer's work to correct any mistakes. A free solution for Linux would be best, non-free and Windows solutions are okay, but a working solution is highest priority. Can anyone point me in the right direction(s)?"
This discussion has been archived.
No new comments can be posted.
Recorded Speech to Text Software?
|
Log In/Create an Account
| Top
| 66 comments
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Lo tek is the way to go in this instance (Score:5, Interesting)
It's cost effective, as fast as you need it to be and best of all more accurate than any software solution to date. Most software packages are still at only about 90% accuracy, so that's still 24 minutes per four hour tape that you'll need to correct, and you'll still probably have to listen to the whole thing over again in order to verify the accuracy of any software program.
Re:Lo tek is the way to go in this instance (Score:4, Interesting)
(Last Journal: Thursday March 11 2004, @12:40AM)
Simple Suggestion (Score:2, Insightful)
1) Give your neighbour's kid $10 to transcribe the tape one afternoon
2)
3) Text!
Not really, the technology isn't there (Score:4, Insightful)
The technology to do this isn't really there. If the machine can learn how you speak, it can do it. If you limit yourself to just a few words (1000 perhaps?) it is easier. To do it in general for random speakers though?
The problem is people are too varied. I have trouble understanding people from the "deep south". The accent is too think for my ears. I'm sure they have the same problem with my accent.
That isn't to say don't try it, but don't get your hopes up. Vocie recignition is hard, and isn't done well. Just be glad you only have a few to do, my sister's full time job is typing things like that. (most of less interest as she describes it)
Existing software (Score:3, Interesting)
(Last Journal: Monday December 22 2003, @01:52PM)
From the Dragon page:
True Continuous Speech - Speak to your computer naturally and at a normal pace--without pausing between words. Your spoken words swiftly appear on your computer screen.
You must be joking (Score:4, Insightful)
(http://www.factcheck.org/)
Re:You must be joking (Score:5, Insightful)
(http://slashdot.org/)
Court reporters do this kind of thing for a living and some (all?) are contract workers. They can do it in real time and would probably be quite happy to be able to do it all at home rather than in a deposition room or court room. Oh, and their accuracy would be a lot higher than if you did it yourself without checking or if you hired a student to do it.
Though a tech solution would be cool...
Slow the playback down (Score:5, Informative)
Years ago, I improved my own typing speed and accuracy by transcribing phone conversations with friends. It just takes some practice.
Of course, if you are listening to this guy [demon.co.uk], you can disregard my advice.
Sphinx (Score:5, Informative)
(http://kurobox.com/)
Hire a professional (Score:5, Insightful)
(http://www.threesquirrels.com/)
A person who does transcription for a living will do it faster, probably cheaper, and will be able to handle all of the quirks of human speech that will gum up the works of a voice to text program.
There are still places where a machine cannot match the quality of a real live person.
automatic transcription (Score:1)
(http://www.livejournal.com/users/k4_pacific | Last Journal: Tuesday May 25 2004, @10:16PM)
1. Convert the tapes to *.mp3 files.
2. In the xterm type:
cat speech.mp3 | transcribe > speech.txt
3. Write a program called transcribe that converts audio data on stdin to text on stdout.
4. Redo step 2 now that you have a program.
Karma: desrever
I'm facing the same issue. (Score:1)
(http://www.superbad.com/)
SuSE 7.3 (Score:3, Informative)
(http://www.outshine.com/)
If you can get a copy of SuSE 7.3 Professional, it comes with IBM's ViaVoice for Linux. It can take audio and turn it into text. The trick is that 7.3 came out about 2 years ago, I think. Most stores would have the newer 9.0 version, which doesn't have ViaVoice.
I guess it is possible that IBM still sells ViaVoice for newer distros. I've never looked.
If you had large volumes (Score:1)
(http://pietersz.co.uk/ | Last Journal: Wednesday May 04 2005, @05:22AM)
There are companies that do this sort of stuff but I suspect your volumes are way below what they handle.
Some (slightly) OT Advice (Score:3, Informative)
When the time came to transcribe the interview, she found the version on her PC more helpful -- her hands never had to leave the keyboard in order to pause or "rewind" the audio.
If you go this route, remember that you'll need about 600 MB per hour of uncompressed audio. If space is an issue and you need to compress, don't max out the compression; saving a few megabytes here and there could result in hours of extra work due to artifacts.
[1] With explicit permission given.
Try this (Score:1, Informative)
forget it, it isn't feasible (Score:2)
(Last Journal: Tuesday June 24 2003, @10:34AM)
Genetic Programming may be the best way (Score:1)
(Last Journal: Friday March 05 2004, @06:48PM)
I see this as in ideal area of genetic programming. Imagine that the person transcribes, by hand, a few random places on the tape. Maybe a total of 4 random places from the recorded material, at about a sentences each spot (or a paragraph). The program then generates an initial population of genetic programs who will breed based the fitness of your initial transcription. After a few rounds of this you take the best-fit individuals or something like that. If the speaker is the same on the audio then this may work. This may have problems with multiple speakers. I would imagine if you scale this up and use all of recorded English speech (court transcripts) with audio of the same transcripts and some lisp genetic programming the programs would localize to the English dialect in general. Granted this would take a lot of training and weeding out of bad populations. Interesting idea. See the Wired article: http://hotwired.wired.com/collections/genetics/5.
for some interesting ideas of how this could work. L.A. on the East Coast
Re:Outsource (Score:1, Informative)