Recorded Speech to Text Software? 66
shfted! asks: "Recently, I've been given the task of transcribing several dozen audio tapes of interviews to typed word, that is, listening for 10 seconds, write what was said, repeat. At around 4 hours per hour long tape, I would like to automate the process somehow. Recording the tape into the computer is no problem, but I need some software that will do the speech recognition accurately more than quickly -- several hours per tape is not an issue (I have access to several machines running 24/7). I will still have to go over the computer's work to correct any mistakes. A free solution for Linux would be best, non-free and Windows solutions are okay, but a working solution is highest priority. Can anyone point me in the right direction(s)?"
Re:Outsource (Score:1, Informative)
Re:Outsource (Score:1)
these professionals are equipped with all the equipment aka the foot pedal, etc, and have been "trained" to understand the accent, etc. they further have references and journals to look up a term if they are not familiar with it. on top of it all, there are several quality chec
Lo tek is the way to go in this instance (Score:5, Interesting)
It's cost effective, as fast as you need it to be and best of all more accurate than any software solution to date. Most software packages are still at only about 90% accuracy, so that's still 24 minutes per four hour tape that you'll need to correct, and you'll still probably have to listen to the whole thing over again in order to verify the accuracy of any software program.
Re:Lo tek is the way to go in this instance (Score:1, Funny)
Well, if you can do one tape in four hours, then two people can do one tape in two hours.
More accurately, two people can do two tapes in four hours. While this is equivalent to one tape in two hours, it doesn't mean that two people can listen to two portions of the same tape at once.
When I used to install DSL equipment, we often found that our suppliers substituted items on us, like 20 2" screws instead of 40 1" screws. I mean, it's all still 40", right?
Re:Lo tek is the way to go in this instance (Score:2, Insightful)
Tapes can be copied on off time. If they are standard audio cassette tapes, then they are not more than 45 minutes per side anyway so you are looking several tapes anyway.
Even assuming the worst case, 1 tape that is 4 hours long, you can feed the output of the player into the input of a computer, do a ogg (mp3) rip on the stream, and then fast forward to different places. There will be issues merging the copies, but still much less time per person than one person doing the entire thing. (but more work
Re:Lo tek is the way to go in this instance (Score:1, Insightful)
Tapes can be copied on off time.
They can, but then you're just adding to the total time -- now the students still have to listen to the tapes, and someone has to copy them. Better to just give the students two separate tapes in the first place.
It was something of a joke, of course.
Re:Lo tek is the way to go in this instance (Score:4, Interesting)
Re:Lo tek is the way to go in this instance (Score:2)
Re:Lo tek is the way to go in this instance (Score:2)
Automatic silence removal would also speed things up.
Re:Lo tek is the way to go in this instance (Score:1)
Re:Lo tek is the way to go in this instance (Score:2)
Still I agree a software solution is unlikely to be better. What he should do is rapidly rip the tape to memory and then play it back at whatever speed he wants.
Maybe there's some software that could guess the words for him and he just has to decide if the program is right or wrong in real time. Could cut it down to 1.5 hours to do 1 hour audio. Heck might even be faster if the speaker i
Re:Lo tek is the way to go in this instance (Score:1)
Re:Lo tek is the way to go in this instance (Score:3, Insightful)
Simple Suggestion (Score:2, Insightful)
1) Give your neighbour's kid $10 to transcribe the tape one afternoon
2)
3) Text!
Not really, the technology isn't there (Score:4, Insightful)
The technology to do this isn't really there. If the machine can learn how you speak, it can do it. If you limit yourself to just a few words (1000 perhaps?) it is easier. To do it in general for random speakers though?
The problem is people are too varied. I have trouble understanding people from the "deep south". The accent is too think for my ears. I'm sure they have the same problem with my accent.
That isn't to say don't try it, but don't get your hopes up. Vocie recignition is hard, and isn't done well. Just be glad you only have a few to do, my sister's full time job is typing things like that. (most of less interest as she describes it)
Re:Not really, the technology isn't there (Score:2, Interesting)
The state of the art for arbitrary news broadcasts is about a 20% word error rate. While this isn't good enough for the poster's needs, it turns out to be almost good enough for indexing.
Wonder when we'll start seing Google return audio and video along with text documents? There's a research project demo of this happening here [limsi.fr].
Re:Not really, the technology isn't there (Score:2)
Existing software (Score:3, Interesting)
From the Dragon page:
True Continuous Speech - Speak to your computer naturally and at a normal pace--without pausing between words. Your spoken words swiftly appear on your computer screen.
Re:Existing software (Score:1, Funny)
Speak to your computer naturally and at a normal pace
Yeah, drag in soft wear works wwww under fully four me. I'm you sing it write now.
Re:Existing software (Score:2, Interesting)
Hell, the new voice-mail voice activated menus that have been popping up when i dial customer service sometimes force me to say out my phone number. And even to do that accurately, I have to speak very slowly and quite loud. More ofen, I just press random buttons until I get dumped to a live operator. (Try it, it works!)
Re:Existing software (Score:1)
Some computers that ask your address only have pre-programmed values, so if your street doesn't match one of their values (a common non-matching value is "THE"), you'll have to repeat it 3 times (while in between listening to a computer voice say that it didn't understand and ask you to repeat) before it will give up and connect you to someone live.
Re:Existing software (Score:2)
I do that too but while pushing the random keys last week while being forced to listen AGAIN to the long winded and patronising "did I know about metalink?" message while calling Oracle tech support and desperately hoping to MAKE IT STOP, however it dumped me out to an operator in Switzerland.
Re:Existing software (Score:2)
You must be joking (Score:4, Insightful)
Re:You must be joking (Score:2)
Re:You must be joking (Score:5, Insightful)
Court reporters do this kind of thing for a living and some (all?) are contract workers. They can do it in real time and would probably be quite happy to be able to do it all at home rather than in a deposition room or court room. Oh, and their accuracy would be a lot higher than if you did it yourself without checking or if you hired a student to do it.
Though a tech solution would be cool...
Re:You must be joking (Score:3, Informative)
Basicly, rather then typing in characters to form words, they are typing in syllables to form words. Sometime later they transcribe the shorthand into full text. So while recording speech in real time, they are not transcribing it into full text.
And somewhere back in my brain ISTR that prety much all US court procedings have been recorded on audio tape for decades. I know for a fact that the loca
there's also closed caption people (Score:1)
Re:You must be joking (Score:2)
Re:You must be joking (Score:1)
I didn't know by the time but you can also use it to slowdown the speech, like slow motion for video, i think it's much easier to type.
I don't know if it was because i also needed to translate/syncronize and wasn't a native speaker i gave up on it. I advice
Slow the playback down (Score:5, Informative)
Years ago, I improved my own typing speed and accuracy by transcribing phone conversations with friends. It just takes some practice.
Of course, if you are listening to this guy [demon.co.uk], you can disregard my advice.
Re:Slow the playback down (Score:2)
Re:Slow the playback down (Score:2)
Dvorak keyboards (Score:1)
Re:Dvorak keyboards (Score:2)
1. Get hired to transcribe tapes at 40$CAN/tape.
2. Buy a Dvorak keyboard (cost: probably $200CAN minimum)
3. Spend a year getting really proficient with a new layout
4. Whip through those transcriptions like it ain't nobody's business.
5. ???
6. Profit.
Re:Dvorak keyboards (Score:2)
Re:Dvorak keyboards (Score:2)
Sphinx (Score:5, Informative)
Re:Sphinx (Score:2)
Hire a professional (Score:5, Insightful)
A person who does transcription for a living will do it faster, probably cheaper, and will be able to handle all of the quirks of human speech that will gum up the works of a voice to text program.
There are still places where a machine cannot match the quality of a real live person.
Re:Hire a professional (Score:2)
Being able to get a computer to do it would be cool, but there's a reason why companies hire people with this skill - the technology just isn't there yet. Even if it were possible to buy/build a sy
automatic transcription (Score:1)
1. Convert the tapes to *.mp3 files.
2. In the xterm type:
cat speech.mp3 | transcribe > speech.txt
3. Write a program called transcribe that converts audio data on stdin to text on stdout.
4. Redo step 2 now that you have a program.
Karma: desrever
Re:automatic transcription (Score:1)
Re:automatic transcription (Score:2, Funny)
Way to make a dick of yourself on Slashdot #445:
NOT USE THE BLOODY PREVIEW BUTTON.
Re:automatic transcription (Score:1)
I'm facing the same issue. (Score:1)
SuSE 7.3 (Score:3, Informative)
If you can get a copy of SuSE 7.3 Professional, it comes with IBM's ViaVoice for Linux. It can take audio and turn it into text. The trick is that 7.3 came out about 2 years ago, I think. Most stores would have the newer 9.0 version, which doesn't have ViaVoice.
I guess it is possible that IBM still sells ViaVoice for newer distros. I've never looked.
If you had large volumes (Score:1)
There are companies that do this sort of stuff but I suspect your volumes are way below what they handle.
Some (slightly) OT Advice (Score:3, Informative)
When the time came to transcribe the interview, she found the version on her PC more helpful -- her hands never had to leave the keyboard in order to pause or "rewind" the audio.
If you go this route, remember that you'll need about 600 MB per hour of uncompressed audio. If space is an issue and you need to compress, don't max out the compression; saving a few megabytes here and there could result in hours of extra work due to artifacts.
[1] With explicit permission given.
Re:Some (slightly) OT Advice (Score:2)
Re:Some (slightly) OT Advice (Score:2)
In most US states, as long as at least one of the parties in a phone conversation knows about the taping, the taping doesn't break the law. This means you can tape any calls you make or receive. Basically, the relevant laws only affect an uninvolved 3rd party recording a converation they listen in on without the main parties of the call knowing about it.
However, I say "most states", not all, so you might want to verify this for where you live.
Re:Some (slightly) OT Advice (Score:2)
Nope, federal law requires both parties to know *UNLESS* there's a warrant authorizing such a thing (ie. wiretap).
This is why Linda Tripp was prosectuted for recording her phone conversations with Monica Lewinsky.
Re:Some (slightly) OT Advice (Score:1)
Re:Some (slightly) OT Advice (Score:2)
My bad, after further research, it is a state by state issue unless it crosses state lines (as in the Tripp/Lewinsky case).
Try this (Score:1, Informative)
Re:Try this (Score:2)
forget it, it isn't feasible (Score:2)
Genetic Programming may be the best way (Score:1)
I see this as in ideal area of genetic programming. Imagine that the person transcribes, by hand, a few random places on the tape. Maybe a total of 4 random places from the recorded material, at about a sentences each spot (or a paragraph). The program then generates an initial population of genetic programs who will breed based the fitness of your initial transcription. After a few rounds of this you take the best-fit individuals or something like that. If the speaker is the same o