Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
×
Open Source Software

Open Source Transcription Software? 221

sshirley writes "I am beginning to do some interviews with family members and will do some audio journals for genealogy purposes. I would really love to be able to run the resulting MP3 or WAV files through some software a get a text file out. I know that software like this exists commercially. But does this exist in the open source world?"
This discussion has been archived. No new comments can be posted.

Open Source Transcription Software?

Comments Filter:
  • Unfortunately... (Score:3, Interesting)

    by dmneoblade ( 848781 ) on Tuesday July 20, 2010 @06:55PM (#32971966)
    I spent several month searching for something like this. Open-source voice recognition is in really infant stages, and there does not seem to be much interested in improving the few things we have.
  • by itamblyn ( 867415 ) on Tuesday July 20, 2010 @06:55PM (#32971968) Homepage
    It seems like there should be some way to "hack" the audio transcription that google offers through google voice or youtube. Unfortunately I haven't found a way to upload a file. With youtube, if you make a fake movie, it gives an error that it can't be transcribed. Getting google voice to work would require some sort of phone interface I suppose...
  • XTrans (Score:2, Interesting)

    by ceraphis ( 1611217 ) on Tuesday July 20, 2010 @07:01PM (#32972042)
    Why don't you give XTrans a shot: XTrans [upenn.edu]
  • by afabbro ( 33948 ) on Tuesday July 20, 2010 @07:31PM (#32972380) Homepage
    ...you could always use RentACoder (er, Vworker.com now) and hire someone for pennies to do it.
  • by vrmlguy ( 120854 ) <samwyse&gmail,com> on Tuesday July 20, 2010 @08:02PM (#32972636) Homepage Journal

    I just slice everything up into segments of 60 seconds and let Google Voice transcribe it for me. Sure, some nay-sayers might point out that it's slower that transcribing it all manually, but they don't get that I'm getting Google to do the work for me!

  • Re:Got kids? (Score:5, Interesting)

    by Luckyo ( 1726890 ) on Tuesday July 20, 2010 @08:27PM (#32972860)

    This is one of the cases where journey matters as much if not more then destination :)

  • by Anonymous Coward on Tuesday July 20, 2010 @08:46PM (#32973012)

    I looked into automatic transcription software too. I think the consensus is that none of it works well unless it is trained, and trying to "train" software with regular recordings of conversations is not likely to work.

    I wrote my own little application so that I could type the text in myself. It works with WinAmp so its tied to windows (Sorry! Time constraints...) From my web page:

    http://csclub.uwaterloo.ca/~jg3macka/GabbleFarb/index.html

    What it is:

    GabbleFarb is basically a glorified notepad application that works with WinAmp (a free audio and video player). A number of hotkey combinations exist to control WinAmp from inside GabbleFarb. As a transcriber, this allows you to easily pause, rewind, fast-forward and control volume levels without leaving the editor. Additionally, as a video or audio file is playing in WinAmp whenever the ENTER key is pressed GabbleFarb will begin the next line with a timestamp of the current playing time. Within the editor, you can then double-click on a line of text in your transcript and GabbleFarb will automatically tell WinAmp to start playback at that point in the file.

  • Re:Dear aunt, (Score:5, Interesting)

    by BitZtream ( 692029 ) on Tuesday July 20, 2010 @10:16PM (#32973550)

    Ironically, I have a family member he runs a business doing transcription for doctors ... because every time the try voice recognition software they get pissed off and go back to real people.

    Being a fan of Dragon Dictate myself, I know its not that great and I know it has a fit when you start throwing accents at it, training or not.

    I call bullshit on your claims of using Dragon for everything.

  • Re:Dear aunt, (Score:3, Interesting)

    by binarybum ( 468664 ) on Tuesday July 20, 2010 @10:28PM (#32973608) Homepage

    wow, shame on the anonymous troll that posted this and the moderators that must have been teleported from the early 90s. The high-end transcription packages are truly incredible. Yes, you need to spend some time training them to your speech patterns and accent, and yes it makes a big difference if you use a quality microphone (not the one that's built into your laptop or iphone) at a fixed distance. With a decent setup transcription software can be really impressive at high speeds and with complicated vocabulary - talk to a doctor in a large modern hospital - many are trusting these systems with their patients medical record information, and these guys have high expectations when it comes to transcriptions because they are used to having very skilled ears listen to them mumbling jargon quickly for their transcriptions.

    Having anything but a really good setup can be really frustrating though - maybe slashdot tinkerers have dabbeled and written these kinds of apps off. I do imagine that it wouldn't be worthwhile using anything but the top dictation apps if you want to avoid any serious post-editing.

  • Doing it yourself... (Score:3, Interesting)

    by Cruciform ( 42896 ) on Tuesday July 20, 2010 @10:34PM (#32973642) Homepage

    When I did some medical transcription a couple of years ago it was up to me to do it myself, and I didn't find anything open source at the time.
    So I loaded up Amarok, configured global hotkeys to pause and jump forward and backward in the audio file in five second gap, and then loaded up a word processor.
    Sure, it's not automatic, but it helped me get the job done.

    It took me 3 to 4 hours to transcribe each spoken hour of a group of strangers. When the subjects have familiar speech patterns or it's an individual I found progress was much faster.

  • Re:CMU Sphinx (Score:3, Interesting)

    by inkyblue2 ( 1117473 ) on Tuesday July 20, 2010 @11:17PM (#32973856)

    Sphinx by itself is a terrible answer to this problem, unfortunately. The code is free, but good luck finding an appropriate model. Worse, you'll need to train a speaker-dependent model to get any usable results, and this is a VERY non-trivial task with Sphinx tools in the state that they are. I spent several years getting paid to adapt Sphinx for commercial purposes and while it's great for some things, I can say with confidence that it is not the tool you're looking for.

    You know what works? Dragon. Hate to say it, but the commercial products here have a gigantic edge on the competition.

    That said, I'd love to see someone come up with an open source speaker-dependent model training system that's friendly enough for app developers (not speech researchers) to roll into projects. I think this is a big open door for contribution to the community. Sphinx isn't the best thing going, but it's certainly usable, and if a real product came into being I'm sure all the speech wonks would start coming out of the woodwork to improve the algorithms.

  • Re:Dear aunt, (Score:2, Interesting)

    by Mr. Pibb ( 26775 ) on Tuesday July 20, 2010 @11:43PM (#32973986)

    I call bullshit on your bullshit.

    I do occasional work for a Worker's Comp doc who has been working with Dragon for over 10 years. He swears by it.
    The work is an hour-long interview, and hours of paperwork. He dictates the report into a MiniDisc recorder while reviewing his notes and then plays the recording back into the computer, watching for errors (few) and reviewing. I've also set up several other docs in the same field with Dragon, and they're quite pleased with it as well.

    At first, he had to buy the latest HW and audio cards to get the best accuracy, but now runs Dragon virtualized on a 1st-gen MacBook without a problem. Dragon FTW!

  • Re:Dear aunt, (Score:3, Interesting)

    by micheas ( 231635 ) on Wednesday July 21, 2010 @03:24AM (#32974760) Homepage Journal

    I can see medical transcriptions being the best point of transcription software.

    The vocabulary is largely devoid of slang.

    You have long specialized lexicons that are similar to very few other words.

    The vocabulary is probably fairly small as most doctors have a fairly specialized practice, so internists don't deal with the same areas as podiatrists, reducing the words that are used.

    The repetition is probably fairly high, allowing for training to be more effective than speech on random topics.

    In conclusion, for what the original poster wants, voice recognition software is probably not viable, but if you have a medical practice, and are not a general practitioner, you may well find that voice recognition software is usable.

  • Re:Dear aunt, (Score:3, Interesting)

    by msclrhd ( 1211086 ) on Wednesday July 21, 2010 @04:02AM (#32974878)

    Your post highlights a key difference between written and spoken words -- we tend to contract words ("have a" to "hav.uh") and will flow one word into another ("said John" the d at the end of said and the d in the dZ sound merge, so the d at the end of said is dropped -- "sE dZ0n").

    Some people drop certain letters at the beginning and end of words -- "'e said 'what 'ave you been doin' today?'". This also makes it more complicated to transcribe. Not to mention regional dialect variations and strong accents.

    Then you have words like "four candles" "fork 'andles", "night train" "night rain" (http://en.wikipedia.org/wiki/Homophones) -- a lot of The Two Ronnies humour stemmed from word play that take advantage of the difference between written and spoken speech and how the audience interprets them (see the Hieroglyph sketch for another classic example). 'Ello 'Ello did a similar thing as well.

  • Transana (Score:3, Interesting)

    by paugq ( 443696 ) <pgquiles@@@elpauer...org> on Wednesday July 21, 2010 @04:47AM (#32975046) Homepage
    It's not what you are asking for, but it sure will help you: Transana [transana.org]
  • by oergiR ( 992541 ) on Wednesday July 21, 2010 @08:57AM (#32976510)
    I'm doing my PhD on speech recognition. I think (and hope!) it's neither dead nor fully developed. Currently, changes of environment screw speech recognisers up. Different speakers, background noise... A trick that I heard has been used for subtitling television broadcasts is to have someone re-speak the words (which is not that hard). You could play the audio recordings on your headphones while repeating them into a microphone. If you're in a quiet room and the recogniser is trained on your voice, that may get you most of the way. You'll still want to correct transcriptions manually.

    I don't know of any good trained open-source speech recognisers. There are open-source back-ends like Sphinx or HTK (which I sort of work on) but you need massive transcribed training corpora to train a speech recogniser. This is expensive which I guess is why open-source speech recognition hasn't taken off. In the speech recognition group at my university, most people use Linux, and I don't think anyone actually uses a speech recogniser in their daily work.

It's a naive, domestic operating system without any breeding, but I think you'll be amused by its presumption.

Working...