Forgot your password?

Ask Slashdot: Effective, Reasonably Priced Conferencing Speech-to-Text? 81

Posted by samzenpus
from the keep-talking dept.
First time accepted submitter DeafScribe writes "Every year during the holidays, many people in the deaf community lament the annual family gathering ritual because it means they sit around bored while watching relatives jabber. This morning, I had the best one-on-one discussion with my mother in years courtesy of her iPhone and Siri; voice recognition is definitely improving. It would've been nice if conference-level speech-to-text had been available this evening for the family dinner. So how about it? Is group speech to text good enough now, and available at reasonable cost for a family dinner scenario?"
This discussion has been archived. No new comments can be posted.

Ask Slashdot: Effective, Reasonably Priced Conferencing Speech-to-Text?

Comments Filter:
  • captions (Score:5, Insightful)

    by phantomfive (622387) on Monday December 30, 2013 @03:13PM (#45820779) Journal
    Go find some youtube videos with auto-captioning. That is the upper-limit on the quality you will get with today's technology.

    Good luck.
  • by TWX (665546) on Monday December 30, 2013 @03:19PM (#45820863)

    video transcribers also quite expensive

    Based on what I get on my TV when I press the Mute button, they really shouldn't be...

  • by jettoblack (683831) on Monday December 30, 2013 @03:40PM (#45821103)

    There's no perfect solution, but something that works for 60% might already be better than nothing.

    I work in the closed captioning industry, and I'd say anything less than 95% accuracy is actually WORSE than nothing. Automatic Speech Recognition (ASR) has no concept of context or situational awareness. The mistakes they make tend to be not in the simple common words and phrases, but concentrated in the nouns, especially proper nouns: names of people, places, companies, products, etc. Even at 80% accuracy, which is quite good for the current best speaker independent ASR systems, you're looking at 2 words out of every 10 being substituted with the wrong word, completely changing the meaning of the phrases. Imagine the chaos if (major news network)'s closed captioning reported some celebrity or politician as saying "I'm not a fan of Jews." when they actually said "I'm not a fan of juice." (Which would be 83% accurate!) Wars have been started for one misheard word out of a thousand; imagine how bad 200 out of 1000 would be.

    Here's an article about a HUMAN transcription error that caused a pretty major ruckus. Now imagine this kind of problem being an order of magnitude worse:,,20693447,00.html []

    People who lost hearing later in life tend to do better with high error rate ASR because they know what words sound like and can figure out easy substitutions, e.g. Juice vs. Jews, Election vs. Erection, etc., but people who were born deaf or lost hearing before language acquisition cannot easily make these substitutions in their head because they don't "hear" the word sounds when they read them.

Entropy requires no maintenance. -- Markoff Chaney