Speech Recognition Isn't Dead

Robert Fortner's penned a fascinating post-mortem on speech recognition software. That's right, post-mortem. Because it's apparently very, very dead! Except that it's not.

Fortner's analysis is thorough, and to be fair, he's speaking mostly about the apparently stagnation of speech recognition research and accuracy improvement, not declaring speech recognition dead outright. The thesis:

The accuracy of computer speech recognition flat-lined in 2001, before reaching human levels. The funding plug was pulled, but no funeral, no text-to-speech eulogy followed. Words never meant very much to computers-which made them ten times more error-prone than humans. Humans expected that computer understanding of language would lead to artificially intelligent machines, inevitably and quickly. But the mispredicted words of speech recognition have rewritten that narrative. We just haven't recognized it yet.

The evidence, to irresponsibly summarize, runs something like this:

• The basic techniques for machine speech recognition haven't truly changed, well, ever
• Speech recognition word error rate fell precipitously from the early nineties to around 2001, but has plateaued at somewhere around 20%
• It's proven difficult to formalize grammar rules in a way that computers can understand and use, leaving speech recognition dependent nearly entirely on interpreting sounds, not context

Anyway, just give it a read. When you do, though, see if you notice what's missing. (Hint: It's in your pocket, probably.)

Fortner makes fleeting references to mobile and phone-centric speech recognition, and that current technology is actually powerful enough to deal with the kinds of input needed for call center phone tree navigation, voice dialing and whatnot. What's conspiciously missing is the speech recognition we're seeing more and more every day: Mobile! Phones. Mobile phones.

Android has it, and Google has taken it to other platforms; Apple appears to be very interested in expanding voice search on their phones, and not just simple, one or two word queries. Apps from the very companies Fortner implies are waning (Dragon Dictation, for one) have proven extremely popular (and impressive) on the iPhone. The implicit assumption in the piece is that if desktop speech-to-text is on the wane—and it's pretty clear that it is—then so follows the entire dream of talking to a computer, period. That's where I disagree. In failing to find a place on the desktop, speech recognition has been forced to a place—mobile—where long-form dictation isn't as vital, and where its uses are much, much wider. I may never be able to tell my PC exactly what to do, but I won't really care—I'll be too busy talking with my phone. [Robert Fortner via Techmeme]