John Seabrook wrote a recent feature in The New Yorker about interactive-voice-response systems (I.V.R.) commonly used with customer service and tech support telephone hotlines. Seabrook spent time at B.B.N. Technologies watching these systems transcribe callers' words and analyzing the tone of voice for emotions present. While breaking down the history of automated telephone services and voice recognition innovations, he attempts to tackle the larger question of whether or not we can create a fully conversational, quasi-conscious robot, akin to 2001: A Space Odyssey's Hal 9000. Judging from the number of experts interviewed for the piece, the answer is a resounding no.
- While machines that could accurately reproduce the sound of human speech, such as Wolfgang von Kempelen's talking head, have been around since the late 1700s, no device has been able to learn the syntactical rules necessary for generating conversation.
- Secondly, the act of hearing and interpreting is more difficult to instill in a machine because of the on-the-fly signal processing that would be required. The complexity of the ear allows it to pick up on the most subtle nuances in sound (according to the article, people can distinguish between hot and cold coffee just by hearing it poured into a glass.
- Roger Schank is a philosopher-programmer who has spent his professional live trying to create a conscious computer that not only has a memory, but can also learn. After years in the field, Schank is skeptical it will ever happen. He says replicating idle chatter and the sheer complexity of speech in general is beyond the abilities of current scientists.
- Steven Pinkner, a Harvard cognitive scientist, says that natural speech could rely on the breadth of one's knowledge, which is "extraordinarily difficult" to endow to a computer.
- R&D efforts in speech recognition began in the 1950s and '60s, but researchers are still hung up on the number of ways to communicate the word yes. Speech engineers for Nuance found that Southerners in the U.S. tend to add "sir" or "ma'am" to responses where as Northerners do not. And "Valley Girl" speak tends to make computers interpret declarative statements as questions.
- Finding it difficult to make a computer able to "learn," scientists turned to brute-force computing and algorithms that relied upon mass amounts of data. But in 1969, high-ranking Bell Labs staffer John Pierce wrote that a speech machine that could recognize, but not understand, was utterly pointless.
- The big emphasis on speech recognition has now moved to emotional analysis, which still uses algorithms to estimate a caller's state of mind. Stanford researcher Elizabeth Shriberg says its impossible to compare emotions in acted speech to emotions in real speech. The escalation of anger, for example, happens in smaller, more subtle increments with authentic speech.
- The most promising breakthrough in emotional recognition is an agression detector that has been deployed through out parts of Europe. Sound Intelligence were able to recreate the processes of the inner ear on a computer, which spawned a device that could learn the sounds of different objects in action and identify them. The Dutch city of Groningen has placed this technology in its pubs, where if it detects excessively aggressive speech in the pub, it will alert the nearest police station. But as Seabrook comments, "This is no HAL."
- Other research labs, like the Speech Analysis and Interpretation Laboratory, have turned to facial recognition to glean emotional insight, but have come up dry. "Emotions aren't discrete," lab chief Shrikanth Narayanan told the New Yorker. "They are a continuum, and it isnt clear to any one perceiver where one emotion ends and the other begins." To add insult to injury, there hasn't been any real demand for emotional recognition outside the call center arena.
So while we might not ever see a robot become a Nobel Laureate, there is one lession to be learned from this New Yorker piece — never talk freely while on hold with customer service. Seabrook learned while at B.B.N. Technologies that they still record the call while you're on hold to assess your emotional state. After a profanity-laced tirade, one annoyed caller took a couple of hits from his bong, waited a little longer, and hung up. [The New Yorker]