Communicating with machines: What the next generation of speech recognizers will do

February 16, 2004 | Source: KurzweilAI

“If we want to communicate with a machine as we would with a human, the basic assumptions underlying today’s automated speech recognition systems are wrong,” said former AT&T Bell Labs scientist B.H. “Fred” Juang, now professor in the School of Electrical and Computer Engineering at the Georgia Institute of Technology, speaking at the annual meeting of the American Association for the Advancement of Science.

“To have real human-machine communication, the machine must be able to detect the intention of the speaker by compiling all the linguistic cues in the acoustic wave. That’s much more difficult than what the existing technology was designed to do: convert speech to text.”

In the real world, human speech mixes with noise — which may include the speech of another person. Speaking pace varies, and people group words in unpredictable ways while peppering their conversations with “ums” and “ahs.”

Speech researchers chose mathematical algorithms known as Hidden Markov Models to match sounds to words and place them into grammatical outlines. That system has performed well for simple tasks, but often produces errors that make the result of speech-to-text conversion difficult for humans to understand — and even worse for natural human-machine communication.

The next generation of speech communications technology will require new mathematical algorithms that will go beyond the Hidden Markov Models, he believes. Researchers at university and corporate research labs worldwide have already begun working on the problem.

Georgia Institute of Technology Research News