Who did you hear, me or your lying eyes?

(With apologies to Richard Pryor)
September 8, 2013

Our understanding of language may depend more heavily on vision than previously thought, University of Utah bioengineers have discovered.

Video credit: Three Gun Rose Productions

What did you hear?

“For the first time, we were able to link the auditory signal in the brain to what a person said they heard when what they actually heard was something different. We found vision is influencing the hearing part of the brain to change your perception of reality — and you can’t turn off the illusion,” says the new study’s first author, Elliot Smith, a bioengineering and neuroscience graduate student at the University of Utah.

“People think there is this tight coupling between physical phenomena in the world around us and what we experience subjectively, and that is not the case.”

The McGurk effect

The brain considers both sight and sound when processing speech. However, if the two are slightly different, visual cues dominate sound. This phenomenon is named the McGurk effect for Scottish cognitive psychologist Harry McGurk, who pioneered studies on the link between hearing and vision in speech perception in the 1970s. The McGurk effect has been observed for decades. However, its origin has been elusive.

Temporal lobe recordings to test the McGurk effect (credit: Department of Neurosurgery, University of Utah)

In the new study in the open-access journal PLOS ONE, the University of Utah team pinpointed the source of the McGurk effect by recording and analyzing brain signals in the temporal cortex, the region of the brain that typically processes sound.

The researchers recorded electrical signals from the brain surfaces of four epileptic adult volunteers who were undergoing surgery to treat their epilepsy.

These four test subjects were then asked to watch and listen to videos focused on a person’s mouth as they said the syllables “ba,” “va,” “ga” and “tha.” Depending on which of three different videos were being watched, the patients had one of three possible experiences as they watched the syllables being mouthed:

— The motion of the mouth matched the sound. For example, the video showed “ba” and the audio sound also was “ba,” so the patients saw and heard “ba.”

— The motion of the mouth obviously did not match the corresponding sound, like a badly dubbed movie. For example, the video showed “ga” but the audio was “tha,” so the patients perceived this disconnect and correctly heard “tha.”

— The motion of the mouth only was mismatched slightly with the corresponding sound. For example, the video showed “ba” but the audio was “va,” and patients heard “ba” even though the sound really was “va.” This demonstrates the McGurk effect — vision overriding hearing.

By measuring the electrical signals in the brain while each video was being watched, the researchers could pinpoint whether auditory or visual brain signals were being used to identify the syllable in each video. When the syllable being mouthed matched the sound or didn’t match at all, brain activity increased in correlation to the sound being watched.

However, when the McGurk effect video was viewed, the activity pattern changed to resemble what the person saw, not what they heard. Statistical analyses confirmed the effect in all test subjects.

“This innovation gives us a more complete picture about how everyday language perception is carried out in the brain,” Smith told KurzweilAI. “This tells us that for cortical brain machine interfaces, information from many senses could be useful, and processed in areas of the brain we didn’t know could process information from those senses.”

The researchers suggest that these findings suggest artificial hearing devices and speech-recognition software could benefit from a camera, not just a microphone. They could also help researchers sort out how language processing goes wrong when visual and auditory inputs are not integrated correctly, such as in dyslexia.

This study also raises an interesting question: do clever manipulators (politicians, preachers, salespeople, etc.) take advantage of the McGurk effect (consciously or not)? One way to test that might be to listen to them without video, then repeat with video and see if your perception of some words changes.

The video shows fricative (f or v in this case) overriding plosive (b or p in this case) consonants. What about other types of consonants, vowels (which also have different lip configurations) and semivowels? What about other languages than English?

Equipment used for these experiments was purchased as part of a Defense Advance Research Project Agency project to control and communicate with prostheses, in particular for recording signals to control a prosthetic limb. Additional funding for the study was provided by the University of Utah Research Foundation.