Scientists decode brain waves to eavesdrop on what we hear

January 31, 2012

An X-ray CT scan of the head of one of the volunteers, showing electrodes distributed over the brain’s temporal lobe, where sounds are processed. (Credit: Adeen Flinker, UC Berkeley)

Neuroscientists may one day be able to hear the imagined speech of a patient unable to speak due to stroke or paralysis, according to University of California, Berkeley researchers.

These scientists have succeeded in decoding electrical activity in the brain’s temporal lobe — the seat of the auditory system — as a person listens to normal conversation. Based on this correlation between sound and brain activity, they then were able to predict the words the person had heard solely from the temporal lobe activity.

Audio file for original and reconstructed words

“This is huge for patients who have damage to their speech mechanisms because of a stroke or Lou Gehrig’s disease and can’t speak,” said co-author Robert Knight, a UC Berkeley professor of psychology and neuroscience. “If you could eventually reconstruct imagined conversations from brain activity, thousands of people could benefit.”

“This research is based on sounds a person actually hears, but to use it for reconstructing imagined conversations, these principles would have to apply to someone’s internal verbalizations,” cautioned first author Brian N. Pasley, a post-doctoral researcher in the center.

“There is some evidence that hearing the sound and imagining the sound activate similar areas of the brain. If you can understand the relationship well enough between the brain recordings and sound, you could either synthesize the actual sound a person is thinking, or just write out the words with a type of interface device.”

pasley1

Subjects listened to words (acoustic waveform, top left) while neural signals were recorded from cortical surface electrode arrays (top right, red circles) implanted over superior and middle temporal gyrus (STG, MTG). Speech-induced cortical field potentials (bottom right, gray curves) recorded at multiple electrode sites were used to fit multi-input, multi-output models for offline decoding. The models take as input time-varying neural signals at multiple electrodes and output a spectrogram consisting of time-varying spectral power across a range of acoustic frequencies (180-7000 Hz, bottom left). To assess decoding accuracy, the reconstructed spectrogram is compared to the spectrogram of the original acoustic waveform. (Credit: Brian N. Pasley)

In addition to the potential for expanding the communication ability of the severely disabled, he noted, the research also “is telling us a lot about how the brain in normal people represents and processes speech sounds.”

They enlisted the help of people undergoing brain surgery to determine the location of intractable seizures so that the area can be removed in a second surgery. Neurosurgeons typically cut a hole in the skull and safely place electrodes on the brain surface or cortex — in this case, up to 256 electrodes covering the temporal lobe — to record activity over a period of a week to pinpoint the seizures. For this study, 15 neurosurgical patients volunteered to participate.

Pasley visited each person in the hospital to record the brain activity detected by the electrodes as they heard 5 to 10 minutes of conversation. Pasley used this data to reconstruct and play back the sounds the patients heard. He was able to do this because there is evidence that the brain breaks down sound into its component acoustic frequencies — for example, between a low of about 1 Hz (cycles per second) to a high of about 8,000 Hz — that are important for speech sounds.

. pasleys7

Spectrograms of the original and reconstructed words. For audio playback, the spectrogram or modulation representations must be converted to an acoustic waveform, a transformation that requires both magnitude and phase information. Because the reconstructed representations are magnitude-only, the phase must be estimated. (Credit: Brian N. Pasley)

Audio file for original and reconstructed words

Pasley tested two different computational models to match spoken sounds to the pattern of activity in the electrodes. The patients then heard a single word, and Pasley used the models to predict the word based on electrode recordings.

“We are looking at which cortical sites are increasing activity at particular acoustic frequencies, and from that, we map back to the sound,” Pasley said. He compared the technique to a pianist who knows the sounds of the keys so well that she can look at the keys another pianist is playing in a sound-proof room and “hear” the music, much as Ludwig van Beethoven was able to “hear” his compositions despite being deaf.

The better of the two methods was able to reproduce a sound close enough to the original word for Pasley and his fellow researchers to correctly guess the word.

“We think we would be more accurate with an hour of listening and recording and then repeating the word many times,” Pasley said. But because any realistic device would need to accurately identify words heard the first time, he decided to test the models using only a single trial.

“This research is a major step toward understanding what features of speech are represented in the human brain” Knight said. “Brian’s analysis can reproduce the sound the patient heard, and you can actually recognize the word, although not at a perfect level.”

Knight predicts that this success can be extended to imagined, internal verbalizations, because scientific studies have shown that when people are asked to imagine speaking a word, similar brain regions are activated as when the person actually utters the word.

“With neuroprosthetics, people have shown that it’s possible to control movement with brain activity,” Knight said. “But that work, while not easy, is relatively simple compared to reconstructing language. This experiment takes that earlier work to a whole new level.”

The current research builds on work by other researchers about how animals encode sounds in the brain’s auditory cortex. In fact, some researchers, including the study’s coauthors at the University of Maryland, have been able to guess the words ferrets were read by scientists based on recordings from the brain, even though the ferrets were unable to understand the words.

The ultimate goal of the UC Berkeley study was to explore how the human brain encodes speech and determine which aspects of speech are most important for understanding.

“At some point, the brain has to extract away all that auditory information and just map it onto a word, since we can understand speech and words regardless of how they sound,” Pasley said. “The big question is, What is the most meaningful unit of speech? A syllable, a phone, a phoneme? We can test these hypotheses using the data we get from these recordings.”

Chang and Knight are members of the Center for Neural Engineering and Prostheses, a joint UC Berkeley/UCSF group focused on using brain activity to develop neural prostheses for motor and speech disorders in disabling neurological disorders.

Ref.: Brian N. Pasley et al., Reconstructing speech from human auditory cortex, PLoS Biology, Jan. 31, 2012 [link]