High-speed MRI technique captures complex vocal movements at 100 frames per second

Demonstrated with Wizard of Oz song "If I Only Had a Brain"
April 22, 2015


Beckman Institute | New Super-Fast MRI Technique: Singing ‘If I Only Had a Brain’

Scientists at the University of Illinois Beckman’s Biomedical Imaging Center (BIC) have developed a real-time magnetic resonance imaging (MRI) technique capable of showing dynamic images of vocal movement at 100 frames per second — the fastest MRI speed in the world, according to the scientists.

“Typically, MRI is able to acquire maybe 10 frames per second or so, but we are able to scan 100 frames per second, without sacrificing the quality of the images,” said Brad Sutton, technical director of the BIC and associate professor in bioengineering at Illinois.

The researchers published their technique in the journal Magnetic Resonance in Medicine.

“The technique excels at high spatial and temporal resolution of speech—it’s both very detailed and very fast. Often you can have only one of these in MR imaging,” said Sutton. “We have designed a specialized acquisition method that gathers the necessary data for both space and time in two parts and then combines them to achieve high-quality, high-spatial resolution, and high-speed imaging.”

To combine the dynamic imaging with the audio, the researchers use a noise-cancelling fiber-optic microphone to pull out the voice, and then align the audio track with the imaging.

Powerful new tool for voice studies

The dynamic imaging is especially useful in studying how rapidly the tongue is moving, along with around 100 different muscles in the head and neck used during speech and singing. It turns out that to capture these articulation movements requires 100 frames per second, according to Aaron Johnson, affiliate faculty member in the Bioimaging Science and Technology Group at the Beckman Institute and assistant professor in speech and hearing science at Illinois.

He is also the singer in the video above, which demonstrates the technique. After 10 years of working as a professional singer in Chicago choruses, Johnson’s passion for vocal performance progressed into research to understand the voice and its neuromuscular system, with a particular interest in the aging voice.

“The neuromuscular system and larynx change and atrophy as we age, and this contributes to a lot of the deficits that we associate with the older voice, such as a weak, strained, or breathy voice,” Johnson said. “I’m interested in understanding how these changes occur, and if interventions, like vocal training, can reverse these effects. In order to do this, I need to look at how the muscles of the larynx move in real time.”

With a recent K23 Career Development Award from the National Institutes of Health (NIH), Johnson is investigating whether group singing training with older adults in residential retirement communities will improve the structure of the larynx, giving the adults stronger, more powerful voices. This research relies on pre- and post-data of laryngeal movement collected with the MRI technique.


Abstract of High-resolution dynamic speech imaging with joint low-rank and sparsity constraints

Purpose: To enable dynamic speech imaging with high spatiotemporal resolution and full-vocal-tract spatial coverage, leveraging recent advances in sparse sampling.

Methods: An imaging method is developed to enable high-speed dynamic speech imaging exploiting low-rank and sparsity of the dynamic images of articulatory motion during speech. The proposed method includes: (a) a novel data acquisition strategy that collects spiral navigators with high temporal frame rate and (b) an image reconstruction method that derives temporal subspaces from navigators and reconstructs high-resolution images from sparsely sampled data with joint low-rank and sparsity constraints.

Results: The proposed method has been systematically evaluated and validated through several dynamic speech experiments. A nominal imaging speed of 102 frames per second (fps) was achieved for a single-slice imaging protocol with a spatial resolution of 2.2 × 2.2 × 6.5 mm3. An eight-slice imaging protocol covering the entire vocal tract achieved a nominal imaging speed of 12.8 fps with the identical spatial resolution. The effectiveness of the proposed method and its practical utility was also demonstrated in a phonetic investigation.

Conclusion: High spatiotemporal resolution with full-vocal-tract spatial coverage can be achieved for dynamic speech imaging experiments with low-rank and sparsity constraints. Magn Reson Med 73:1820–1832, 2015. © 2014 Wiley Periodicals, Inc.