Vocal Tract Simulator Translates a Person's Brain Activity Into Clear Sentences

By capturing brain signals associated with the mechanical aspects of speaking, such as movements of the jaw, lips, and tongue, researchers have created a virtual, computer-based vocal tract capable of intelligible speech. The system could eventually be used by people who have lost the capacity to speak.

Conventional speech-generating devices, like the one used by the late Stephen Hawking, typically use nonverbal movements, such as twitches of the eyes or head, to produce words. Users have to spell each word letter by letter, which takes time and effort. At best, these assistive devices produce words at rates between six and 10 words per minute—a far cry from natural speech, which produces around 100 to 150 words per minute.

For people who have lost the capacity to speak, whether it be from Parkinson’s, ALS, stroke, or other brain injury, conventional speech-generating devices are good but not great. In an effort to create something more efficient, a research team led by Gopala Anumanchipalli from the University of California San Francisco developed a system that simulates the mechanical aspects of verbal speech by tapping directly into the brain.

The system collects and maps brain signals that trigger movements of the jaw, larynx, lips, and tongue. A computer then decodes these signals to produce clear sentences with a speech synthesizer. At a press conference held yesterday, the researchers described the new device as a “virtual vocal tract.” The details of this work were published today in Nature.

https://gizmodo.com/neuroscientists-translate-brain-waves-into-recognizable-1832155006

This latest speech-generating device is the second to appear this year that uses brain signals to produce speech. Back in January, a team led by neuroscientist Nima Mesgarani from Columbia University created a system that captures a person’s responses to auditory speech, which was then decoded by machine learning to produce synthesized speech. The approach taken by the UC San Francisco researchers is a bit different. It also taps into brain signals, but instead of decoding auditory speech, it decodes brain signals responsible for verbal speech.

Importantly, neither system collects a person’s covert, or imagined, speech—the words we say to ourselves inside our heads. Current science and technology are nowhere near that level of sophistication. These new approaches still utilize brain signals, but those having to do with neural activity in the sensory cortex (speech perception, as in the Mesgarani system) or neural activity in the motor cortex (speech production, as in the new device).

Lead author Gopala Anumanchipalli holding an intracranial electrode of the type used to record brain activity in the new study. Image: (UCSF)

To create the virtual vocal tract, Anumanchipalli and his colleagues recruited five patients who were scheduled to undergo brain surgery to treat their epilepsy. None of the participants had issues with producing verbal speech, and all were native English speakers. Brain surgeons implanted electrode arrays directly onto their brains, specifically the areas associated with language production. The patients then spoke several hundred sentences aloud, while the researchers recorded the associated cortical activity.

Over the months that followed, this data was decoded and linked to specific movements of the vocal tract. In a way, the researchers reverse-engineered the mechanics of verbal speech, mapping the various ways sounds are produced, for example, by the tongue on the roof of the mouth or the tightening of the vocal cords. A machine-learning algorithm decoded these signals, enabling an intelligent speech synthesizer to convert and express the signals as audible speech. The result was a computer-based, virtual vocal tract that—in theory—could be controlled by brain activity.

To turn theory into action, the researchers then tested the system on a volunteer who was hooked up to the system—intracranial electrodes and all. The person was instructed to both talk out loud and to mime, or mouth, verbal speech without uttering a sound. The latter method, known as subvocal speech, was done to simulate a person who has lost the capacity for speech, but is still familiar with the mechanical aspects of talking. Fed with this data, the virtual vocal tract was able to produce verbal speech with surprising clarity. Both methods resulted in intelligible speech, though verbal speech performed a bit better than the subvocal speech.

In followup tests, a panel of several hundred native English speakers were recruited to decipher the synthesized speech. The panelists were given a pool of words to choose from and told to select the best match. In tests, around 70 percent of the words were correctly transcribed. Encouragingly, many of the missed words were close approximations, such as mistaking “rodent” for “rabbit,” as an example.

“We still have a ways to go to perfectly mimic spoken language,” Josh Chartier, a co-author of the new study, said in a statement. “We’re quite good at synthesizing slower speech sounds like ‘sh’ and ‘z’ as well as maintaining the rhythms and intonations of speech and the speaker’s gender and identity, but some of the more abrupt sounds like ‘b’s and ‘p’s get a bit fuzzy. Still, the levels of accuracy we produced here would be an amazing improvement in real-time communication compared to what’s currently available.”

As noted, the system is designed for patients who have lost the capacity for speech. At the press conference held yesterday, study co-author Edward Chang said it remains “an open question” as to whether the system could be used by people who have never been able to talk, such as people with cerebral palsy. It’s “something that will have to be studied in the future,” he said, “but we’re hopeful,” adding that, “speech would have to be learned from the bottom up.”

A major limitation of this virtual vocal tract is the need for brain surgery and cranial implants to customize the system for each person. For the foreseeable future, this will have to remain invasive, as no technological devices currently exist that are capable of collecting the required resolution outside of the brain.

“This study represents an important step toward the actualization of speech neuroprosthesis technologies,” Mesgarani, who wasn’t involved with the new research, told Gizmodo in an email. “One of the main barriers for such devices has been the low intelligibility of the synthesized sound. Using the latest advances in machine learning methods and speech synthesis technologies, this study and ours show a significant improvement in the intelligibility of the decoded speech. What approach will ultimately prove better for decoding the imagined speech condition remains to be seen, but it is likely that a hybrid of the two may be the best.”

Indeed, an exciting aspect of this field is the rapid rate of development and the application of different techniques. As Mesgarani correctly pointed out, it’s possible that multiple approaches could be combined into a single system, potentially leading to more accurate speech results.

As a final, speculative aside, these brain-computer interfaces could conceivably be used one day to produce a form of technologically enabled telepathy, or mind-to-mind communication. Imagine, for example, a device like the one developed by the UC San Francisco researchers, but with the speech synthesizer hooked directly to a receiving person’s auditory cortex, similar to a cochlear implant (the auditory cortex is associated with hearing). With the two elements linked together over wireless, two interconnected people could theoretically communicate just by silently miming (or imagining the movements of miming) speech—they would hear each other’s words, but no one else would.

But I’m getting ahead of this latest research. Most importantly, the new system could eventually be used to help patients with ALS, multiple sclerosis, stroke, and traumatic brain injury regain clearer speech. And, as the researchers suggested, it could potentially even give a voice to individuals who never had the capacity for speech.

Vocal Tract Simulator Translates a Person’s Brain Activity Into Clear Sentences

Sign up for our newsletters

Latest news

Watch These Orcas Smash a Giant Dead Fish to Pieces—Just for Kicks

Musk Leaves Room For a Potential SpaceX-Tesla Merger

Medicaid and Food Stamps Are Increasingly Paired With Amazon Gig Economy Jobs, Federal Report Shows

Darren Aronofsky’s Company That Makes Very Bad AI Movies Just Raised a Ton of Investor Money

Google Free Cash Flow Turns Negative Due to Massive AI Spend

‘Alice in Borderland: Retry’ Makes Up For the Netflix Series Reheating ‘Squid Game’ Nachos

This Startup Wants to Buy a Company Just to See What an AI CEO Does With It

New ‘End of Oak Street’ Trailer Brings ‘It Follows’ Magic to Dinosaur-Infested Suburbia

Latest Reviews

‘Splatoon Raiders’ Isn’t What the Switch 2 Needs Right Now

Alienware AW3426DW Review: Gaming Monitors Get Thrown a Curveball

Anker Solix S2000 Review: The Little 2kWh Battery That Could

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

Related Articles

Vocal Tract Simulator Translates a Person’s Brain Activity Into Clear Sentences

Sign up for our newsletters

Watch These Orcas Smash a Giant Dead Fish to Pieces—Just for Kicks

Musk Leaves Room For a Potential SpaceX-Tesla Merger

Medicaid and Food Stamps Are Increasingly Paired With Amazon Gig Economy Jobs, Federal Report Shows

Darren Aronofsky’s Company That Makes Very Bad AI Movies Just Raised a Ton of Investor Money

Google Free Cash Flow Turns Negative Due to Massive AI Spend

‘Alice in Borderland: Retry’ Makes Up For the Netflix Series Reheating ‘Squid Game’ Nachos

This Startup Wants to Buy a Company Just to See What an AI CEO Does With It

New ‘End of Oak Street’ Trailer Brings ‘It Follows’ Magic to Dinosaur-Infested Suburbia

‘Splatoon Raiders’ Isn’t What the Switch 2 Needs Right Now

Alienware AW3426DW Review: Gaming Monitors Get Thrown a Curveball

Anker Solix S2000 Review: The Little 2kWh Battery That Could

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

Related Articles

Back to School: The 8 Best Alternatives to Buying a TV

The Best Budget Laptops Under $1,000 for Back to School

The Best Tech to Level Up Summer 2026

Meta’s AI Is Getting Better at Reading Your Thoughts—Without Cracking Open Your Skull

Don’t Be Afraid of Self-Improving AI, Says a16z-Backed Startup Mirendil

Your Brain Has Separate Circuits for Belly Laughs and Polite Chuckles