In followup tests, a panel of several hundred native English speakers were recruited to decipher the synthesized speech. The panelists were given a pool of words to choose from and told to select the best match. In tests, around 70 percent of the words were correctly transcribed. Encouragingly, many of the missed words were close approximations, such as mistaking “rodent” for “rabbit,” as an example.

Advertisement

“We still have a ways to go to perfectly mimic spoken language,” Josh Chartier, a co-author of the new study, said in a statement. “We’re quite good at synthesizing slower speech sounds like ‘sh’ and ‘z’ as well as maintaining the rhythms and intonations of speech and the speaker’s gender and identity, but some of the more abrupt sounds like ‘b’s and ‘p’s get a bit fuzzy. Still, the levels of accuracy we produced here would be an amazing improvement in real-time communication compared to what’s currently available.”

As noted, the system is designed for patients who have lost the capacity for speech. At the press conference held yesterday, study co-author Edward Chang said it remains “an open question” as to whether the system could be used by people who have never been able to talk, such as people with cerebral palsy. It’s “something that will have to be studied in the future,” he said, “but we’re hopeful,” adding that, “speech would have to be learned from the bottom up.”

Advertisement

A major limitation of this virtual vocal tract is the need for brain surgery and cranial implants to customize the system for each person. For the foreseeable future, this will have to remain invasive, as no technological devices currently exist that are capable of collecting the required resolution outside of the brain.

“This study represents an important step toward the actualization of speech neuroprosthesis technologies,” Mesgarani, who wasn’t involved with the new research, told Gizmodo in an email. “One of the main barriers for such devices has been the low intelligibility of the synthesized sound. Using the latest advances in machine learning methods and speech synthesis technologies, this study and ours show a significant improvement in the intelligibility of the decoded speech. What approach will ultimately prove better for decoding the imagined speech condition remains to be seen, but it is likely that a hybrid of the two may be the best.”

Advertisement

Indeed, an exciting aspect of this field is the rapid rate of development and the application of different techniques. As Mesgarani correctly pointed out, it’s possible that multiple approaches could be combined into a single system, potentially leading to more accurate speech results.

As a final, speculative aside, these brain-computer interfaces could conceivably be used one day to produce a form of technologically enabled telepathy, or mind-to-mind communication. Imagine, for example, a device like the one developed by the UC San Francisco researchers, but with the speech synthesizer hooked directly to a receiving person’s auditory cortex, similar to a cochlear implant (the auditory cortex is associated with hearing). With the two elements linked together over wireless, two interconnected people could theoretically communicate just by silently miming (or imagining the movements of miming) speech—they would hear each other’s words, but no one else would.

Advertisement

But I’m getting ahead of this latest research. Most importantly, the new system could eventually be used to help patients with ALS, multiple sclerosis, stroke, and traumatic brain injury regain clearer speech. And, as the researchers suggested, it could potentially even give a voice to individuals who never had the capacity for speech.