This Artificially Intelligent Speech Generator Can Fake Anyone’s Voice

The human voice, with all its subtlety and nuance, is proving to be an exceptionally difficult thing for computers to emulate. Using a powerful new algorithm, a Montreal-based AI startup has developed a voice generator that can mimic virtually any person’s voice, and even add an emotional punch when necessary. The system isn’t perfect, but it heralds a future when voices, like photos, can be easily faked.

When Siri, Alexa, or our GPS talk to us, it’s fairly obvious that we’re being spoken to by a machine. That’s because virtually every text-to-speech system on the market relies on a pre-recorded set of words, phrases, and utterances (recorded from voice actors), which are then strung together in Frankenstein-like fashion to produce complete words and sentences. The end result is a vocal delivery that sounds distinctly uninspiring, robotic, and at times laughable. This approach to voice synthesis also means that we’re stuck listening to the same pre-recorded, monotonous voice over and over again.

In an effort to inject some life in the automated voices that come out of our apps, AI startup Lyrebird has developed a voice-imitation algorithm that can mimic any person’s voice, and read any text with a predefined emotion or intonation. Incredibly, it can do this after analyzing just a few dozen seconds of pre-recorded audio. In an effort to promote its new tool, Lyrebird produced several audio samples using the voices of Barack Obama, Donald Trump, and Hillary Clinton.

Lyrebird’s demos also showcase the virtually unlimited catalog of voices, and the system’s ability to articulate the same sentence with different intonations.

https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/playlists/317413750

This is all made by possible through the use of artificial neural networks, which function in a manner similar to the biological neural networks in the human brain. Essentially, the algorithm learns to recognize patterns in a particular person’s speech, and then reproduce those patterns during simulated speech.

“We train our models on a huge dataset with thousands of speakers,” Jose Sotelo, a team member at Lyrebird and a speech synthesis expert, told Gizmodo. “Then, for a new speaker we compress their information in a small key that contains their voice DNA. We use this key to say new sentences.”

The end result is far from perfect—the samples still exhibit digital artifacts, clarity problems, and other weirdness—but there’s little doubt who is being imitated by the speech generator. Changes in intonation are also discernible. Unlike other systems, Lyrebird’s solution requires less data per speaker to produce a new voice, and it works in real time. The company plans to offer its tool to companies in need of speech synthesis solutions.

“We are currently raising funds and growing our engineering team,” said Sotelo. “We are working on improving the quality of the audio to make it less robotic, and we hope to start beta testing soon.”

Needless to say, this form of speech synthesis introduces a host of ethical problems and and security concerns. Eventually, a refined version of this system could replicate a person’s voice with incredible accuracy, making it virtually impossible for a human listener to discern the original from the emulation. The day is coming when vocal speech, like an image processed in Photoshop, can be manipulated without our knowing. Unscrupulous individuals could fake a speech by a prominent politician, adding yet another layer to the emerging post-truth environment. Hackers could use speech synthesis for social engineering, fooling even the most careful security experts. The possibilities are almost endless.

These potentially adverse impacts are not lost on Lyrebird, which argues that the era in which we can trust audio recordings is on the verge of coming to an end.

“We take seriously the potential malicious applications of our technology,” Sotelo told Gizmodo. “We want this technology to be used for good purposes: giving back the voice to people who lost it to sickness, being able to record yourself at different stages in your life and hearing your voice later on, etc. Since this technology could be developed by other groups with malicious purposes, we believe that the right thing to do is to make it public and well-known so we stop relying on audio recordings [as evidence].”

No doubt, we’ll have to start second-guessing audio recordings of speech soon, but solutions could also be developed to ascertain the authenticity of vocal recordings. Humans may be fooled by such systems, but computers will not be—at least, not for a while. When analyzing the waveform, or frequencies, of human speech, a high resolution recording can yield a tremendous amount of data for a computer to analyze. It will be a long, long time before a speech synthesis program can replicate every single aspect of a person’s distinctive speech, like the finer details of vocal timbre (i.e. the quality of speech), and mouth noises such as breathing, tongue sounds, and lip smacking, to the point where even a machine can’t detect the difference. There are other aspects of a recording to consider as well. For instance, the absence of background noises, the presence of a faked acoustic space, or artificially introduced ambient sounds should be easily detectable by a machine designed for the task.

Eventually, however, a speech synthesis program may be able to fake all of these things, at which point, our ability to discern truth from fabrication will be put to the test.

[Lyrebird via Scientific American]

This Artificially Intelligent Speech Generator Can Fake Anyone’s Voice

Sign up for our newsletters

Latest news

We’ve Seen Doctor Doom in Action for ‘Avengers: Doomsday’

We’ve Seen the First 18 Minutes of ‘The Mandalorian and Grogu’

The New ‘Mandalorian and Grogu’ Trailer Amps Up the Nostalgia

Hollywood’s First Big Budget AI-Generated Movie Is About Bitcoin, of Course

White House Is Reportedly Ready to Drop Its Anthropic Beef and Embrace the Spooky New Model

A Critical Ocean Current System May Be Unraveling Faster Than We Thought

Lana Del Rey Just Released the First Bond Theme We’ll Hear for Years

Tesla Wants a $50,000 Penalty for Anyone Who Tries to Resell Its Signature Model S and X

Latest Reviews

Alienware’s $350 QD-OLED Gaming Monitor Nixes Everything for a Pretty Screen

Sony Inzone H6 Air Review: A Perfect Case for Open-Back Gaming Headsets

Anker’s EufyMake E1 Finally Brings Printers Out of the Dark Ages

The Soundboks Mix Is Going to Make You the Most Popular Guy at the Park

Asus Zenbook A16 Review: Start Considering Snapdragon on PC, for Real

Aqara Thermostat Hub W200 Review: Almost Better than My Ecobee

HP Omen Max 45L Review: 4K60 Gaming Has Never Been So Easy

Ecovacs Winbot W3 Omni Review: Window Cleaning Robots Have a Long Way to Go

Related Articles

This Artificially Intelligent Speech Generator Can Fake Anyone’s Voice

Sign up for our newsletters

We’ve Seen Doctor Doom in Action for ‘Avengers: Doomsday’

We’ve Seen the First 18 Minutes of ‘The Mandalorian and Grogu’

The New ‘Mandalorian and Grogu’ Trailer Amps Up the Nostalgia

Hollywood’s First Big Budget AI-Generated Movie Is About Bitcoin, of Course

White House Is Reportedly Ready to Drop Its Anthropic Beef and Embrace the Spooky New Model

A Critical Ocean Current System May Be Unraveling Faster Than We Thought

Lana Del Rey Just Released the First Bond Theme We’ll Hear for Years

Tesla Wants a $50,000 Penalty for Anyone Who Tries to Resell Its Signature Model S and X

Alienware’s $350 QD-OLED Gaming Monitor Nixes Everything for a Pretty Screen

Sony Inzone H6 Air Review: A Perfect Case for Open-Back Gaming Headsets

Anker’s EufyMake E1 Finally Brings Printers Out of the Dark Ages

The Soundboks Mix Is Going to Make You the Most Popular Guy at the Park

Asus Zenbook A16 Review: Start Considering Snapdragon on PC, for Real

Aqara Thermostat Hub W200 Review: Almost Better than My Ecobee

HP Omen Max 45L Review: 4K60 Gaming Has Never Been So Easy

Ecovacs Winbot W3 Omni Review: Window Cleaning Robots Have a Long Way to Go

Related Articles

The Case for Tracking Everything

Nobel Prizes: 5 Unlikely Winner Reactions, From the Unbothered to the Downright Mad

The Day Grok Tried to Be Human

We Asked ChatGPT to Be Mean

An Artist Claims to Have Created Paint in a ‘New’ Impossible Hue Conjured by Scientists

Scientists Agree That Everyone Hates Your Terrible Zoom Mic