Tech News

Microsoft’s New AI Tool Just Needs to Hear Three Seconds of Your Voice to Mimic You

VALL-E can preserve the original speaker's emotional tone and even simulate their acoustic environment.

By Andrew Liszewski Published January 10, 2023, 10:55 am ET

Reading time 2 minutes

Despite how far advancements in AI video generation have come, it still requires quite a bit of source material, like headshots from various angles or video footage, for someone to create a convincing deepfaked version of your likeness. When it comes to faking your voice, that’s a different story, as Microsoft researchers recently revealed a new AI tool that can simulate someone’s voice using just a three-second sample of them talking.

The new tool, a “neural codec language model” called VALL-E, is built on Meta’s EnCodec audio compression technology, revealed late last year, which uses AI to compress better-than-CD quality audio to data rates 10 times smaller than even MP3 files, without a noticeable loss in quality. Meta envisioned EnCodec as a way to improve the quality of phone calls in areas with spotty cellular coverage, or as a way to reduce bandwidth demands for music streaming services, but Microsoft is leveraging the technology as a way to make text to speech synthesis sound more realistic based on a very limited source sample.

Current text to speech systems are able to produce very realistic sounding voices, which is why smart assistants sound so authentic despite their verbal responses being generated on the fly. But they require high-quality and very clean training data, which is usually captured in a recording studio with professional equipment. Microsoft’s approach makes VALL-E capable of simulating almost anyone’s voice without them spending weeks in a studio. Instead, the tool was trained using Meta’s Libri-light dataset, which contains 60,000 hours of recorded English language speech from over 7,000 unique speakers, “extracted and processed from LibriVox audiobooks,” which are all public domain.

Microsoft has shared an extensive collection of VALL-E generated samples so you can hear for yourself how capable its voice simulation capabilities are, but the results are currently a mixed bag. The tool occasionally has trouble recreating accents, including even subtle ones from source samples where the speaker sounds Irish, and its ability to change up the emotion of a given phrase is sometimes laughable. But more often than not, the VALL-E generated samples sound natural, warm, and are almost impossible to distinguish from the original speakers in the three second source clips.

In its current form, trained on Libri-light, VALL-E is limited to simulating speech in English, and while its performance is not yet flawless, it will undoubtedly improve as its sample dataset is further expanded. However, it will be up to Microsoft’s researchers to improve VALL-E, as the team isn’t releasing the tool’s source code. In a recently released research paper detailing the development of VALL-E, its creators fully understand the risks it poses:

“ Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker. To mitigate such risks, it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E. We will also put Microsoft AI Principles into practice when further developing the models.”

Explore more on these topics

Microsoft Technology

Share this story

Sign up for our newsletters

Subscribe and interact with our community, get up to date with our customised Newsletters and much more.

Microsoft’s New AI Tool Just Needs to Hear Three Seconds of Your Voice to Mimic You

Sign up for our newsletters

Latest news

‘Masters of the Universe’ Has the Streaming Power This Week

Logitech Goes Big on Wireless Keyboards as MX Keys Mini Drops to Near Black Friday Pricing, Works With Mac and PC

Alienware AW3426DW Review: Gaming Monitors Get Thrown a Curveball

Woman’s Cornea Gets ‘Eaten’ by Parasite After Contact Lens Blunder

Roborock 10,000Pa Suction Robot Vacuum and Mop Goes Near Prime Day Low, No Need for an Off-Brand Stick Cleaner

A Version of the ‘Dune: Part Three’ Art Book Comes With Actual Sandworm [Exclusive]

The Asteroid That Killed the Dinosaurs May Not Have Done It Exactly How We Thought

Toshiba 65-Inch LED 4K UHD Smart Fire TV Is 53% Off, Letting You Buy It for Portable Monitor Money

Latest Reviews

Anker Solix S2000 Review: The Little 2kWh Battery That Could

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

You Know What Your Bathroom Needs? A Smart Mirror With Party Lighting

Narwal Freo Z10 Turbo Review: Midrange Vacuum, High-End Performance

Related Articles

Microsoft’s New AI Tool Just Needs to Hear Three Seconds of Your Voice to Mimic You

Sign up for our newsletters

‘Masters of the Universe’ Has the Streaming Power This Week

Logitech Goes Big on Wireless Keyboards as MX Keys Mini Drops to Near Black Friday Pricing, Works With Mac and PC

Alienware AW3426DW Review: Gaming Monitors Get Thrown a Curveball

Woman’s Cornea Gets ‘Eaten’ by Parasite After Contact Lens Blunder

Roborock 10,000Pa Suction Robot Vacuum and Mop Goes Near Prime Day Low, No Need for an Off-Brand Stick Cleaner

A Version of the ‘Dune: Part Three’ Art Book Comes With Actual Sandworm [Exclusive]

The Asteroid That Killed the Dinosaurs May Not Have Done It Exactly How We Thought

Toshiba 65-Inch LED 4K UHD Smart Fire TV Is 53% Off, Letting You Buy It for Portable Monitor Money

Anker Solix S2000 Review: The Little 2kWh Battery That Could

SwitchBot Home Dashboard Review: An E Ink Smart Display for the Weather-Obsessed

Asus ROG Kithara Review: A Huge Gaming Headset With Even Bigger Sound

Geekom A9 Max (2026) Review: Not Much ‘Max’ About It

The Best Budget Laptops Under $1,000 for Back to School

Roborock Saros 20 Review: Jack of All Trades, Master of Most

You Know What Your Bathroom Needs? A Smart Mirror With Party Lighting

Narwal Freo Z10 Turbo Review: Midrange Vacuum, High-End Performance

Related Articles

The Best Budget Laptops Under $1,000 for Back to School

The Best Tech to Level Up Summer 2026

Claude and ChatGPT Are Getting Too Expensive, Even for Microsoft

Xbox Hits ‘Reset’ Button With Thousands of Job Cuts and Game Studio Spin Offs

Everyone Wants to Build AI Using Someone Else’s Work

Microsoft’s Revised Surface Laptop Is Cheaper—and Worse—Than Before