The worst part about visiting somewhere you don’t speak the local tongue is that sinking feeling as you squint at a phrasebook, open your mouth, and prepare yourself to totally butcher the language. If that’s your nightmare, good news. Google is working on an AI that can actually speak in another language using your own voice.
Aptly named Translatotron, the AI is described in a Google blog as an end-to-end, speech-to-speech translation model. What makes it novel is that it eschews the usual method of speech-to-text and then text-to-speech voice conversion—which is what Google Translate does. Instead, it employs a neural network so it can skip the intermediate step of translating audio to text and back again. It also includes a “speaker encoder” that can preserve the original speaker’s voice.
The new AI has a few advantages over the traditional method. Namely, Google points out faster inference speeds, eliminating compound errors, and better handling of words that don’t need translation like proper nouns and names. Right now, Google says translation quality using Translatotron lags behind conventional methods. That said, if you check out the example audio clips, the end results are not only decently accurate, but also sound more natural. There’s still that mechanical intonation, but much less so compared to say, Amazon Alexa.
Right now, there are a few text-to-speech translation apps already out there—including Google Translate, SayHi, Microsoft Translator, iTranslate, and TripLingo. That said, none of them use your actual voice in the final product, and that can be somewhat jarring in real life.
Translatotron might still be in the works, but fingers crossed that sometime in the near future I can take that Paris trip I’ve always wanted without having to embarrass myself mispronouncing “confit de canard” at a fancy restaurant.