How Google Has Turned Language Translation into a Math Problem

Illustration for article titled How Google Has Turned Language Translation into a Math Problem

Language translation is a notoriously difficult task for humans, let alone computers. But in trying to solve that problem Google has stumbled across a clever trick, that involves treating them like maps—and it really, really works.


Technology Review has shone a light on a Google research paper published on the arXiv server. It spells out how the search giant's most recent translation work focuses on plotting out words on language maps—think of a space full of words, ordered in some kind of logical order—and simply uses linear operations to switch between languages. Technology Review explains:

The new approach is relatively straightforward. It relies on the notion that every language must describe a similar set of ideas, so the words that do this must also be similar. For example, most languages will have words for common animals such as cat, dog, cow and so on. And these words are probably used in the same way in sentences such as “a cat is an animal that is smaller than a dog.”

The same is true of numbers. The image above shows the vector representations of the numbers one to five in English and Spanish and demonstrates how similar they are.

This is an important clue. The new trick is to represent an entire language using the relationship between its words. The set of all the relationships, the so-called “language space”, can be thought of as a set of vectors that each point from one word to another. And in recent years, linguists have discovered that it is possible to handle these vectors mathematically. For example, the operation ‘king’ – ‘man’ + ‘woman’ results in a vector that is similar to ‘queen’.

Turns out that lots of languages share a huge number of similarities when they're mapped out in this way, which means the problem isn't about finding the right words, but about finding the right way to map one vector space to another. In other words, translation's no longer a problem of linguistics, but of mathematics.

The neat thing is that the technique doesn't really make any assumptions about the languages involved, it just interrogates the way vector space are related to teach other. That makes it incredibly versatile. But the algorithms are still in their early days; there's a way to go yet. [arXiv via Technology Review]

Image by dylancantwell under Creative Commons license



One obstacle is that these algorithms are not very flexible. They can approximate flexibility in language by becoming more complex, but that creates problems of its own.

For example, there's an idiom in English Pie in the sky. Google's translate program is able to "map" a common phrase like this to its equivalent in other languages and it correctly translates it as something like an impossibility rather than a hovering pastry.

And maybe 99% of the time this will be a correct interpretation. But the problem arises when I want to describe how the air shipment of Sara Lee fell off the cargo plane and there was all this pie in the sky and Google mangles it. This will probably always happen, because translation and interpretation are different things. An algorithm might become very good at translation, but true interpretation might never be fully perfected. Which is not that surprising since humans are not perfect at interpreting meaning, even in their native language.