The Vonyich manuscript (Image: Beinecke Rare Book & Manuscript Library, Yale University)

Since its discovery over a hundred years ago, the 240-page Voynich manuscript, filled with seemingly coded language and inscrutable illustrations, has confounded linguists and cryptographers. Using artificial intelligence, Canadian researchers have taken a huge step forward in unraveling the document’s hidden meaning.

Named after Wilfrid Voynich, the Polish book dealer who procured the manuscript in 1912, the document is written in an unknown script that encodes an unknown language—a double-whammy of unknowns that has, until this point, been impossible to interpret. The Voynich manuscript contains hundreds of fragile pages, some missing, with hand-written text going from left to right. Most pages are adorned with illustrations of diagrams, including plants, nude figures, and astronomical symbols. But as for the meaning of the text—nothing. No clue.

But not for want of trying. The manuscript is considered the world’s most important cipher, one scrutinized by cryptographers, both professional and amateurs, for decades. It was even analyzed by codebreakers during the Second World War, but even they had no luck. Various theories about the code have been tossed around over the years, including that it was created using semi-random encryption schemes; anagrams; or writing systems in which vowels have been removed. Some have even suggested the document is an elaborate hoax.

The Vonyich manuscript (Image: Beinecke Rare Book & Manuscript Library, Yale University)

For Greg Kondrak, an expert in natural language processing at the University of Alberta, this seemed a perfect task for artificial intelligence. With the help of his grad student Bradley Hauer, the computer scientists have taken a big step in cracking the code, discovering that the text is written in what appears to be the Hebrew language, and with letters arranged in a fixed pattern. To be fair, the researchers still don’t know the meaning of the Voynich manuscript, but the stage is now set for other experts to join the investigation.


The first step was to figure out the language of the ciphered text. To that end, an AI studied the text of the “Universal Declaration of Human Rights” as it was written in 380 different languages, looking for patterns. Following this training, the AI analyzed the Voynich gibberish, concluding with a high rate of certainty that the text was written in encoded Hebrew. Kondrak and Hauer were taken aback, as they went into the project thinking it was formed from Arabic.

“That was surprising,” said Kondrak in a statement. “And just saying ‘this is Hebrew’ is the first step. The next step is how do we decipher it.”

A clip of the manuscript. (Image: Beinecke Rare Book & Manuscript Library, Yale University)


For the second step, the researchers entertained a hypothesis proposed by previous researchers—that the script was created with alphagrams, that is, words in which text has been replaced by an alphabetically ordered anagram (For example, an alphagram of GIZMODO would read DGIMOOZ). Armed with the knowledge that text was originally coded from Hebrew, the researchers devised an algorithm that could take these anagrams and create real Hebrew words.

“It turned out that over 80 percent of the words were in a Hebrew dictionary, but we didn’t know if they made sense together,” said Kondrak.

For the final step, the researchers deciperhered the opening phrase of the manuscript, and presented it to colleague Moshe Koppel, a computer scientist and native Hebrew speaker. Koppel said it didn’t form a coherent sentence in Hebrew.


“However, after making a couple of spelling corrections, Google Translate [was] able to convert it into passable English: ‘She made recommendations to the priest, man of the house and me and people,’” wrote the researchers in the study, which now appears in Transactions of the Association of Computational Linguistics.

It’s a really weird way to open up a 240-page manuscript, but the phrase actually makes some sense. Importantly, the researchers aren’t saying they’ve deciphered the entire Voynich manuscript. Rather, they’ve identified the language of origin (Hebrew), and a coding scheme in which letters have been arranged in a particular order (alphagram). Kondrak says the full meaning of the text won’t be known until historians of ancient Hebrew have a chance to study the deciphered text.


Excitingly, the team is planning to apply the new algorithm to other ancient scripts, highlighting the potential for AI to solve problems that have vexed humans for centuries.

[Transactions of the Association of Computational Linguistics]