Why the Knotted Language of DNA Sounds Like MusicS

Knot theory hasn't been the only unexpected math to pop up during DNA research. Scientists have used Venn diagrams to study DNA, and the Heisenberg uncertainty principle. The architecture of DNA shows traces of the "golden ratio" of length to width found in classical edifices like the Parthenon. Geometry enthusiasts have twisted DNA into möbius strips and constructed the five Platonic solids. Cell biologists now realize that, to even fit inside the nucleus, long, stringy DNA must fold and refold itself into a fractal pattern of loops within loops within loops, a pattern where it becomes nearly impossible to tell what scale - nano-, micro-, or millimeter - you're looking at.

DNA has especially intimate ties to an oddball piece of math called Zipf's law, a phenomenon first discovered by a linguist. George Kingsley Zipf came from solid German stock—his family had run breweries in Germany—and he eventually became a professor of German at Harvard university.

A colleague once described Zipf as someone "who would take roses apart to count their petals," and Zipf treated literature no differently. As a young scholar Zipf tackled James Joyce's Ulysses, and the main thing he got out of it was that it contained 29,899 different words, and 260,430 words total. From there Zipf dissected Beowulf, Homer, Chinese texts, and the Oeuvre of the Roman playwright Plautus. By counting the words in each work, he discovered Zipf's law. It says that the most common word in a language appears roughly twice as often as the second most common word, roughly three times as often as the third most, a hundred times as often as the hundredth most, etc. In English, "the" accounts for seven percent of words, "of" about half that, "and" a third of that, all way down to obscurities like grawlix or boustrophedon. These distributions hold just as true for Sanskrit, Etruscan, Hieroglyphics, Spanish, or Russian. Even when people make up languages, something like Zipf's law emerges.

After Zipf died in 1950, scholars found evidence of his law in an astonishing variety of other places—in music, city population ranks, income distributions, mass extinctions, earthquake magnitudes, the ratios of different colors in paintings and cartoons, and more. Probably inevitably, the theory's sudden popularity led to a backlash, especially among linguists, who questioned what Zipf's law even meant, if anything. Still, many scientists defend Zipf's law because it feels correct—the frequency of words doesn't seem random—and, empirically, it does describe languages in uncannily accurate ways. Even the "language" of DNA.

Of course, it's not apparent at first that DNA is Zipfian, especially to speakers of Western languages. Unlike most languages DNA doesn't have obvious spaces to distinguish each word. It's more like those ancient texts with no breaks or pauses or punctuation of any kind, just relentless strings of letters. You might think that the A-C-G-T triplets that code for amino acids could function as "words," but their individual frequencies don't look Zipfian. To find Zipf, scientists had to look at groups of triplets instead, and a few turned to an unlikely source for help: Chinese search engines. The Chinese language creates compound words by linking adjacent symbols. So if a Chinese text reads ABCD, search engines might examine a sliding "window" to find meaningful chunks, first AB, BC, and CD, then ABC and BCD. Using a sliding window proved a good strategy for finding meaningful chunks in DNA, too.

The expression of DNA, the translation into proteins, also obeys Zipf's law. Like common words, a few genes in every cell get expressed time and time again, while most genes hardly ever come up in conversion. Over the ages cells have learned to rely on these common proteins more and more, and the most common one generally appears twice and thrice and quatrice as often as the next-most-common proteins.

So if DNA shows Zipfian tendencies, too, is DNA arranged into a musical score of sorts? Musicians have in fact translated the A-C-G-T sequence of serotonin, a brain chemical, into little ditties by assigning the four DNA letters to the notes A, C, G, and, E. Other musicians have composed DNA melodies by assigning harmonious notes to the amino acids that popped up most often, and found that this produced more complex and euphonious sounds.

Something even more interesting happened when two scientists, instead of turning DNA into music, inverted the process and translated the notes from a Chopin nocturne into DNA. They discovered a sequence "strikingly similar" to part of the gene for RNA polymerase. This polymerase, a protein universal throughout life, is what builds RNA from DNA. Which means, if you look closer, that the nocturne actually encodes an entire life cycle. Consider: Polymerase uses DNA to build RNA. RNA in turn builds complicated proteins. These proteins in turn build cells, which in turn build people, like Chopin. He in turn composed harmonious music —which completed the cycle by encoding the DNA to build polymerase. Musicology recapitulates ontology.

Humans have long wanted to link music to deeper, grander themes in nature. most notably astronomers from ancient Greece right through to Kepler believed that, as the planets ran their course through the heavens, they created an achingly beautiful musica universalis, a hymn in praise of Creation. It turns out that universal music does exist, only it's closer than we ever imagined, in our DNA.

Reprinted with permission from Little, Brown and Company. Copyright Sam Kean, 2012. All Rights Reserved. Image: sgame / Shutterstock

Why the Knotted Language of DNA Sounds Like MusicS

The Violinist's Thumb is available from SamKean.com and Amazon.