Knot theory hasn't been the only unexpected math to pop up during DNA research. Scientists have used Venn diagrams to study DNA, and the Heisenberg uncertainty principle. The architecture of DNA shows traces of the "golden ratio" of length to width found in classical edifices like the Parthenon. Geometry enthusiasts have twisted DNA into mÃ¶bius strips and constructed the five Platonic solids. Cell biologists now realize that, to even fit inside the nucleus, long, stringy DNA must fold and refold itself into a fractal pattern of loops within loops within loops, a pattern where it becomes nearly impossible to tell what scale - nano-, micro-, or millimeter - you're looking at.

DNA has especially intimate ties to an oddball piece of math called Zipf's law, a phenomenon first discovered by a linguist. George Kingsley Zipf came from solid German stockâ€”his family had run breweries in Germanyâ€”and he eventually became a professor of German at Harvard university.

A colleague once described Zipf as someone "who would take roses apart to count their petals," and Zipf treated literature no differently. As a young scholar Zipf tackled James Joyce's Ulysses, and the main thing he got out of it was that it contained 29,899 different words, and 260,430 words total. From there Zipf dissected Beowulf, Homer, Chinese texts, and the Oeuvre of the Roman playwright Plautus. By counting the words in each work, he discovered Zipf's law. It says that the most common word in a language appears roughly twice as often as the second most common word, roughly three times as often as the third most, a hundred times as often as the hundredth most, etc. In English, "the" accounts for seven percent of words, "of" about half that, "and" a third of that, all way down to obscurities like grawlix or boustrophedon. These distributions hold just as true for Sanskrit, Etruscan, Hieroglyphics, Spanish, or Russian. Even when people make up languages, something like Zipf's law emerges.

After Zipf died in 1950, scholars found evidence of his law in an astonishing variety of other placesâ€”in music, city population ranks, income distributions, mass extinctions, earthquake magnitudes, the ratios of different colors in paintings and cartoons, and more. Probably inevitably, the theory's sudden popularity led to a backlash, especially among linguists, who questioned what Zipf's law even meant, if anything. Still, many scientists defend Zipf's law because it feels correctâ€”the frequency of words doesn't seem randomâ€”and, empirically, it does describe languages in uncannily accurate ways. Even the "language" of DNA.

Of course, it's not apparent at first that DNA is Zipfian, especially to speakers of Western languages. Unlike most languages DNA doesn't have obvious spaces to distinguish each word. It's more like those ancient texts with no breaks or pauses or punctuation of any kind, just relentless strings of letters. You might think that the A-C-G-T triplets that code for amino acids could function as "words," but their individual frequencies don't look Zipfian. To find Zipf, scientists had to look at groups of triplets instead, and a few turned to an unlikely source for help: Chinese search engines. The Chinese language creates compound words by linking adjacent symbols. So if a Chinese text reads ABCD, search engines might examine a sliding "window" to find meaningful chunks, first AB, BC, and CD, then ABC and BCD. Using a sliding window proved a good strategy for finding meaningful chunks in DNA, too.