What happens if you let a genome hacker—a kind of computer scientist-turned-biologist—loose on the world's online genealogy sites? The world's biggest family tree is what, which shows how over 13 million people are related.
The work of computational biologist Yaniv Erlich, which he presented at the American Society of Human Genetics annual meeting in Boston just recently, rolls together masses of data stripped straight from online genealogy sites.
Historically, researchers have had to sift through dusty old records for this kind of data; assembling a tree of just a few thousand individuals could take years. But Erlich scraped over 43 million public profiles—which always include birth and death dates, but also sometimes locations and images—off of the genealogy website geni.com, then had his team assemble it into family trees.
Some were as small as a thousand individuals; one was as large as 13 million. That dwarfs the trees available to researchers in the past, which were hundreds of thousands large at best, reports Nature. Before you scream and shout about data rights, it's all been anonymized to protect privacy—but that doesn't make it any less useful. Stretching way back to the 15th century, the idea is to probe and analyze it for the good of science.
The challenge, though, is how to interrogate those trees and wring out the secrets they hide. There's a lot of promise: their mere structure could tell us a lot about demographics and population expansions, and if they can be linked to medical information or to DNA sequence data then they could offer huge insight into the way we understand heredity.
As ever, though, there are caveats. The problem with most genealogy data is that it's self-reported—and so not always reliable, especially the farther back you go. Still, that's the benefit of such a larger data set: among the noise there may still be enough signal. For now, it's unclear just how useful the massive family trees will be—but it'll be exciting to find out. In the meantime, why not give it a bash yourself? [Nature]