Genetic testing has helped plenty of people gain insight into their ancestry, and some services even help users find their long-lost relatives. But a new study published this week in Science suggests that the information uploaded to these services can be used to figure out your identity, regardless of whether you volunteered your DNA in the first place.
The researchers behind the study were inspired by the recent case of the alleged Golden State Killer.
Earlier this year, Sacramento police arrested 72-year-old Joseph James DeAngelo for a wave of rapes and murders allegedly committed by DeAngelo in the 1970s and 1980s. And they claimed to have identified DeAngelo with the help of genealogy databases.
Traditional forensic investigation relies on matching certain snippets of DNA, called short tandem repeats, to a potential suspect. But these snippets only allow police to identify a person or their close relatives in a heavily regulated database. Thanks to new technology, the investigators in the Golden State Killer case isolated the genetic material that’s now collected by consumer genetic testing companies from the suspected killer’s DNA left behind at a crime scene. Then they searched for DNA matches within these public databases.
This information, coupled with other historical records, such as newspaper obituaries, helped investigators create a family tree of the suspect’s ancestors and other relatives. After zeroing in on potential suspects, including DeAngelo, the investigators collected a fresh DNA sample from DeAngelo—one that matched the crime scene DNA perfectly.
But while the detective work used to uncover DeAngelo’s alleged crimes was certainly clever, some experts in genetic privacy have been worried about the grander implications of this method. That includes Yaniv Erlich, a computer engineer at Columbia University and chief science officer at MyHeritage, an Israel-based ancestry and consumer genetic testing service.
Erlich and his team wanted to see how easy it would be in general to use the method to find someone’s identity by relying on the DNA of distant and possibly unknown family members. So they looked at more than 1.2 million anonymous people who had gotten testing from MyHeritage, and specifically excluded anyone who had immediate family members also in the database. The idea was to figure out whether a stranger’s DNA could indeed be used to crack your identity.
They found that more than half of these people had distant relatives—meaning third cousins or further—who could be spotted in their searches. For people of European descent, who made up 75 percent of the sample, the hit rate was closer to 60 percent. And for about 15 percent of the total sample, the authors were also able to find a second cousin.
Much like the Golden State investigators, the team found they could trace back someone’s identity in the database with relative ease by using these distant relatives and other demographic but not overly specific information, such as the target’s age or possible state residence.
In one specific case, they were able to cross-reference a woman’s anonymous genetic profile from another research project with the same service used by Golden State Killer investigators—a website called GEDmatch—and find her identity. The woman had been identified in an earlier study conducted by Erlich, using a different method that relied on figuring out the genetic profile of her husband, but the search was even easier and required less upfront information than their previous method.
For Erlich, the findings are both reassuring and frightening.
“Of course, there’s some good news. If someone did something wrong out there, then [law enforcement] is going to be able to catch them,” he told Gizmodo. “But down the road, as things continue to evolve, there could be people who use this for illegitimate reasons.”
That could include scientists who try to identify research subjects from other projects, as well as companies and individuals that might try to leverage and sell your information elsewhere. Another concern is genetic discrimination.
Erlich said there are ways to stop the potential misuse of these databases. Agencies such as the U.S. Department of Health and Human Services have regulations for federally funded research that involves human subjects. Known as the common rule, a revision of these guidelines was set to be implemented in 2017, but won’t come in full effect until 2019. The revised common rule doesn’t currently consider our genomes to be identifiable information, but Erlich noted that the HHS is allowed to change that status as technology advances. That might stop unscrupulous scientists, who would stand to lose federal funding if they were caught trying to pilfer people’s identities.
Genetic testing services could also take their own steps to protect their consumers. They could encrypt the raw genetic data they send out with cryptographic signatures, a technique touted by other scientists concerned about genetic privacy. Genealogy services would then only run searches through their database if a query was confirmed to be coming from a customer (as a supplement to the paper, the researchers have uploaded their demo source code for such a signature on GitHub).
In an ideal world, law enforcement agencies could also still access these services, but only obtaining after explicit permission, such as through a warrant. As of right now, MyHeritage does not allow researchers or law enforcement officials to use their genealogy service without permission, and according to the company, no one has been granted permission as of yet.
“We need to think about oversight, about checks and balances, now, before these concrete concerns show up” said Erlich.
Though the details are still being worked out, it’s almost certain that all of us will need our genetic information to be safeguarded, even if you do decide to turn down a well-meaning gift of a free DNA test. According to the researchers, it will take only about 2 percent of an adult population having their DNA profiled in a database before it becomes theoretically possible to trace any person’s distant relatives from a sample of unknown DNA—and therefore, to uncover their identity. And we’re getting ever closer to that tipping point.
“Once we reach 2 percent, nearly everyone will have a third cousin match, and a substantial amount will have a second cousin match,” Erlich explained. “My prediction is that for people of European descent, we’ll reach that threshold within two or three years.”
For those concerned about their criminal misdeeds coming back to bite them, there’s already plenty to be worried about. The authors note that more law enforcement officials in the U.S. are starting to adopt this technique. Since April, at least 13 criminal cases have seemingly been solved with the help of genealogy searches. And while most of these involved cold cases, it’s also been used to find the suspect of a crime committed just this April. Private forensic testing companies have also recently announced their own sweeps of cold cases using a similar technique.
What this means for you: If you want to protect your genetic privacy, the best thing you can do is lobby for stronger legal protections and regulations. Because whether or not you’ve ever submitted your DNA for testing, someone, somewhere, is likely to be able to pick up your genetic trail.