Despite what procedural dramas might lead you to believe, the DNA that police collect at crime scenes can’t tell you anything about the characteristics of the person who left it behind. If police have a database of suspects’ DNA, then they can match a sample they’ve swabbed up on-scene with the profiles in the database. But the sample alone can’t tell them whether its owner had brown eyes or a propensity for diabetes. In fact, the inability to tell physical traits from law enforcement DNA databases is crucial to their legality: The Supreme Court ruled in 2013 that police are allowed to take DNA samples from people they’ve arrested for serious crimes—and doing so is not considered an unreasonable search—in part because the DNA markers that police look at “do not reveal an arrestee’s genetic traits and are unlikely to reveal any private medical information.”
A new study, however, begins to undermine that argument. A team of geneticists from universities in the United States and Canada found a way to match forensic DNA profiles with health-related DNA profiles. In other words, somebody with access to both U.S. law enforcement’s DNA database and a genetic research database could, theoretically, run an analysis to find if there’s anybody who pops up in both. From there, the analyst could look up traits such as that person’s ancestry and risk for diseases such as cancer and dementia.
“What we’re saying here is there’s this connection between databases that has not been part of the conversation about where policy should consider privacy,” says Noah Rosenberg, a genetics researcher at Stanford University and the study’s lead author.
Some outside experts contend that such matching might be much more difficult than it sounds. There are questions about how well Rosenberg and his team’s methods will work in the real world. Plus, a quirk in human DNA means it’s possible Rosenberg’s methods won’t hold up over time. “I think it’s unlikely to ever be useful,” says Sara Katsanis, a genetics policy researcher at Duke University whose own research was cited by the Supreme Court to support its argument that law enforcement’s DNA database doesn’t reveal important personal information.
“There’s this connection between databases that has not been part of the conversation about where policy should consider privacy.”
The difference between law enforcement and scientists—in terms of genetic data, at least—is that they look at different parts of the human genome. Everyone has about a three-billion-letter-long sequence of DNA inside every cell in their bodies. Genetic tests never try to read every single letter. Instead, they choose parts. Police analysts look at so-called short tandem repeats, which are distinctive patterns scattered inside the genome. Until this year, law enforcement only looked at short tandem repeats in 13 locations in the genome; they’ve now expanded to 20. These short tandem repeats don’t affect people’s health or appearance, but they’re unique enough between people that they can help with identifying crime suspects and missing persons.
Genetics researchers, on the other hand, often look at single nucleotide polymorphisms, or SNPs (pronounced “snips”) for short. So do commercial genome companies, such as 23andMe and Ancestry.com. Ever since the human genome was sequenced in 2003, researchers have plumbed the SNPs in humanity’s DNA, analyzing hundreds of thousands of SNPs at once to see which ones are associated with physical traits. SNPs have been associated, with varying reliability, with everything from people’s weight and height to how they’ll react to certain medicines.
SNPs and short tandem repeats have no overlaps and, until Rosenberg’s study, there wasn’t a way to convert between the data sets. Rosenberg offered an accurate way to translate one to the other. Within his study group of 872 people, he was able to match SNP profiles to profiles containing 13 short tandem repeats, between 90 and 98 percent of the time.
What would happen if he tried in a larger data set, however? 23andMe, for example, has two million customers, as Forbes reported in April. “We have a calculation in the study that suggests that, for some non-trivial fraction of a population of millions, these links will be possible to make,” Rosenberg says. “The technique doesn’t have to work in every case in order to be a technique whose implications need to be explored.”
Katsanis is a bit more skeptical. She says that, if police were to go to 23andMe and demand that they be allowed to try to match a forensic sample with 23andMe’s customers, “there would be too many possible people that could align with that STR [short tandem repeat] profile. That is just not useful information for law enforcement.”
In addition, short tandem repeats change quickly between generations. This means when parents pass their short tandem repeats onto their children, the STRs aren’t always copied perfectly. But because STRs don’t affect people’s health, nobody would ever have reason to know. On the other hand, SNPs are almost always inherited perfectly. Imagine SNPs and STRs are two runners in a race. Rosenberg and his colleagues found a way to calculate the distance between SNPs and STRs as they’re running. But because STRs is faster than SNPs, that calculation will, over time, become less accurate. “I would argue that the STR mutation rates will mean that they won’t have a reliable connection over time,” says John Butler, a fellow for forensic science at the National Institute of Standards and Technology. But nobody knows for sure. “I would be happy to be proven wrong, over time,” Butler says.
Although the science is nascent now, Katsanis thinks that matching forensic DNA profiles to health- and appearance-related ones may well be possible one day. And such development would bring benefits alongside privacy dangers. “A lot of my research is in missing persons and the tools and technology that are considered invasive for law enforcement purposes, like phenotyping who might be Muslim, are very useful when you have a bone in a desert and you’re trying to figure out who that person is,” she says. When that time comes, it’ll be up to society to decide how to draw boundaries and protect people’s genetic data.