Most of us occasionally worry that some online mishap, like a misreading of Facebook’s latest privacy policy, will disclose to the world things we’d rather keep private: pictures unlikely to amuse your employer, rants about your neighbor’s hygiene habits, or harmless-but-personal stuff that you only want to share with close friends. If the threat of losing your online privacy has you worried, future threats to the privacy of your genome should have you absolutely freaked out. By means of your DNA, nature provides you with a security flaw that makes Microsoft Windows look like Fort Knox.
Personal genetic information is currently a subject of intense scientific interest, because one of the major goals of today’s biomedical science is to figure out how we can use information in your DNA to customize your medical care. Your individual genome has a strong influence on how your body responds to efforts to prevent and treat disease, and in the coming years, doctors will begin to routinely use genetic information in patient care. As the practice of genetic medicine becomes increasingly common, more and more of us will use genetic testing services, which means that more and more of us will have our genetic data stored in some sort of database. This will put some very personal information at risk, because, as Visa will tell you, databases can be hacked.
If the threat of losing your online privacy has you worried, future threats to the privacy of your genome should have you absolutely freaked out.
What kind of private information could your genome data reveal? At this point, not too much, but this will change quickly. In the near future, it may not be difficult for someone with access to your genome data to make a good guess about your ethnicity, your skin color, your propensity to obesity, addiction, bipolar disorder, attention-deficit disorder, early onset cancer, or Parkinson’s, Huntington’s, or Alzheimer’s Disease, not to mention the identity of your real father. A hack of your Facebook or credit card account seems minor by comparison.
To make matters worse, unlike your credit card number, genetic data is, with a little help from social media, potentially self-identifying—even anonymous genetic data, as a group of computational biologists from MIT proved in a recent study. Yaniv Erlich and his students were able to use easily available, public information to identify anonymous volunteers who had contributed their genetic data to a database for scientific research. Disturbingly, the clues that led scientists to the identities of the anonymous donors came from information uploaded to the Web not by the donors themselves, but by relatives as distant as a second cousin, once removed.
These distant relatives, all males in this case, were users of recreational genealogy websites. They had uploaded low-cost genetic data about their Y chromosomes to a publicly viewable database called Ysearch, which contains surnames and Y chromosome data. Because last names and Y chromosomes tend to be inherited from fathers, the Ysearch database can act like a genetic dictionary that translates Y chromosome data into surnames, even for individuals who are not in the Ysearch database. Submit someone’s Y chromosome data, and Ysearch will make what is often a very good guess of that person’s surname. This is exactly what the MIT scientists did with data from the Y chromosomes of the anonymous volunteers. With likely surnames in hand, the scientists used other publicly available information from Web sources like PeopleFinder.com to completely identify those volunteers. How reliable is this procedure? Out of 10 anonymous individuals, the scientists completely identified five of them.
If you don’t have a Y chromosome or you haven’t donated your genome to science, you may be wondering why any of this should worry you. The link between Y chromosomes and surnames is just one of many potential privacy vulnerabilities in genetic data. It will only get easier to exploit these vulnerabilities as the popularity of social media, recreational genealogy, and genetic medicine goes up while the cost of genetic scans goes down.
Dramatic genome privacy violations are, for the most part, only a theoretical threat right now, but we should avoid mistakes that will put us at risk in the future. The lesson from the MIT study is that all of us—patients, research institutions, and companies that offer genetic scans—need to handle our online genetic data with the same caution that we handle our financial or medical data. By themselves, popular, low-cost, low-information genetic scans for genealogy won’t ever be much of threat to your privacy, but the in-depth medical genetic scans could reveal more about you than that compromising picture from a long-ago college party ever will.