DNA hack could make medical privacy impossible

It may now be possible for anyone, even if they follow rigorous privacy and anonymity practices, to be identified by DNA data from people they do not even know.

A paper published in January in the journal Science describes a process by which it's possible to identify by name the donors of DNA samples, even without any demographic or personal information. The technique was developed by a team of geneticists at MIT's Whitehead Institute for Biomedical Research and is intended to demonstrate that science and technology have surpassed the techniques and laws currently in place for safeguarding private medical data, according to Yaniv Erlich, a fellow at Whitehead and member of the research team.

The point was not to reveal private information, but to demonstrate a systemic weakness that will require research, debate and new laws and technology to overcome, Erlich says. The technique relies on the custom of passing family names down through the fathers family. By statistically modeling the distribution of family names, the researchers were able to narrow the list of possible contributors of DNA samples. They then pinpointed individuals using a range of other publicly available sources, none of which were directly connected to the original donors and none of which included protected personal data.

Also see: "Personally Identifiable Information - My Digital DNA is Not for Sale!

This isn't a specific exploit against an effective wall of security, Erlich says. Instead, it demonstrates that genomic research may have grown beyond our ability to conceal the identities of the sources of DNA samples. The team started with a list of genomes that had already been sequenced, mapped and published for the use of genetic researchers. They analyzed the material to find identifying markers on the Y chromosome -- which is present only in men -- because surnames are generally passed down through fathers. They compared those Y markers to databases that list such markers along with the surnames of those from whom the samples were taken, but were not able to match all the samples with surnames using confirmed data. They determined which surnames were most likely to belong to which samples using scientifically accepted statistical models that were designed, among other things, to track the movement of regional populations by following the spread of family names.

The next step was more hack than science: The team used record-search engines on the Internet, obituaries, genealogical websites and demographic data from the National Institutes of Health's Human Genetic Cell Repository. Researchers then linked 50 of the samples to the names with those who contributed them.

Until now, the risk that private genetic data could be made public was considered fairly limited. Data about samples was kept separate from data about donors, and demographic data about the donors could only be supplied after identifiers were removed.

There is a risk to more than just donors, however. Even people who have never contributed a DNA sample could be identified and genetically typed if a relative has ever donated DNA. That scenario is becoming more likely as recreational genetic genealogy sites gain popularity. These sites trace family trees in part through a genetic componentand they make contributed genetic information available to members of the public, often without the same level of controls used by research or medical institutions. Until now, the identity of donors was considered protected if demographic and genetic data were kept in different databases and certain information was masked in the demographic record.

Legislation to keep research institutes from releasing any demographic information about donors would protect patient privacy, but would eliminate the ability of researchers who have identified markers for a particular disease to also identify the ethnic or cultural background of those who might have it, Erlich says. The whole point of scientific research is to publish the results so other researchers can build on it and develop more effective treatments. On the other hand, genetic information can be misused to identify members of ethnic or racial groups targeted for discrimination or other repressive or exploitative purposes, Erlich says.

Join the CSO newsletter!

Error: Please check your email address.

Tags security

More about Institute for Biomedical ResearchMIT

Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Stories by Kevin Fogarty

Latest Videos

  • 150x50

    CSO Webinar: The Human Factor - Your people are your biggest security weakness

    ​Speakers: David Lacey, Researcher and former CISO Royal Mail David Turner - Global Risk Management Expert Mark Guntrip - Group Manager, Email Protection, Proofpoint

    Play Video

  • 150x50

    CSO Webinar: Current ransomware defences are failing – but machine learning can drive a more proactive solution

    Speakers • Ty Miller, Director, Threat Intelligence • Mark Gregory, Leader, Network Engineering Research Group, RMIT • Jeff Lanza, Retired FBI Agent (USA) • Andy Solterbeck, VP Asia Pacific, Cylance • David Braue, CSO MC/Moderator What to expect: ​Hear from industry experts on the local and global ransomware threat landscape. Explore a new approach to dealing with ransomware using machine-learning techniques and by thinking about the problem in a fundamentally different way. Apply techniques for gathering insight into ransomware behaviour and find out what elements must go into a truly effective ransomware defence. Get a first-hand look at how ransomware actually works in practice, and how machine-learning techniques can pick up on its activities long before your employees do.

    Play Video

  • 150x50

    CSO Webinar: Get real about metadata to avoid a false sense of security

    Speakers: • Anthony Caruana – CSO MC and moderator • Ian Farquhar, Worldwide Virtual Security Team Lead, Gigamon • John Lindsay, Former CTO, iiNet • Skeeve Stevens, Futurist, Future Sumo • David Vaile - Vice chair of APF, Co-Convenor of the Cyberspace Law And Policy Community, UNSW Law Faculty This webinar covers: - A 101 on metadata - what it is and how to use it - Insight into a typical attack, what happens and what we would find when looking into the metadata - How to collect metadata, use this to detect attacks and get greater insight into how you can use this to protect your organisation - Learn how much raw data and metadata to retain and how long for - Get a reality check on how you're using your metadata and if this is enough to secure your organisation

    Play Video

  • 150x50

    CSO Webinar: How banking trojans work and how you can stop them

    CSO Webinar: How banking trojans work and how you can stop them Featuring: • John Baird, Director of Global Technology Production, Deutsche Bank • Samantha Macleod, GM Cyber Security, ME Bank • Sherrod DeGrippo, Director of Emerging Threats, Proofpoint (USA)

    Play Video

  • 150x50

    IDG Live Webinar:The right collaboration strategy will help your business take flight

    Speakers - Mike Harris, Engineering Services Manager, Jetstar - Christopher Johnson, IT Director APAC, 20th Century Fox - Brent Maxwell, Director of Information Systems, THE ICONIC - IDG MC/Moderator Anthony Caruana

    Play Video

More videos

Blog Posts

Market Place