New Snowden revelation: NSA collects millions of email and chat address books

What happens in email, doesn't stay there. But you knew that already.

The first page of the PowerPoint used in the Post’s story with some information redacted.

The first page of the PowerPoint used in the Post’s story with some information redacted.

Your online address books are probably being accessed by the government. But you probably kinda knew that anyway.

The NSA collects hundreds of millions of address books and contact lists from emails and instant messaging accounts, the Washington Post  reported late Monday, drawing once again from documents leaked by Edward Snowden.

According to the report, the NSA actively collects and stores "buddy lists" and online address books from "most major webmail" systems and has been since at least January 2012. The Agency uses these virtual reams of "metadata-rich" info to create searchable recreations of an individual's life based on their online connections.

The newly unveiled program expands on the NSA's reach even beyond the already expansive PRISM and Xkeyscore programs, which gave the government the ability to access nearly all digital communications.

The Post's report is largely drawn another matter-of-fact NSA PowerPoint presentation (linked here), that describes how the NSA's Special Source Operations (SSO) was able to collect nearly 450k address books per day (or roughly 250 million per year).

According to the NSA documentation, address book data accounts for nearly a fifth of the SSO's "major accesses," though only a small amount (13.8% of the data) is considered "attributable," meaning the information is verifiably traceable to an actual contact. In fact, the PowerPoint goes into detail about how one of the program's biggest technical setbacks were large amounts of spam adding noise to the system.

According to the PowerPoint, the program includes data culled from numerous services including Yahoo, Hotmail, Gmail, and Facebook. Web services often transmit information such as address book data whenever a user logs in to their services. For example, when you access your Gmail account on a new computer, the site has the ability to autofill past contacts in a message as you type. It does this by accessing your address book stored on Google's remote servers. According to the report, the NSA is able to snag this data during its transit over international access points.

It would be illegal for the NSA to collect this information from facilities in the United States as per the Foreign Intelligence Surveillance Act. However, according to unnamed sources quoted in the Post's report, the agency gets around this by collecting the data from access points all over the world, rather than directly on U.S. soil.

While Google's email services are encrypted by default, this may make little difference as a previous Snowden-release has detailed the NSA's ability to defeat many encryption schemes. It is perhaps also why Yahoo announced today that it would be moving to SSL encryption by default.

One big graph search

According to the NSA's analysis of a single day's collection, Yahoo was the most collected source, followed by Hotmail, Gmail, and Facebook. Facebook's data was by far the most accurate, however, ranking in at 95.87% attributable (that is, gave verifiable information on a real person). As a point of comparison, the next highest was Gmail, which came in at a measly 6.97% attributable. Facebook's attribution "success rate" is probably due the social network's insistence on non-anonymity and little spam within the service. In fact, the NSA program could be described to be, in effect, one giant Facebook graph search.

As of writing this, there is no official statement on the government's data collection transparency tumblr, ICOnTheRecord. The tumblr was the result of a pledge from the administration to foster more transparency in the collection activities. During a press conference in early August, Obama criticized the methodology of leaks being "released drip by drip, one a week, to kind of maximize attention."

If one Snowden's chief media contacts, Glenn Greenwald, is to be believed, Snowden supplied him with 15-20k secret documents before finally seeking asylum in Russia. Unlike with WikiLeaks' unredacted info dumps, Snowden's media contacts seem to be taking more care to vet and redact parts of these secret documents before going public, thus the relatively slower pace of distribution. This may be yet another train car in a very long line of surveillance revelations.

Join the CSO newsletter!

Error: Please check your email address.

Tags securityU.S. National Security Agencywashington post

More about FacebookGoogleHotmailNSAYahoo

Show Comments

Featured Whitepapers

Editor's Recommendations

Solution Centres

Stories by Evan Dashevsky

Latest Videos

  • 150x50

    CSO Webinar: The Human Factor - Your people are your biggest security weakness

    ​Speakers: David Lacey, Researcher and former CISO Royal Mail David Turner - Global Risk Management Expert Mark Guntrip - Group Manager, Email Protection, Proofpoint

    Play Video

  • 150x50

    CSO Webinar: Current ransomware defences are failing – but machine learning can drive a more proactive solution

    Speakers • Ty Miller, Director, Threat Intelligence • Mark Gregory, Leader, Network Engineering Research Group, RMIT • Jeff Lanza, Retired FBI Agent (USA) • Andy Solterbeck, VP Asia Pacific, Cylance • David Braue, CSO MC/Moderator What to expect: ​Hear from industry experts on the local and global ransomware threat landscape. Explore a new approach to dealing with ransomware using machine-learning techniques and by thinking about the problem in a fundamentally different way. Apply techniques for gathering insight into ransomware behaviour and find out what elements must go into a truly effective ransomware defence. Get a first-hand look at how ransomware actually works in practice, and how machine-learning techniques can pick up on its activities long before your employees do.

    Play Video

  • 150x50

    CSO Webinar: Get real about metadata to avoid a false sense of security

    Speakers: • Anthony Caruana – CSO MC and moderator • Ian Farquhar, Worldwide Virtual Security Team Lead, Gigamon • John Lindsay, Former CTO, iiNet • Skeeve Stevens, Futurist, Future Sumo • David Vaile - Vice chair of APF, Co-Convenor of the Cyberspace Law And Policy Community, UNSW Law Faculty This webinar covers: - A 101 on metadata - what it is and how to use it - Insight into a typical attack, what happens and what we would find when looking into the metadata - How to collect metadata, use this to detect attacks and get greater insight into how you can use this to protect your organisation - Learn how much raw data and metadata to retain and how long for - Get a reality check on how you're using your metadata and if this is enough to secure your organisation

    Play Video

  • 150x50

    CSO Webinar: How banking trojans work and how you can stop them

    CSO Webinar: How banking trojans work and how you can stop them Featuring: • John Baird, Director of Global Technology Production, Deutsche Bank • Samantha Macleod, GM Cyber Security, ME Bank • Sherrod DeGrippo, Director of Emerging Threats, Proofpoint (USA)

    Play Video

  • 150x50

    IDG Live Webinar:The right collaboration strategy will help your business take flight

    Speakers - Mike Harris, Engineering Services Manager, Jetstar - Christopher Johnson, IT Director APAC, 20th Century Fox - Brent Maxwell, Director of Information Systems, THE ICONIC - IDG MC/Moderator Anthony Caruana

    Play Video

More videos

Blog Posts

Market Place