Your online address books are probably being accessed by the government. But you probably kinda knew that anyway.
The NSA collects hundreds of millions of address books and contact lists from emails and instant messaging accounts, the Washington Post reported late Monday, drawing once again from documents leaked by Edward Snowden.
According to the report, the NSA actively collects and stores "buddy lists" and online address books from "most major webmail" systems and has been since at least January 2012. The Agency uses these virtual reams of "metadata-rich" info to create searchable recreations of an individual's life based on their online connections.
The newly unveiled program expands on the NSA's reach even beyond the already expansive PRISM and Xkeyscore programs, which gave the government the ability to access nearly all digital communications.
The Post's report is largely drawn another matter-of-fact NSA PowerPoint presentation (linked here), that describes how the NSA's Special Source Operations (SSO) was able to collect nearly 450k address books per day (or roughly 250 million per year).
According to the NSA documentation, address book data accounts for nearly a fifth of the SSO's "major accesses," though only a small amount (13.8% of the data) is considered "attributable," meaning the information is verifiably traceable to an actual contact. In fact, the PowerPoint goes into detail about how one of the program's biggest technical setbacks were large amounts of spam adding noise to the system.
According to the PowerPoint, the program includes data culled from numerous services including Yahoo, Hotmail, Gmail, and Facebook. Web services often transmit information such as address book data whenever a user logs in to their services. For example, when you access your Gmail account on a new computer, the site has the ability to autofill past contacts in a message as you type. It does this by accessing your address book stored on Google's remote servers. According to the report, the NSA is able to snag this data during its transit over international access points.
It would be illegal for the NSA to collect this information from facilities in the United States as per the Foreign Intelligence Surveillance Act. However, according to unnamed sources quoted in the Post's report, the agency gets around this by collecting the data from access points all over the world, rather than directly on U.S. soil.
While Google's email services are encrypted by default, this may make little difference as a previous Snowden-release has detailed the NSA's ability to defeat many encryption schemes. It is perhaps also why Yahoo announced today that it would be moving to SSL encryption by default.
One big graph search
According to the NSA's analysis of a single day's collection, Yahoo was the most collected source, followed by Hotmail, Gmail, and Facebook. Facebook's data was by far the most accurate, however, ranking in at 95.87% attributable (that is, gave verifiable information on a real person). As a point of comparison, the next highest was Gmail, which came in at a measly 6.97% attributable. Facebook's attribution "success rate" is probably due the social network's insistence on non-anonymity and little spam within the service. In fact, the NSA program could be described to be, in effect, one giant Facebook graph search.
As of writing this, there is no official statement on the government's data collection transparency tumblr, ICOnTheRecord. The tumblr was the result of a pledge from the administration to foster more transparency in the collection activities. During a press conference in early August, Obama criticized the methodology of leaks being "released drip by drip, one a week, to kind of maximize attention."
If one Snowden's chief media contacts, Glenn Greenwald, is to be believed, Snowden supplied him with 15-20k secret documents before finally seeking asylum in Russia. Unlike with WikiLeaks' unredacted info dumps, Snowden's media contacts seem to be taking more care to vet and redact parts of these secret documents before going public, thus the relatively slower pace of distribution. This may be yet another train car in a very long line of surveillance revelations.