Study shows phone metadata is much more sensitive than top spies admit

Contrary to the claims of America’s top spies, the details of your phone calls and text messages—including when they took place and whom they involved—are no less revealing than the actual contents of those communications.

In a study published online Monday in the journal Proceedings of the National Academy of Sciences, Stanford University researchers demonstrated how they used publicly available sources—like Google searches and the paid background-check service Intelius—to identify “the overwhelming majority” of their 823 volunteers based only on their anonymized call and SMS metadata.

Using data collected through a special Android app, the Stanford researchers determined that they could easily identify people based on their call and message logs.

The results cast doubt on claims by senior intelligence officials that telephone and Internet “metadata”—information about communications, but not the content of those communications—should be subjected to a lower privacy threshold because it is less sensitive.

Contrary to those claims, the researchers wrote, “telephone metadata is densely interconnected, susceptible to reidentification, and enables highly sensitive inferences.”

Patrick Mutchler, a Ph.D candidate at Stanford and one of the study’s co-authors, said in an email that this conclusion “muddies the distinction between metadata and content and calls into question the legal distinctions between the two kinds of data.”

Alan Butler, a senior counsel at the Electronic Privacy Information Center, said in an email that the study offered “tremendous insight into the sensitivity of call records and other metadata” and bolstered civil-liberties advocates’ case for applying stronger warrant protections to this type of data.

“EPIC has consistently argued that these records should be protected under the Fourth Amendment and that certain types of [metadata] records (including location data and Web URLs) should only be obtained pursuant to a warrant,” Butler said.

Metadata is a major component of several U.S. mass-surveillance programs. The National Security Agency uses it to search for people who communicate with terrorism suspects in the hope of discovering extremist cells. Under current rules, the agency can collect metadata on people “two hops” out from a terrorism suspect, meaning anyone who talked to anyone who talked to the suspect.

One of the most controversial operations that former NSA contractor Edward Snowden exposed was a bulk phone records collection program, conducted under Section 215 of the USA Patriot Act, that required phone companies to periodically send the NSA vast swaths of customer metadata.

Responding to widespread public outrage, Congress ended that program in mid-2015 by passing the USA Freedom Act, the most substantial surveillance-reform law since the sweeping post-9/11 expansion of the intelligence bureaucracy. Now, when the NSA wants to obtain someone’s call metadata, the agency must request it from the phone companies.

But other bulk metadata programs remain active, and senior officials have defended them by arguing that metadata is not as sensitive as the contents of the communications themselves. Publicly, they argue that they collect two substantively distinct categories of data: “personally identifiable information” (PII), which includes communications linked to people’s identities, and non-PII, which is anonymized.

The Stanford researchers sharply rejected that framing in their study, writing that “the policy distinction between PII and non-PII is not based on sound science.”

“We’ve had several years of national conversation about metadata and even seen new legislation passed without public science on the topic to inform citizens,” Mutchler said. “More information can only help the conversation.”

Some officials have hinted at metadata’s true value. “We kill people based on metadata,” former NSA and CIA Director Michael Hayden said in May 2014.

In their report, the Stanford researchers argued that the ease with which they were able to identify people based on their metadata pointed to the need to rethink its sensitivity.

“Our results lend strong support to the view that telephone metadata is extraordinarily sensitive, especially when paired with a broad array of readily available information,” they wrote. “Over a large sample of telephone subscribers, over a lengthy period, it is inevitable that some individuals will expose deeply sensitive information. It follows that large-scale metadata surveillance programs, like the NSA’s, will necessarily expose highly confidential information about ordinary citizens.”

Amie Stepanovich, U.S. policy manager at the digital-rights group Access Now, said in an email that the new research highlighted the danger of downplaying metadata’s sensitivity.

The study, she wrote, “irrefutably demonstrates the need for our legal system to adapt in order to meaningfully protect users’ expectations of privacy.”

It’s trivially easy to identify you based on records of your calls and texts

Metadata isn’t as anonymous as government officials want you to believe.

Eric Geller