The NSA may review 10 times as much data as it claims
Yesterday, several media outlets reported on what appears to be an error in a statistic about how much information the U.S. National Security Agency is reviewing each day. According to the Atlantic Wire, the agency underestimated how much data its analysts look at by a factor of ten. The NSA, however, has since denied that any miscalculation was made.
So who is right?
The number in question is from a document released earlier this month by the NSA which claimed that the agency’s analysts only look at 0.00004 percent of the world’s Internet traffic each day. That white paper explained the number this way:
According to figures published by a major tech provider, the Internet carries 1,826 Petabytes of information per day. In its foreign intelligence mission, NSA touches about 1.6% of that. However of the 1.6% of the data, only 0.025% is actually selected for review. The net effect is that NSA analysts look at 0.00004% of the world’s traffic in conducting their mission--that’s less than one part in a million. Put another way, if a standard basketball court represented the global communication environment, NSA’s total collection would be represented by an area smaller than a dime on the basketball court.
In essence, the point being made by media outlets like the Atlantic is that the paper seems to imply the 0.00004 percent figure is derived by selecting 0.025 percent from the 1.6 percent of Internet traffic “for review.”
And, of course, 0.025 percent of 1.6 percent is 0.0004 percent--10 times larger than the percentage given in the NSA’s white paper.
Yesterday evening, however, an NSA spokesperson responded to the Atlantic Wire’s story on the apparent math error: “Our figure is valid; the classified information that goes into the number is more complicated than what’s in your calculation,” they said. “I’m not sure why you’re calling this a ‘discrepancy’ when the number in the white paper is valid.”
Assuming the NSA spokesperson is telling the truth, it would seem that there is an additional, omitted step in the calculation between the 0.0004 percent of data selected for review and the 0.00004 percent of information reviewed by “analysts.”
What this step looks like is hard to say. Possibly, the first figure, 0.0004 percent, refers to the amount of information flagged by NSA software (perhaps based on keywords or metadata about where it came from). From there, the number may be reduced through further computer analysis by a factor of 10 before it actually reaches the eyes of “analysts.”
Even if this is the case, the 0.0004% figure calculated by the Atlantic Wire and other publications is far from meaningless.
One important aside here before getting into what to make of the Atlantic Wire's calculation: The language in the NSA white paper is curiously--if not surprisingly--misleading. Particularly, the implication that the NSA only looks at 0.00004 percent of the “global communication environment” (italics mine).
The NSA is claiming it only touches 1.6 percent of all information moving through the Internet each day. But most of that data is accounted for by activities like real-time media streaming (62 percent) and file sharing (10.5 percent)--two things the NSA no doubt has little interest in. In fact, only 2.9 percent of Internet traffic is for communication. So, the NSA is potentially touching more than half of all communication on the Internet. In that context, regardless of whether analysts look at 0.00004 percent or 0.0004 percent, they are viewing a much larger percentage of relevant information on the Internet than the language in the white paper would seem to imply.
Taking that fact into consideration, here’s what the 0.0004% figure could represent: If the NSA is focused on communication like emails and chats when it touches 1.6 percent of Internet traffic, then, because only 2.9 percent of data on the net is communications, the agency is seeing about 55 percent of the world’s communications each day. Now take the .025 percent of 55 percent, and that means the NSA is flagging more than 1.3 percent of the communications for review.
In other words, 0.0004 percent of Internet traffic could still constitute 1.3 percent of global communications.
Believe, then, as the NSA seems to be claiming, that the number flagged for review is reduced by a factor of 10, and analysts could still be actively assessing .13 percent of communications on the Internet each day.
Photo by Tim Lucas/Flickr