Earlier this month, the ECTR Coalition released a public letter to the Senate addressing an amendment to the Electronic Communications Privacy Act that would expand the FBI’s ability to collect electronic communication transactional records (ECTR). It’s often assumed that non-content information like ECTR is not as personal as the content it describes, and this assumption is even codified in law. However, the ECTR Coalition, whose members include the American Civil Liberties Union, Facebook, Google, and more, warned senators not to underestimate the powers of ECTR to reveal “incredibly intimate” details about an individual—all “without any oversight from a judge.”

ECTR is not well-defined, but it belongs to a family of data known as metadata, which is data that describes other data, like the time and length of a call rather than a transcript of what was said. It may not seem like much, but this is surprisingly substantive information. Metadata can provide insight into the private life of the person or people it describes with astonishing detail. Even the way you scroll reveals information about you and the machine you are using. In Jonathan Mayer and Patrick Mutchler’s MetaPhone project, researchers were able to predict whether or not a participant in their study was in a romantic relationship with startling accuracy based on just two weeks of metadata from volunteers’ phones.

Mayer and Mutchler were also able to determine the religious affiliation and even health status of participants. Based purely on metadata, the researchers found that one subject had cardiac arrhythmia, another multiple sclerosis, and another chose to abort an unplanned pregnancy (after received advice and encouragement from a person who was identified—based on metadata—as her sister). They also noticed when one of their participants started to cultivate marijuana plants, and when another bought an automatic rifle.

Source: Pew Research Center.
Source: Pew Research Center.

Most Americans classify this data as personal and private. Pew Research found that a majority of those surveyed consider their health, who they have contacted, and relationships to be “very sensitive” or “somewhat sensitive” information.

Collecting Telephony Metadata: A Tightly Connected Network

The NSA’s ability to collect metadata in bulk under § 215 of the Patriot Act expired at the end of November 2015, but the techniques used can provide insight into how metadata is collected and analyzed in general. When the provision was in effect, the NSA collected all metadata involving U.S. callers on a daily basis from major telecommunications companies like Verizon, AT&T, and Sprint.

With data on so many people, how did intelligence officers evaluate potential threats—or find them in the first place? Rather than casting an indiscriminate net into the unruly waters of global telephone communications, the NSA began with a set of “seed” phone numbers that have been linked back to national security threats. There were 288 seeds in 2012, 423 seeds in 2013, 161 seeds in 2014, and 56 seeds in 2015.

According to a 2013 report from the President’s Review Group on Intelligence and Communications Technologies, the search moved out in a spanning tree: the NSA compiles a list of “every telephone number that either called or was called by the seed phone number in the past five years,” and does the same for each number on that list. The report goes on to explain how an investigation under § 215 of the Patriot Act would navigate the database:

“If we assume that the average telephone number called or was called by 100 phone numbers over the course of the five-year period, the query will produce a list of 10,000 phone numbers (100 x 100) that are two ‘hops’ away from the person reasonably believed to be associated with a foreign terrorist organization.”

ABOVE GRAPHIC: THE REACH OF TWO “HOPS” ON A NETWORK WHERE CALLERS PRODUCE THREE NEW NUMBERS. THE FIRST HOP ADDS THREE NEW CALLERS FOR A TOTAL OF FOUR (1 + 3), AND THE SECOND ADDS ANOTHER NINE FOR A TOTAL OF THIRTEEN (1 + 3 + 3 * 3).

The presidential review’s example represents a loosely connected network of callers, where each person is only in contact with a small percentage of other members of the network. In reality, U.S. callers are a tightly connected network linked by a small collection of high-frequency phone numbers. Numbers belonging to restaurants, pharmacies, or tech support lines might have thousands or even millions of contacts within the last five years. As an anonymous senior NSA official put it when speaking to James Bamford of WIRED in 2012, “everybody’s a target; everybody with communication is a target.”

U.S. callers are actually a tightly linked network. Blue dots represent participants in the Metaphone study. Red dots represent the most common outside numbers that called or were called by participants. Source: Jonathan Mayer, Patrick Mutchler Web Policy.
U.S. callers are actually a tightly linked network. Blue dots represent participants in the Metaphone study. Red dots represent the most common outside numbers that called or were called by participants. Source: Jonathan Mayer, Patrick Mutchler Web Policy.

The NSA’s bulk collection program has been limited by some important reforms, such as the exclusion of bulk data collected under Patriot Act § 215 from internal searches starting December 2015. By the end of 2013, the scope of database queries of telephony metadata were reduced from three degrees of separation from a seed number to two. The USA Freedom Act, signed into law in June 2015, ended the Patriot Act’s bulk collection programs. The NSA can still conduct searches, but telecommunications companies are no longer required to hand over their subscribers’ metadata, and they only have to retain the records for a minimum of eighteen months—down from five years.

Metadata analysis is a relatively new part of intelligence gathering. The events of the past few years have seen government agencies and privacy advocates engaging with the question of whether or not non-content such as metadata is truly “content-less.”