NSA collects metadata for a large fraction of the cell phone calls placed in the United States. Upon inspection, analysts are allowed to make three hops from a targeted calling account. In other words, they're allowed to inspect the metadata for every contact that the targeted person has made, for every contact that those contacts have made, for every contact that those contacts have made, and finally, for all the contacts that those contacts have called, i.e. three "hops". One person in a community of 10,000 is targeted for having called a grandparent in Yemen.
In addition, there are two toll free hotlines, one for emergency assistance that is called by 214 unique community members, and another for information about various community events that is called by 673 unique community members. The full calling record for the community is listed here .
You are a journalist who has been given the calling record by a whistleblower inside NSA. You know that the targeting has been carried out but you don't know the phone number of the targeted individual. To get a sense of how many people fall under the dragnet when this one person is targeted, find the number of 2-hop connections for the following phone numbers, and report the average:
1 , 17, 793, 1200, 3402
Assumptions
1 2 3 4 5 6 7 8 |
|
This section requires Javascript.
You are seeing this because something didn't load right. We suggest you, (a) try
refreshing the page, (b) enabling javascript if it is disabled on your browser and,
finally, (c)
loading the
non-javascript version of this page
. We're sorry about the hassle.
We can conquer this problem with a modified depth first search. Whereas conventional DFS iteratively branches out from a root node until it reaches a terminal node, here we want to terminate after we're two hops away from the root node, i.e. we visit our neighbors, our neighbors, and our neighbor's neighbor's neighbor's.
Before we code our search, we need to import the phone calls and store all the direct calling partners (their 0-hop connections) of each user in a dictionary.
In short, we trim the return character from each line of the data, and split the lines on whitespace. Then, we append the first phone number to the 0-hop entry of the second phone number and vice versa. We use the
setdefault
method to avoid initializing the dictionary separate from updating it.Now, we wish to visit the caller's neighbors, accumulate them into a list of "reached" people, then visit the neighbors of every caller in our "reached" list, so long as they're not already in our list, and add them to our list. Finally, we want to cut this process off once we've made added all the 2-hop connections of the caller being targeted.
Consider the following implementation:
Calling this routine on the sample phone numbers,
[1, 17, 793, 1200, 3402]
, i.e.we find
[4644, 5928, 5541, 5365, 4171]
2-hop neighbors for those phone numbers, which puts the sample mean around 5,000.The reason the expanse of the hopping gets so large so quickly is due to the structure of calling networks. Most nodes are very closely connected to dense sub-networks of other callers.
Note that with just 2-hops, subpoenaing the records of a single caller intrudes on the privacy of half this community. As NSA is allowed to inspect 3-hops, if in the population of the United States, the average caller has 40 direct contacts, this potentially provides access to the calling records of 4 0 4 ≈ 2 . 5 × 1 0 6 3-hop neighbors per targeted individual.