||Enhancing Anonymity: Cryptographic and Statistical Approaches for Shredding Our Digital Dossiers
Never one to shy away from mixing humor with politics, fashion designer Kenneth Cole years ago ran an iconic ad that glibly declared, "You are on a video camera an average of ten times a day. Are you dressed for it?" While the statistic has been rendered a vast understatement, the sentiment still rings true: we all leave traces of ourselves when we interact in the world, and are generally oblivious to how often and how they will be spliced, recombined, processed, and shared.
Anonymity does not strive to eliminate such traces, however it does arguably the next best thing: it ensures that the originator of the trace cannot be personally identified. Practical instantiations of anonymity, such as the ones discussed within this essay, are characterized by what "identifiable information" is taken to mean, and to what degree uncertainty is instilled about the identities involved.
Identities are a complex mix of attributes we choose and accrue, combined with attributes that are assigned to us . It is common in the literature, as well as in public policy, to distinguish personally identifiable information (PII) like names, addresses, and DNA from other non-identifiable data like commercial transactions, viewing habits, or medical history. However, the reality is that all information, however menial, that distinguishes one person from another has the potential to personally identify an individual upon combination with additional information . Therefore, anonymity tools are best when designed to make careful and narrow provisions about what identities they protect.
Another property of an anonymity tool is how much information is leaked about the identity responsible for an observed action. By definition, it is not enough for unique identification, but that is not to say that it is zero either. A common formulation is an anonymity set: the set of all individuals who could have potentially initiated an action. As the size of the anonymity set grows, the actual initiator is afforded more anonymity. Of course, not all members of an anonymity set may be equally likely to have initiated a certain observed action. Thus, one direction in the literature considers how to quantify the information gained by a particular observation, or conversely the amount of uncertainty remaining after an observation, with standard tools from information theory like entropy [3, 4].
Domains Requiring Enhanced Anonymity
It is commonplace in our society for whistleblowing, the reporting of crime, the reporting of information to the press, and many other types of communication to be done anonymously. In these cases, the anonymity of the sender is not often guaranteed through technical measures, but is rather maintained through a trusted intermediary. David Chaum, in his seminal 1981 work, demonstrated how anonymous communication could be achieved, without trusting single entities, through the use of cryptography .
The key concept is that messages are forwarded through a series of servers. The message is encrypted by the user through a novel layered approach, where each layer can only be decrypted by one specific server in the path. Upon decryption by the server, the message directs it to send the payload to the next server (the last server will be directed to send the message to the recipient). The server waits until it has received and decrypted a batch of messages from different servers or users. It then reorders the messages and sends each one out.
From a security perspective, no server learns the entire path of the message. Each server only knows which server it got the message from and where it sent the message next; the message is rendered unrecognizable each time a server removes a layer of encryption. Since many messages are sent simultaneously, tracing a particular message from its sender to its recipient would require the collusion of every server along the path. Thus, the true sender of a particular message can only be determined to be someone within the set of entities sending messages at or shortly before the time of receipt.
Chaum's design, called a mix network, also provides a mechanism for the recipient to return a message to the sender, despite not knowing who the sender is. Subsequent research on mix networks is plentiful. Interested readers should consult an older survey of the security consequences of certain user/server policies (how long servers should hold messages, whether users should choose their own paths, and so on)  and recent work on a compact format for messages with the full suite of (arguably subtle) security properties . A somewhat direct implementation of Chaum's idea, Mixminion, can be used for sending anonymous email messages where the sender's email and Internet protocol (IP) addresses are anonymized.
WATCH: David Chaum is introduced by Dr. Joss Wright in this clip from Horizon, BBC News, 9/2/2014
While mix networks provide strong anonymity, they also introduce high latency. Not only are messages being routed through multiple servers, but each server is also waiting to receive a batch of messages before processing them. Latency of multiple hours may be acceptable for email, but it is certainly not for interactive communication like web browsing. Onion routing  is a low-latency variant of a mix network where traffic is still encrypted in layers (resembling an onion) and sent through multiple servers (called a circuit); however, the servers process and forward the messages immediately.
An adversary capable of observing the entire path (or even just the first and last server on the path) can infer, using time and message length patterns, if a traffic flow entering the network is the same as one leaving it . Thus, onion routing strives for a practical level of security against realistic adversaries.
Tor  is a fairly direct implementation of onion routing that can be used to anonymize users' IP addresses for transmission control protocol (TCP) streams. At the time of writing, the Tor project reports millions of daily users . Tor has spawned an immense amount of research. The scale of its deployment leads to exploration of efficiency [12, 13], usability , and incentives , while numerous attacks have been formulated on various aspects of the system including tracing traffic flows over long periods of time, disrupting servers, and cheaply congesting the network [16, 17, 9, 18].
WATCH: "Tor and Circumvention: Lessons Learned," Roger Dingledine, Invited talk at Crypto 2011, UCSB
Anonymizing TCP streams is necessary, but not sufficient, for anonymous web browsing. Users must be careful to ensure domain name system (DNS) lookups are routed through Tor. Web servers typically set cookies that can track users across different sessions, which is particularly problematic if users enable and disable Tor without flushing their cookies. For users who disable cookies, web servers can fingerprint a user's browser by examining a long set of configuration options that are often unique. Tor also addresses fingerprint attacks [19, 20].
Financial transactions with cash are another example of an application that requires anonymity. Even though bank notes have unique serial numbers, banks and businesses do not generally track these numbers. For online and electronic transactions, cash cannot be used. The alternative is to digitally authorize transactions through the banking or credit card system using a real identity.
Building a digital form of cash that is untraceable is particularly challenging, as it must also be impossible to counterfeit, duplicate, or double-spend it. Chaum proposed a solution to this problem using cryptographic techniques . The key concept of Chaum's scheme is a new cryptographic primitive that allows digital bank notes (or any message) to be blinded, digitally signed by the bank, and unblinded in such a way that the bank cannot determine what it signed and yet the signature remains intact after the unblinding process. This approach allowed banks to issue digital notes with unique serial numbers, but with no ability to track who was given what note. A robust solution to the double-spend problem was proposed later .
Chaum's proposal for anonymous digital cash created a new research field with many improvements [23, 24, 25]. Chaum's design was refined through his company DigiCash, but the company eventually failed without significant deployment.
In 2008, a novel design and implementation of a digital currency called Bitcoin was released online by a currently unknown developer . The key concept of Bitcoin is that every transaction is recorded in a public ledger maintained by a decentralized network of computers. Transactions move the Bitcoin currency between accounts, where accounts are identified only by a random cryptographic key (specifically, the fingerprint of a public key for digital signatures). Strictly speaking, Bitcoin does not provide anonymity, as all transactions are trivially traceable between public keys using the ledger. However, since the association between public keys and a user's identity need not be revealed, the user's public keys can serve as a set of psuedonyms. Users may choose to disclose that they own a particular key (for example, in order receive payment), may fail to hide their IP address when broadcasting a transaction to the network, or may enable clustering of multiple accounts they own by sharing transactions across them. Strengthening the anonymity of Bitcoin  or Bitcoin-like currencies [28, 29] is a current research trend.
WATCH: "Tutorial on Bitcoin Technologies," Joseph Bonneau, CITP Princeton, 3/27/14
Governmental elections are another example of anonymity in action. Voters must identify themselves to prove eligibility for an election, but specific ballots should not be traceable to the voters who cast them. While most of the examples of anonymity discussed thus far originate from a desire for privacy, ballot secrecy is also required in order to prevent coercion and vote selling.
Anonymity in elections presents a challenge: if votes cannot be traced to specific voters, the potential exists for the results of an election to be completely fabricated by a corrupt election authority, which could go undetected. However, using cryptographic techniques, election systems can be designed that allow voters to verify that their ballots were included unmodified in the final tally, without being able to prove to someone how they voted. Such elections were one application provided by Chaum in his original mix network .
The key concept is that voters could encrypt their votes and run them through a mix network along with all of the other ballots. While Chaum's original mix network did not allow servers to prove they output all messages correctly, such protocols were later proposed [30, 31]. The literature on secret ballot election systems with a provably correct tally is vast. A recent trend is building and deploying such systems. Significantly, the first governmental election to use such a system was held in 2011 by the municipality of Takoma Park, MD with the system Scantegrity .
WATCH: "Securing Digital Democracy" Coursera course, Fall 2014
Another governmental service that requires anonymity is the census. While citizens report their identities with their census information, statistics and metrics released from the data must ensure that the records of individual citizens cannot be identified. This is particularly challenging as very little data is needed to uniquely identify individuals--for example, date of birth, gender, and county of residence could identify 14 to 18 percent of US citizens [33, 34].
A growing body literature considers techniques for ensuring that datasets (from the census to consumer data to hospital records) resist the de-anonymization of the individuals included in the dataset. This may be accomplished by generalizing the data into broader categories and/or ranges to ensure single records are not unique , or by adding enough noise to the data to obfuscate the inclusion of any one individual in the dataset while preserving enough accuracy for the data to be statistically useful .
WATCH: "Differential Privacy Tutorial," Katrina Ligett, Simons Institute (Berkeley), 12/11/13
Various technologies for enhancing anonymity are available. The benefits of anonymity vary from situation to situation--in many cases, anonymity provides a level of privacy for the individual (for example, to browse the web without being tracked or to remain unidentified within a sensitive dataset); in other cases, anonymity could prevent coercion or retribution for certain actions (including whistleblowing or casting a ballot for a particular candidate). Anonymity can, of course, also be detrimental (when used, for example, to commit crimes or spread hate).
The balance between enabling privacy and reducing accountability is central to the debate over anonymity's role in society, and I do not attempt to resolve that here. However, the variety of methods for collecting data and tracing our actions is only increasing, while the usage of collected data is becoming less visible, both due to the vast technological improvements in harvesting, storing, transmitting, and analyzing large datasets. At the very least, anonymity enables us to roll back the clock on pervasive surveillance and tracking while we determine the most optimal outcome for society.
Created: Dec 15 2014
Untraceable electronic mail, return addresses, and digital pseudonyms
Chaum D. CACM
Blind signatures for untraceable payments
Chaum D. CRYPTO 82
Security without identification: transaction systems to make big brother obsolete
Chaum D. CACM
Tor: the second-generation onion router
Dingledine R., Mathewson N., Syverson P. USENIX Security 04
Differential privacy: a survey of results
Dwork C. TAMC 08
Bitcoin: a peer-to-peer electronic cash system
Nakamoto S. Bitcoin.org
Rethinking public key infrastructures and digital certificates: building in privacy Brands S., 2000
Privacy Enhancing Technologies Symposium (PETS): an annual conference where privacy and anonymity experts discuss new research and developments. According to the PETS site, "PETS addresses the design and realization of privacy services for the Internet and other data systems and communication networks."
Workshop on Privacy in the Electronic Society (WPES): an annual workshop focused on worldwide electronic privacy problems and their possible solutions.
Traveling the Silk Road: a measurement analysis of a large anonymous online marketplace
Christin N. WWW 2013
Internet voting in the US
Simons B., Jones D. CACM 55 (10), 2012, pp. 68-77
Wherefore art thou R3579X? Anonymized social networks, hidden patterns, and structural steganography
Backstrom L., Dwork C,. Kleinberg J. CACM 54 (12), 2011, pp. 133-141
Approximate algorithms with generalizing attribute values for k-anonymity
Park H., Shim K. Information Systems 35 (8), 2010, pp. 933-955
Security without identification: transaction systems to make big brother obsolete
Chaum D. CACM 29 (10), 1985, pp. 1030-1044
Kerr, I. The strange return of Gyges' Ring: an introduction. In Lessons from the identity trail: anonymity, privacy and identity in a networked society. Kerr, I., Steeves V. and Lucock, C. (Eds). Oxford University Press, New York, NY (2009), xxiii-xxxi.
Narayaran, A., Shmatikov, V. Myths and fallacies of "personally identifiable information". CACM 53, 6 (2010), 24-26.
Díaz, C., Seys, S., Claessens, J., Preneel, B. Towards measuring anonymity. In Proc. of PET 2002 (2003), 54-68.
Serjantov, A., Danezis, G. Towards an information theoretic metric for anonymity. In Proc. of PET 2002 (2003), 41-53.
Chaum, D. L. Untraceable electronic mail, return addresses, and digital pseudonyms. CACM 24, 2 (1981), 84-90.
Raymond, J.-F. Traffic analysis: protocols, attacks, design issues, and open problems. In Designing Privacy Enhancing Technologies (LNCS 2009) (2001), 10-29.
Danezis, G., Goldberg, I. Sphinx: a compact and provably secure mix format. In Proc. of the IEEE Symposium on Security and Privacy (2009), 269-282.
Syverson, P. F., Goldschlag, D. M., Reed, M. G. Anonymous connections and onion routing. In Proc. of the IEEE Symposium on Security and Privacy (1997), 44-54.
Shmatikov, V., Wang, M.-H. Timing analysis in low-latency mix networks: attacks and defenses.. In Computer Security -- ESORICS 2006 (LNCS 4189) (2006), 18-33.
Dingledine, R., Mathewson, N., Syverson, P. Tor: the second-generation onion router. In Proc. of the 13th USENIX Security Symposium (2004).
The Tor Project, Inc. Tor Metrics: Users.
Snader, R., Borisov, N. A tune-up for Tor: improving security and performance in the Tor network. In Proc. of the Network and Distributed System Security Symposium (NDSS) (2008).
Reardon, J., Goldberg, I. Improving Tor using a TCP-over-DTLS tunnel. In Proc. of the of the 18th USENIX Security Symposium (2009), 119-134.
Clark, J., van Oorschot, P. C., Adams, C. Usability of anonymous web browsing: an examination of Tor interfaces and deployability. In Proc. of the 3rd Symposium on Usable Privacy and Security (SOUPS) (2007), 41-51.
Ngan, T.-W., Dingeldine, R., Wallach, D. S. Building incentives into Tor. In Proc. of the of the 14th International Conference on Financial Cryptography and Data Security (LNCS 6052) (2010), 238-256.
Murdoch, S. J., Danezis, G. Low-cost traffic analysis of Tor. In Proc. of the 2005 IEEE Symposium on Security and Privacy (2005), 183-195.
Murdoch, S. J. Hot or not: revealing hidden services by their clock skew. In Proc. of CCS 2006 (2006), 27-36.
Jansen, R., Tschorsch, F., Johnson, A., Scheuermann, B. The sniper attack: anonymously deanonymizing and disabling the Tor network. In Proc. of NDSS 2014 (2014).
Perry, M., Clark, E., Murdoch, S. The design and implementation for the Tor browser [DRAFT]. March 15, 2013.
Eckersley, P. How unique is your web browser?. In Privacy Enhancing Technologies (LNCS 6205) (2010), 1-18.
Chaum, D. Blind signatures for untraceable payments. In Advances in Cryptology: Proc. of CRYPTO '82 (1983), 199-203.
Chaum, D., Fiat, A., Naor, M. Untraceable electronic cash. In Advances in Cryptology: Proc. of CRYPTO '88 (1990), 319-327.
Brands, S. Untraceable off-line cash in wallet with observers. In Advances in Cryptology: Proc. of CRYPTO '93 (1994), 302-318.
Camenisch, J., Hohenberger, S., Lysyanskaya, A. Compact e-cash. In Proc. of EUROCRYPT 2005 (LNCS 3494) (2005), 302-321.
Okamoto, T., Ohta, K. Universal electronic cash. In Advances in Cryptology: Proc. of CRYPTO '91 (1992), 324-337.
Nakamoto, S. Bitcoin: a peer-to-peer electronic cash system. Bitcoin.org (Oct. 31, 2008), 9 pages.
Bonneau, J., Narayanan, A., Miller, A., Clark, J., Kroll, J. A., Felten, E. W. Mixcoin: anonymity for Bitcoin with accountable mixes. Cryptology ePrint Archive: Report 2014/077 (Apr. 21, 2014), 25 pages.
Miers, I., Garman, C., Green, M., Rubin, A. D. Zerocoin: anonymous distributed e-cash from Bitcoin. In Proc. of the 2013 IEEE Symposium on Security and Privacy (SP '13) (2013), 397-411.
Ben-Sasson, E., Chiesa, A., Garman, C., Green, M., Miers, I., Tromer, E., Virza, M. Zerocash: decentralized anonymous payments from Bitcoin. In Proc. of the 2014 IEEE Symposium on Security and Privacy (SP '14) (2014), 459-474.
Sako, K., Kilian, J. Receipt-free mix-type voting scheme. In Advances in Cryptology: EUROCRYPT '95 (LNCS 921) (1995), 393-403.
Jakobsson, M., Juels, A., Rivest, R. L. Making mix nets robust for electronic voting by randomized partial checking. In Proc. of the 11th USENIX Security Symposium (2002), 339-353.
Carback, R., Chaum, D., Clark, J., Conway, J., Essex, A., Herrnson, P. S., Mayberry, T., Popoveniuc, S., Rivest, R. L., Shen, E., Sherman, A. T., Vora, P. L. Scantegrity II municipal election at Takoma Park: the first E2E binding governmental election with ballot privacy. In Proc. of the 19th USENIX Security Symposium (2010), 16 pages.
Sweeney, L. Simple demographics often identify people uniquely. Carnegie Mellon University, Data Privacy Working Paper 3 (2000), 34 pages.
Golle, P. Revisiting the uniqueness of simple demographics in the US population. In Proc. of the 5th ACM Workshop on Privacy in Electronic Society (WPES '06) (2006), 77-80.
Sweeney, L. k-Anonymity: a model for protecting privacy. International Journal of Uncertainty, Fuzziness, and Knowledge-Based Systems 10, 5 (2002), 557-570.
Dwork, C. Differential privacy: a survey of results. In Theory and Applications of Models of Computation (LNCS 4978) (2008), 1-19.