"Science, Faculty of"@en . "Computer Science, Department of"@en . "DSpace"@en . "UBCV"@en . "Alavi, Marjan Sadat"@en . "2016-01-27T02:03:48"@en . "2015"@en . "Master of Science - MSc"@en . "University of British Columbia"@en . "Anonymous communications is a long sought goal of dissidents, privacy advocates, and a host of other user communities. While a large number of systems have been proposed, those systems generally require large-scale communication infrastructure to be built in order to achieve a non-trivial amount of anonymity. However, the very nature of anonymity has meant that a business rationale for building such infrastructure is lacking.\r\nMotivated to design a practical receiver anonymity network with large anonymity set sizes, we present Twistor: a new system for receiver-anonymous communications which leverages the Twitter social graph as the underlying anonymizing layer. To send a twist, a Twistor client first checks for reachability of its intended recipient, using local graph information maintained by interactions with the Twistor server. It then encrypts the message under the recipient's public key and posts the ciphertext to the corresponding user's timeline. Larger ciphertexts are encoded into an image, so as to conform with Twitter's 140 UTF-8 character limit. Twistor clients are listening for Twistor posts and decrypt and repost when those they follow publish a new twist, except for when a TTL indicator is 0. Given self-reducibility properties of ElGamal, even if the adversary, e.g., Twitter, monitors all Twistor posts and re-posts that cascade from an initial post, the anonymity of the receiver and the confidentiality of the plaintext are reducible to the hardness of Decisional Diffie-Hellman problem. Twistor derives its increase in size of the receiver anonymity set from asymmetric social connections combined with the publish-subscribe communication model in Twitter. Our aim is to achieve receiver anonymity set sizes on the order of hundreds of thousands. In this thesis we describe our built system, the cost to the underlying infrastructure and the tradeoffs between those costs and the size of the anonymity set."@en . "https://circle.library.ubc.ca/rest/handle/2429/56719?expand=metadata"@en . "Twistor: Anonymous MessageTransfer through TwitterbyMarjan Sadat AlaviA THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)January 2016c\u00C2\u00A9 Marjan Sadat Alavi 2015AbstractAnonymous communications is a long sought goal of dissidents, privacyadvocates, and a host of other user communities. While a large numberof systems have been proposed, those systems generally require large-scalecommunication infrastructure to be built in order to achieve a non-trivialamount of anonymity. However, the very nature of anonymity has meantthat a business rationale for building such infrastructure is lacking.Motivated to design a practical receiver anonymity network with largeanonymity set sizes, we present Twistor: a new system for receiver-anonymouscommunications which leverages the Twitter social graph as the underlyinganonymizing layer. To send a twist, a Twistor client first checks for reacha-bility of its intended recipient, using local graph information maintained byinteractions with the Twistor server. It then encrypts the message under therecipient\u00E2\u0080\u0099s public key and posts the ciphertext to the corresponding user\u00E2\u0080\u0099stimeline. Larger ciphertexts are encoded into an image, so as to conformwith Twitter\u00E2\u0080\u0099s 140 UTF-8 character limit.Twistor clients are listening for Twistor posts and decrypt and repostwhen those they follow publish a new twist, except for when a TTL indicatoris 0. Given self-reducibility properties of ElGamal, even if the adversary, e.g.,Twitter, monitors all Twistor posts and re-posts that cascade from an initialpost, the anonymity of the receiver and the confidentiality of the plaintextare reducible to the hardness of Decisional Diffie-Hellman problem.Twistor derives its increase in size of the receiver anonymity set fromasymmetric social connections combined with the publish-subscribe com-munication model in Twitter. Our aim is to achieve receiver anonymity setsizes on the order of hundreds of thousands.In this thesis we describe our built system, the cost to the underlyinginfrastructure and the tradeoffs between those costs and the size of theanonymity set.iiPreface\u00E2\u0080\u00A2 Chapter 3 of this work is a product of weekly meetings with my su-pervisor Prof. Bill Aiello.Jean-Se\u00C2\u00B4bastien Le\u00C2\u00B4gare\u00C2\u00B4 has also made some insightful suggestions dur-ing our system design; He also suggested an architecture for the in-terconnection between the user and the web server on Twistor clientside.\u00E2\u0080\u00A2 The implementation in Chapter 4 is done by myself, with some codesin Section 4.2 being written by Mira Leung. Some parts of those codesdoes not appear the the final version of our work; due to the changein the design.\u00E2\u0080\u00A2 The evaluation in Chapter 5 is done by myself. In the preliminaryversion of this work, the cumulative distribution graphs for the numberof per level Twitter followers were made by Maria Fung.Robert Reiss has also provided useful insights into Chapter 5.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . viiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1 Twitter Terminology . . . . . . . . . . . . . . . . . . . . . . 42.2 Threat Model . . . . . . . . . . . . . . . . . . . . . . . . . . 43 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . 63.1 \u00E2\u0080\u009CStraw Man\u00E2\u0080\u009D Design . . . . . . . . . . . . . . . . . . . . . . 63.2 Twistor Design . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2.1 Twistor Client . . . . . . . . . . . . . . . . . . . . . . 73.2.2 Design Challenges . . . . . . . . . . . . . . . . . . . . 73.2.3 Twistor Graph Service . . . . . . . . . . . . . . . . . 84 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.1 Twistor Client . . . . . . . . . . . . . . . . . . . . . . . . . . 104.1.1 Cryptographic Setting . . . . . . . . . . . . . . . . . . 104.1.2 Twist Generation . . . . . . . . . . . . . . . . . . . . 104.1.3 Reposting a Twist . . . . . . . . . . . . . . . . . . . . 114.2 Graph Server . . . . . . . . . . . . . . . . . . . . . . . . . . . 11ivTable of Contents5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.0.1 Size of G\u00E2\u0088\u0097 Subgraph . . . . . . . . . . . . . . . . . . . 145.0.2 Number of Reposts of a Twist . . . . . . . . . . . . . 155.0.3 Build Time of G\u00E2\u0088\u0097 Subgraph . . . . . . . . . . . . . . 155.0.4 Memory Consumption and Initialization . . . . . . . 175.0.5 Twitter API versus Cryptographic Costs . . . . . . . 176 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186.1 Anonymity through Social Graphs . . . . . . . . . . . . . . . 186.2 Increasing Anonymity Set Size . . . . . . . . . . . . . . . . . 196.3 Anonymity through Multicast . . . . . . . . . . . . . . . . . 207 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22vList of Figures3.1 An overview of Twistor\u00E2\u0080\u0099s system architecture . . . . . . . . . 95.1 CDF for size of G\u00E2\u0088\u0097 for varying number of levels . . . . . . . . 145.2 CDF for number of reposts of a twist for varying number oflevels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155.3 CDF for G\u00E2\u0088\u0097 build time for varying number of levels . . . . . . 16viAcknowledgementsI would like to express my deepest gratitude to my supervisor, Professor BillAiello, for his patience and continuous generosity and support throughoutmy studies and research. Bill\u00E2\u0080\u0099s expertise and understanding added consider-ably to my graduate experience at UBC. With his words of wisdom, advice,encouragement and positive perspective, he showed me how to achieve atrue balance between professionalism and reality.I am indebted to my fellows and friends Jean-Se\u00C2\u00B4bastien Le\u00C2\u00B4gare\u00C2\u00B4 andRobert Reiss whom have offered inspiring and helpful comments on ourweekly meetings and on dozens of my practice presentations.I want to thank the undergraduate students I worked with: Maria Fungand Mira Leung for studying Twitter social graph and working on Twistorgraph server code in the preliminary version of our research.I am thankful to all faculties, fellows and friends at NSS lab: Prof.Ivan Beschastnikh, Nodir Kodirov, David Williams-King and all other NSSmembers, for making my graduate experience the one that I will cherishforever.I thank Prof. Norm Hutchinson for spending his valuable time on readingmy thesis and providing valuable feedback on my thesis. Also, I appreci-ate Prof. Andrew Warfield\u00E2\u0080\u0099s wise and useful comments during my thesispresentation.I truly appreciate my family\u00E2\u0080\u0099s confidence in me. Through their genuinelove, unconditional care and patience they helped me overcome setbacks andstay focused on my studies.I am grateful to many other people at the Computer Science department,for their various forms of support during my graduate studies. Finally, I wantto express my appreciation to the Faculty of Graduate and PostdoctoralStudies and University of British Columbia for their international tuitionaward and for providing me with all sort of amenities throughout my studies.viiDedication\u00E2\u0080\u009C... Keep walking, though there is no place to get to.Don\u00E2\u0080\u0099t try to see through the distances. That\u00E2\u0080\u0099s not for human beings.Move within, but don\u00E2\u0080\u0099t move the way fear makes you move.Today, like every other day, we wake up empty and frightened.Don\u00E2\u0080\u0099t open the door to the study and begin reading.Take down a musical instrument.Let the beauty we love be what we do...\u00E2\u0080\u009DMawlana Rumi (1207-1273)To my precious parents and siblings.To my kind supervisor, Professor Bill Aiello.viiiChapter 1IntroductionAlthough data encryption is necessary for hiding the content of a message,just by observing the communication, the adversary can learn the identity ofthe communication endpoints. As a result, there has been plenty of researchon designing systems which provide anonymity: [20], [21], [22], [25], [16],[26], [24], [23], [11], [12], [13], [14], [16] to name a few.With the help of anonymity systems, activists, dissidents and investiga-tive journalists living under oppressive regimes can communicate informa-tion with particular sources without being concerned about any social orgovernmental consequences.Anonymity of a subject means the subject is not identifiable within aset of subjects, the anonymity set [1]. Anonymity systems can protect theidentity of either one or both of the communication endpoints. Using senderanonymous systems, one can hide the identity of the original sender of themessage, whereas receiver anonymity focuses on protecting the identity ofthe data recipient.Depending on the strength of the design and using various traffic analysistechniques, the adversary might still be able to make some conclusions aboutthe potential communication endpoints. In this respect, the group of peoplewho are potential senders or receivers of a message, create the sender orreceiver anonymity set respectively.Various measures have been proposed in order to quantify the level ofanonymity offered by anonymous communication systems. One of the mostpopular anonymity metrics is Entropy1 [2], [3]. The entropy metric considersan adversary whom have assigned probabilistic information to different usersas being the originators or recipients of a message.Using this anonymity metric, if there be N users in the sender or receiveranonymity set and user i be the true sender or recipient of a message withprobability pi, then the entropy of the system H(X) is calculated as follows:H(X) = \u00E2\u0088\u0092\u00E2\u0088\u0091Ni=1 pi.log2(pi).As such, the amount of information the adversary has learned through1Based on Shannon\u00E2\u0080\u0099s definition of Entropy.1Chapter 1. Introductionher attack can be expressed as log2(N) \u00E2\u0088\u0092 H(X). Clearly, the bigger theanonymity set, the smaller the decrease in entropy.While a large number of anonymity systems have been proposed, the vastmajority of those systems require large-scale communication infrastructureto be built in order to achieve a non-trivial amount of anonymity. How-ever, most such systems have not reached the critical mass, because thevery nature of anonymity has meant that a business rationale for buildingsuch infrastructure was lacking. TOR [20] pointed the way by showing that apractical, but not necessarily perfect, anonymity system could be built as anoverlay on top of a general purpose, economically viable routing infrastruc-ture that was already built\u00E2\u0080\u0093the global IP routing infrastructure. Nonethe-less, some hesitate to use TOR as it is impossible to ascertain whether asufficient number of TOR routers have been penetrated by an untrustedagent so as to de-anonymize some or all of a user\u00E2\u0080\u0099s communication.Motivated to build a feasible and practical anonymity system, in thisthesis we present Twistor, a new system for achieving receiver anonymouscommunication. In this regard, we selected Twitter as a likely candidateto host our anonymous messaging system because of its reliability and itsability to forward messages to millions of people. Twitter is by far one ofthe most popular social networking service with over 300 million monthlyactive users as of August 2015 [19]. While most social networks are designedaround the notion of mutual friendship with symmetric social connections,Twitter follows an asymmetric connection design. The asymmetry of so-cial connections combined with Twitter\u00E2\u0080\u0099s publish-subscribe communicationmodel makes it very suitable for sending information to a very large audi-ence through iterative reposts.In summary, our contributions in this thesis are as follows:\u00E2\u0080\u00A2 Leveraging the publish-subscribe communication model in asymmetricsocial networks to build a receiver anonymity system for social networkusers.\u00E2\u0080\u00A2 Providing unlinkability of the communication between sender and re-ceiver from the viewpoint of any online global adversary including thesocial network provider, even though sender and receiver know theircorresponding communicating partner.\u00E2\u0080\u00A2 Providing receiver anonymity set sizes on the order of hundreds of2Chapter 1. Introductionthousands for low latency applications (with around 7 seconds end-to-end latency).\u00E2\u0080\u00A2 Implementing the system on top of an existing asymmetric social net-working infrastructure, i.e,Twitter.Chapter 2 describes Twistor\u00E2\u0080\u0099s threat model. Chapter 3 presents Twistor\u00E2\u0080\u0099ssystem architecture. In Chapter 4, we describe the implementation detailsof our system. Chapter 5 presents the results of our experiments. Chapter 6describes how Twistor relates to previous work. Chapter 7 concludes thisthesis.3Chapter 2Threat Model2.1 Twitter TerminologyTwitter is one of the most popular asymmetric social networks where usersexchange short messages, called Tweets, limited to 140 UTF-8 characters.Users who subscribe to a Twitter account, are called followers of that accountwhich is their respective followee2. The real-time stream of Tweets sharedby all followees of a user, is called the user\u00E2\u0080\u0099s home timeline. When a Twitteruser posts a tweet to her timeline, the tweet is displayed in the respectivetimelines of all her followers.2.2 Threat ModelIn anonymous communications, the adversary can either launch passive oractive attacks. In Passive attacks, the adversary observes traffic passingthrough the relays in order to correlate inputs of a relay with its corre-sponding output. In active attacks, however, the adversary modifies someinput messages while trying to trace the communication. In addition, theadversary can either be global or local. A global adversary has access to thewhole communication system, while a local adversary can only control partof the resources.We consider a global network adversary being able to monitor all tweetspassed through the network, but whom can not subvert the cryptographicprimitives. Also, we assume the Twitter graph is completely known to theadversary. The security property we want to maintain is that even if theadversary observes all the network communications, he should not be able todetermine the identity of the intended recipient of a given message, withinthe receiver anonymity set known to him.If some clients in the path misbehave by not processing and retweetingTwistor-related posts, this would cause DoS3 only, but would not disrupt re-2We use terms followee and friend interchangeably throughout this thesis.3Denial of Service.42.2. Threat Modelcipient anonymity. The same reasoning holds for adversary-controlled clientswhom try to actively modify the messages passing through them.For our anonymity purpose, we do not assume any trust on the providerof the social network other than it conforms to its own service level agree-ment. In other words, we assume Twitter provides correct information whenqueried for its graph information and that it does not misbehave when userspublish some content to their subscribers.5Chapter 3System Architecture3.1 \u00E2\u0080\u009CStraw Man\u00E2\u0080\u009D DesignConsider a Twitter user whom is willing to send a message to her intendedrecipient, while protecting receiver\u00E2\u0080\u0099s anonymity against any entity who ismonitoring the communication. In a simple \u00E2\u0080\u009Cstraw man\u00E2\u0080\u009D design, the senderencrypts the message under the recipient\u00E2\u0080\u0099s public key and posts the cipher-text into her timeline. All followers as well as any other entity whom mightexpect receiving a message from the user, could then try decrypting themessage with their associated private keys. With a careful choice of thecryptosystem, the cipher texts would not reveal any information about thepublic key being used for encrypting the message4. This motivates potentialrecipients to try decrypting the ciphertext and at the same time doesn\u00E2\u0080\u0099t leakany information about the intended recipient to the adversary.Using this construction, the recipient anonymity set would consist ofsender\u00E2\u0080\u0099s intermediate followers as well as any Internet user in general whomvisits sender\u00E2\u0080\u0099s timeline. Note that Twitter access logs can easily revealthe IP addresses of non-followers or Internet users not connected to Twitterwhom try to read sender\u00E2\u0080\u0099s timeline, unless those users are connected throughan anonymizer tool. Therefore, the effective recipient anonymity set for atypical Twitter user5 would be rather small.3.2 Twistor DesignIf upon posting the ciphertext to her timeline, the user\u00E2\u0080\u0099s followers act asrelays by posting the ciphertext to their respective timelines, this wouldincrease the receiver anonymity set size in a desirable way. This makes thebasis for Twistor design. In this chapter, we describe the components of ourimproved design.4This property is called \u00E2\u0080\u009Cindistinguishability of the key\u00E2\u0080\u009D and not all cryptosystemsmaintain this property. ElGamal cryptosystem, which we have used in this work, satisfiesthis property [17].5A typical Twitter user has about 200 twitter followers.63.2. Twistor Design3.2.1 Twistor ClientMotivated by our goal that is achieving large recipient anonymity set sizes,we construct a protocol where the sender\u00E2\u0080\u0099s original message, encrypted un-der intended recipient\u00E2\u0080\u0099s public key, is relayed through the sender\u00E2\u0080\u0099s followersubgraph. For this purpose, we utilize the Twistor client, a piece of softwarebeing run by follower nodes, which maintains a long-lived HTTPS connec-tion to Twitter\u00E2\u0080\u0099s streaming API. Twistor client decrypts the ciphertext itreceives to check if the corresponding message is destined for the associateduser and reposts the ciphertext to the user\u00E2\u0080\u0099s timeline.3.2.2 Design ChallengesEven though the mentioned design results in a large receiver anonymityset which is desirable for our goal, if left unfettered, this can result in anexponential increase in the number of tweets. However we are not interestedin flooding Twitter with messages.Another challenge is how to ensure the intended recipient falls withinthe sender\u00E2\u0080\u0099s subgraph.We also need a way to beat Twitter\u00E2\u0080\u0099s 140 character limit for the lengthof tweets.In what follows, we explain how we overcome these challenges.Overcoming Twitter\u00E2\u0080\u0099s 140 character limitAs already stated in previous chapters, currently Twitter only allows ex-change of short messages limited to 140 UTF-8 characters. However, if animage is attached to the Tweet, the maximum limit for the image size wouldbe 3 MB.ElGamal cryptosystem results in a 2:1 expansion in size from the plain-text to the ciphertext. Normally, a Twistor plaintext consists of the sender\u00E2\u0080\u0099soriginal message appended to the message\u00E2\u0080\u0099s checksum6. The plaintext isthen encrypted using the ElGamal encryption scheme. The resulting cipher-text is then attached to a Twistor-specific hashtag and a TTL indicator. Assuch, if sender\u00E2\u0080\u0099s input message be larger than a certain length, the size ofthe Twist can easily exceed the 140 character limit.In order to solve this problem, we stegano-embed larger messages, whichare symmetrically encrypted, into an image. We will then only need toencrypt the symmetric key and its associated checksum with the ElGamal6The details are elaborated in the next chapter.73.2. Twistor Designencryption scheme. As the size of the symmetric key is fixed to 16 bytes inour current scheme, this will never exceed the 140 character limit.Truncating sender\u00E2\u0080\u0099s subgraphIn order to avoid flooding Twitter with senders\u00E2\u0080\u0099 messages, we attach acounter as a TTL flag to the generated ciphertext. When a client gets aciphertext, it first decrement the TTL flag by one and reposts the result-ing message to associated user\u00E2\u0080\u0099s timeline. Additionally, a history of alreadyretweeted messages is kept for some limited time. If Twistor client gets amessage that has been posted before or if TTL flag reaches zero, the messageis discarded.Checking for recipient\u00E2\u0080\u0099s reachabilityWe need a way to check if a given node falls in the sender-originated sub-graph, which is now truncated as described above. The easiest way to dothis is to directly ask twitter or some third party about this information.However, doing so leaks the identity of the intended recipient, which con-tradicts our purpose of hiding this information. Another option is that theTwistor client crawls the entire follower subgraph down to certain numberof levels and then locally checks if this set includes a given node. However,Twitter rate limits its API calls, which makes this approach impractical.Our proposed solution is to provide this functionality as a separate ser-vice, which we will explain in the following section.3.2.3 Twistor Graph ServiceIn our proposed design, Twistor clients perform an initial registration withthe graph service and declare the list of immediate followers/followees ofthe corresponding user collectively. The graph server aggregates this infor-mation into one large graph G. Upon request or even on certain intervals,the graph server provides clients with the computed subgraph G\u00E2\u0088\u0097 for theassociated user. For each user, the nodes in its corresponding subgraph G\u00E2\u0088\u0097make the set of all reachable nodes down to a certain level.Definition 1: The Twistor follower graph G = (V,E) is a directedgraph of vertices V and edges E, where V and E represent Twistor usersand followee/follower relationship among them respectively.We define the Twistor\u00E2\u0080\u0099s follower subgraph for a Twistor user as follows:Definition 2: The Twistor\u00E2\u0080\u0099s follower subgraph for a Twistor user withusername U is a subgraph G\u00E2\u0088\u0097U = (V\u00E2\u0088\u0097, E\u00E2\u0088\u0097) rooted at U induced by fol-83.2. Twistor DesignRegisters, sends public key, followers/friendsReceives updated subgraph G*OAuth authorization for making twists/retwists on behalf of user Access to REST and Streaming APITwistor Client with Subgraph G* Twistor Graph ServerFigure 3.1: An overview of Twistor\u00E2\u0080\u0099s system architecturelowee/follower relationships down to a certain depth l, where V \u00E2\u0088\u0097 \u00E2\u008A\u0086 V andE\u00E2\u0088\u0097 \u00E2\u008A\u0086 E.When an update occurs in the user\u00E2\u0080\u0099s list of followees, this event is re-ceived via Twistor client\u00E2\u0080\u0099s streaming API, which in turn, sends the updateto Twistor\u00E2\u0080\u0099s graph server. Thus, the graph server maintains an updatedversion of G over time.In the meanwhile, each Twistor client declares its (signed) public key tothe graph server upon registration. Although Twistor clients could make useof private information retrieval to query for the public keys of some subsetof nodes [18], in our current implemented scheme, associated user\u00E2\u0080\u0099s publickeys are directly incorporated into G\u00E2\u0088\u0097 vertices.Figure 3.1 illustrates the interaction between our system components.9Chapter 4ImplementationTwistor client and graph server are written in around 4500 lines of Python.In this chapter, we explain some implementation details for Twistor.4.1 Twistor Client4.1.1 Cryptographic SettingWe use NIST elliptic curve based on prime-order groups for primes of length256 bits as the underlying algebraic group. We chose ElGamal over ellipticcurve P-256 as the asymmetric cryptosystem. We deploy AES7 with 128bit keys for symmetric encryption and use SHA1 hash digest for messageintegrity check.4.1.2 Twist GenerationWhen a user U intends to send a recipient-anonymous message through thesystem, she provides the input message and the final recipient\u00E2\u0080\u0099s usernameinto the Twistor client\u00E2\u0080\u0099s web interface. Using subgraph G\u00E2\u0088\u0097U , the Twistorclient checks for reachability of the provided recipient. If the check fails,user is notified. The user can then notify the intended recipient out ofband to randomly follow one of the nodes in G\u00E2\u0088\u0097U in order to make subse-quent anonymous communications possible. Otherwise, if the recipient wasproved reachable, input message is encrypted under recipient\u00E2\u0080\u0099s public key8and posted to the Twitter timeline of user U . As previously stated, largermessages are stegano-embedded into an image and then posted to the user\u00E2\u0080\u0099sTwitter timeline.More precisely, for larger messages, a hybrid encryption scheme is used:the input message is encrypted with AES-128, appended with HMAC mes-sage authentication code and finally embedded into an image. The AES-128 symmetric key (or the message in case of smaller input messages), is7In cipher block chaining (CBC) mode.8Please note that G\u00E2\u0088\u0097 nodes also incorporate the public key of the associated node.104.2. Graph Serverappended with the message checksum9. The resulting message is then en-crypted with the recipient\u00E2\u0080\u0099s public key. Encrypted message together withthe accompanying image (if existing) are posted as an status update to theuser\u00E2\u0080\u0099s timeline.A status post generated by Twistor, which we name a twist, is accom-panied by a hashtag marking the status as being generated by a Twistorclient. Additionally, as explained in the previous chapter, a TTL indicatoris attached as a clear-text to the twist hashtag. Each client receiving a twistwould decrement the TTL indicator by one so as to stop propagation of thetwist after certain level.4.1.3 Reposting a TwistTwistor clients are constantly listening for new Twists from the correspond-ing user\u00E2\u0080\u0099s followees. When a followee posts a new status, the Twistor clientchecks for the presence of Twistor hashtag and verifies if the status is a validTwistor ciphertext. Upon detection of a twist, the Twistor client decryptsthe ciphertext and verifies its checksum in order to ascertain whether thetwist is intended for the corresponding user.If the checksum is correct and the twist does not include an image, ac-companying plaintext is the sender-originated message which was originallyintended for the user. If the twist is accompanied by an image, the Twistorclient keeps the plaintext as an AES-128 key and continues with decodingthe ciphertext embedded into the image. After decoding the ciphertext andverifying its integrity using HMAC, the client decrypts the ciphertext usingthe AES-128 symmetric key in order to obtain the original sender\u00E2\u0080\u0099s plaintextmessage.If hash of the twist was found in a locally maintained dictionary or if theTTL flag was zero, the ciphertext is not reposted any further. Otherwise,Twistor client decrements the TTL indicator by one. The resulting twist,in which we call retwist, together with the accompanying image (if any) isposted as an status update to the corresponding user\u00E2\u0080\u0099s timeline.4.2 Graph ServerThe multithreaded graph server maintains the most up-to-date graph G.When a Twistor client initially registers with the graph server and declares9We use the first few bytes of the message\u00E2\u0080\u0099s SHA1 hash digest as the checksum.114.2. Graph Serverthe list of immediate neighbours for the corresponding user U , it receivesthe subgraph G\u00E2\u0088\u0097U .Later on, as users follow and unfollow other nodes, the vertices andedges in G and G\u00E2\u0088\u0097U get updated in the graph server. As a result, subsequentrequests of the subgraph, only involves sending the updates as opposed to theentire subgraph to Twistor client. In our implemention, sending incrementalupdates from the graph server is also possible via pub/sub rather than ondemand.12Chapter 5EvaluationThere exists a body of quantitative research that have analyzed the Twit-ter social graph in terms of its follower-following topology, its connectedcomponents and the distribution of information propagation within eachcomponent [28], [29], [30].In this section, we provide results for evaluating our system. We havestudied the following parameters in our evaluation:- Number of levels that the ciphertext needs to be relayed, so that thereceiver anonymity set size is in the order of hundred of thousands.- Number of reposts performed by intermediate relays in order to com-municate a message between sender and receiver.- The overall delay in sending the message to the recipient.- Amount of memory consumption for storing follower subgraph G\u00E2\u0088\u0097.It is noteworthy to mention that in practice, where some followers maynot use Twistor, fixing the same number of levels that a ciphertext is re-layed for all users, results in an unsatisfactorily low number of participatingrelays and hence a low anonymity set size. Thus, in order to maintain thesenumbers large enough, the Twistor graph server can tune an appropriatelevel for a given user\u00E2\u0080\u0099s G\u00E2\u0088\u0097 subgraph, so as to obtain the desired anonymityset size. Using this approach, different users might end up having a differentnumber of levels in their corresponding G\u00E2\u0088\u0097 subgraph.We performed our evaluation on a Twitter graph dataset of 2011 10 [27].The dataset was composed of over 11 million users and over 85 million socialrelations. We conducted our analysis on a random population of 5000 users.All tests are performed on OS X 10.10 with 3GHz Intel i7 and 16GB ofRAM.10Mined in 2011 by Arizona State University Data and Machine learning laboratory asa representative sample of the twitter group.13Chapter 5. Evaluation100 101 102 103 104 105 106Number of G* Nodes0.00.20.40.60.81.0ProbabilityCDF for G* Size(a) 2-level G\u00E2\u0088\u0097100 101 102 103 104 105 106Number of G* Nodes0.00.20.40.60.81.0ProbabilityCDF for G* Size(b) 3-level G\u00E2\u0088\u0097100 101 102 103 104 105 106Number of G* Nodes0.00.20.40.60.81.0ProbabilityCDF for G* Size(c) 4-level G\u00E2\u0088\u0097Figure 5.1: CDF for size of G\u00E2\u0088\u0097 for varying number of levels5.0.1 Size of G\u00E2\u0088\u0097 SubgraphAs already stated in previous chapters, we are aiming for receiver anonymitysets in the order of hundreds of thousands. For this purpose, the G\u00E2\u0088\u0097 sub-graph for a typical user needs to include at least that many nodes. Wecomputed G\u00E2\u0088\u0097 rooted at all 5,000 nodes in our sample population for varyinglevels down the graph.We found out that for most users, a three-level G\u00E2\u0088\u0097 was sufficient toachieve our desired anonymity set size.Figure 5.1 shows the cumulative distribution function (CDF) for thenumber of nodes for varying G\u00E2\u0088\u0097 levels. As shown in the figure, for the three-level G\u00E2\u0088\u0097 only 15% of users in the sample population had less than 10,000nodes and 60% of users had more than 100,000 nodes in their G\u00E2\u0088\u0097. The max-imum and average G\u00E2\u0088\u0097 size was around 5,000,000 and 700,000 respectively.For a two-level G\u00E2\u0088\u0097 however, only 15% of user had more than 100,000 nodesin their respective G\u00E2\u0088\u0097.14Chapter 5. Evaluation100 101 102 103Number of Intermediate Nodes0.00.20.40.60.81.0ProbabilityCDF for Number of Intermediate Nodes(a) 2-level G\u00E2\u0088\u0097100 101 102 103 104 105 106Number of Intermediate Nodes0.00.20.40.60.81.0ProbabilityCDF for Number of Intermediate Nodes(b) 3-level G\u00E2\u0088\u0097100 101 102 103 104 105 106Number of Intermediate Nodes0.00.20.40.60.81.0ProbabilityCDF for Number of Intermediate Nodes(c) 4-level G\u00E2\u0088\u0097Figure 5.2: CDF for number of reposts of a twist for varying number oflevels5.0.2 Number of Reposts of a TwistWe found out that number of reposts made by intermediate nodes is in factmuch less than the size of receiver anonymity set in which we obtain. Asillustrated in Figure 5.2, in the three-level G\u00E2\u0088\u0097, 70% of users in the samplepopulation have fewer number of reposts than 10,000 and for only 10% ofusers is the number of reposts more than 100,000.5.0.3 Build Time of G\u00E2\u0088\u0097 SubgraphWe computed the time taken to build the G\u00E2\u0088\u0097 subgraph; depicted in Figure5.3. According to the results, for 80% of users, three-level G\u00E2\u0088\u0097 build timewas at most 5 seconds and the maximum build time was only 32 seconds.15Chapter 5. Evaluation0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0G* Build Time (seconds)0.00.20.40.60.81.0ProbabilityCDF for G* Build Time(a) 2-level G\u00E2\u0088\u00970 5 10 15 20 25 30G* Build Time (seconds)0.00.20.40.60.81.0ProbabilityCDF for G* Build Time(b) 3-level G\u00E2\u0088\u00970 5 10 15 20 25G* Build Time (seconds)0.00.20.40.60.81.0ProbabilityCDF for G* Build Time(c) 4-level G\u00E2\u0088\u0097Figure 5.3: CDF for G\u00E2\u0088\u0097 build time for varying number of levels16Chapter 5. Evaluation5.0.4 Memory Consumption and InitializationWe measured the amount of memory consumption for storing G\u00E2\u0088\u0097 nodes inthe client. We found out that it took less than 7MB to store an average G\u00E2\u0088\u0097size of 700,000 nodes. When sending a G\u00E2\u0088\u0097 subgraph to a Twistor client, theserver also incorporates the public keys of individual nodes in the response.As it takes 80B to store each public key used in our system, the overallmemory consumption for storing G\u00E2\u0088\u0097 subgraph in the client is around 61MB.Assuming 10 Mb/s broadband connection, it takes less than a minute tosend the entire G\u00E2\u0088\u0097 subgraph to a Twistor client (61 MB/1.25 MB/s = 48.8).Note that all subsequent interaction with the graph server only involvessending incremental updates (For the nodes being added to or deleted fromthe G\u00E2\u0088\u0097 subgraph). As a result, the mentioned delay is only a one-time cost.5.0.5 Twitter API versus Cryptographic CostsAs already stated in previous chapters, we used ElGamal over Elliptic curveof prime-order P256. We calculated the time taken for performing cryp-tographic operations and determined that encryption/decryption costs areonly around 5 milliseconds.We measured the time taken for posting and reading a twist via TwitterAPI calls. We found out that Twitter API calls are the dominating costsand in the experiments performed on our fake Twitter accounts, the numbersvaried from 400 milliseconds to 1.5 seconds for each repost. This resulted inan end-to-end latency of between 2 to 7 seconds.17Chapter 6Related Work6.1 Anonymity through Social GraphsTwistor employs the Twitter follower graph for routing receiver-anonymouscommunications. There exist some other works that use social graphs asthe backbone of anonymity networks. The social graph is then used forforwarding users\u00E2\u0080\u0099 traffic to remote communicating parties. However, as faras we are aware of, these proposals have not resulted in practical adoptionand have all targeted sender anonymity.Drac [4] is the first system in this respect. Drac leverages peers in thesocial network to relay traffic in a system providing anonymity and un-observability. In order to hide connection and disconnection events, buildingcircuits for anonymous communications are performed in separate epochs,using random walk in the social graph. During each epoch, users randomlyselect a friend as the first hop and then iteratively extend their circuit viarandom friends of friends as chosen by the predecessor hop.In Pisces [5], a system for anonymous web browsing and instant messag-ing, a random walk in the users\u00E2\u0080\u0099 social graph is augmented by a reciprocalneighbour policy. The reciprocal neighbour policy is for punishment of ma-licious nodes whom try to exclude benign nodes in their declared routingtables. For enforcing the policy and identification of cheaters, each node\u00E2\u0080\u0099scurrent signed list of contacts gets distributed using Whanau distributedhash table. The nodes periodically download and check for presence of sev-eral conflicting neighbour lists.In Dynamix [6], the authors extend theoretical findings for buildinganonymous communications over static social graphs to that of dynamicgraphs; which captures churn normally imposed by joins and leaves in socialnetworks.VirtualFriendship [7], provides profile-request anonymity and communi-cation unlinkability in a centralized online social network by routing com-munications outside the social network. For this purpose, trusted routingfriends are used that redirect received requests using an external anonymitynetwork to other routing friends. On the other side, the routing friend ver-186.2. Increasing Anonymity Set Sizeifies authorization token for the request and sends back social network\u00E2\u0080\u0099sresponse.6.2 Increasing Anonymity Set SizeAnother property of Twistor is increasing receiver anonymity set size. Usingnodes in the follower graph as relays results in an exponential increase inthe size of the receiver anonymity set. Some other works have pursued asimilar approach in order to increase sender anonymity set size.Some designs [8], [9], [10] leverage casual web surfers to increase anonymityset size.[8], presents a protocol that exploits normal web browsing activities ofordinary users for creating an anonymous overlay network. An anonymoususer\u00E2\u0080\u0099s browsing activity gets hidden among all other web users contactingthe same set of servers, which in turn, participate in the sender anonymityset. The authors propose a Chaumian Mixnet variant, where each Mixnode in the system serve a CGI script. When an onion-encrypted messageis supplied as an HTTP POST request to a node, the script gets calledand emulates the behaviour of a regular Mix, i.e., decrypts the messageand looks at the headers to determine further processing. When ordinaryusers visit webpages served by websites containing a specially crafted banneradvert (which includes a frame containing JavaScript created by a node),their browsers execute the JavaScript code returned by the node. This way,ordinary users get involved in transferring the messages to another node.In [9], the authors introduce a JavaScript framework, called ConScript,which is made up of websites cooperating for enabling anonymous commu-nications. It works by encouraging casual web users visiting cooperativewebsites to participate in anonymous communications. The cooperativewebsites serve JavaScript code, instructing the browser to act as a clientby creating and submitting encrypted dummy messages into the underlyinganonymity system. This provides cover traffic for other users of the systemwho wish to send real messages by using a browser plugin. In that case,when the user visits a cooperative website, the browser plugin simply in-tercepts outgoing messages from the browser and replaces dummy messagewith user\u00E2\u0080\u0099s encrypted real message.196.3. Anonymity through Multicast6.3 Anonymity through MulticastTwistor uses Twitter\u00E2\u0080\u0099s followers graph as an application-level multicast treefor relaying anonymous messages. Multicast schemes have been used by alarge body of previous works in order to achieve sender/receiver anonymity.[11] is the first protocol that employs (IP-level) multicast routing inorder to provide receiver anonymity.In [12], the authors propose a method for providing receiver anonymitythat uses multicast together with a cryptographic primitive called incom-parable public keys. Using the suggested primitive, the receiver generatesmany anonymous identities and many unique public keys, all being associ-ated with the same private key. Upon receiving a ciphertext, all membersof the multicast group try decrypting the message and ignore it in case ofdecryption failure.[13], presents a protocol for mutual-anonymous communications by us-ing an application-level multicast tree overlay network. Sender anonymityis achieved by making every node in the overlay network participate in re-laying messages. Receiver anonymity is achieved by peers setting up theirfilters to get messages they are interested in. The tree is restructured fromtime to time so the nodes\u00E2\u0080\u0099 parent, siblings and friends change dynamically.In [14], the authors employ a subscription tree for enabling anonymouscommunication. In order to publish content anonymously within a group,the publisher sends an anonymous message to the group\u00E2\u0080\u0099s root by usingCrowds-like [15] probabilistic routing. The group\u00E2\u0080\u0099s root then forwards thecontent along the tree. Nodes willing to receive some content anonymouslyshould subscribe through a random set of proxy nodes; where the last nodeeventually joins the multicast tree. The published content is then routedanonymously to the receiver. As other nodes should forward and receivecontent on behalf of interested subscribers, this provides plausible deniabilityfor all members.P5 [16], introduces an scalable P2P anonymity protocol for satisfyingsender-, receiver-, and sender-receiver anonymity. Scalability of the systemis achieved through creating a hierarchy of progressively smaller broadcastgroups; each level of hierarchy providing different level of anonymity andcommunication efficiency. Users in the hierarchy are grouped together intobroadcast groups. When a message is sent to a broadcast group, it is alsoforwarded to all groups with an ancestor-descendent relationship with thatgroup.20Chapter 7ConclusionAnonymous communications aim to protect communications\u00E2\u0080\u0099 privacy withinthe Internet, where privacy of online users is easily threatened by adversarialbreaches. Achieving high degree of anonymity, i.e., large anonymity set sizes,requires a large number of active participants and hence a great amount ofcover traffic.In this thesis, we presented Twistor, a practical anonymity system beingbuilt as an overlay on top of Twitter\u00E2\u0080\u0099s existing publish-subscribe infrastruc-ture. Twistor leverages the reachability provided by a cascade of tweets andretweets on Twitter to achieve large receiver-anonymity sets. Twistor relieson a special property of ElGamal encryption scheme whereby a ciphertextdoes not reveal the public key used to produce it.Twistor\u00E2\u0080\u0099s communication cost consists of one tweet operation per Twistorrelay. Processing cost includes one elliptic curve decryption operation andone AES decryption, which is linear with the number of twistor relays inthe sender\u00E2\u0080\u0099s subgraph. Except for being compliant with its normal func-tionality, we don\u00E2\u0080\u0099t put any extra trust assumption on Twitter. As long asrelays falling on the sender\u00E2\u0080\u0099s chosen path do not misbehave, the message isguaranteed to reach its final anonymous destination.21Bibliography[1] Andreas Pfitzmann, and Marit Hansen. A terminology for talk-ing about privacy by data minimization: Anonymity, unlink-ability, undetectability, unobservability, pseudonymity, and iden-tity management, v0.34. Accessible through: http://dud.inf.tu-dresden.de/literatur/Anon Terminology v0.34.pdf, 2010.[2] Andrei Serjantov, and George Danezis. Towards an information theoreticmetric for anonymity, In: Proc. Privacy Enhancing Technologies(PETS),2003.[3] Claudia Diaz, Stefaan Seys, Joris Claessens, and Bart Preneel. To-wards measuring anonymity, In: Proc. Privacy Enhancing Technolo-gies(PETS), 2003.[4] George Danezis, Claudia Diaz, Carmela Troncoso, and Ben Laurie. Drac:an architecture for anonymous low-volume communications, In: Proc.Privacy Enhancing Technologies(PETS), 2010.[5] Prateek Mittal, Matthew Wright, and Nikita Borisov. Pisces: Anony-mous communication using social networks, In: Proc. Network and Dis-tributed System Security Symposium(NDSS), 2013.[6] Abedelaziz Mohaisen, and Yongdae Kim. Dynamix: anonymity on dy-namic social structures, In: Proc. ACM symposium on Information, com-puter and communications security(AsiaCCS), 2013.[7] Filipe Beato, Marco Conti, Bart Preneel, and Dario Vettore. Virtu-alFriendship: hiding interactions on online social networks, In: Proc.Communications and Network Security(CNS), 2014.[8] Matthias Bauer. New covert channels in HTTP: adding unwitting Webbrowsers to anonymity sets. In Proc. Workshop on Privacy in the Elec-tronic Society(WPES), 2003.22Bibliography[9] Henry Corrigan-Gibbs, and Bryan Ford. Conscript your friends intolarger anonymity sets with JavaScript, In Proc. Workshop on Privacyin the Electronic Society(WPES), 2013.[10] Volker Roth, Benjamin Gldenring, Eleanor Rieffel, Sven Dietrich, andLars Ries. A secure submission system for online whistleblowing plat-forms. In Proc. Financial Cryptography and Data Security(FC), 2013.[11] Brian Neil Levine, and Clay Shields. Hordes: a multicast based protocolfor anonymity. Journal of Computer Security 10, no. 3 : 213-240, 2002.[12] Brent R. Waters, Edward W. Felten, and Amit Sahai. Receiveranonymity via incomparable public keys. In Proc. Computer and Com-munications Security(CCS), 2003.[13] Ying Wang, and Partha Dasgupta. Anonymous communications on theinternet. In Proc. Computer and Communication Security(CCS), 2005.[14] Alan Mislove, Gaurav Oberoi, Ansley Post, Charles Reis, Peter Dr-uschel, and Dan S. Wallach. AP3: Cooperative, decentralized anonymouscommunication. In Proc. ACM SIGOPS European workshop, 2004.[15] Michael K. Reiter, and Aviel D. Rubin. Anonymous web transactionswith crowds. In Communications of the ACM, 1999.[16] Rob Sherwood, Bobby Bhattacharjee, and Aravind Srinivasan. P5: AProtocol for scalable anonymous communication. In Proc. IEEE Securityand Privacy(S&P) Symposium, 2002.[17] Mihir Bellare, Alexandra Boldyreva, Anand Desai, and DavidPointcheval. Key-privacy in public-key encryption. In Advances in Cryp-tology(ASIACRYPT), 2001.[18] Prateek Mittal, Femi G. Olumofin, Carmela Troncoso, Nikita Borisov,and Ian Goldberg. PIR-Tor: Scalable anonymous communication usingprivate information retrieval. In: Proc. USENIX Security Symposium,2011.[19] http://www.statista.com/statistics/282087/number-of-monthly-active-twitter-users/, August 2015.[20] Roger Dingledine, Nick Mathewson, and Paul Syverson. Tor: Thesecond-generation onion router. In: Proc. USENIX Security Symposium,2004.23Bibliography[21] George Danezis, Roger Dingledine, and Nick Mathewson. Mixminion:Design of a type III anonymous remailer protocol. In Proc. IEEE Securityand Privacy(S&P) Symposium, 2003.[22] Ulf Mller, Lance Cottrell, Peter Palfrader, andLen Sassaman. Mixmaster protocol\u00E2\u0080\u0093version 2. Draft,http://freehaven.net/anonbib/cache/mixmaster-spec.txt, 2003.[23] Stevens Le Blond, David Choffnes, Wenxuan Zhou, Peter Druschel,Hitesh Ballani, and Paul Francis. Towards efficient traffic-analysis resis-tant anonymity networks. In Proc. ACM Special Interest Group on DataCommunication(SIGCOMM), 2013.[24] Hsu-Chun Hsiao, TH-J. Kim, Adrian Perrig, Akimasa Yamada, SamuelC. Nelson, Marco Gruteser, and Wei Meng. LAP: Lightweight anonymityand privacy. In Proc. IEEE Security and Privacy(S&P) Symposium,2012.[25] Sharad Goel, Mark Robson, Milo Polte, and Emin Sirer. Herbivore: Ascalable and efficient protocol for anonymous communication. Technicalreport, Cornell University, 2003.[26] David Isaac Wolinsky, Henry Corrigan-Gibbs, Bryan Ford, and AaronJohnson. Dissent in numbers: making strong anonymity scale. In Proc.USENIX Symposium on Operating Systems Design and Implementa-tion(OSDI), 2012.[27] Reza Zafarani and Huan Liu. Social Computing Data Repository atASU [http://socialcomputing.asu.edu]. Tempe, AZ: Arizona State Uni-versity, School of Computing, Informatics and Decision Systems Engi-neering, 2009 (updated in 2011).[28] Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. What isTwitter, a social network or a news media? In Proc. ACM internationalconference on World Wide Web(WWW), 2010.[29] Kristina Lerman, and Rumi Ghosh. Information Contagion: An Empir-ical Study of the Spread of News on Digg and Twitter Social Networks. InProc. International Conference on Weblogs and Social Media(ICWSM),2010.[30] Maksym Gabielkov, Ashwin Rao, and Arnaud Legout. Studying socialnetworks at scale: Macroscopic anatomy of the twitter social graph. In24BibliographyACM international conference on Measurement and Modeling of Com-puter Systems(Sigmetrics), 2014.25"@en . "Thesis/Dissertation"@en . "2016-02"@en . "10.14288/1.0223805"@en . "eng"@en . "Computer Science"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "Attribution-NonCommercial-NoDerivs 2.5 Canada"@* . "http://creativecommons.org/licenses/by-nc-nd/2.5/ca/"@* . "Graduate"@en . "Twistor : anonymous message transfer through Twitter"@en . "Text"@en . "http://hdl.handle.net/2429/56719"@en .