Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Dancing in the dark : private multi-party machine learning in an untrusted setting Fung, Clement 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2018_november_fung_clement.pdf [ 706.84kB ]
Metadata
JSON: 24-1.0372888.json
JSON-LD: 24-1.0372888-ld.json
RDF/XML (Pretty): 24-1.0372888-rdf.xml
RDF/JSON: 24-1.0372888-rdf.json
Turtle: 24-1.0372888-turtle.txt
N-Triples: 24-1.0372888-rdf-ntriples.txt
Original Record: 24-1.0372888-source.json
Full Text
24-1.0372888-fulltext.txt
Citation
24-1.0372888.ris

Full Text

Dancing in the Dark: Private Multi-Party MachineLearning in an Untrusted SettingbyClement FungBASc., University of Waterloo, 2016A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Computer Science)The University of British Columbia(Vancouver)October 2018c© Clement Fung, 2018The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:Dancing in the Dark: Private Multi-Party Machine Learning in an Un-trusted Settingsubmitted by Clement Fung in partial fulfillment of the requirements for the de-gree of Master of Science in Computer Science.Examining Committee:Ivan Beschastnikh, Computer ScienceSupervisorMargo Seltzer, Computer ScienceSupervisory Committee MemberiiAbstractThe problem of machine learning (ML) over distributed data sources arises in avariety of domains. Unfortunately, today’s distributed ML systems use an unso-phisticated threat model: data sources must trust a central ML process.We propose a brokered learning abstraction that provides data sources withprovable privacy guarantees while allowing them to contribute data towards a globally-learned model in an untrusted setting. We realize this abstraction by building onthe state of the art in multi-party distributed ML and differential privacy meth-ods to construct TorMentor, a system that is deployed as a hidden service over ananonymous communication protocol.We define a new threat model by characterizing, developing and evaluatingnew attacks in the brokered learning setting, along with effective defenses for theseattacks. We show that TorMentor effectively protects data sources against knownML attacks while providing them with a tunable trade-off between model accuracyand privacy.We evaluate TorMentor with local and geo-distributed deployments on Azure.In an experiment with 200 clients and 14 megabytes of data per client our prototypetrained a logistic regression model using stochastic gradient descent in 65 seconds.iiiLay SummaryMachine learning is a form of analysis that draws insights from large volumes oftraining data. This mandates the collection of data into a single system for analysis,but the nature of data today is highly distributed and private. Furthermore, mostof this analysis is performed by companies, who lack incentive to provide userprivacy.We design TorMentor, a system that enables anonymous, distributed machinelearning between parties. Our contribution also includes a novel learning paradigmcalled brokered learning, which removes the role of the central institution in themachine learning process.Through brokered learning, TorMentor protects the privacy and security of dataproviders while ensuring that analysis is performed correctly. We design and eval-uate such a system and show that it outperforms the state of the art in defendingagainst known privacy and security threats on distributed machine learning sys-tems.ivPrefaceAll of the work presented henceforth was conducted in the NSS (Networks, Sys-tems and Security) lab in the Department of Computer Science at the University ofBritish Columbia, Point Grey campus.The work presented in this thesis is original, unpublished work by the author,Clement Fung. This work was performed in collaboration with Syed Iqbal, JamieKoerner and Stewart Grant. The entirety of this work was designed and writtenin assistance with Dr. Ivan Beschastnikh, who supervised this projects and all thestudents involved.The design of the TorMentor algorithms, system and API in Chapter 5 wereperformed by the author. Jamie implemented and designed the differentially-privatelearning component described in Section 5.4 and Algorithm 1. Syed assisted mewith the implementation of the go-python component of the system described inChapter 6. Jamie implemented the noise function in Python described in Chap-ter 6.Regarding the evaluation of the system, Jamie executed the experiments forand generated Figure 7.1. Stewart assisted with the deployment of the system onAzure described in Chapter 6 and used to generate Figures 7.2, 7.3, 7.6 and 7.7.All other figures and experiments were designed and conducted by the author.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Brokered Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93.1 Decoupling federated learning . . . . . . . . . . . . . . . . . . . 93.2 Defining brokered learning . . . . . . . . . . . . . . . . . . . . . 103.3 Example use cases . . . . . . . . . . . . . . . . . . . . . . . . . 114 Threat model, guarantees, assumptions . . . . . . . . . . . . . . . . 134.1 Security guarantees . . . . . . . . . . . . . . . . . . . . . . . . . 15vi4.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 TorMentor Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175.1 Design overview . . . . . . . . . . . . . . . . . . . . . . . . . . 175.2 Curator API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.3 Client API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.4 Training process . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.5 Defending against inversion attacks . . . . . . . . . . . . . . . . 215.6 Defending against poisoning attacks . . . . . . . . . . . . . . . . 235.7 Modular design . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 TorMentor Implementation . . . . . . . . . . . . . . . . . . . . . . . 267 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277.2 Model convergence . . . . . . . . . . . . . . . . . . . . . . . . . 297.3 Scalability and overhead . . . . . . . . . . . . . . . . . . . . . . 297.4 Inversion defenses evaluation . . . . . . . . . . . . . . . . . . . . 317.5 Poisoning defenses evaluation . . . . . . . . . . . . . . . . . . . 348 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40viiList of TablesTable 5.1 TorMentor Curator API. . . . . . . . . . . . . . . . . . . . . 18Table 5.2 TorMentor Client API. . . . . . . . . . . . . . . . . . . . . . 18Table 7.1 Time to train the model with TorMentor, with and without Tor,over a varying number of clients. . . . . . . . . . . . . . . . . 30viiiList of FiguresFigure 2.1 Brokered learning and TorMentor design. Brokers (implementedas hidden Tor services) mediate between curators (top) and setsof clients (bottom). . . . . . . . . . . . . . . . . . . . . . . . 8Figure 4.1 (a) Attacks/defenses in TorMentor. (b) Threat model: an edgeis an attack by source against target(s). . . . . . . . . . . . . . 14Figure 5.1 Overview of the TorMentor protocol between curator/broker/-client. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20Figure 5.2 One iteration in an inversion attack in which an attacker ob-serves the difference between Mt and Mt+2, and infers this dif-ference to be ∆V,t+1. After many iterations, the attacker candiscover M∗V , the optimal model trained on the victim’s data. . 22Figure 7.1 Effects of differential privacy and batch size on training lossover time. As ε and b decrease, model convergence is slowerand may not be guaranteed. . . . . . . . . . . . . . . . . . . . 29Figure 7.2 TorMentor model convergence in deployments with 10, 50,100, and 200 clients. . . . . . . . . . . . . . . . . . . . . . . 30Figure 7.3 TorMentor without Tor model convergence in deployments with10, 50, 100, and 200 clients. . . . . . . . . . . . . . . . . . . 31Figure 7.4 Model agreement between victim model and inverted estimatein the ideal setting (Fig. 5.2), with varying ε . . . . . . . . . . 32ixFigure 7.5 Reconstruction error between victim model and inverted modelestimate, with varying privacy parameters: the number of by-standers and the privacy parameter ε . . . . . . . . . . . . . . 34Figure 7.6 Model training loss over time, when attacked by a varying pro-portion of poisoners. RONI threshold is maintained at 2%. . . 35Figure 7.7 Model training loss over time, when attacked by 50% poison-ers. RONI threshold is varied from 0.5% to 5%. . . . . . . . 35xGlossaryAPI application programming inferfaceCPU computer processing unitLOC lines of codeMCMC Markov Chain Monte CarloMIT Massachusetts Institute of TechnologyML Machine learningP2P peer-to-peerRAM random access memoryRONI reject on negative influenceSD standard deviationSGD stochastic gradient descentSGX Intel Software Guard ExtensionsUCI University of California IrvineVM virtual machineWAN wide area networkxiAcknowledgmentsThis research was funded by grants from both the Natural Science and Engineer-ing Research Council of Canada (NSERC) Discovery Grant, Huawei TechnologiesCo., and the Institute for Computing, Information and Cognitive Systems (ICICS)at the University of British Columbia, Point Grey Campus.Firstly, I would like to thank Ivan Beschastnikh for taking a chance on me ashis student and providing careful guidance throughout my Masters degree, whilesubsequently introducing me to the joys of research. Another thank you goes toMargo Seltzer for being my second reader and taking the time to provide feedbackon my thesis.I would also like to thank the members of the NSS lab for joining me throughthe perils of graduate school: most notably Amanda Carbonari, Fabian Ruffy,Stewart Grant, Nodir Kodirov and Puneet Mehrotra. All of you made my numerouslate nights in the lab bearable, and to some extent, even a little enjoyable.In addition to my friends listed above, I thank the many new friends I havemade here at UBC: Giovanni Viviani, Yasha Pushak, Neil Newman, Kuba Karpierz,Siddhesh Khandelwal, Alistair Wick, Nico Ritschel, and many more who deserveto be listed. You kept me sane at times when my research was struggling, and keptmy work-life balance in check.Lastly, I must thank my friends and family in Toronto for supporting me fromafar. Despite our physical distance, I have always felt extremely grateful to havesuch a loving network of support available to me.xiiChapter 1IntroductionMachine learning (ML) models rely on large volumes of diverse, representativedata for training and validation. However, to build multi-party models from user-generated data, users must provide and share their potentially sensitive information.Existing solutions [29, 31], even when incorporating privacy-preserving mecha-nisms [2], assume the presence of a trusted central entity that all users share theirdata and/or model updates with. For example, the Gboard service [32] uses sensi-tive data from Android keyboard inputs to generate better text suggestions; userswho wish to train an accurate Gboard suggestion model must send their mobilekeyboard data to Google.Federated learning [33] keeps data on the client device and trains ML modelsby only transferring model parameters to a trusted central coordinator. In the questfor training the optimal ML model as quickly as possible, the coordinator is notincentivized to provide privacy for data providers: data is collected and is onlykept on the device as a performance optimization [33].Furthermore, federated learning is not private or secure against adversaries,who can pose as honest data providers or an honest coordinator and who use aux-iliary information from the learning process to infer information about the trainingdata of other users [22], despite data never being explicitly shared. One may con-sider obfuscating data labels before learning, but this is also insufficient to guar-antee privacy [9]. General privacy-preserving computation models exist, but theyrely on substantial amounts of additional infrastructure. These include homomor-1phically encrypted secure aggregation [7], secure multi-party computation [34], ortrusted Intel Software Guard Extensions (SGX) [39], all of which are infeasible forindividuals and small organizations to deploy.Today there is no accessible solution to collaborative multi-party machine learn-ing that maintains privacy and anonymity in an untrusted setting. In developingthis solution, we propose a novel setting called brokered learning that decouplesthe role of coordination from the role of defining the learning process. We intro-duce a short-lived, honest-but-curious broker that mediates interactions between acurator, who defines the shared learning task, and clients, who provide trainingdata. In decoupling these roles, curators are no longer able to directly link clientsto their model updates, and cannot manipulate the learning process.Clients and curators never directly communicate: they are protected from eachother by a broker that is only used to coordinate the learning task. The broker isadministered by an honest-but-curious neutral party, detects and rejects anomalousbehavior, and terminates when the learning objective is met. Our system designsupports privacy and anonymity by building on accessible public infrastructure tominimize the cost of setting up and maintaining a broker.We realize the brokered learning setting by designing, implementing, and eval-uating TorMentor, a system which creates brokers to interface with the curator andthe clients in a brokered learning process. TorMentor is implemented as a hiddenTor service, but can use any general-purpose anonymous communication serviceto safeguard the identities of curators and clients.Although the model curator is removed from the learning process, a myriad ofother attacks are still possible. We adapt known ML attacks to brokered learningand build on several state of the art techniques to thwart a variety of these attackswhen they are mounted by clients, brokers and curators. Client-side differentialprivacy [13, 20] protects users from inversion attacks [17, 18], reject on negativeinfluence (RONI) [4] and monitored client statistics [35] prevent model poisoningattacks [6, 25] and proof of work [3] mitigates sybil attacks [12].Our evaluation of TorMentor demonstrates that these defenses protect clientsand curators from each other. For example, in one experiment with 50% mali-cious poisoning clients, a TorMentor broker was able to converge to an optimumafter mitigating and recovering from malicious behavior through our novel adap-2tive proof of work mechanism. We also evaluated the performance of our prototypein a geo-distributed setting: across 200 geo-distributed clients with 14 MB of dataper client, the training process in TorMentor takes 67s. By comparison the trainingon a similar federated learning system without Tor would take 13s. The observedoverhead of TorMentor ranges from 5-10x, and depending on the level of privacyand security required, TorMentor’s modular design allows users to further tune thesystem to meet their expected needs on the security-performance trade-off.In summary, we make four contributions:? We develop a brokered learning setting for privacy-preserving anonymousmulti-party machine learning in an untrusted setting. We define the responsi-bilities, interactions, and threat models for the three actors in brokered learn-ing: curators, clients, and the broker.? We realize the brokered learning model in the design and implementation ofTorMentor and evaluate TorMentor’s training and classification performanceon a public dataset.? We translate known attacks on centralized ML (poisoning [25, 38] and inver-sion [17, 18]) and known defenses in centralized ML (RONI [4], differentialprivacy [13]) to the brokered learning setting. We evaluate the privacy andutility implications of these attacks and defenses.? We design, implement, and evaluate three new defense mechanisms for thebrokered learning setting: distributed RONI, adaptive proof of work, andthresholding the number of clients.3Chapter 2BackgroundMachine Learning (ML). Many ML problems can be represented as the min-imization of a loss function in a large Euclidean space. For example, a binaryclassification task in ML involves using features from data examples to predict dis-crete binary outputs; a higher loss results in more prediction errors. Given a set oftraining data and a proposed model, ML algorithms train, or iteratively find an op-timal set of parameters, for the given training set. One approach is to use stochasticgradient descent (SGD) [8], an iterative algorithm which samples a batch of trainingexamples of size b, uses them to compute gradients on the parameters of the cur-rent model, and takes gradient steps in the corresponding gradient direction. Thealgorithm then updates the model parameters and another iteration is performed(Appendix A contains all background formalisms).Our work uses SGD as the training method. SGD is a general learning algo-rithm that is used to train a variety of models, including deep learning [45].Distributed multi-party ML. To support larger models and datasets, ML traininghas been parallelized using a parameter server architecture [29]: the global modelparameters are partitioned and stored on a parameter server. At each iteration,client machines pull the parameters, compute and apply one or more iterations,and push their updates back to the server. This can be done with a synchronous orasynchronous protocol [24, 41], both of which are supported in our work.Federated Learning [33]. The partitioning of training data enables multi-party4machine learning: data is distributed across multiple data owners and cannot beshared. Federated learning supports this setting through a protocol in which clientssend gradient updates in a distributed SGD algorithm. These updates are collectedand averaged by a central server, enabling training over partitioned data sourcesfrom different owners.Attacks on ML. Our work operates under a unique and novel set of assumptionswhen performing ML and requires a new threat model. Despite this novel setting,the attacks are functionally analogous to state of the art ML attacks known today.Poisoning attack. In a poisoning attack [6, 35], an adversary meticulously cre-ates adversarial training examples and inserts them into the training data set of atarget model. This may be done to degrade the accuracy of the final model (a ran-dom attack), or to increase/decrease the probability of a targeted example beingpredicted as a target class (a targeted attack) [25]. For example, such an attackcould be mounted to avoid anomaly detectors [42] or to evade email spam filter-ing [38].In federated learning, clients possess a disjoint set of the total training data;they have full control over this set, and can therefore perform poisoning attackswith minimal difficulty.Information leakage. In an information leakage attack, such as model inver-sion, an adversary attempts to recover training examples from a trained model f (x).Given a targeted class y, one inversion technique involves testing all possible val-ues in a feasible feature region, given that some information about the targetedexample is known. The attack then finds the most probable values of the targetedexample features [17].Another model inversion attack uses prediction confidence scores from f (x)when predicting y to perform gradient descent to train a model ˆf (x) that closelyresembles f (x). The victim’s features are probabilistically determined by using themodel in prediction. The resulting vector x that comes from this process is one thatmost closely resembles an original victim training example [18].Information leakage attacks have been extended to federated learning: insteadof querying information from a fully trained model, an adversary observes changesin the shared model during the training process itself [22]. In doing so, the adver-5sary can reconstruct training examples that belong to other clients.Defenses for ML. A RONI (Reject on Negative Influence) defense [4] counters MLpoisoning attacks. Given a set of untrusted training examples Dun, this defensetrains two models, one model using all of the trusted training data D, and anothermodel using the union dataset D′ = D∪Dun which includes the untrusted data.If the performance of D′ is significantly worse than the performance of D on avalidation dataset, the data Dun is flagged as malicious and rejected. However,this defense relies on access to the centralized dataset, which is infeasible in thefederated learning setting.Differential privacy [13] is a privacy guarantee that ensures that, for a givendataset, when used to answer questions about a given population, that no adversarycan identify individuals that are members of the dataset. Differential privacy isuser-centric: the violation of a single user’s privacy is considered a privacy breach.Differential privacy is parameterized by ε , which controls the privacy-utility trade-off. In the ML context, utility equates to model performance [45]. When ε ap-proaches 0, full privacy is ensured, but an inaccurate model is produced. When εapproaches infinity, the optimal model is produced without any privacy guarantees.Differential privacy has been applied to several classes of ML algorithms [5,43] in decentralized settings to theoretically guarantee that a client’s privacy is notcompromised when their data is used to train a model. This guarantee extends toboth the training of the model and to usage of the model for predictions.Differentially private SGD [20, 45] is a method that applies differential pri-vacy in performing SGD. Before sending gradient updates at each iteration, clientsperturb their gradient values with additive noise, which protects the privacy of theinput dataset. The choice of batch size impacts the effect of the privacy parameteron the model performance. Our work uses differentially private SGD to providethe flexible privacy levels against attacks in our setting.Anonymous communication systems. Clients are still at privacy risk when send-ing model updates directly to an untrusted broker, so we add an additional layer ofindirection in our work. Onion routing protocols are a modern method for provid-ing anonymity in a distributed peer-to-peer (P2P) setting. When communicating6with a target node, clients route traffic through a randomly selected set of relaynodes in the network, which obfuscates the source and destination nodes for eachmessage.Our work supports the use of any modern implementation for providing anony-mous communication. We select the Tor network [11] as the system in our imple-mentation.Sybil attacks and proof of work. An anonymous system that allows users to joinand leave may be attacked by sybils [12], in which an adversary joins a systemunder multiple colluding aliases. One approach to mitigate sybil attacks is to useproof of work [3], in which a user must solve a computationally expensive prob-lem (that is easy to verify) to contribute to the system. This mechanism providesthe guarantee that if at least 50% of the total computation power in the system iscontrolled by honest nodes, the system is resilient to sybils.7Curator 2 …Curator 1 Curator NClient pool 1 Client pool 2 Client pool N… … ……Broker 1 Broker 2 Broker 3…Figure 2.1: Brokered learning and TorMentor design. Brokers (implementedas hidden Tor services) mediate between curators (top) and sets ofclients (bottom).8Chapter 3Brokered LearningIn this work, we define brokered learning, which builds on the federated learningsetting [33], but assumes no trust between clients and curators.3.1 Decoupling federated learningIn federated learning, a central organization (such as Google) acts as the parameterserver and performs two logically separate tasks: (1) define the data schema andthe learning objective, (2) coordinate distributed ML. The federated learning serverperforms both tasks at a central service, however, there are good reasons to separatethem.Fundamentally, the goals of data privacy and model accuracy are at tension.Coordinating the ML training process in a private and secure manner compromisesthe model’s ability to learn as much as possible from the training data. In currentlearning settings, the coordinator is put in a position to provide privacy, yet theyare not incentivized to do so.To take things even further, a malicious curator can observe the contributions ofany individual client, creating an opportunity to perform information leakage [22]attacks on clients, such as model inversion [17, 18] or membership inference [44].These attacks can be mitigated in federated learning with a secure aggregation pro-tocol [7], but this solution does not handle poisoning attacks and requires severalcoordinated communication rounds between clients for each iteration.9Client anonymity may also be desirable when privacy preferences are shared.For example, if attempting to train a model that uses past criminal activity as afeature, one user with strong privacy preferences in a large group of users withweak privacy preferences will appear suspicious, even if their data is not revealed.A key observation in our work is that because data providers and model cura-tors agree on a learning objective before performing federated learning, there isno need for the curator to also coordinate the learning.Brokered learning decouples the two tasks into two distinct roles and includesmechanisms for anonymity to protect data providers from the curator while orches-trating federated learning. In this setting the users in the system (model curatorsand data providers) do not communicate with one another, which facilitates a min-imal trust model, strengthening user-level privacy and anonymity. As well, usersdo not reveal their personal privacy parameters to the broker, since all privacy-preserving computation is performed on the client.At the same time brokered learning maintains the existing privacy features offederated learning: data providers do not need to coordinate with each other (oreven be aware of each other). This is important to counter malicious clients whoattack other clients [22].3.2 Defining brokered learningBrokered learning builds on federated learning [33] but provides additional privacyguarantees. This model introduces a broker to mediate the learning process.Curators define the machine learning model. A curator has a learning taskin mind, but lacks the sufficient volume or variety of data to train a quality model.Curators would like to collaborate with clients to perform the learning task and maywant to remain anonymous. We provide curators with anonymity in TorMentor bydeploying a broker as a Tor hidden service, and by using the broker as a point ofindirection (Figure 2.1).Curators may know the identities of clients that wish to contribute to the modelor may be unaware of the clients that match their learning objectives. Brokeredlearning supports these and other use cases, For example, curators may know somesubset of the clients, or set a restriction on the maximum number of anonymous10clients who can contribute1.Clients contribute their data to the machine learning task and specify the cri-teria for their participation. Instead of fully trusting the curator as they would infederated learning, clients communicate with an honest-but-curious broker. Thebroker is trusted only with coordinating the learning process, and does not learnthe identity nor data of any client. This threat model is similar to what was used inthe secure aggregation protocol for federated learning [7].Brokered learning allows these clients to jointly contribute to a shared globalmodel, without being aware of nor trusting each other. Each client only needs to beconcerned about its personal privacy parameters. Some clients may be more con-cerned with privacy than others; brokered learning supports differentially privatemachine learning with heterogeneous privacy levels, which has been shown to befeasible [20].A broker is a short-lived process that coordinates the training of a multi-partyML model. For each model defined by a curator in TorMentor, a single brokeris created and deployed as a hidden service in an anonymous network 2. Clientsperform actions such as requesting access to the system, defining client-specificprivacy parameters and sending model updates for distributed SGD. We define aprecise client application programming inferface (API) in Section 5. When modeltraining is complete, the broker publishes the model and terminates. In our vision,brokers are not intended to be long lasting, and their sole function should be to bro-ker the agreement between users to facilitate anonymous multi-party ML. Brokersmay even explicity be managed by governments or as part of a privacy enhancingbusiness model, both of whom are incentivized to provide privacy, anonymity andfairness in distributed ML.3.3 Example use casesMedical Sharing Network. Hospitals store substantial patient medical data. How-ever, due to strict regulations, they cannot share this data with each other. No in-1We assume that the broker identifies or advertises the learning process to clients out of band,externally to TorMentor.2In this paper we do not consider who is running the broker; but we do assume that it is anhonest-but-curious third party that is distinct from the curator and the participating clients.11dividual hospital wishes to host the infrastructure for model coordination, and noindividual hospital is trusted to securely coordinate the analysis. An alternative so-lution is for the network of hospitals to collaborate in a brokered learning system.For this the hospitals would define the learning task, one hospital would agree todeploy the broker as a hidden service, and all other willing hospitals would joinand contribute model updates, training a shared model.Internet of Things. With the growth of the Internet of Things (IoT), and a largelyheterogeneous set of device providers, there is currently no solution for privacy-preserving multi-device ML, hosted by a neutral provider. Without anonymousmulti-party ML, each system device provider would need to host their own ML co-ordinators and would have no mechanism for sharing models across other providers.Brokered learning allows these devices to collaborate on model training with-out explicitly trusting each other. Devices reap the benefits of shared trained mod-els, without risking data privacy loss. The broker can be run by any single com-pany, or a neutral trusted third party, neither of which have power to compromisedevice-level privacy.12Chapter 4Threat model, guarantees,assumptionsWe realized brokered learning in a system called TorMentor, which uses differen-tially private distributed SGD [45] to train a model in an anonymous multi-partysetting. We select Tor [11] as the anonymous communication network for TorMen-tor. TorMentor is designed to counter malicious curators and malicious clients whomay attempt to gain additional information (information leakage) about others ornegatively influence the learning process (poisoning). The honest-but-curious bro-ker coordinates the learning process, but is not trusted with the identity nor the dataof users.Clients and curators do not attack the broker itself, rather they aim to attackother curators, other clients, or the outcome of the learning process. Brokers arealso untrusted in the system: the client and curator API protects users from poten-tial broker attacks. Figure 4.1 overviews TorMentor’s threat model with attacks/de-fenses and who can attack who and how.Deanonymization. For anonymous communication to and from the broker we as-sume a threat model similar to Tor [11]: an adversary has the ability to observeand control some, but not all of the network. Adversaries may attempt to observeTor traffic as a client or as a broker in the network [14, 28, 36]. Adversaries canalso influence traffic within Tor through their own onion router nodes, which do13CuratorClient(s)DefensesAttackValidationPoisoningProof of workSybilTor, broker indirectionDeAnonDiff-priv, bystandersInversionPoisoningDeAnonSybilInversionDeAnonSybilInversion, DeAnon, Sybil(a) (b)BrokerInversionDeAnonDeAnonFigure 4.1: (a) Attacks/defenses in TorMentor. (b) Threat model: an edge isan attack by source against target(s).not necessarily need to be active TorMentor clients.Poisoning attacks. In our threat model, poisoning attacks are performed by clientsagainst shared models. After a curator defines a model, adversarial clients can usethe defined learning objective to craft adversarial samples that oppose the objec-tive. We assume that adversaries generate these adversarial samples using a com-mon strategy such as label flipping [6], and join the training process and poison themodel by influencing its prediction probabilities.Inversion attacks. We assume that adversaries can target a specific client victimwho they know is using the system. Inversion attacks can be executed from avariety of points: the adversary may be administering the broker, the adversarymay curate a model that the victim joins, or the adversary joins model trainingas a client, knowing that the victim has also joined. Although the broker doesnot expose model confidence scores in the model prediction API, which are a keypiece of information for performing inversion attacks in a black-box setting [18],our threat model grants adversaries white-box access to the model internals, whichis more powerful than a traditional model inversion attack.In TorMentor, adversarial clients have access to the global model and can in-fer confidence scores or gradient step values by carefully observing changes to the14global model and reconstructing a copy of the victim’s local model. This attack issimilar to a model stealing attack [48]: after stealing an accurate approximation ofa victim’s local model, an adversary can mount an inversion attack on the approxi-mation to reconstruct the victim’s training examples.Sybil attacks. Since clients and curators join the system anonymously, they cangenerate sybils, or multiple colluding virtual clients, to attacks the system [12]. Infederated learning, all users are given an equal stake in the system, and thus sybilsmake poisoning and inversion attacks linearly easier to perform.4.1 Security guaranteesTorMentor guarantees that curator and client identities remain anonymous to allparties in the system by using an anonymous communication network. Our proto-type uses Tor and provides the same guarantees as Tor, however, it can use otheranonymous messaging systems [10, 49]. For example, since Tor is susceptible totiming attacks, an adversary could target a client by observing its network traffic tode-anonymize its participation in the system.TorMentor exposes a small, restrictive API to limit a user’s influence on thesystem. TorMentor alleviates the risk of poisoning attacks and inversion attacks byallowing clients to specify k, the minimum number of clients required for training.When a client joins the system, they are able to specify the number of other clientsrequired in the system, which TorMentor guarantees will be honored. Clients alsolocally use differential privacy to further protect their privacy. Both parameters aredefined by the client, guaranteeing that clients (and not the curator) controls thisaccuracy-privacy tradeoff.TorMentor prevents sybils through proof of work, similar to the Bitcoin proto-col [37]. This mechanism makes it expensive, though not impossible, to mount asybil attack. Proof of work is implemented at two points: proof of work as a pre-requisite for system admission, and proof of work as a prerequisite for contributingto the model.154.2 AssumptionsWe assume that the only means to access information within the system is throughthe APIs defined in Section 5. A TorMentor instance and its corresponding brokersare exposed as a hidden service with a public .onion domain. We assume that this.onion becomes a widely known and trusted domain1.We use proof of work to defend against sybils and therefore assume that the ad-versary does not have access to more than half of the computational power relativeto the total computational power across all users in an instance of the TorMentortraining process [3].We make the same assumptions as Tor [11]; for example, we assume that anadversary does not control a large fraction of nodes within the Tor network.1To build further trust the TorMentor service can use an authoritatively signed certificate.16Chapter 5TorMentor DesignTorMentor design has three goals: (1) meet the defined learning objective in a rea-sonable amount of time, (2) provide both anonymity and data privacy guarantees toclients and curators, and (3) flexibly support client-specific privacy requirements.5.1 Design overviewThe broker handles all communication between clients and the curator, and actsas the coordinator in an untrusted collaborative learning setting. Each TorMentorbroker is deployed as a Tor hidden service with a unique and known .onion domain.Several clients may join a model once a curator defines it, with requirements forjoining specified by both the curator and the clients. Each broker and thereforeeach model is associated with a pool of clients among whom the learning proceduretakes place (see Figure 2.1).Each broker runs a separate aggregator process and validator process. Theaggregator serves the same purpose as a parameter server [29]: storing and dis-tributing the parameters of the global model. The validator is a novel addition inour work that observes and validates the values of gradient updates sent by clientsto the aggregator.Next, we review the TorMentor curator and client API in Tables 5.1 and 5.2.We also review the training process illustrated in Figure 5.1, and finally detail howTorMentor defends against adversarial clients and curators.17address← curate(mID, maxCli, minCli, validSet)Curate a new model. Curator provides modelID, client count range, validation set.TorMentor returns a hidden service address for a newly specified broker.Table 5.1: TorMentor Curator API.Padmit ← join(mID)Client joins a curated model. Client provides modelID; TorMentor returns a SHA-256 admission hash puzzle Padmit .conn, Mt ← solve(mID, Sadmit , minCli, schema)Client finds the solution Sadmit to Padmit and joins. Client provides modelID, solu-tion to puzzle, min number of clients and its dataset schema; TorMentor returns aconnection and global model state.Mg,t+1, Pi,t+1← gradientUpdate(mId, Si,t , ∆i,t)Client pushes a local model update to the global model state. Client i providesmodelID, solution to previous SHA-256 puzzle Si,t and gradient update ∆i,t at iter-ation t; TorMentor returns new global model state Mg,t+1, and the next SHA-256puzzle Pi,t+1.Table 5.2: TorMentor Client API.5.2 Curator APITable 5.1 shows the curator API in TorMentor. The curator uses the curate callto bootstrap a new model by defining a common learning objective: the modeltype, the desired training data schema and a validation dataset. These are criticalto perform ML successfully (even in a local setting). We therefore expect that acurator can provide these.Once the learning objective is defined, a Tor .onion address is established forthe specified model, and the system waits for clients to contact the hidden servicewith a message to join. The validation dataset is used by the validator to rejectadversaries, and to ensure that the ML training is making progress towards modelconvergence.Too few clients may lead to a weak model with biased data, while a largenumber of clients will increase communication overhead. The curator can use theAPI to adjust an acceptable range for the number of clients contributing to the18model.5.3 Client APITable 5.2 shows the client API in TorMentor. A client uses the join call to joina curated model. A client’s data is validated against the objective when joining.Our prototype only checks that the specified number of features matches those ofthe client, but more advanced automatic schema validation techniques [40] can beused.The client uses the solve call to perform a proof-of-work validation, similar tothat of the blockchain protocol [37], in which a cryptographic SHA-256 admissionhash is inverted, the solution is verified to contain a required number of trailing‘0’ digits, and a new puzzle is published. Once the proof-of-work is completed,the client is accepted as a contributor to the model. Once the desired number ofclients have been accepted to the model, collaborative model training is performedthrough the TorMentor protocol: each client computes their SGD update on theglobal model and pushes it to the parameter server through the gradientUpdatecall.Clients compute gradient updates locally. Clients also maintain a personal pri-vacy level ε and a personal batch size b to tune their differentially-private updatesduring model training. With the privacy-utility tradeoff in mind, it is natural forclients and curators to have different preferences regarding client privacy. Someclients may value privacy more than others and thus will tune their own privacyrisk, while curators want to maximize their model utility TorMentor is the firstsystem to support anonymous machine learning in a setting with heterogeneoususer-controlled privacy goals.5.4 Training processTraining in TorMentor (Figure 5.1 and Algorithm 1) is performed in a fashion sim-ilar to that of the parameter server [29]: each client pulls the global model, locallycomputes a gradient step, and applies the update to the global model. TorMentorusers the differentially private SGD [45] method, which allows clients to selecttheir own privacy parameter ε and batch size b. We assume that clients understand19CuratorTimeBroker Clientcurate()join()puzzlesolve()model M0MfinalSolve PoWcryptopuzzleCheck starting criteriaStart traininggradientUpdate 0Compute gradientApply diff-priv (  )Solve PoW…gradientUpdate nCheck stopping criteriamodel Mn"…ValidateComputeM10Figure 5.1: Overview of the TorMentor protocol between curator/broker/-client.how to properly define these parameters and are aware of their implications on theprivacy-utility tradeoff and their privacy budgets [13].Since clients may fail or be removed from the system by the broker, bulk syn-chronous computation in TorMentor may be infeasible. Instead, as an alternativeto the synchronous update model in federated learning [33], TorMentor also sup-ports a total asynchronous model [24, 29], which enables parallelization but allowsclients to compute gradients on stale versions of the global model, potentially com-promising model convergence. A lock-free approach to parallel SGD is feasible ifthe the step size is tuned properly, and the corresponding global loss function meetscertain strong convexity guarantees [41], which we assume is true when using the20Data: Training data x,y; batch size b; privacy parameter εResult: Returns a single gradient update on the model parameterswhile IsTraining doPull gradients wt from TorMentor;Subsample b points (xi,yi) ∈ Bt from training data;Draw noise Zt from Laplacian distribution;Compute gradient step through differentially private SGD;Push gradient to TorMentorendAlgorithm 1: TorMentor differentially private SGD training algorithm.total asynchronous model in our brokered learning setting. This approach alsonegates the affect of stragglers in a high latency environment (see Section 7).Clients are free to leave the training process at any time. TorMentor keepsa registry of the active clients, and checks that the minimum number of clientscondition is met at each gradient update. In the case of clients leaving the system,TorMentor uses timeouts to detect the clients who drop out of the system. Suchclients do not negatively impact the curator or other clients. As long as the requiredminimum number of clients k exists, the learning process will not halt and no workwill be wasted.5.5 Defending against inversion attacksAlthough a direct inversion attack in federated learning has not been realized yet,we envision a novel potential attack in this scenario. Figure 5.2 shows the pro-posed ideal situation for an attacker performing an inversion attack: a two clientTorMentor system, one of whom is the adversary.In this scenario the victim V and attacker A alternate in sending gradient up-dates to the broker. Since the global model parameters are sent to the adversaryat each iteration, it can ideally observe the difference in the global model betweeniterations. As the attacker knows their contribution to the global model at the pre-21ClientVictimTimeBrokerMta,tMt+1v,t+1Mt+2Computev,t+1ClientAttackerFigure 5.2: One iteration in an inversion attack in which an attacker observesthe difference between Mt and Mt+2, and infers this difference to be∆V,t+1. After many iterations, the attacker can discover M∗V , the optimalmodel trained on the victim’s data.vious iteration, they are able to exactly compute the victim’s update by calculating:Mt+2 = Mt +∆v,t+1+∆a,t∆v,t+1 = Mt+1−Mt −∆a,tBy saving and applying ∆v,t at each iteration to a hidden, shadow model, the ad-versary can compute an approximation to MV , the optimal model trained with onlythe victim’s data, similar to a model stealing attack [48]. The adversary can thenperform a model inversion attack [17, 18] and reconstruct the victim’s training dataelements in XV . In the case that the broker carries out the inversion attack, the at-tack is even stronger: the broker can isolate all updates sent to it through a singleconnection.Differential privacy offers a natural defense against attacks from a broker oranother client by perturbing the victim’s updates ∆V,t that are sent to the broker. Anadversary will find it difficult to recover MV and XV when the privacy parameter ε22is closer to 0. An adversary could choose to send any vector as their ∆a,t update,which allows them to curate specific gradients that elicit revealing responses fromthe victim [22].In the case of attacks from other clients, the effectiveness of the differentiallyprivate SGD protocol is also contingent on a setting in which multiple clients aresimultaneously performing updates. When an adversarial client receives a newcopy of the global model in TorMentor, it has no mechanism to discover whichclients contributed to the model since the last update, making it difficult to deriveknowledge about specific clients in the system.Thus, TorMentor exposes a privacy parameter k to clients, which clients use toexpress the minimum number of clients that must be in the system before trainingcan begin. Our differentially private SGD only begins when n clients, each with aparameter k ≤ n exist in the system. Effectively, this means that for an adversarialclient to perform the ideal model inversion against a victim with parameter k theadversary needs to create k−1 sybil clients.5.6 Defending against poisoning attacksIn adding the validator process, we propose an active parameter server alternativeto the assumed passive parameter server in current attacks [22]. The parameterserver validates each client’s contribution to the model health and penalizes updatesfrom suspicious connections.We develop a distributed RONI defense that uses sets of gradient updates, in-stead of datasets, and independently evaluates the influence that each gradient up-date has on the performance of a trusted global model. Validation (Algorithm 2)executes within the parameter server in TorMentor and validates that iterationsperformed by clients have a positive impact. Validation is parameterized by twovariables: the validation rate at which validations are performed, and the RONIthreshold [4] required before a client is flagged.To ensure that validations are performed in a fair manner, we benchmark allclients against the same candidate model. The validator intersperses validation iter-ations within the gradient updates requests in distributed SGD. A validation roundis triggered through a periodic Bernoulli test, with the probability parameterized by23Data: Stream of gradient updates from each client i, over t iterations ∆i,tResult: Reject a client if their updates oppose the defined learningobjectivewhile IsTraining doDraw Bernoulli value v;if v > VALIDATION_RATE thenSet current model Mt to be snapshot model Ms;Wait for client responses;endif Client c contacts TorMentor thenSend Ms instead of Mt ;Save response ∆c,sendif All clients responded thenFind RONI rc: rc = err(Ms,Xval)− err(Ms+∆c,s,Xval);totalc = totalc+ rc;if totalc > THRESHOLD thenpenalize c;endendendAlgorithm 2: RONI validation algorithm.a validation rate. During a validation round, the current model state is snapshotted,and a copy of this model is sent to all active clients. The clients’ gradient responsesare collected and the RONI value is calculated and accumulated.In addition to the proof of work required to join the system, we implement anadaptive proof of work mechanism to mitigate sybils in poisoning attacks. A SHA-256 proof of work puzzle that must be solved on each iteration before an update tothe global model is accepted by the broker. When a client’s RONI score exceedsa defined negative threshold, the broker increases the required trailing number of0’s by one, increasing the difficulty of the puzzle for that client. The difficulty ofthis puzzle is client- and iteration-specific, making it more expensive for maliciousclients to poison the model and decreasing their influence relative to honest clients.When the rate of validation is increased, the broker discovers poisoning clientsmore quickly, but with a performance overhead. When the RONI threshold is in-24creased, the broker is more likely to detect adversaries, but the false positive rateof flagging honest nodes increases as well.An important design detail is that a validation request looks just like a gradi-ent update request. Therefore, adversaries cannot trick the validator by appearingbenign during validation rounds while poisoning the true model. If the snapshotmodel Ms is taken close enough to the current model state Mt , it becomes moredifficult for adversaries to distinguish whether the updates they are receiving aregenuine gradient requests or not.5.7 Modular designTo summarize, TorMentor includes several mechanisms to provide various levelsof client and curator privacy and security. These include:? Using Tor to provide anonymous communication for all users (clients andthe curator).? Using proof of work as a form of admission control.? A validator process that uses RONI and adaptive proof of work to mitigatepoisoning attacks and sybils.? Differentially private SGD and minimum client enforcement to provide clientprivacy and to defend against inversion attacks.Each of these components operates independently, and if brokered learning isdeployed in a setting where some of the described attacks are out of scope, thesecomponents can be independently disabled.25Chapter 6TorMentor ImplementationWe implemented a TorMentor prototype in 600 lines of code (LOC) in Python 2.7and 1,500 LOC in Go 1.8. All the communication primitives are developed in Go,while the vector computation and ML are in Python. To facilitate communicationbetween Go and Python, we use go-python [1], a library that provides communi-cation bindings between the two languages. We implement differentially-privateSGD [45] in Numpy 1.12. For our noise function, we use a multivariate isotropicLaplace distribution. As a performance operation, we draw random samples fromthis distribution prior to training by using emcee, a Massachusetts Institute of Tech-nology (MIT) licensed Markov Chain Monte Carlo (MCMC) ensemble sampler [16].In our evaluation we deploy the TorMentor curator and clients on Azure byusing bash scripts consisting of 371 LOC. These bootstrap VMs with a TorMentorinstallation, launch clients, and orchestrate experiments.26Chapter 7EvaluationWe evaluated our TorMentor design by carrying out several local and wide-areaexperiments with the TorMentor prototype. Specifically, we answer the followingfour research questions:1. What are the effects of the privacy parameter ε and batch size b on modelconvergence? (Section 7.2)2. What is TorMentor’s overhead as compared to the baseline alternative? (Sec-tion 7.3)3. How effective are the privacy parameters ε and minimum number of clientsk in defending against inversion attacks? (Section 7.4)4. How effective is validation in defending against poisoning attacks, and whatare the effects of its parameters? (Section 7.5)Next we describe the methodology behind our experiments and then answereach of the questions above.7.1 MethodologyCredit card dataset. In our experiments we envision multiple credit card compa-nies collaborating to train a better model that predicts defaults of credit card pay-ments. However, the information in the dataset is private, both to the credit card27companies and to their customers. In this context, any individual company can actas the curator, the broker is a commercial trusted service provider, and clients arethe credit card companies with private datasets.To evaluate this use-case we used a credit card dataset [50] from the Universityof California Irvine (UCI) machine learning repository [30]. The dataset has 30,000examples and 24 features. The features represent information about customers,including their age, gender and education level, along with information about thecustomer’s payments over the last 6 months. The dataset also contains informationabout whether or not the given customer managed to pay their next credit card bill,which is used as the prediction for the model.Prior to performing the training, we normalized, permuted, and partitioned thedatasets into a 70% training and 30% testing shard. For each experiment, the train-ing set is further sub-sampled to create a client dataset, and the testing shard isused as the curator-provided validation set. Training error, the primary metric usedin evaluation, is calculated as the error when classifying the entire 70% trainingshard. In brokered learning, no single client would have access to the entire train-ing dataset, so this serves as a hypothetical metric.Wide-area deployment on Azure. We evaluated TorMentor at scale by deployinga geo-distributed set of 25 Azure virtual machines, each running in a separate datacenter, spanning 6 continents. Each virtual machine (VM) was deployed usingAzure’s default Ubuntu 16.06 resource allocation. Each VM was provisioned witha single core Intel Xeon E5-2673 v3 2.40GHz computer processing unit (CPU), and4 gigabytes of random access memory (RAM). Tor’s default stretch distributionwas installed on each client. We deployed the broker at our home institution as ahidden service on Tor. The median ping latency (without using Tor) from the clientVMs to the broker was 133.9ms with a standard deviation (SD) of 61.9ms. WithTor, the median ping latency increased to 715.9ms with a SD of 181.8ms.In our wide-area experiments we evenly distribute a varying number of clientsacross the 25 VMs and measure the training error over time. Each client joinsthe system with a bootstrapped sample of the original training set (n = 21,000 andsampled with replacement), and proceeds to participate in asynchronous modeltraining.28Figure 7.1: Effects of differential privacy and batch size on training loss overtime. As ε and b decrease, model convergence is slower and may not beguaranteed.7.2 Model convergenceWe evaluate the effect of the privacy parameter ε and the batch size b when per-forming learning over TorMentor. Figure 7.1 shows training error over time witha single client performing differentially private SGD [45] to train a logistic regres-sion model using the entire training shard.We found that models converge faster and more reliably when the batch size ishigher and when ε is higher (less privacy). These results are expected as they areconfirmations of the utility-privacy tradeoff. In settings with a low ε (more privacy)and a low batch size we observed that the effect of differential privacy is so strongand the magnitude of the additive noise is so large that the model itself does notconverge, rendering the output of the model useless. Based on these results, theexperiments in the remainder of the paper use a batch size of 10.7.3 Scalability and overheadWe also evaluated TorMentor’s scalability by varying the number of participatingclients. We evaluate the overhead of Tor and the wide-area in TorMentor by run-ning TorMentor experiments with and without Tor. All nodes were honest, held asubsample of the original dataset, and performed asynchronous SGD.Figure 7.2 shows that, when updating asynchronously, the model convergencesat a faster rate as we increase the number of clients.We also compared the convergence time on TorMentor with a baseline widearea network (WAN) parameter server. For the WAN parameter server we used the29Figure 7.2: TorMentor model convergence in deployments with 10, 50, 100,and 200 clients.# of Clients TorMentor w/o Tor10 819 s 210 s50 210 s 34 s100 135 s 18 s200 67 s 13 sTable 7.1: Time to train the model with TorMentor, with and without Tor,over a varying number of clients.same clients deployment, but bypassed Tor, thereby sacrificing anonymity.The results in Table 7.1 show that on average, the overhead incurred from usingTor ranges from 5-10x. However, as the number of clients increases, the trainingtime in both deployments drops, while the central deployment slows down.30Figure 7.3: TorMentor without Tor model convergence in deployments with10, 50, 100, and 200 clients.7.4 Inversion defenses evaluationWe first set up the inversion attack by carefully partitioning the dataset. Of the30,000 examples in the credit dataset, 9,000 (30%) examples were partitioned intoXtest , a test dataset. The remaining 21,000 examples were evenly partitioned acrossclients. The victim’s dataset Xv was one of these partitions, with all the yi predic-tion values flipped. This was done to sufficiently separate the victim model fromthe globally trained model1. With this victim dataset, a globally trained modelachieved an error of 95.4% when attempting to reconstruct the victim model, andpredicting on the test set.With this setup we carried out the attack described in Figure 5.2. Each at-tack was executed for 4,000 gradient iterations, which was long enough for theglobal model to reach convergence in the baseline case. We then calculated the1We originally attempted the inversion with one of the training data shards as the victim dataset,but we found that even naively comparing the final global model M∗g to the optimal victim modelM∗v resulted in a low reconstruction error of 4.4%. Thus, separating the victim model in a way thatmakes it distinguishable is necessary.31Figure 7.4: Model agreement between victim model and inverted estimate inthe ideal setting (Fig. 5.2), with varying ε .reconstruction error by comparing the resulting inversion model to the true victimmodel, a model trained with only the victim’s data, by comparing predictions onthe test set. That is, if the inversion model and true victim model classify all testexamples identically, the reconstruction error is 0. The reconstruction error wouldalso serve as the error when an attacker uses outputs from Mˆv to infer the trainingexamples in Xv [18, 48].Since the inversion attack is passively performed, it is defended by a clientcarefully tuning the privacy parameters ε and the minimum number of clients k.We evaluate the effects of these parameters in Figures 7.4 and 7.5.Figure 7.4 shows the effect of the privacy parameter ε on the reconstructionerror, as ε is varied from 0.5 to 5, plotting the median and standard deviation over5 executions.In the baseline case the client and curator are alternating gradient updates asin Figure 5.2, and there is no differential privacy. As ε decreases (increasing pri-vacy), the reconstruction error of the inversion attack increases. When ε = 1, thereconstruction error is consistently above 10%.32When the attacker sends a vector of zeros as their gradient update, the inver-sion attack is most effective, as this completely isolates the updates on the globalmodel to those performed by the victim. Figure 7.4 shows the same experimentperformed when the attack contributes nothing to the global model. As ε increasesbeyond 2 (decreasing privacy), the attack performed without sending any gradientsconsistently outperforms the attack when performing gradient updates. This behav-ior, however, is suspicious and a well designed validator would detect and blacklistsuch an attacker. Therefore, this case is a worst case scenario as the attacker mustattempt to participate in the model training process.Inversion attacks are made more difficult when randomness in the ordering ofgradient updates is increases. Two methods for increasing this randomness include(1) adding random latencies at the broker, and (2) introducing bystanders: clientsother than the attacker and victim. In Figure 7.5, we evaluate both of these methodsby asynchronously training a model on TorMentor with one victim, one attacker(using the same datasets as in Figure 7.4), and a varying number of bystanders.When replying to a client response, we sample a random sleep duration uniformlyfrom 0-500ms at the server before returning a message. All clients choose the samevalue for parameter k and the actual number of clients in the system is equal to k.Thus, in the framework consisting of one victim and one attacker, the number ofbystanders equals k−2.Introducing even just one bystander (k = 3) into the system increases the re-construction error during an inversion attack from about 20% to 40%. As k grows,a model inversion attack becomes more difficult to mount.Figure 7.5 also illustrates that differential privacy defends client privacy whenthe number of bystanders is low. When there are no bystanders in the system, de-creasing the privacy parameter ε (more private) increases the reconstruction error.The effects of a low ε value in a model inversion setting have a higher variancethan in executions with higher ε values. Another mechanism that helps to mitigateinversion attacks is the adaptive proof of work mechanism that counters sybils (anattacker could spawn k−1 sybils as an alternative way to isolate the victim).33Figure 7.5: Reconstruction error between victim model and inverted modelestimate, with varying privacy parameters: the number of bystandersand the privacy parameter ε .7.5 Poisoning defenses evaluationWe evaluate the effect of our proof of work on poisoning attacks. To do this,we deployed TorMentor in an setting without differential privacy or Tor in a totalasynchronous setting with 8 clients. We then varied the proportion of poisoners andthe RONI threshold. Figure 7.6 shows the training error for the first 250 secondsfor a RONI threshold of 2%, while varying the proportion of poisoning attackersfrom 25% to 75%, with a validation rate of 0.1.As the number of poisoners increases, different effects can be observed. Whenthe number of poisoners is low (below 25%), the model still converges, albeit at aslower rate than normal. With 50% poisoning, the model begins to move away fromthe optimum, but is successfully defended by the validator, which increases theproof of work required for all of the poisoners within 30 seconds. From this point,the poisoners struggle to outpace the honest nodes, and the model continues on apath to convergence. Lastly, when the proportion of poisoners is 75%, the increasein proof of work is too slow to react; the model accuracy is greatly compromised340 50 100 150 200 250Time (s)0.00.10.20.30.40.50.60.70.8Training Error75% poisoners50% poisoners25% poisoners0% poisonersFigure 7.6: Model training loss over time, when attacked by a varying pro-portion of poisoners. RONI threshold is maintained at 2%.0 50 100 150 200 250Time (s)0.10.20.30.40.50.6Training Error5% RONI2% RONI1% RONI0.5% RONIFigure 7.7: Model training loss over time, when attacked by 50% poisoners.RONI threshold is varied from 0.5% to 5%.within 20 seconds and struggles to recover.From this evaluation, we note that, if a poisoner was able to detect this defense,and attempt to leave and rejoin the model, an optimal proof of work admissionpuzzle should require enough time such that this strategy becomes infeasible.35Figure 7.7 shows the execution of model training with 50% poisoning clientsfor different RONI validation thresholds. As the threshold decreases, adversariesare removed from the system more quickly, allowing the model to recover from thepoisoning damage.Setting the RONI threshold too low is also dangerous as it increases the ef-fect of false positives. In Figure 7.7, we observe that the model initially performspoorly, this is due to incorrectly penalizing honest clients. The effect of a lowRONI is especially noticed in combination with differential privacy. To confirmthis, we performed two additional experiments in which the validator had a RONIthreshold of 0.5% (the highest threshold from Figure 7.7), and a full set of honestclients with differential privacy parameter ε joined the model. When ε was set to 5,the model converged to an optimal point in 480 seconds. When ε was set to 1, thevalidator flagged all of the honest clients, and the model did not reach convergence.The difference between model convergence, model divergence, and a privacyviolation all rely on a careful interplay between ε , the minimum number of clientsk, the RONI threshold, the proof of work difficulty, and the anticipated attacks thatTorMentor expects to deter. Determining the optimal parameters for a deploymentdepends on the anticipated workloads, data distribution, and attack severity. Giventhe large scope of potential attacks and attack scenarios in this space [25], we leavethe exploration of such parameter selection to future work.36Chapter 8DiscussionUser incentives. Although TorMentor demonstrates that privacy-preserving, un-trusted collaborative ML is technically feasible, social feasibility remains an openquestion [46]. That is, are there application domains in which data providers areincentivized to contribute to models from anonymous curators? And do these in-centives depend on the type of data, protections offered by a brokered learningsetting, or something else? Regarding curators, are there cases in which curatorswould be comfortable using a model trained on data from anonymous users? Webelieve that such application domains exist, partly because of the widespread usageof federated learning [19] and a growing concern over data privacy [15].Usability of privacy parameters. Allowing clients and curators to define theirown privacy parameters ε and k allows for more expressive privacy policies, butthese parameters, and their implications, are difficult for users to understand. Fur-thermore, privacy budgets, a fundamental element in safely implementing and us-ing differential privacy, are complex and difficult to understand [15], as evidencedby Apple’s recent struggles in implementing such a system [47].Machine learning and Tor latency. Table 7.1 shows that Tor adds significantlatency to the machine learning process. On the one hand this (unpredictable)latency can make it more difficult to mount an attack; for example, the success ofthe inversion attack partly depends on predictable timing. On the other hand, it37would be desirable to shorten training time.At the moment Tor’s latency is paid at each iteration, indicating that methodswith a lower iteration complexity would perform better. One solution to this prob-lem is to locally aggregate gradients [7, 24, 33] over many iterations before sendingthem to the broker, trading off potential model staleness for reduced communica-tion costs.Outside of aggregating gradients, several iterative alternatives to SGD exist,such as the Newton-Raphson method [26] or other quasi-Newton methods [21],which involve computing the second-order Hessian. This provides convergencewith a lower iteration complexity, but with a higher computational cost per itera-tion. A differentially-private version of the Newton-Raphson method has also beendeveloped [27].TorMentor can be extended to generally support iterative ML update methods.For models and learning objectives where Newton-Raphson is applicable, we ex-pect that Newton-Raphson will complete faster than SGD when accounting for theoverhead of Tor.Data-free gradient validation. While we demonstrated that our active validationapproach defends against attacks (e.g., Figure 7.6), it relies on the curator, whodefines the learning objective, to provide the ground truth to determine if an updateis beneficial or not. This approach is not only prone to bias but also opens up anew avenue for attack from the curator; an adversarial curator could submit a junkvalidation set to attack clients.It is possible to mitigate these risks by using a data-free solution. An open ques-tion is whether or not it is possible to achieve the same effect without an explicitground truth. We believe that the use of a statistical outlier detection method [23] todetect and remove anomalous gradient updates may bring the best of both worlds.This would alleviate the need for and risk of a curator-provided validation dataset,and this technique would also eliminate the computational overhead of performingexplicit validation rounds. Another alternative would require the use of robust MLmethods that are known to handle poisoning attacks [4, 35], but these methods areonly applicable with stronger assumptions about the curator who now must specifya ground truth model.38Chapter 9ConclusionWe introduced a novel multi-party machine learning setting called brokered learn-ing, in which data providers and model curators do not trust one another and inter-operate through a third party brokering service. All parties define their privacyrequirements, and the broker orchestrates the distributed machine learning processwhile satisfying these requirements. To demonstrate that this proposal is practi-cal, we developed TorMentor, a system for anonymous, privacy-preserving ML inthe brokered learning setting. TorMentor uses differentially private model trainingmethods to provide the strongest known defenses against attacks in this setting [25]and to support heterogeneous privacy levels for data owners. We also developedand evaluated novel ML attacks and defenses for the brokered learning setting.Using a Tor hidden service as the broker to aggregate and validate client gra-dient updates, TorMentor collaboratively trains a model across 200 geo-distributedclients, without ever directly accessing the raw data or de-anonymizing any of theusers. We define a realistic threat model for brokered learning and show that incontrast to existing solutions for distributed ML, such as Gaia [24] and federatedlearning [33], TorMentor’s defenses successfully counter recently developed poi-soning and inversion attacks on ML.39Bibliography[1] go-python. https://github.com/sbinet/go-python, 2018. → page 26[2] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin,S. Ghemawat, G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga,S. Moore, D. G. Murray, B. Steiner, P. Tucker, V. Vasudevan, P. Warden,M. Wicke, Y. Yu, and X. Zheng. TensorFlow: A system for large-scalemachine learning. In 12th USENIX Symposium on Operating SystemsDesign and Implementation (OSDI 16), 2016. → page 1[3] A. Back. Hashcash - A Denial of Service Counter-Measure. Technicalreport, 2002. → pages 2, 7, 16[4] M. Barreno, B. Nelson, A. D. Joseph, and J. D. Tygar. The Security ofMachine Learning. Machine Learning, 81(2), 2010. → pages 2, 3, 6, 23, 38[5] A. Bellet, R. Guerraoui, M. Taziki, and M. Tommasi. Fast and DifferentiallyPrivate Algorithms for Decentralized Collaborative Machine Learning.2017. → page 6[6] B. Biggio, B. Nelson, and P. Laskov. Poisoning Attacks Against SupportVector Machines. In Proceedings of the 29th International Coference onInternational Conference on Machine Learning, 2012. → pages 2, 5, 14[7] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan,S. Patel, D. Ramage, A. Segal, and K. Seth. Practical secure aggregation forprivacy-preserving machine learning. In Proceedings of the 2017 ACMSIGSAC Conference on Computer and Communications Security, CCS,2017. → pages 2, 9, 11, 38[8] L. Bottou. Large-Scale Machine Learning with Stochastic GradientDescent. 2010. → page 440[9] J. A. Calandrino, A. Kilzer, A. Narayanan, E. W. Felten, and V. Shmatikov.”You Might Also Like: ” Privacy Risks of Collaborative Filtering. InProceedings of the 2011 IEEE Symposium on Security and Privacy, S&P,2011. → page 1[10] H. Corrigan-Gibbs, D. Boneh, and D. Mazie`res. Riposte: An AnonymousMessaging System Handling Millions of Users. In Proceedings of the 2015IEEE Symposium on Security and Privacy, S&P, 2015. → page 15[11] R. Dingledine, N. Mathewson, and P. Syverson. Tor: The Second-generationOnion Router. In Proceedings of the 13th Conference on USENIX SecuritySymposium, 2004. → pages 7, 13, 16[12] J. J. Douceur. The sybil attack. In IPTPS, 2002. → pages 2, 7, 15[13] C. Dwork and A. Roth. The Algorithmic Foundations of DifferentialPrivacy. Foundations and Trends in Theoretical Computer Science, 9(3-4),2014. → pages 2, 3, 6, 20[14] N. S. Evans, R. Dingledine, and C. Grothoff. A Practical Congestion Attackon Tor Using Long Paths. In Proceedings of the 18th Conference onUSENIX Security Symposium, 2009. → page 13[15] L. Fan and H. Jin. A Practical Framework for Privacy-Preserving DataAnalytics. In Proceedings of the 24th International Conference on WorldWide Web, WWW ’15, 2015. → page 37[16] D. Foreman-Mackey, D. W. Hogg, D. Lang, and J. Goodman. emcee: TheMCMC Hammer. 2013. → page 26[17] M. Fredrikson, E. Lantz, S. Jha, S. Lin, D. Page, and T. Ristenpart. Privacyin pharmacogenetics: An end-to-end case study of personalized warfarindosing. In Proceedings of the 23rd USENIX Security Symposium, USENIXSEC, 2014. → pages 2, 3, 5, 9, 22[18] M. Fredrikson, S. Jha, and T. Ristenpart. Model Inversion Attacks ThatExploit Confidence Information and Basic Countermeasures. In Proceedingsof the 2015 ACM SIGSAC Conference on Computer and CommunicationsSecurity, CCS, 2015. → pages 2, 3, 5, 9, 14, 22, 32[19] R. M. Frey, T. Hardjono, C. Smith, K. Erhardt, and A. S. Pentland. SecureSharing of Geospatial Wildlife Data. In Proceedings of the FourthInternational ACM Workshop on Managing and Mining EnrichedGeo-Spatial Data, GeoRich ’17, 2017. → page 3741[20] R. C. Geyer, T. Klein, and M. Nabi. Differentially private federated learning:A client level perspective. NIPS Workshop: Machine Learning on the Phoneand other Consumer Devices, 2017. → pages 2, 6, 11[21] R. Haelterman, J. Degroote, D. Van Heule, and J. Vierendeels. TheQuasi-Newton Least Squares Method: A New and Fast Secant MethodAnalyzed for Linear Systems. SIAM J. Numer. Anal., 2009. → page 38[22] B. Hitaj, G. Ateniese, and F. Pe´rez-Cruz. Deep Models Under the GAN:Information Leakage from Collaborative Deep Learning. In Proceedings ofthe 2017 ACM SIGSAC Conference on Computer and CommunicationsSecurity, 2017. → pages 1, 5, 9, 10, 23[23] V. Hodge and J. Austin. A Survey of Outlier Detection Methodologies.Artificial Intelligence Review, 22(2), 2004. → page 38[24] K. Hsieh, A. Harlap, N. Vijaykumar, D. Konomis, G. R. Ganger, P. B.Gibbons, and O. Mutlu. Gaia: Geo-Distributed Machine LearningApproaching LAN Speeds. In NSDI, 2017. → pages 4, 20, 38, 39[25] L. Huang, A. D. Joseph, B. Nelson, B. I. Rubinstein, and J. D. Tygar.Adversarial Machine Learning. In Proceedings of the 4th ACM Workshop onSecurity and Artificial Intelligence, AISec, 2011. → pages 2, 3, 5, 36, 39[26] R. I. Jennrich and S. M. Robinson. A Newton-Raphson algorithm formaximum likelihood factor analysis. Psychometrika, 1969. → page 38[27] Z. Ji, X. Jiang, S. Wang, L. Xiong, and L. Ohno-Machado. Differentiallyprivate distributed logistic regression using private and public data. BMCMedical Genomics, 2014. → page 38[28] A. Johnson, C. Wacek, R. Jansen, M. Sherr, and P. Syverson. Users GetRouted: Traffic Correlation on Tor by Realistic Adversaries. In Proceedingsof the 2013 ACM SIGSAC Conference on Computer and CommunicationsSecurity, CCS, 2013. → page 13[29] M. Li, D. G. Andersen, J. W. Park, A. J. Smola, A. Ahmed, V. Josifovski,J. Long, E. J. Shekita, and B.-Y. Su. Scaling Distributed Machine Learningwith the Parameter Server. In 11th USENIX Symposium on OperatingSystems Design and Implementation (OSDI 14), 2014. → pages1, 4, 17, 19, 20, 45[30] M. Lichman. UCI machine learning repository, 2013. URLhttp://archive.ics.uci.edu/ml. → page 2842[31] Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, and J. M.Hellerstein. Distributed GraphLab: A Framework for Machine Learning andData Mining in the Cloud. Proceedings of the VLDB Endowment, 2012. →page 1[32] H. B. McMahan and D. Ramage. Federated Learning: CollaborativeMachine Learning without Centralized Training Data.https://research.googleblog.com/2017/04/federated-learning-collaborative.html, 2017. Accessed: 2017-10-12. → page1[33] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas.Communication-Efficient Learning of Deep Networks from DecentralizedData. In Proceedings of the 20th International Conference on ArtificialIntelligence and Statistics, 2017. → pages 1, 4, 9, 10, 20, 38, 39[34] P. Mohassel and Y. Zhang. SecureML: A System for ScalablePrivacy-Preserving Machine Learning. In Proceedings of the 2017 IEEESymposium on Security and Privacy, S&P, 2017. → page 2[35] M. Mozaffari-Kermani, S. Sur-Kolay, A. Raghunathan, and N. K. Jha.Systematic Poisoning Attacks on and Defenses for Machine Learning inHealthcare. In IEEE Journal of Biomedical and Health Informatics, 2015.→ pages 2, 5, 38[36] S. J. Murdoch and G. Danezis. Low-Cost Traffic Analysis of Tor. InProceedings of the 2005 IEEE Symposium on Security and Privacy, S&P,2005. → page 13[37] S. Nakamoto. Bitcoin: A peer-to-peer electronic cash system. 2009. →pages 15, 19[38] B. Nelson, M. Barreno, F. J. Chi, A. D. Joseph, B. I. P. Rubinstein, U. Saini,C. Sutton, J. D. Tygar, and K. Xia. Exploiting Machine Learning to SubvertYour Spam Filter. In Proceedings of the 1st Usenix Workshop onLarge-Scale Exploits and Emergent Threats, 2008. → pages 3, 5[39] O. Ohrimenko, F. Schuster, C. Fournet, A. Mehta, S. Nowozin, K. Vaswani,and M. Costa. Oblivious Multi-Party Machine Learning on TrustedProcessors. In USENIX SEC, 2016. → page 2[40] E. Rahm and P. A. Bernstein. A Survey of Approaches to AutomaticSchema Matching. The VLDB Journal, 2001. → page 1943[41] B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A Lock-Free Approach toParallelizing Stochastic Gradient Descent. In Advances in NeuralInformation Processing Systems 24. 2011. → pages 4, 20[42] B. I. Rubinstein, B. Nelson, L. Huang, A. D. Joseph, S.-h. Lau, S. Rao,N. Taft, and J. D. Tygar. ANTIDOTE: Understanding and DefendingAgainst Poisoning of Anomaly Detectors. In Proceedings of the 9th ACMSIGCOMM Conference on Internet Measurement, 2009. → page 5[43] R. Shokri and V. Shmatikov. Privacy-preserving deep learning. InProceedings of the 2015 ACM SIGSAC Conference on Computer andCommunications Security, CCS, 2015. → page 6[44] R. Shokri, M. Stronati, C. Song, and V. Shmatikov. Membership inferenceattacks against machine learning models. 2017. → page 9[45] S. Song, K. Chaudhuri, and A. D. Sarwate. Stochastic gradient descent withdifferentially private updates. In IEEE Global Conference on Signal andInformation Processing, GlobalSIP 2013, Austin, TX, USA, December 3-5,2013, 2013. → pages 4, 6, 13, 19, 26, 29, 46[46] I. Stoica, D. Song, R. A. Popa, D. A. Patterson, M. W. Mahoney, R. H. Katz,A. D. Joseph, M. Jordan, J. M. Hellerstein, J. Gonzalez, K. Goldberg,A. Ghodsi, D. E. Culler, and P. Abbeel. A Berkeley View of SystemsChallenges for AI. Technical report, EECS Department, University ofCalifornia, Berkeley, 2017. URLhttp://www2.eecs.berkeley.edu/Pubs/TechRpts/2017/EECS-2017-159.html.→ page 37[47] J. Tang, A. Korolova, X. Bai, X. Wang, and X. Wang. Privacy loss in apple’simplementation of differential privacy on macos 10.12. 2017. → page 37[48] F. Trame`r, F. Zhang, A. Juels, M. K. Reiter, and T. Ristenpart. StealingMachine Learning Models via Prediction APIs. In USENIX SEC, 2016. →pages 15, 22, 32[49] J. van den Hooff, D. Lazar, M. Zaharia, and N. Zeldovich. Vuvuzela:Scalable private messaging resistant to traffic analysis. In Proceedings of the25th Symposium on Operating Systems Principles, SOSP, 2015. → page 15[50] I.-C. Yeh and C.-h. Lien. The comparisons of data mining techniques for thepredictive accuracy of probability of default of credit card clients. ExpertSystems with Applications: An International Journal, 36(2), 2009. → page2844Appendix A: SGD and differential privacyStochastic gradient descent (SGD). In SGD at each iteration t, the model param-eters w are updated as follows:wt+1 = wt −ηt(λwt + 1b ∑(xi,yi)∈Bt∇l(wt ,xi,yi)) (1)where ηt represents a degrading learning rate, λ is a regularization parameterthat prevents over-fitting, Bt represents a gradient batch of training data examples(xi,yi) of size b and ∇l represents the gradient of the loss function.As the number of iterations increases, the effect of each gradient step becomessmaller, indicating convergence to a global minimum of the loss function. A typi-cal heuristic involves running SGD for a fixed number of iterations or halting whenthe magnitude of the gradient falls below a threshold. When this occurs, modeltraining is complete and the parameters wt are returned as the optimal model w∗.Distributed SGD. In parallelized ML training with a parameter server [29], theglobal model parameters wg are partitioned and stored on a parameter server. Ateach iteration, client machines, which house horizontal partitions of the data, pullthe global model parameters wg,t , compute and apply one or more iterations, andpush their update ∆i,t back to the parameter server:∆i,t =−ηt(λwg,t + 1b ∑(xi,yi)∈Bt∇l(wg,t ,xi,yi)) (2)wg,t+1 = wg,t +∑i∆i,tDifferential privacy and SGD. ε-differential privacy states that: given a functionf and two neighboring datasets D and D′ which differ in only one example, theprobability of the output prediction changes by at most a multiplicative factor ofeε . Formally, a mechanism f : D→ R is ε-differentially private for any subset of45outputs S⊆ R ifPr[ f (D) ∈ S]≤ eεPr[ f (D′) ∈ S].In differentially private SGD [45] the SGD update is redefined to be the sameas in Equation (2), except with the addition of noise:∆i,t =−ηt(λwg,t + ∑(xi,yi)∈Bt∇l(wg,t ,xi,yi)+Ztb) (3)where Zt is a noise vector drawn independently from a distribution:p(z) ∝ e(α/2)||z||46

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0372888/manifest

Comment

Related Items