UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A study on a privacy measure for social networks : computational complexity and properties on random… Zhang, Congsong 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2018_february_zhang_congsong.pdf [ 594.54kB ]
JSON: 24-1.0362920.json
JSON-LD: 24-1.0362920-ld.json
RDF/XML (Pretty): 24-1.0362920-rdf.xml
RDF/JSON: 24-1.0362920-rdf.json
Turtle: 24-1.0362920-turtle.txt
N-Triples: 24-1.0362920-rdf-ntriples.txt
Original Record: 24-1.0362920-source.json
Full Text

Full Text

A Study on a Privacy Measure forSocial Networks: ComputationalComplexity and Properties onRandom GraphsbyCongsong ZhangB.Sc. in Communication Engineering, Nanjing Tech University, China, 2006M.Sc. in Information and Communication Engineering, Southeast University,China, 2009A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe College of Graduate Studies(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Okanagan)December 2017c© Congsong Zhang, 2017The following individuals certify that they have read, and recommendto the College of Graduate Studies for acceptance, a thesis/dissertation en-titled:A Study on a Privacy Measure for Social Networks: Computational Com-plexity and Properties on Random Graphssubmitted by Congsong Zhang in partial fulfilment of the requirements ofthe degree of Master of ScienceDr. Yong Gao, Irving K. Barber School of Arts and Sciences, Unit 5 - Computer ScienceSupervisorDr. Heinz Bauschke, Irving K. Barber School of Arts and Sciences, Unit 5 - MathematicsSupervisory Committee MemberDr. Paramjit Gill, Irving K. Barber School of Arts and Sciences, Unit 5 - StatisticsSupervisory Committee MemberDr. Zheng Liu, School of EngineeringUniversity ExamineriiAbstractSince the booming of social networks, network analysis has benefitedgreatly from the released data. Meanwhile, the leakage of users’ informa-tion is getting more and more serious. The users’ personal informationmay be compromised even if the data are released after being anonymized.Adversaries can uniquely re-identify a user in an anonymized social net-work by the quasi-identifiers as their background knowledge. To measurethe resistance against privacy attacks in anonymized social networks wherethe background knowledge of adversaries is the metric representation, R.Trujillo-Rasua et al. introduced a new measure: (k, l)-anonymity based onthe notions of k-antiresolving set and k-metric antidimension in [TRY16b].In this thesis, we prove that the problem of computing k-metric antidi-mension is NP-hard by a polynomial-time reduction from a well-known NP-complete problem, the exact cover by 3-sets problem (X3C problem), to adecision version of the problem of computing k-metric antidimension. Withthis conclusion, we prove that the (k, l)-anonymity problem is NP-complete.Also, in the hope to get a general relation between k and l in the(k, l)-anonymity problem, we study the behaviors of k-antiresolving setsin Erdo˝s-Re´nyi random graphs. We establish three bounds on the size ofk-antiresolving sets in Erdo˝s-Re´nyi random graphs leading to a range ofk-metric antidimension where k is constant.iiiPrefaceThe results contained in this thesis have appeared in [ZG17].ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . viiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixChapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . 1Chapter 2: Information Privacy and Information Secrecy . . 42.1 Information Privacy . . . . . . . . . . . . . . . . . . . . . . . 42.2 Information Secrecy . . . . . . . . . . . . . . . . . . . . . . . 6Chapter 3: Background Knowledge and Results . . . . . . . . 103.1 The Complexity Classes of Problems . . . . . . . . . . . . . . 103.2 Erdo˝s-Re´nyi Random Graphs and the Probabilistic Method . 163.3 k-Metric Antidimension and (k, l)-Anonymity . . . . . . . . . 203.4 Results and Related Works . . . . . . . . . . . . . . . . . . . 21Chapter 4: The Complexity of Computing adimk(G) and (k, l)-Anonymity . . . . . . . . . . . . . . . . . . . . . . . . 234.1 The Complexity of Computing k-Metric Antidimension . . . . 234.1.1 A Reduction from the X3C Problem . . . . . . . . . . 244.1.2 The NP-completeness Proof: Sufficient Condition . . . 264.1.3 The NP-completeness Proof: Necessary Condition . . 26vTABLE OF CONTENTS4.1.4 The Problem of Computing k-Metric Antidimensionis NP-hard . . . . . . . . . . . . . . . . . . . . . . . . 284.2 The Computational Complexity of the (k, l)-Anonymity Prob-lem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28Chapter 5: The k-Metric Antidimension in Random Graphs . 305.1 The Definition of Relaxed Metric Representation . . . . . . . 305.2 The First Bound on the Size of k-Antiresolving Sets . . . . . 315.3 The Second Bound on the Size of k-Antiresolving Sets . . . . 335.4 The Third Bound on the Size of k-Antiresolving Sets . . . . . 375.5 A Range of k-Metric Antidimension Where k ∈ O(1) . . . . . 41Chapter 6: Conclusion . . . . . . . . . . . . . . . . . . . . . . . . 43Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50Appendix A: The Asymptotic Notations . . . . . . . . . . . . . . . 51A.1 Θ-Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51A.2 O-Notation and Ω-Notation . . . . . . . . . . . . . . . . . . . 52A.3 o-Notation and ω-Notation . . . . . . . . . . . . . . . . . . . 53Appendix B: The Probabilistic Method . . . . . . . . . . . . . . . . 55B.1 Chernoff Bound . . . . . . . . . . . . . . . . . . . . . . . . . . 55B.2 Azuma-Hoeffding Inequality . . . . . . . . . . . . . . . . . . . 60viList of FiguresFigure 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25viiAcknowledgementsFirst of all, I would like to thank Unit 5 of UBC Okanagan for thetwo years’ support. Evolving the master program of computer science fortwo years is quite a fantastic experience. Also, I greatly appreciate theopportunity of teaching undergraduate students as a lab instructor.I want to give my thanks to everyone in the department I have dealedwith as well. You are so friendly and have contributed an excellent experi-ence for me.I also would like to thank Dr. Y. Gao for his mentorship and encourage-ment over the duration in the last two years. His mathematical influence isinvaluable for me, and his guideline of how to contribute to a solid researchdrives force behind the work of this thesis.viiiDedicationI would like to dedicate this thesis to my wife Lingling Ji, my lovelydaughter Jiyan Zhang, and my parents Shuyi Zhang and Xiuyun Cong.My parents give me every academic opportunity I pursuit, as well as mywife. Without their support, I can not imagine that I can finish my masterdegree in UBC Okanagan.Lastly, let me thank again for their endless support and love.ixChapter 1IntroductionThe Internet and big data analytics bring a significant change to ourlives. We can use the online banking to make our payment conveniently andefficiently. E-commence applications, such as Taobao and Amazon, give usa new experience of shopping. Moreover, by social networks, e.g., Facebook,Twitter, and Linkedin, we can acquaint new friends around the world andexchange messages instantly. The result extracted from the analysis of bigdata brings us the knowledge unknown before.But these benefits are not free. With the booming of Internet, infor-mation privacy and secrecy become more and more important. A malicioushacker can attack web servers and cause damage to the public. For example,CBC reported on April 14th, 2014, that 900 social insurance numbers werestolen [SIN]. The attacker accessed the data by exploiting a bug of OpenSSL- Heartbleed [Hea].Even though a malicious hacker does not attack servers actively, theinappropriate leakage of user information can lead a breach of privacy. Forexample, A. Tockar shows an example on how to track individuals in NewYork using data from the 2013 NYC Taxi data release [Toc].The concepts of information secrecy and information privacy in com-puter and information sciences are related, but still with some significantdifferences. Information secrecy is a practice of sharing information with agroup of people or a person while hiding it from others, and the challengeof information privacy is to utilize information while protecting individual’sinformation.To achieve information security, we use many cryptographic techniques.One of them is cryptosystems. On the other hand, information privacy isan interdisciplinary topic of studying how to prevent private informationbeing recovered from released data sets. For example, investers can use thetransaction data of stock market to predict the future of stock market inorder to make a profit. But, they can not extract the personal transactioninformation.In the practice of protecting information privacy, there are two mainframeworks: interactive and non-interactive. In the interactive framework,1Chapter 1. Introductiondata analysts inquiry data from a trusted data collector; in the non-interactiveframework, the data collector uses anonymization techniques, e.g., the k-anonymity [Swe02], to get rid of identifies of data and then publishes theanonymized data.Although anonymization techniques can get rid of identifiers of data insocial networks, adversaries may compromise the privacy by using quasi-identifiers as their background knowledge.In an anonymized network graph, let u be a user vertex and S be a subsetof attacker vertices. Supposing the background knowledge of adversaries isthe metric representation of u with respect to S, R. Trujillo-Rasua et al.in [TRY16b] introduced a measure of resistance against privacy attacks inanonymized social networks: (k, l)-anonymity. This notion creats a newproblem in graph theory.Let G = (V,E) be a simple connected graph, S be a proper subset ofV , and v be a vertex in V \ S. The metric representation of v with respectto S is a tuple formed by the shortest-path distances from v to vertices inS. The set S is a k-antiresolving set if k is the greatest integer that for anyvertex v ∈ V \S there are at least k−1 different vertices in V \S having thesame metric representation with respect to S as v. A k-antiresolving basisis defined to be a k-antiresolving set of minimum cardinality. The k-metricantidimension is the cardinality of a k-antiresolving basis. We denote thek-metric antidimension by adimk(G). The graph G meets (k, l)-anonymityif k is the smallest positive integer that adimk(G) is less than or equal to l.From the definition of k-antiresolving set, it is clear that, if the set ofcontrolling nodes by an adversary is a k-antiresolving set, the adversary cannot uniquely re-identify other nodes with the probability that is greater than1/k. The adimk(G) is the lower bound on the number of vertices controlledby an adversary to approach this probability.The main results of this thesis include a proof of the NP-hardness ofthe problem of computing adimk(G) and the bounds on the size of k-antiresolving sets in Erdo˝s-Re´nyi random graphs. The NP-hardness proofis based on a reduction from a well-known NP-complete problem: the ex-act cover by 3-sets problem (X3C problem). The bounds on the size ofk-antiresolving sets are established by making use of an observation underthe case that an Erdo˝s-Re´nyi random graph has the diameter less than orequal to 2. This observation helps us to overcome the difficulty brought bythe dependence of the shortest-path distances between different vertices inErdo˝s-Re´nyi random graphs.In Chapter 2, we give an overview of information privacy and informationsecrecy including the origins of these two topics and the development in both2Chapter 1. Introductionareas. In Chapter 3, we give the background knowledge of this thesis, as wellas our results and the related works. In Chapter 4, we give a polynomial-time reduction from X3C problem to a decision version of the problem ofcomputing adimk(G). With the reduction, we can say the decision version ofthe problem of computing adimk(G) is also NP-complete. Thus, the problemof computing adimk(G) is NP-hard. With this conclusion, we give the proofof (k, l)-anonymity is NP-complete. In Chapter 5, we give the proofs of thebounds on the size of k-antiresolving sets in Erdo˝s-Re´nyi random graphsG(n, p). With the bounds, we give a range of k-metric antidimension wherek ∈ O(1) when n tends to infinite. In Chapter 6, we summarize our resultsand mention the future direction and some open problems.3Chapter 2Information Privacy andInformation SecrecyThe study of information privacy has benefited greatly from the devel-opment of computer science. For example, graph theory and the study ofsocial networks have supported the study of information privacy under thepopularity and accessibility of online social networks in recent years.In Section 2.1, we give an overview of information privacy. Then, we in-troduce the techniques of how to protect information privacy in the practiceof social networks.Information secrecy that benefits a lot from Number Theory is oftenconfused with information privacy. Then, in Section 2.2, we also give a briefintroduction on this topic.2.1 Information PrivacyInformation privacy is an interdisciplinary concept, and its meaningvaries with cultures and historical stages [NHP11, Hol09, BJKL04]. S. War-ren et al. defined the concept of privacy from the law perspective [WB90].Because of the advancement of computer technology, such as big data storageand analytics, the Internet, and social networks, information privacy has be-come an increasing concern [JS05, CP02]. A potential danger is that mostactions in our daily lives are recorded on computers somewhere, e.g., thetrack of what we bought in supermarkets or our medical history. Improperdisclosure of such information can lead harmful effects [And96, WW96].Another hidden danger arises from the practice of data mining and socialnetworks analysis [Ale11]. The purposes of these two processes are to fig-ure out some patterns in large data sets and investigate social structuresin social networks. It requires us to reserve some statistical informationwhen we do the sanitization on the original information. However, it is atrade-off between preserving statistical information and hiding personal in-formation. If we sanitize the original information thoroughly, we will lose42.1. Information Privacystatistical information. On the other hand, if we reserve statistical infor-mation too much, adversaries may compromise individual privacy just likethe way of breaking deterministic cryptosystems mentioned in Subsection2.2. Furthermore, the results extracted from the data mining or social net-works analysis, such as classification rules, may make a breach on individualprivacy. The results may even lead to unfair treatments, e.g., the bias or dis-crimination on specific groups or races, when being used on decision tasks,see [PRT08, DL17, CCSZ13].Interactive and non-interactive frameworks are two models for privacymechanisms in the practice of data mining and social networks analytics[Dwo06]. In the setting of the interactive model, a trusted data collectorprovides some interfaces of queries through which users can get data. Inthe setting of the non-interfaces model, a data collector publishes the col-lected data after getting rid of identifiers of data, such as names and socialinsurance numbers. The corresponding techniques are called anonymizationtechniques or de-identification techniques in the literature.The results of the interactive case are powerful, see [AS00a, DMNS06].But the non-interactive case seems to be more difficult [EGS03]. A possiblereason is that it is difficult to supply the utility that has not been specifiedwhen the sanitization is carried out [Dwo06].T. Dalenius articulated a desideratum for the statistical database in1977 [Dal77]: nothing about an individual should be learnable from thedatabase that can not be learned without access to the database. Thisdesideratum means that anything that an adversary could learn from thedatabase can also be learned without the database so that accessing to thedatabase would not compromise individual privacy. This notion was definedby S. Goldwasser et al. as the semantic security which we will introducein Subsection 2.2. C. Dwork proved that this type of privacy could not beachieved [Dwo06], and then C. Dwork designed a relative guarantee calleddifferential privacy mechanism: any given disclosure will be, within a smallfactor, no matter whether the individual participates in the database. Belowis the formal definition of differential privacy mechanism in [Dwo06].Definition 2.1. [Dwo06](Differential privacy). A randomized function Kgives -differential ( > 0) if for all data sets D1 and D2 differing on at mostone element, and all S ⊆ Range(K),Pr[K(D1) ∈ S] ≤ exp()× Pr[K(D2) ∈ S].The -differential privacy mechanism matches the relative guarantee. Forexample, an insurance provider consults the database to decide whether to52.2. Information Secrecyinsure Tom, and the presence or absence of Tom in the database wouldnot affect the result significantly. F. McSherry proved that if we query a-differential privacy mechanism t times where each query is independent,then the result is t-differential privacy [McS09].Social networking services are used widely in our lives, such as Facebookand Linkedin. The popularity of social networks enables governments orthird-party institutions to collect the data of social networks which can bereleased for research purposes. The analysis of social networks can help usuncover previously unknown knowledge, e.g., community-based problems.In other fields, such as recommended systems, the analysis of social networksalso supports substantially [SANT14].However, these benefits are not free. An adversary can compromise theindividual privacy from the published data of social networks and makethe sensitive information of individuals disclosure. An approach to solvethis issue is to use anonymization techniques to remove identify attributesbefore releasing the data, see a brief survey on anonymization techniques andanonymized data of some social networks in [ZPL08, Les]. But, due to thecomplex structure of social networks, an adversary still can compromise theindividual privacy by the background knowledge of quasi-identifier attributesof victims, e.g., link relationship, neighborhoods, and embedded subgraphs.In [BDK07], L. Backstrom et al. described a family of privacy attacks inanonymized social networks. W. Peng et al. gave another example called thetwo-stage deanonymization attack in [PLZW14]. The example shows that anadversary can first register new users with connections to the targeted usersin a social network. Then the adversary creates edges between the newlyregistered users to construct a unique subgraph. After that, the adversaryidentifies the subgraph in the anonymized social network that is released sothat the adversary can re-identifies the targeted users.K. Liu et al. proposed a framework for the identity anonymization ongraphs under the background knowledge of adversaries is vertex degree[LT08]. Contemporaneously, B. Zhou et al. studied how to protect privacyin social networks against neighborhood attacks [ZP08].We refer to [NHP11, WYLC10] for further reading on the privacy-preservingpublication of social graphs.2.2 Information SecrecyIn the first century, Pliny the Elder described how the milk from a thi-tymallus plant could be used as invisible ink [Mat]. Unlike hiding the actual62.2. Information Secrecyinformation, croptography is a class of techniques to hide the meaning of in-formation. Cryptography makes sure the real meaning of information couldnot be revealed to the wrong receiver. Indeed, cryptography is the practiceand study of techniques for guaranteeing information secrecy by secure com-munications in the presence of the third parties called adversaries [Riv90].In cryptography, a cryptosystem is a term referring to a set of cryptographicalgorithms used for information security services [MOV01]. A cryptosystemcan hide the real meaning of data from adversaries. People may firstly usecryptosystems in the military field. Julius Caesar invented the Caesar cipherto protect messages of military significance around 50 B.C. [LP87]. Anotherfamous story about cryptosystems is that Alan Turing tackled the problemof Enigma [Cop04].A cryptosystem has three types of algorithms: (1) the key generation al-gorithm; (2) the encryption algorithm; (3) the decryption algorithm. Belowis the formal definition of a cryptosystem.Definition 2.2. [Buc04](Cryptosystem). A cryptosystem is a tuple(P, C,K, E ,D)with the following properties:1. P is a set. It is called the plaintext space. Its elements are calledplaintexts.2. C is a set. It is called the ciphertext space. Its elements are calledciphertexts.3. K is a set. It is called the key space. Its elements are called keys.4. E = {Ek|k ∈ K} is a family of functions Ek : P → C. Its elements arecalled encryption functions.5. D = {Dk|k ∈ K} is a family of functions Dk : C → P. Its elements arecalled decryption functions.6. For each e ∈ K, there is a d ∈ K such that Dd(Ee(p)) = p for all p ∈ P.We assume that an adversary can intercept ciphertexts and compute pos-terior probabilities for different plaintexts. In 1949, C. Shannon introducedthe definition of perfect secrecy against such adversaries for a cryptosystem[Sha49]. Informally, we say a cryptography system has perfect secrecy if theadversary could not get any information by intercepting the ciphertext. To72.2. Information Secrecyformalize this property mathematically, let Pr(X) be the prior probabilityfor a message X and Pr(X|Y ) be the posterior probability where Y is aciphertext. A cryptography system has perfect secrecy if Pr(X|Y ) = Pr(X)for all plaintexts X and all ciphertexts Y .In a cryptosystem, if encryption functions and decryption functions usethe same keys, the cryptosystem is symmetric; otherwise, the cryptosys-tem is asymmetric. Advanced Encryption Standard (AES) and Triple DataEncryption Standard (3DES) are two well-known symmetric cryptosystems[Sta05]. In asymmetric cryptosystems, the key e of encryption functions andthe key d of decryption functions are different. If a person wants to receivethe encrypted messages, he could publish an encryption key e and keep thecorresponding decryption key d. Anyone can use e to encrypt messages andsend the ciphertexts to him, and only he could decrypt the ciphertexts by d.Therefore, asymmetric cryptosystems are also called public key cryptosys-tems. R. Rivest et al. firstly described a practical public key cryptosystemRSA [RSA78].Symmetric cryptosystems are faster than asymmetric cryptosystems.Moreover, their implementations are not complicated due to their lowercomplexity compared to asymmetric cryptosystems. However, a significantdisadvantage of symmetric cryptosystems is that symmetric cryptosystemsrequire all participants have already been configured well with same keysthrough some external security channels.Both symmetric and asymmetric cryptosystems described above are de-terministic cryptosystems. We say a cryptosystem is deterministic whichmeans this cryptosystem always produces the same ciphertext for a givenplaintext and key in separate executions. Deterministic cryptosystem mayleak information to an adversary. An adversary can do a statistical analysisof ciphertexts by listening to the encrypted channel and then construct adictionary of pairs of plaintexts and ciphertexts if they have a close enoughappearance frequency. For example, if plaintexts are English sentences andthe appearance frequency of a ciphertext is close to the appearance fre-quency of the word the in English sentences, then the adversary could saythis ciphertext corresponding to the plaintext the. This problem is seriousin public key cryptosystems. As any part in public key cryptosystems canencrypt plaintexts by a public key, an adversary could build the dictionaryof pairs of plaintexts and ciphertexts actively.To overcome this problem, S. Goldwasser et al. proposed a probabilisticpublic key cryptosystem in 1984 [GM84]. The probabilistic public cryp-tosystem publishes a pair of integers as a public key. The sender can use thepublic key to encrypt a plaintext into a random ciphertext, and the receiver82.2. Information Secrecycan decrypt the random ciphertext by the corresponding decryption key. S.Goldwasser et al. proved that their cryptosystem is polynomial security andsemantic security [GM84]. Informally, we say that a public key cryptosystemis polynomial security if, for all plaintexts with any probability distribution,an adversary could not find two plaintexts m1 and m2 whose ciphertexts aredistinguishable in polynomial time of the length of plaintexts. Informally,we say that a public key cryptosystem is semantic security if the informationthat an adversary can compute from a given ciphertext could also be com-puted without the ciphertext. We use two games1 to illustrate the conceptof semantic security.Game 1:Let f be a function defined on all plaintexts. Both us and the adversaryknow this function. We randomly choose a plaintext m. Then we ask anadversary to guess the value of f(m) without being told the value of m.Game 2:The adversary choose a function fE defined on all plaintexts, and thenthe adversary tells us the function. After that, we randomly choose a plain-text m and do not tell the adversary this plaintext. We compute the cipher-text for this plaintext and give this ciphertext to the adversary. Then weask the adversary to guess the value of fE(m).We say a cryptosystem is semantic security if the adversary can not winGame 2 with higher probability than Game 1.1These two games also come from [GM84]9Chapter 3Background Knowledge andResultsIn Section 3.1, we introduce the complexity classes of P, NP, NP-complete,and NP-hard. In Section 3.2, we show the random graph model and theprobabilistic method used in Chapter 5. Moreover, in Section 3.3, we in-troduce the formal definitions of k-antiresolving set, k-antiresolving basis,k-metric antidimension, and (k, l)-anonymity. In Section 3.4, we show ourresults and the related works.3.1 The Complexity Classes of ProblemsIf the answer to a problem is yes or no, we say this problems is a deci-sion problem. On the other hand, if a problem asks us to get a maximum orminimum solution, we say this problem is an optimization problem. Prob-lems can also be categorized by the level of their difficulty. To illustratethe difficulty of problems, we first introduce the concepts of tractable andintractable problems.Given an algorithm whose input size is n, if the running time of thealgorithm is bounded by O(nk), where k is a constant, we say the algorithmis a polynomial-time algorithm. Moreover, if a problem has a polynomial-time algorithm, we say the problem is tractable, and if a problem has nopolynomial-time algorithm, we say the problem is intractable.Informally, we define the class of decision problems that can be solved bya polynomial-time algorithm as the class P [CJW+06]. Therefore, the classP is tractable. For example, given two integer arrays A and B, the problemof whether A is the sorted array of B belongs to the class P. We can sort Bin polynomial time of the length of B and then compare the sorted array toA.We define the class NP as the class of decision problems that given asolution whose size is a polynomial of the problem input, we can verifywhether the solution is correct in polynomial time [CLRS09]. If a problem103.1. The Complexity Classes of Problemsbelongs to the class NP and its hardness is as hard as any problem in theclass NP, we say the problem is NP-complete. In 1971, A. Cook foundthe first NP-complete problem: the boolean satisfiability problem [Coo71].Under the study of NP-complete problems, researchers found many NP-complete problems, e.g., the clique problem, 3-CNF-SAT problem, and theexact cover by 3-sets problem (X3C).Given two problems Q and Q′, if any instance of Q can be solved byan algorithm of a polynomial number of primitive steps and a polynomialnumber of calls to an algorithm for Q′, we say that Q can be reduced to Q′in polynomial-time and denoted by Q ≤P Q′ .Below is an example of a polynomial-time reduction in [CLRS09]. Givenan instance of ax+ b = 0, we transform it to 0x2 + ax+ b = 0. The solutionof 0x2 + ax+ b = 0 is also the solution of ax+ b = 0.By the definition of the polynomial-time reduction, we can explain themeaning of a problem is as hard as another problem and give the formaldefinition of NP-complete and NP-hard. For two NP problems A and B, ifA ≤P B, we say A is no more than a polynomial factor harder than B .Definition 3.1. [CLRS09](NP-complete). A problem Q is NP-complete if1. Q ∈ NP, and2. Q′ ≤P Q for any Q′ ∈ NP.If a problem does not satisfy Property 1, e.g., the problem is an opti-mization problem, but satisfies Property 2, we say the problem is NP-hard.If a decision problem can be solved by a polynomial-time algorithm,obviously a solution to this problem can be verified in polynomial time.Therefore, the class P is a subset of the class NP. But whether every decisionproblem whose solution can be verified in polynomial time can also be solvedin polynomial time is an open question. If the answer is true, it means P =NP, and the class NP is also tractable.A possible way to prove P = NP is to find a polynomial-time algorithmfor an NP-complete problem. However, to our best knowledge, there is nopolynomial-time algorithm for any NP-complete problem, and no one hasgiven the proof of the class NP-complete is intractable.The problem of whether P = NP is one of seven Millennium Prize Prob-lems. The prize to the first correct solution to this problem is one milliondollars [Dev02].Besides the complexity classes we described above, there are other com-plexity classes, e.g., PSPACE - the class of decision problems can be solved in113.1. The Complexity Classes of Problemspolynomial space and PSPACE-complete - the hardest problems in PSPACE[AB09].Furthermore, some problems can not be solved by any computer, e.g.,the halting problem. The halting problem is the problem of determining,from a description of an arbitrary computer program and an input, whetherthe program will stop or run forever. A. Turing proved that there is nogeneral algorithm to solve the halting problem [Tur36].In the rest of this section, to illustrate how to reduce an NP-completeproblem to another NP problem, we give a revised NP-completeness proofof the X3C problem by a reduction from 3-CNF-SAT problem.Definition 3.2. [CLRS09](3-CNF-SAT problem).A literal in a boolean formula is an occurrence of a variable or its negation.A boolean formula is in conjunctive normal form, or CNF, if it is expressedas an AND of clauses, each of which is the OR of one or more literals. Aboolean formula is in 3-conjunctive normal form, or 3-CNF, if each clausehas exactly three distinct literals.Below is the definition of X3C problem.Definition 3.3. [GJ79](X3C problem).Given a set B = {e1, ..., e3q} and a family S = {S1, ...,Sp} of 3-elementsubsets of B, does S contain a subfamily such that every element in Boccurs in exactly one member of the subfamily?Theorem 3.4. [GJ79] X3C problem is NP-complete.Proof. Let us suppose that an instance of 3-CNF-SAT problem has n booleanvariables: x1, ..., xn and k clauses: C1, ...Ck. We give a O(n2k2)-time reduc-tion from this instance to an instance of X3C problem.Reduction:1. The elements:(a) For each variable xi, we create 4k elements:ai,1, ..., ai,2k and bi,1, ..., bi,2k.(b) For each clause Cj , we create 2 elements: cj and c′j .(c) Futhermore, we create 2(n− 1)k elements:q1, ..., q(n−1)k and q′1, ..., q′(n−1)k.123.1. The Complexity Classes of Problems2. The 3-element subsets:(a) We suppose that a clause Cj contains the variables xn1 , xn2 , andxn3 . If xn1 appears as a positive literal in Cj , we create a subset{cj , c′j , bn1,2j−1}; otherwise, we create a subset {cj , c′j , bn1,2j}. xn2and xn3 apply the same reduction.(b) We create (n− 1)k subsets: {q1, q′1, bi,j}, ..., {q(n−1)k, q′(n−1)k, bi,j}for each bi,j .(c) For ai,1, ..., ai,2k, we create 2k subsets: {ai,j , ai,(j+1), bi,j} wherej ∈ [1, 2k − 1] and {ai,2k, ai,1, bi,2k}.As the number of elements created is4kn+ 2k + 2(n− 1)kand the number of 3-element subsets is3k + 2n(n− 1)k2 + 2nk,the reduction can finish in O(n2k2)-time. For example, let us consider a3-CNF-SAT problem instance of 4 variables and 2 clauses where C1 =(x1∨x2 ∨ x3)and C2 =(x2 ∨ x3 ∨ x4). After the reduction, we have a set B ofelements:B = {c1, c′1, c2, c′2,a1,1, a1,2, a1,3, a1,4,b1,1, b1,2, b1,3, b1,4,a2,1, a2,2, a2,3, a2,4,b2,1, b2,2, b2,3, b2,4,a3,1, a3,2, a3,3, a3,4,b3,1, b3,2, b3,3, b3,4,a4,1, a4,2, a4,3, a4,4,b4,1, b4,2, b4,3, b4,4,q1, q2, q3, q4, q5, q6,q′1, q′2, q′3, q′4, q′5, q′6}133.1. The Complexity Classes of Problemsand a family S of 3-element subsets of B:S ={{c1, c′1, b1,1}, {c1, c′1, b2,2}, {c1, c′1, b3,1},{c2, c′2, b2,3}, {c2, c′2, b3,4}, {c1, c′1, b4,3},{a1,1, a1,2, b1,1}, {a1,2, a1,3, b1,2}, {a1,3, a1,4, b1,3}, {a1,4, a1,1, b1,4},...,{a4,1, a4,2, b4,1}, {a4,2, a4,3, b4,2}, {a4,3, a4,4, b4,3}, {a4,4, a4,1, b4,4},{q1, q′1, b1,1}, {q1, q′1, b1,2}, {q1, q′1, b1,3}, {q1, q′1, b1,4},{q1, q′1, b2,1}, {q1, q′1, b2,2}, {q1, q′1, b2,3}, {q1, q′1, b2,4},{q1, q′1, b3,1}, {q1, q′1, b3,2}, {q1, q′1, b3,3}, {q1, q′1, b3,4},{q1, q′1, b4,1}, {q1, q′1, b4,2}, {q1, q′1, b4,3}, {q1, q′1, b4,4},...,{q6, q′6, b1,1}, {q6, q′6, b1,2}, {q6, q′6, b1,3}, {q6, q′6, b1,4},{q6, q′6, b2,1}, {q6, q′6, b2,2}, {q6, q′6, b2,3}, {q6, q′6, b2,4},{q6, q′6, b3,1}, {q6, q′6, b3,2}, {q6, q′6, b3,3}, {q6, q′6, b3,4},{q6, q′6, b4,1}, {q6, q′6, b4,2}, {q6, q′6, b4,3}, {q6, q′6, b4,4}}.Now we show that a 3-CNF-SAT problem is satisfiable if and only if thereduced X3C problem has an exact cover.Lemma 3.5. Given a 3-CNF-SAT problem instance, if the problem is sat-isfiable, the reduced X3C problem has an exact cover.Proof. As the 3-CNF-SAT problem is satisfiable, there is such an assignmentof variables that makes each clause is true. If the assignment of a variablexi is true and xi appears as a positive literal in a clause Cj , the elementscj , c′j can be exactly covered by the subsets {cj , c′j , bi,2j−1}.Similarly, if the assignment of xi is false and xi appears as a negativeliteral in a clause Cj , the elements cj , c′j can be exactly covered by the subsets{cj , c′j , bi,2j}.For the same i, the values of j of the covered elements bi,j above are alleven or odd, depending on the assignment of xi.After applying the above procedure for all variables, we have that allelements cj , cj′ are exactly covered. If not, there exists such a clause thatthe assignments of its variables appearing as positive literals are false, andthe assignments of its variables appearing as negative literals are true. Thus,the clause is not satisfiable which is a contradiction.143.1. The Complexity Classes of ProblemsNow, let us consider how to exactly cover the elements ai,1, ..., ai,2k. If bi,jof an even number j have been covered, ai,1, ..., ai,2k can be exactly coveredby the subsets{ai,1, ai,2, bi,1}, {ai,3, ai,4, bi,3}, ..., {ai,2k−1, ai,2k, bi,2k−1}.Otherwise, ai,1, ..., ai,2k can be exactly covered by the subsets{ai,2, ai,3, bi,2}, {ai,4, ai,5, bi,4}, ..., {ai,2k, ai,1, bi,2k}.After covering ai,1, ..., ai,2k, we know that k elements bi,j are also coveredwhere j are all even or odd. Then, there are (n−1)k elements bi,j and (n−1)kpairs of qi and q′i uncovered.These elements can be exactly covered by the subsets {qi, q′i, bi′,j′}. Asshown in the reduction, there are exactly (n−1)k pairs of qi and q′i; for eachpair of qi and q′i, there are 2nk subsets {qi, q′i, bi′,j′}. Therefore, there is anexact cover for the reduced X3C problem.Lemma 3.6. Given a 3-CNF-SAT problem instance, if the reduced X3Cproblem has an exact cover, the 3-CNF-SAT problem is satisfiable.Proof. For a variable xi, we suppose that two clauses Cj and C′j both containxi with different literals. Without loss of generality, we suppose that xiappears as a positive literal in Cj and a negative literal in Cj′ . Then, wecan prove that the case of {cj , c′j , bi,2j−1} and {cj′ , c′j′ , bi,2j′} are in the exactcover does not exist.By the reduction, the elements ai,1, ..., ai,2k can only be exactly coveredby{ai,1, ai,2, bi,1}, {ai,3, ai,4, bi,3}, ..., {ai,2k−1, ai,2k, bi,2k−1}or{ai,2, ai,3, bi,2}, {ai,4, ai,5, bi,4}, ..., {ai,2k, ai,1, bi,2k},which means that we need k elements bi,j where j are all even or odd toexactly cover ai,1, ..., ai,2k.If {cj , c′j , bi,2j−1} and {cj′ , c′j′ , bi,2j′} are in the exact cover, there are atmost k − 1 uncovered elements bi,j where j are all even or odd. Then, thisobservation leads to a contradiction that the elements ai,1, ..., ai,2k can notbe exactly covered.Therefore, if the pair of cj , c′j is covered by {cj , c′j , bi,2j−1}, the pair ofcj′ , c′j′ would not be covered by {cj′ , c′j′ , bi,2j′}, and vice versa.Now we show that there exists an assignment of variables such that makeseach clause is true. Let xi be true, if a pair of cj , c′j is covered by the subset153.2. Erdo˝s-Re´nyi Random Graphs and the Probabilistic Method{cj , c′j , bi,2j−1}; otherwise, let xi be false if the pair is covered by the subset{cj , c′j , bi,2j}. By the reduction, the existing of subset {cj , c′j , bi,2j−1} meansthat xi appears as a positive literal in Cj . Therefore, the true assignment ofxi makes the clause Cj is true. Similarly, the existing of subset {cj , c′j , bi,2j}means that xi appears as a negative literal in Cj , and the assignment of xibe false makes the clause Cj is true.As 3-CNF-SAT problem is NP-complete [CLRS09], Lemma 3.5 and 3.6lead to Theorem Erdo˝s-Re´nyi Random Graphs and theProbabilistic MethodIn this section, we introduce the Erdo˝s-Re´nyi random graphs. Moreover,we list the main inequalities used in the probabilistic method.In 1959, P. Erdo˝s and A. Re´nyi introduced the random graph modelnow named after them [ER59]. Contemporaneously, E. Gilbert introduceda closely related random graph model independently [Gil59].Definition 3.7. [ER59, Gil59] (G(n,M) model and G(n, p) model). In theG(n,M) model, a graph with n vertices and M edges is chosen uniformlyat random from the collection of all graphs with n vertices and M edges. Inthe G(n, p) model, a graph with n vertices is built by connecting any twovertices with the same probability independently.In this thesis, we analyze the properties of k-antiresolving sets in thegraph built by the G(n, p) model.In the following section, we introduce some inequalities in the probabilis-tic method. The first one is Markov’s inequality [Als11].Theorem 3.8. [Als11] (Markov’s inequality). If X(integrable) is a nonneg-ative random variable, then for any real number a > 0,Pr{X ≥ a} ≤ E(X)a.163.2. Erdo˝s-Re´nyi Random Graphs and the Probabilistic MethodProof.E(X) =∫ ∞0x · dF (X)≥∫ ∞ax · dF (X)≥∫ ∞aa · dF (X)= a ·∫ ∞adF (X)= a · Pr{X ≥ a}If X is a nonnegative integral valued random variable, then we knowPr{X ≥ 1} = Pr{X > 0}.By plugging a = 1 into Markov’s inequality, we get a special case of Markov’sinequality.Proposition 3.9. [AS00b] If X is a nonnegative integral valued randomvariable, thenE(X) ≥ Pr{X > 0}.By using Markov’s inequality, we can get Chebyshev’s inequality.Theorem 3.10. [AS00b] (Chebyshev’s inequality). If X(integrable) is arandom variable with finite expected value µ and finite non-zero varianceσ2, then for any real number k > 0Pr{|X − µ| ≥ kσ} ≤ 1k2.Proof.Pr{|X − µ| ≥ kσ} = Pr{|X − µ|2 ≥ k2σ2}≤ E(|X − µ|2)k2σ2(by Markov’s inequality)=σ2k2σ2=1k2173.2. Erdo˝s-Re´nyi Random Graphs and the Probabilistic MethodBy setting kσ = µ, we can prove Theorem 4.3.1 in [AS00b].Theorem 3.11. [AS00b]Pr{X = 0} ≤ σ2µ2Proof.Pr{X = 0} ≤ Pr{|X − µ| ≥ kσ} ≤ 1k2=σ2µ2.We suppose that X is a random variable can be decomposed byn∑i=1Xiwhere Xi is the indicator random variable for an event Ai. We say X1, ..., Xnare symmetric if for every i 6= j there is an automorphism of the underlyingprobability space that sends the event Ai to the event Aj . For indices i, j,i ∼ j means the events Ai and Aj are not independent. Then we have thefollowing theorem.Theorem 3.12. [AS00b] Let X is a random variable can be decomposedbyn∑i=1Xi where Xi is the indicator random variable for an event Ai, andX1, ..., Xn are symmetric. Let∆∗ =∑i∼jPr{Aj |Ai}.Then,Var(X) ≤ E(X) · (1 + ∆∗).Proof. Note thatVar(X) =∑iVar(Xi) +∑i 6=jCov(Xi, Xj)whereCov(Xi, Xj) = E(XiXj)− E(Xi) · E(Xj).As Xi is the indicator random variable for the event Ai, thenVar(Xi) = Pr{Ai} · (1− Pr{Ai}).183.2. Erdo˝s-Re´nyi Random Graphs and the Probabilistic MethodClearly,Var(Xi) ≤ Pr{Ai} = E(Xi).Thus, ∑iVar(Xi) ≤ E(X).AsCov(Xi, Xj) = Pr{AiAj} − Pr{Ai}Pr{Aj}= Pr{Ai} ·(Pr{Aj |Ai} − Pr{Aj}),if Ai and Aj are independent, thenCov(Xi, Xj) = 0.Otherwise,Cov(Xi, Xj) ≤ Pr{Ai} · Pr{Aj |Ai}.Therefore, ∑i 6=jCov(Xi, Xj) ≤ ∆∗ ·(∑iPr{Ai})= ∆∗ · E(X).Moreover, in Chapter 5, we will apply Boole’s inequality (also called asthe union bound in discrete mathematics) and Fre´chet inequalities (or calledas Boole-Fre´chet inequalities).Theorem 3.13. [LB11] (Boole’s inequality). For a finite set of eventsA1, A2, ..., An,Pr{ n⋃i=1Ai}≤n∑i=1Pr{Ai}.Theorem 3.14. [LB11] (Fre´chet inequalities). For a finite set of eventsA1, A2, ..., An,Pr{ n⋂i=1Ai}≤ mini∈[1,n](Pr{Ai}) ≤ maxi∈[1,n](Pr{Ai}).Besides the above introduced inequalities, we also use Chernoff Boundand Azuma-Hoeffding inequality. We will show their descriptions and proofsin Appendix B.193.3. k-Metric Antidimension and (k, l)-Anonymity3.3 k-Metric Antidimension and (k, l)-AnonymityThe definition of k-metric antidimension comes from the concept of met-ric dimension in graph theory. To illustrate the metric dimension, we firstintroduce the concept of metric representation. Below is the formal defini-tion of metric representation in [TRY16b].Definition 3.15. [TRY16b] (Metric representation). Let G = (V,E) bea simple connected graph and dG(u, v) be the length of a shortest pathbetween the vertices u and v in G. For a set S = {u1, ..., ut} of vertices inV and a vertex v, we call the t-tuple r(v|S) := (dG(v, u1), ..., dG(v, ut)) themetric representation of v with respect to S.Given a simple connected graph G, let S be a set of vertices in G. Forany two vertices u, v /∈ S, if they have different metric representations withrespect to S, then we call S as a resolving set. The metric dimension of Gis the minimum cardinality of a resolving set.In [KRR96], S. Khuller et al. showed a proof of that the problem of find-ing the metric dimension of an arbitrary graph is NP-hard. In [EMRVY13],A. Estrada-Moreno et al. generalized the metric dimension by k-metric di-mension. A vertex set S is a k-metric generator if for any two verticesu, v /∈ S, they are distinguished by at least k vertices in S. The minimumcardinality of a k-metric generator is the k-metric dimension. They provedthat the problem of finding k-metric dimension is NP-hard.A resolving set is what an adversary wants to plant in a social net-work graph. To measure the resistance against adversaries’ privacy attackson anonymized social network graphs whose background knowledge is themetric representation, R. Trujillo-Rasua et al. introduced the concept ofk-antiresolving set. Below is the formal definition of k-antiresolving set in[TRY16b].Definition 3.16. [TRY16b] (k-antiresolving set). Let G = (V,E) be asimple connected graph and let S = {u1, ..., ut} be a subset of vertices of G.The set S is called a k-antiresolving set if k is the greatest positive integersuch that for every vertex v ∈ V \ S, there exist at least k − 1 differentvertices v1, ..., vk−1 ∈ V \ S with r(v|S) = r(v1|S) = ... = r(vk−1|S), i.e., vand v1 , ..., vk−1 have the same metric representation with respect to S.With the k-antiresolving set, R. Trujillo-Rasua et al. also defined thek-metric antidimension and k-antiresolving basis in [TRY16b].Definition 3.17. [TRY16b] (k-metric antidimension and k-antiresolvingbasis). The k-metric antidimension of a simple connected graph G = (V,E)203.4. Results and Related Worksis the minimum cardinality amongst the k-antiresolving sets in G and isdenoted by adimk(G). A k-antiresolving set of cardinality adimk(G) is calleda k-antiresolving basis for G.If an adversary controls some vertices in an anonymized network graph,the maximum probability that the adversary can distinguish others users bytheir metric representations with respect to the adversary’s controlled ver-tices is an important measure of privacy in the graph. The formal definitionof this measure is called (k, l)-anonymity [TRY16b].Definition 3.18. [TRY16b] ((k, l)-anonymity). A graph G meets (k, l)-anonymity with respect to active attacks if k is the smallest positive integersuch that the k-metric antidimension of G is lower than or equal to l.3.4 Results and Related WorksWe first studied the computational complexity of adimk(G) and (k, l)-anonymity. Also, we studied the behavior of k-antiresolving sets in G(n, p)where k and p are constants. Our main results are: (1) a reduction fromX3C problem to a decision version of the problem of computing adimk(G);(2) three bounds on the size of k-antiresolving sets in G(n, p) with constantp.The reduction proves that the decision version of the problem of comput-ing adimk(G) is NP-complete. Thus, the problem of computing adimk(G) isNP-hard. From this conclusion, we show that the (k, l)-anonymity problemis NP-complete.For G(n, p) random graphs, we use w.h.p. as the abbreviation of withhigh probability to mean that the occurrence probability of an event tendsto 1 when n tends to infinity. Then, we establish the following three boundsof the size of k-antiresolving sets in G(n, p).The first bound on the size of k-antiresolving sets is such an upper boundthat w.h.p. there is no k-antiresolving set where k ∈ O(1). The secondbound is such a lower bound that w.h.p. there is no k-antiresolving setwhere k ∈ ω(1). The last bound is such a lower bound that w.h.p. there isat least one k-antiresolving set where k ∈ O(1).By the first and the last bound, we establish a range of the k-metricantidimension where k ∈ O(1) in random graphs G(n, p).In [CDM+16], T. Chatterjee et al. also proved that the problem of com-puting adimk(G) is NP-hard independently. Moreover, they gave a O(n3)-time (1 + ln(n− 1))-approximation algorithm for 1-antiresolving basis.213.4. Results and Related WorksIn [MTRX16], S. Mauw et al. provided an efficient method to transforma graph G into another graph G′ such that G′ is not (1,1)-anonymity. Themethod is only based on the edge addition operation.In [TRY16a], R. Trujillo-Rasua et al. studied how to decide whether agraph is 1-metric antidimensional. They provided characterizations for 1-metric antidimensional trees and unicyclic graphs. Futhermore, they gaveefficient algorithms to decide whether these two types of graphs are 1-metricantidimensional.22Chapter 4The Complexity ofComputing adimk(G) and(k, l)-AnonymityIn this chapter, we prove that the problem of computing adimk(G) inan arbitrary simple connected graph is NP-hard. We prove this result by apolynomial-time reduction from X3C problem to a decision version of theproblem of computing adimk(G). Then, we show that the (k, l)-anonymityis also NP-complete.4.1 The Complexity of Computing k-MetricAntidimensionWe first introduce some notations used in the proof. Let G = (V,E) bea simple connected graph, S be a subset of V , and v be a vertex in V \ S.We defineNS (v) = {u : u ∈ S, u and v are neighbors}.For two vertices u, v in V \S, we use u =S v to mean r(u|S) = r(v|S) wherer(u|S) and r(v|S) are the metric representations of u and v with respect toS (see Definition 3.15). Moreover, we denote{v : v ∈ V \ S, r(v|S) = r′}by VS [r′] for a metric representation r′ with respect to S. Then, we showtwo properties of k-antiresolving sets in simple connected graphs.Proposition 4.1. Let S′ be a subset of a k-antiresolving set S in a simpleconnected graph G. If there is such a metric representation r with respect toS′ that 0 < |VS′ [r]| < k, then VS′ [r] ⊂ S.By the definition of k-antiresolving set, we know this proposition is true.234.1. The Complexity of Computing k-Metric AntidimensionProposition 4.2. Let G be a simple connected graph, S be a k-antiresolvingset in G with k ≥ 3, and Pn = {vp1 , ..., vpn} be a path in G satisfying : (1)the degree of the vertex vpn is equal to 1; (2) the degree of the vertex vpi isequal to 2 where i ∈ [2, n−1]. Then, Pn ⊆ S, if any vpi ∈ S where i ∈ [2, n].Proof. By the definition of k-antiresolving set, for a vertex v in S,|NV \S (v)| = 0or|NV \S (v)| ≥ k.Therefore, if a vertex vpi ∈ S where i ∈ [2, n], its neighborhood{vpi−1 , vpi+1} ⊂ S.Similarly, the neighborhood of vpi−1 is a subset of S if i − 1 > 1, and theneighborhood of vpi+1 is a subset of S if i + 1 < n. By repeating thisobservation, we get that Pn ⊆ S.We prove that the following decision version of the problem of com-puting adimk(G) is NP-complete: given two integers k and m, is there ak-antiresolving set of the cardinality is less than or equal to m in a simpleconnected graph G = (V,E)?Given a subset of vertices in V , in polynomial time of |V |, we can verifythat whether the cardinality of the subset is less than or equal to m, andwhether the subset is a k-antiresolving set. Therefore, this decision problembelongs to the class NP.We prove the NP-completeness by a reduction from the X3C problem.4.1.1 A Reduction from the X3C ProblemGiven an instance of the X3C problem: a set B = {e1, ..., e3q} and afamily S = {S1, ...,Sp} of 3-element subsets of B, we suppose p − q ≥ 12.If not, we create such a set B′ = {e3q+1, ..., e3q+36} and a family S ′ ={Sp+1, ...,Sp+24} of 3-element subsets of B′ that S ′ contains an exact coverfor B′. Then, we get a new instance of X3C with B ∪ B′ and S ∪ S ′ thatsatisfies our assumption. Clearly, S ∪ S ′ contains an exact cover for B ∪B′if and only if S contains an exact cover for B.Let n be such an integer that b(p − q)/3c < n < b(p − q)/2c. Weconstruct a simple connected graph G = (V,E) with |V | = 3qn(q + 2) + pfrom an instance of X3C with 3q elements and p subsets as follows. Notethat n < b(p − q)/2c. Thus, we can get G in polynomial time of p and q.Fig. 4.1 shows the gadget corresponding to a subset Si = {ea, eb, ec}.244.1. The Complexity of Computing k-Metric Antidimension1. Vertices(a) For each Si, we create a vertex vSi .(b) For each ei, we create n vertices vei,1 , ..., vei,n .(c) For each vei,j , we create q + 1 vertices vpi,j,1 , ..., vpi,j,q+1 .2. Edges(a) We create edges {vSi , vSj} where i 6= j.(b) We create edges {vei,j , vei′,j′} where i 6= i′ or j 6= j′.(c) For each vpi,j,1 , ..., vpi,j,q+1 , we create a path = {vpi,j,1 , ..., vpi,j,q+1}.(d) For each vei,j , we create edges {vei,j , vpi′,j′,1} where i 6= i′ or j 6= j′.(e) If a subset Si = {ea, eb, ec}, we create edges {vSi , vea,j}, {vSi , veb,j},and {vSi , vec,j} for all j ∈ [1, n].vea,1 .... vea,n veb,1 .... veb,n vec,1 .... vec,nvSivpa,1,1........vpa,1,q+1.............. vpa,n,1........vpa,n,q+1vpb,1,1........vpb,1,q+1.............. vpb,n,1........vpb,n,q+1vpc,1,1........vpc,1,q+1.............. vpc,n,1........vpc,n,q+1Figure 4.1: A gadget for a subset Si = {ea, eb, ec}254.1. The Complexity of Computing k-Metric Antidimension4.1.2 The NP-completeness Proof: Sufficient ConditionLemma 4.3. If the X3C problem has an exact cover Sc, there exists such a(p− q)-antiresolving set S in G that |S| ≤ q.Proof. Let VSc and VS be the sets of vertices corresponding to the subsets inSc and S. Then, there are three types of metric representations with respectto VSc :1. the q-tuple (1, ..., 1) for the vertex in VS \ VSc ;2. a q-tuple where only one element is 1 and the remaining elements are2 for the vertex vei,j ;3. the q-tuple (`+ 1, ..., `+ 1) for the vertex vpi,j,` .As Sc is an exact cover, then|VSc | = q and |VS \ VSc | = p− q.Thus, the number of vertices having the first type of metric representationswith respect to VSc is p− q.Clearly, for a vertex vei,j , there are 3n− 1 different vertices vei′,j′ wherevei′,j′ =VScvei,j . Moreover, for a vertex vpi,j,l , there are 3qn − 1 differentvertices vpi′,j′,l where vpi′,j′,l =VScvpi,j,l .As 3qn > 3n > p− q, then VSc is a (p− q)-antiresolving set.4.1.3 The NP-completeness Proof: Necessary ConditionLemma 4.4. If there is such a (p−q)-antiresolving set S in G that |S| ≤ q,the X3C problem has an exact cover.Proof. We claim three facts:1. the vertex vei,j /∈ S and the vertex vpi,j,l /∈ S;2. |S| = q;3. the metric representation of vei,j with respect to S has one elementvalued 1 and the remaining |S| − 1 elements valued 2.The vertex vpi,j,l where l ∈ [2, q + 1] is not in S. If not, by Proposition 4.2,the path {vpi,j,1 , ..., vpi,j,q+1} ⊆ S due to that p − q ≥ 12. But, the lengthof the path {vpi,j,1 , ..., vpi,j,q+1} is q + 1 which leads to a contradiction that|S| ≥ q + 1.264.1. The Complexity of Computing k-Metric AntidimensionSimilarly, the vertex vei,j is not in S. If not, we suppose that a vertexvei,j is in S. The metric representation of the corresponding vpi,j,q+1 withrespect to {vei,j} is (q+2), and only this vertex has the metric representation(q + 2) with respect to {vei,j}. Then, by Proposition 4.1, the vertex vpi,j,q+1should be in S which is a contradiction.Now, we consider two cases for the vertex vpi,j,1 :1. more than two vertices vpi,j,1 and vpi′,j′,1 are in S;2. only one vertex vpi,j,1 is in S.In Case 1, the metric representation of the corresponding vpi,j,2 withrespect to {vpi,j,1 , vpi′,j′,1} is (1, 3). Only this vertex has the metric represen-tation (1, 3) with respect to {vpi,j,1 , vpi′,j′,1}. Thus, by Proposition 4.1 again,the vertex vpi,j,2 should be in S which is a contradiction.In Case 2, as {vpi,j,1} is not a (p − q)-antiresolving set, there is at leasta vertex vSi′ in S. Similarly, only the corresponding vertex vpi,j,2 has themetric representation (1, 3) with respect to {vpi,j,1 , vSi′} which leads to thesame contradiction as Case 1.To prove |S| = q, we first suppose |S| < q. Then more than p−q verticesvSi /∈ S haver(vSi |S) = (1, ..., 1).Moreover, 3qn vertices vpi,j,l where l ∈ [1, q + 1] haver(vpi,j,l |S) = (l + 1, ..., l + 1).Only vertices vei,j may have the metric representations with respect to S dif-ferent from the above two metric representations with respect to S. However,for any vei,j , the number of vertices having the same metric representationof vei,j with respect to S is an integer multiple of n. Note thatbp− q3c < n < bp− q2cwhich means an integer multiple of n is not equal to p− q. Hence, S is not a(p− q)-antiresolving set that is a contradiction. Therefore, we have |S| ≥ q.As |S| ≤ q, then |S| = q.By the discussion in the above proof, the metric representation of vei,jwith respect to S should not be (1, ..., 1). Futhermore, ifr(vei,j |S) 6= (2, ..., 2),we can prove that there is only one element valued 1 in r(vei,j |S).274.2. The Computational Complexity of the (k, l)-Anonymity ProblemIf not, there are such three vertices va, vb, and vc in S thatr(vei,j |{va, vb, vc}) = (1, 1, 2).As S is a (p − q)-antiresolving set, there should be such 3n − 1 differentvertices vei′,j′ thatr(vei′,j′ |{va, vb, vc}) = (1, 1, 2).Because by the assumption of n, only 3n is greater than p − q. Let Saand Sb be the 3-element subsets in S corresponding to va and vb. By theconstruction of G, we get Sa = Sb which contradicts that no subsets in Sare equal.Also, we prove that r(vei,j |S) 6= (2, ..., 2). If not, let Ve be the set ofvertices vei,j that are adjacent to at least one vertex in S. For a vertexv ∈ Ve, r(v|S) has only one element valued 1, and there should be such3n − 1 different vertices v′ ∈ Ve that v′ =S v. As |Ve| < 3qn, at least onevertex in S is not adjacent to any vertex vei,j . Let S ′ be the 3-element subsetin S corresponding to this vertex in S that has no connection to any vei,j .Then, we have that S ′ = ∅ which contradicts that S ′ is a 3-element subsets.Let Sc be the subfamily of 3-element subsets in S corresponding to thevertices in S. By the three claims, we know that Sc is an exact cover forB.4.1.4 The Problem of Computing k-Metric Antidimensionis NP-hardTheorem 4.5. The problem of computing adimk(G) is NP-hard.Proof. Lemma 4.3 and 4.4 complete a polynomial-time reduction from theX3C problem to the decision version of the problem of computing adimk(G).As the X3C problem is NP-complete, the decision version of the problem ofcomputing adimk(G) is also NP-complete.Clearly, the answer of the decision version problem is true if and onlyif adimk(G) ≤ m. Therefore, the problem of computing adimk(G) is NP-hard.4.2 The Computational Complexity of the(k, l)-Anonymity ProblemCorollary 4.6. The (k, l)-anonymity problem is NP-complete.284.2. The Computational Complexity of the (k, l)-Anonymity ProblemProof. [CDM+16] shows that computing adim1(G) is NP-hard. Then, wegive an algorithm that can compute adim1(G) in O(|V |)-time of calling(k, l)-anonymity.Let L initialize by 1. Then, the algorithm does the following loop untilL = |V | − 1. In the loop, the algorithm checks whether G meets (1,L)-anonymity. If the answer is true, the algorithm stops the loop. Otherwise,the algorithm increases L by 1 and repeats the loop. After the loop, thealgorithm returns L.By the definition of the (k, l)-anonymity problem, the return value of thisalgorithm is adim1(G). Thus, the (k, l)-anonymity problem is NP-complete.29Chapter 5The k-Metric Antidimensionin Random GraphsAs shown in Corollary 4.6, it is hard to get an exact relationship betweenk and l. To estimate the relationship between these two parameters, westudy the k-metric antidimension in random graphs. We wish this studycould help us characterize the trade-off between the level of anonymity andthe minimum cost of achieving such level of anonymity. For k ∈ O(1) andk ∈ ω(1), we establish three bounds on the size of k-antiresolving sets inG(n, p) where p is constant.5.1 The Definition of Relaxed MetricRepresentationClearly, the shortest-path distance between different pairs of vertices arenot mutually independent in G(n, p). Thus, the analysis of k-metric antidi-mension is much more difficult than the analysis of the size of cliques andindependent sets in [JLR00]. To overcome the difficulty from the correlationamong distances, we introduce the concept of a relaxed metric representationand establish bounds on the size of k-antiresolving sets under the relaxedmetric representation. Then, we convert these bounds to the bounds on thesize of k-antiresolving sets under the standard metric representation, by tak-ing into the consideration of an observation on the diameter of the randomgraph G(n, p).In the relaxed metric representation, we use the relaxed shortest-pathdistance instead of the shortest-path distance.Definition 5.1. Given two vertices vi, vj in G(n, p), we define the relaxedshortest-path distance dij asdij ={1 : vi and vj are adjacent,∗ : otherwise.305.2. The First Bound on the Size of k-Antiresolving Sets5.2 The First Bound on the Size ofk-Antiresolving SetsFor a set S of vertices and a vertex v /∈ S, we denote the relaxed metricrepresentation of v with respect to S by r∗(v|S). Also, we denote the familyof metric representations and relaxed metric representations with respect toS by RS and R∗S .Given a G(n, p) where p is constant, letpm = min(p, 1− p),Cα = min[α22, (1 + α) ln(1 + α)− α], = ln[ (2 + β)Cα ln(1pm)ln2(n)] · [ln(n)]−1where α, β are arbitrary positive constants. We have the following theorem.Theorem 5.2. Given a random graph G(n, p) where p is constant, w.h.p.there is no k-antiresolving set S where k ∈ O(1) satisfying|S| ≤ (1− ) log 1pm(n)where pm = min(p, 1− p).Proof. Let S be such a subset of vetices in G(n, p) that|S| ≤ (1− ) log 1pm(n).Given a vertex v /∈ S and a relaxed metric representation r′∗ ∈ R∗S , we defineI∗v (r′∗, S) ={1 : r∗(v|S) = r′∗,0 : otherwiseandX∗r′∗(S) =∑v/∈SI∗v (r′∗, S).By the definition of G(n, p), the relaxed shortest-path distances between vand vertices in S are mutually independent. Therefore,E(I∗v (r′∗, S)) = pβ1(r′∗) · (1− p)|S|−β1(r′∗),E(X∗r′∗(S)) ≥ (n− |S|) · p|S|m315.2. The First Bound on the Size of k-Antiresolving Setswhere β1(r′∗) is the number of 1 in r′∗. By the assumptions of α and ,(n− |S|) · p|S|m ≥ n − |S| · n−1,n =(2 + β)Cα ln(1pm)ln2(n).Thus,min(E(X∗r′∗(S))) ∈ Ω(ln2(n)).We define E∗(r′∗, S) as the event|X∗r′∗(S)− E(X∗r′∗(S))| ≤ αE(X∗r′∗(S)).By Chernoff Bound, see Corollary A.1.14 in [JLR00],Pr{∼E∗(r′∗, S)} = Pr{|X∗r′∗(S)− E(X∗r′∗(S))| ≥ αE(X∗r′∗(S))}≤ 2e−CαE(X∗r′∗(S)).Let r′ be a metric representation with respect to S where the correspondingrelaxed representation of r′ is r′∗. We defineIv(r′, S) ={1 : r(v|S) = r′,0 : otherwiseandXr′(S) =∑v/∈SIv(r′, S).Let E(r′, S) be the event|Xr′(S)− E(X∗r′∗(S))| ≤ αE(X∗r′∗(S)).Let D≤2 be such the event that the diameter of G(n, p) is less than or equalto 2. We can rewrite the event∼E(r′, S) as(∼E(r′, S) ∩D≤2) ∪ (∼E(r′, S) ∩∼ D≤2).Note that the event∼E(r′, S) ∩D≤2 is exactly the event ∼E∗(r′∗, S) ∩D≤2.Therefore,Pr{∼E(r′, S)} = Pr{∼E∗(r′∗, S) ∩D≤2}+ Pr{∼E(r′, S) ∩∼ D≤2}≤ Pr{∼E∗(r′∗, S)}+ Pr{∼D≤2}.325.3. The Second Bound on the Size of k-Antiresolving SetsNote that two vertices have the shortest-path distance greater than 2 ifand only if they have no common neighbor. Let X be the number of pairsof vertices in G(n, p) that they have no common neighbor. By Markov’sinequality,Pr{∼D≤2} = Pr{X > 0} ≤ E(X) =(n2)(1− p2)n−2.Let E(S) be the event ⋂r′∈RSE(r′, S).By the union bound,Pr{E(S)} ≥ 1− n|S| ·[2e−Cα(n−|S|)·p|S|m +(n2)(1− p2)n−2].Note that (nk)≤ nkk!≤ nk√2pik(k/e)k≤ nk(k/e)k= (enk)k.LetA = |{S : S ⊂ V (G), |S| ≤ (1− ) log 1pm(n),∼E(S)}|.Then,limn→∞E(A) = limn→∞(n|S|)Pr{∼E(S)}≤ limn→∞(en|S|)|S| · n|S| ·[2e−Cα(n−|S|)·p|S|m +(n2)(1− p2)n−2]≤ limn→∞ 2 ·{en2|S| ·[eCαn−1n2+β+((n2)(1− p2)n−2) 1|S|]}|S|= 0.By Markov’s inequality, we get that limn→∞Pr{A = 0} = 1.5.3 The Second Bound on the Size ofk-Antiresolving SetsTheorem 5.3. Given a random graph G(n, p) with constant p, w.h.p. thereis no k-antiresolving set S where k ∈ ω(1) satisfying|S| ≥ log 12p2−2p+1(n).335.3. The Second Bound on the Size of k-Antiresolving SetsProof. We use Iv(r′, S) as the definition in the above theorem, as well aspm. Let S be such a subset of vetices in G(n, p) that|S| ≥ log 12p2−2p+1(n).We consider two cases of |S|:1. |S| ∈ Θ(n);2. |S| ∈ o(n).In Case 1,Pr{Iv(r′, S) = 1} ≤ (1− pm)|S|.Let E(r′, S) be the event ∑v/∈SIv(r′, S) ∈ ω(1).Thus,Pr{E(r′, S)} ≤(nω(1))(1− pm)|S|ω(1).Let E(S) be the event that S is a k-antiresolving set where k ∈ ω(1). ByFre´het inequalities,Pr{E(S)} = Pr{⋂r′∈RSE(r′, S)} ≤(nω(1))(1− pm)|S|ω(1).LetA = |{S : S ⊂ V (G), |S| ∈ Θ(n), E(S)}|.By the notation of Θ, there are such a n0 and a positive constant c1 thatwhen n > n0, then c1n ≤ |S|. Thus,limn→∞E(A) = limn→∞(n|S|)Pr{E(S)}≤ limn→∞[neω(1)( 11−pm )|S|2]ω(1)·[ne|S|( 11−pm )ω(1)2]|S|≤ limn→∞[ 1c1e( 11−pm )ω(1)2]|S|= 0which leads to limn→∞Pr{A = 0} = 1.345.3. The Second Bound on the Size of k-Antiresolving SetsIn Case 2, we claim the following fact: if |RS | ∈ Θ(n− |S|), then S is ak-antiresolving set of constant k. If not, for any r′ ∈ RS , we have∑v/∈SIv(r′, S) ∈ ω(1).Therefore, by Proposition A.7, the number of vertices /∈ S belongs toω(1) ·Θ(n− |S|) ∈ ω(n− |S|).This conclusion contradicts that the number of vertices /∈ S is exactly n−|S|.In Case 2, we can prove that E(|R∗S |) ∈ Θ(n− |S|)). Let r′∗ be a relaxedmetric representation in R∗S . We define IS(r′∗) asIS(r′∗) ={1 : ∃v /∈ S where r∗(v|S) = r′∗,0 : otherwise.Then,Pr{IS(r′∗) = 1} = 1−[1− pβ1(r′∗)(1− p)|S|−β1(r′∗)]n−|S|. (5.1)We denotepβ1(r′∗)(1− p)|S|−β1(r′∗)by Φ. By ex ≥ 1 + x, we have(1− x)n ≤ e−nxfor n > 0. By e−x ≤ 1− x+ x2/2 for x > 0, we have1− (1− x)n ≥ nx− (nx)22for x, n > 0. Applying this inequality to (5.1), we getPr{IS(r′∗) = 1} ≥ [Φ · (n− |S|)]−[Φ · (n− |S|)]22which leads toE(|R∗S |) ≥|S|∑β1(r′∗)=0( |S|β1(r′∗)){[Φ · (n− |S|)]− [Φ · (n− |S|)]22}≥ (n− |S|) · ( |S|∑β1(r′∗)=0Φ)− (n− |S|)22· ( |S|∑β1(r′∗)=0Φ2)= (n− |S|)− (n− |S|)22(2p2 − 2p+ 1)|S|≥ (n− |S|)− 12n2(2p2 − 2p+ 1)|S| − 12|S|2(2p2 − 2p+ 1)|S|.355.3. The Second Bound on the Size of k-Antiresolving SetsAs the assumption|S| ≥ log 12p2−2p+1(n),we getE(|R∗S |) ∈ Ω(n− |S|).Obviously,|R∗S | ≤ n− |S|.Then,E(|R∗S |) ∈ Θ(n− |S|)).To follow Azuma-Hoeffding inequality, we define mutually independentrandom variablesZv1 , ..., Zvn−|S|where v1, ..., vn−|S| /∈ S and Zvi = r∗(vi|S). Also, we define a functionf(Zv1 , ..., Zvn−|S|) = |R∗S |.If two vectors z, z′ ∈∏n−|S|i=1 r∗(vi|S) are different with only one coordinate,|f(z)− f(z′)| ≤ 1.Let α be an arbitary positive constant. By Azuma-Hoeffding inequality, seeCorollary 2.27 in [JLR00],Pr{∣∣|R∗S | − E(|R∗S |)∣∣ ≥ αE(|R∗S |)} ≤ 2e− (αE(|R∗S |))22(n−|S|) .AsE(|R∗S |) ∈ Θ(n− |S|),there are such a n0 and a positive constant c1 that when n > n0, thenc1(n− |S|) ≤ E(|R∗S |).Let E(S) be the event|RS | ∈ Θ(n− |S|)and E∗(S) be the event|R∗S | ∈ Θ(n− |S|).Clearly,|RS | ≥ |R∗S |.365.4. The Third Bound on the Size of k-Antiresolving SetsThus,Pr{∼E(S)} ≤ Pr{∼E∗(S)}which meanslimn→∞Pr{∼E(S)} ≤ 2e−Cα(n−|S|)where Cα = (αc1)2/2. LetA = |{S : S ⊂ V (G), |S| ≥ log 12p2−2p+1(n), |S| ∈ o(n),∼ E(S)}|.Then,limn→∞E(A) = limn→∞(n|S|)Pr{∼E(S)}≤ limn→∞ 2 ·( ne|S|)|S|e−Cα(n−|S|)= limn→∞ 2 ·[ e · n|S|eCα(n|S|−1)]|S|= 0.Thus, we have that limn→∞Pr{A = 0} = 1.5.4 The Third Bound on the Size ofk-Antiresolving SetsTheorem 5.4. Given a random graph G(n, p) with constant p, w.h.p. thereis at least one k-antiresolving set S where k ∈ O(1) satisfying|S| ≥ log 1pm(n)where pm = min(p, 1− p).Proof. As shown in the above theorem, we only need to consider the case|S| ∈ Θ(log 1pm(n)).We define rS∗ as such the relaxed metric representation with respect to SthatPr{I∗v (rS∗ , S) = 1} = p|S|m .Let E(S) be the event that S is a k-antiresolving set where k ∈ O(1) andE′(S) be the event that ∑v/∈SI∗v (rS∗ , S) = c375.4. The Third Bound on the Size of k-Antiresolving Setswhere c is constant. Clearly,E′(S) ⊆ E(S).LetA = |{S : S ⊂ V (G), |S| ≥ log 1pm(n), |S| ∈ Θ(log 1pm(n)), E′(S)}|.Then,E(A) =(n|S|)Pr{E′(S)} =(n|S|+ c)p|S|cm (1− p|S|m )n−|S|−c.Note thatnk≤ n− 1k − 1for n ≥ k that leads to (nk)≥(nk)k.As the assumption that p|S|m ≤ 1/n, then(1− p|S|m )n−|S|−c ≥ (1−1n)n−|S|−c ≥ (1− 1n)n ≥ 14for n ≥ 2 due to (1− 1n)n increases as n increases. Thus,E(A) ≥ 14·( npc|S|+ c)|S| · ( n|S|+ c)c.Then,limn→∞E(A) =∞.To provelimn→∞Pr{A = 0} = 0,we first proveVar(A) ∈ o((E(A))2).Letdm ={1 : p = pm,∗ : otherwise.To apply Theorem 3.16, for two subsets of vertices S and S′, we considertwo cases:385.4. The Third Bound on the Size of k-Antiresolving Sets1. |S| = |S′|, S 6= S′, S ∩ S′ 6= ∅;2. |S| = |S′|, S 6= S′, S ∩ S′ = ∅.In Case 1, let y = |S∩S′|, vu be a vertex /∈ S′\S, and vt be a vertex ∈ S′\S,thenPr{dut = dm|E′(S)} = Pr{dut = dm, vu ∈ S|E′(S)}+ Pr{dut = dm, vu /∈ S|E′(S)}≤ |S|n− |S|+ y + pm.Thus,Pr{r∗(v|S′) = rS′∗ |E′(S)} ≤ Pr{r∗(v|(S′ \ S)) = r(S′\S)∗ |E′(S)}≤ ( |S|n− |S|+ y + pm)|S′|−y.Letfm(y) =|S|n− |S|+ y + pmand z be the number of such vertices v /∈ S ∪ S′ thatr∗(v|S) = rS∗ and r∗(v|S′) = rS′∗ .Then,Pr{E′(S′)|E′(S)} ≤(nc− z)[fm(y)]c(|S′|−y).Thus,∑S,S′Pr{E′(S′)|E′(S)} ≤c∑z=0|S|−1∑y=1(nc− z)(n|S| − y)[fm(y)]c(|S|−y).By changing variables z to c− z, y to |S| − y, we havefm(y) =|S|n− y + pm,∑S,S′Pr{E′(S′)|E′(S)} ≤c∑z=0|S|−1∑y=1(nz)(ny)[fm(y)]cy.395.4. The Third Bound on the Size of k-Antiresolving SetsAs y ∈ O(ln(n)), there exists such a n0 that when n > n0, then(ny)[fm(y)]cy increases as y increases.Thus,limn→∞maxy[(ny)[fm(y)]cy]= limn→∞(n|S| − 1)[fm(|S| − 1)]c(|S|−1).Then, as the argument preceding in Theorem 3.16, we know∑S,S′Cov(S, S′) ≤ E(A) ·{c · nc · |S| ·(n|S| − 1)· [fm(|S| − 1)]c(|S|−1)}.Note that (nk)=nkk!· (1− 1n) · · · (1− k − 1n)≥ nkk!(1− k − 1n)k−1.As (1− (k − 1)/n)k−1 decreases as k increases, if k ≤ √n, then(1− k − 1n)k−1 ≥ (1− 1√n)√n ≥ 14.Thus, for k ≤ √n, we have (nk)≥ nk4k!.Therefore,E(A) ≥ n|S|+c16(|S|+ c)! · pc|S|m .Then,limn→∞∑S,S′ Cov(S, S′)[E(A)]2≤ limn→∞16 · c · |S| · (|S|+ c)1+c · (1 + |S|n−|S|+1 · 1pm )c(|S|−1)pcm · n= 0.In Case 2, let V ′ be the set of vertices v /∈ S′ where r∗(v|S′) = rS′∗ . Underthe case V ′ ∩ S = ∅, E′(S) and E′(S′) are independent events. Thus, thecorresponding covariance of E′(S) and E′(S′) is 0. If V ′ ∩ S 6= ∅, we definez = |V ′ ∩ S|.405.5. A Range of k-Metric Antidimension Where k ∈ O(1)Then,Pr{E′(S′)|E′(S)} ≤c∑z=1(|S|z)(n− 2|S|c− z)[fm(0)]c|S|.Therefore, in Case 2,∑S,S′Cov(S, S′) ≤ E(A) ·{( n|S|)· c · |S|c · nc−1 · [fm(0)]c|S|}.Hence,limn→∞∑S,S′ Cov(S, S′)[E(A)]2≤ limn→∞n|S||S|! · c · |S|c · nc−1 · [fm(0)]c|S|n|S|+c16(|S|+c)! · pc|S|m≤ limn→∞16c · |S|c · (|S|+ c)c · [1 + |S|n−|S| · 1pm ]c|S|n= 0.The observations from Case 1 and 2 lead to the following conclusion:Var(A) ∈ o([E(A)]2).Thus, by Theorem 3.15,limn→∞Pr{A = 0} = 0.As E′(S) ⊆ E(S), we knowlimn→∞Pr{⋃SE(S)}≥ limn→∞Pr{⋃SE′(S)}= limn→∞Pr{A > 0}= 1.5.5 A Range of k-Metric Antidimension Wherek ∈ O(1)By Theorem 5.2 and 5.4, we get the following corollary.415.5. A Range of k-Metric Antidimension Where k ∈ O(1)Corollary 5.5. Given a random graph G(n, p) with constant p, w.h.p.adimk(G) where k ∈ O(1) is between(1− ) log 1pm(n) and log 1pm(n)wherepm = min(p, 1− p), = ln[ (2 + β)Cα ln(1pm)ln2(n)] · [ln(n)]−1,Cα = min[α22, (1 + α) ln(1 + α)− α],α, β are arbitary positive constants.42Chapter 6ConclusionIn this thesis, we prove that the problem of computing adimk(G) is NP-hard and the (k, l)-anonymity problem is NP-complete. Also, to study therelationship between k and l, we establish three bounds on the size of k-antiresolving sets in G(n, p) with constant p. With the bounds, we establisha range of k-metric antidimension where k ∈ O(1) in random graphs G(n, p)with constant p when n tends to infinity. To some extent, Corollary 5.5shows that, when the size of an anonymized social network increases, anadversary can keep a probability which is independent on the network size tore-identify other anonymized users by adding only a few controlled vertices.Futhermore, for applications of checking whether an anonymized socialnetwork has the potential danger to leak user’s privacy, the three boundscan be seen as indicators if an adversary controls such many vertices.For the future study, we are looking for an exact algorithm better thanthe brute force algorithm to find a k-antiresolving basis in a simple connectedgraph. Besides that, a gap of Θ(ln(ln(n))) exists between the first boundand the third bound. We conjecture that log 1pm(n) is the sharp thresholdfor the appearance of k-antiresolving sets with constant k in G(n, p). Forthe second bound, we also conjecture that it has a lower value.Moreover, how to efficiently add noise (e.g., fake vertices and edges) intoanonymized social networks to against privacy attacks is another interestingtopic.43Bibliography[AB09] S. Arora and B. Barak. Computational Complexity: A ModernApproach. Cambridge University Press, New York, NY, USA,1st edition, 2009. → pages 12[Ale11] K. Aleksandra. Privacy violations using microtargeted ads: Acase study. Journal of Privacy and Confidentiality, 3(1), 2011.→ pages 4[Als11] G. Alsmeyer. Chebyshev’s Inequality, pages 239–240. SpringerBerlin Heidelberg, Berlin, Heidelberg, 2011. → pages 16[And96] R. Anderson. A security policy model for clinical informationsystems. In Proceedings of the 1996 IEEE Conference on Secu-rity and Privacy, SP’96, pages 30–43, Washington, DC, USA,1996. IEEE Computer Society. → pages 4[AS00a] R. Agrawal and R. Srikant. Privacy-preserving data mining.SIGMOD Rec., 29(2):439–450, 2000. → pages 5[AS00b] N. Alon and J. Spencer. The Probabilistic Method. WileyPublishing, 2th edition, 2000. → pages 17, 18, 55, 56, 57, 58,59[BDK07] L. Backstrom, C. Dwork, and J. Kleinberg. Wherefore artthou r3579x?: Anonymized social networks, hidden patterns,and structural steganography. In Proceedings of the 16th Inter-national Conference on World Wide Web, WWW ’07, pages181–190, New York, NY, USA, 2007. ACM. → pages 6[BJKL04] S. Bellman, E. Johnson, S. Kobrin, and G. Lohse. Internationaldifferences in information privacy concerns: A global surveyof consumer. The Information Society, 20:313–324, 2004. →pages 4[Buc04] J. Buchmann. Introduction to Cryptography (2nd ed.).Springer, 2004. → pages 744Bibliography[CCSZ13] B. Custers, T. Calders, B. Schemer, and T. Zarsky. Dis-crimination and Privacy in the Information Society, volume 3.Springer Berlin Heidelberg, 2013. → pages 5[CDM+16] T. Chatterjee, B. DasGupta, N. Mobasheri, V. Srinivasan,and I.G. Yero. On the computational complexities of threeprivacy measures for large networks under active attack.arXiv:1607.01438, 2016. → pages 21, 29[CJW+06] J. Carlson, A. Jaffe, A. Wiles, Clay Mathematics Institute,and American Mathematical Society. The Millennium PrizeProblems. American Mathematical Society, 2006. → pages 10[CLRS09] T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introductionto Algorithms, Third Edition. The MIT Press, 3rd edition,2009. → pages 10, 11, 12, 16, 51, 52, 53[Coo71] A. Cook. The complexity of theorem-proving procedures. InProceedings of the Third Annual ACM Symposium on Theoryof Computing, STOC ’71, pages 151–158, New York, NY, USA,1971. ACM. → pages 11[Cop04] J. Copeland. The Essential Turing: Seminal Writings in Com-puting, Logic, Philosophy, Artificial Intelligence, and ArtificialLife: Plus the Secrets of Enigma. Oxford University Press,2004. → pages 7[CP02] W. Chung and J. Paynter. Privacy issues on the internet. InProceedings of the 35th Hawaii International Conference onSystem Sciences. IEEE Computer Society Press, 2002.→ pages4[Dal77] T. Dalenius. Towards a methodology for statistical disclosurecontrol. Statistik Tidskrift, 15(429-444):2–1, 1977. → pages 5[Dev02] K. Devlin. The Millennium Problems: The Seven GreatestUnsolved Mathematical Puzzles of Our Time. 2002. → pages11[DL17] D. Danks and A. London. Algorithmic bias in autonomous sys-tems. In Proceedings of the 26th International Joint Conferenceon Artificial Intelligence. 2017. → pages 545Bibliography[DMNS06] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibratingnoise to sensitivity in private data analysis. In Proceedingsof the Third Conference on Theory of Cryptography, TCC’06,pages 265–284, Berlin, Heidelberg, 2006. Springer-Verlag. →pages 5[Dwo06] C. Dwork. Differential privacy. In Proceedings of the 33rdInternational Conference on Automata, Languages and Pro-gramming - Volume Part II, ICALP’06, pages 1–12, Berlin,Heidelberg, 2006. Springer-Verlag. → pages 5[EGS03] A. Evfimievski, J. Gehrke, and R. Srikant. Limiting privacybreaches in privacy preserving data mining. In Proceedingsof the Twenty-second ACM SIGMOD-SIGACT-SIGART Sym-posium on Principles of Database Systems, PODS ’03, pages211–222, New York, NY, USA, 2003. ACM. → pages 5[EMRVY13] A. Estrada-Moreno, J. Rodr´ıguez-Vela´zquez, and I.G. Yero.The k-metric dimension of a graph. ArXiv e-prints, 2013. →pages 20[ER59] P. Erdo˝s and A. Re´nyi. On random graphs i. PublicationesMathematicae (Debrecen), 6:290–297, 1959. → pages 16[Gil59] E. Gilbert. Random graphs. Ann. Math. Statist., 30:1141–1144, 12 1959. → pages 16[GJ79] M. Garey and D. Johnson. Computers and Intractability; AGuide to the Theory of NP-Completeness. W. H. Freeman &Co., New York, NY, USA, 1979. → pages 12[GM84] S. Goldwasser and S. Micali. Probabilistic encryption. Jour-nal of Computer and System Science, 28(2):270–299, 1984. →pages 8, 9[Hea] Heartbleed [online, cited June 26, 2017]. → pages 1[Hol09] J. Holvast. History of privacy. In IFIP Advances in Informationand Communication Technology, volume 298. Springer, Berlin,Heidelberg, 2009. → pages 4[JLR00] S. Janson, T.  Luczak, and A. Rucin´ski. Random Graphs. Wiley,2000. → pages 30, 32, 36, 64, 6646Bibliography[JS05] H. Jones and J. Soltren. Facebook: Threats to privacy, 2005.→ pages 4[KRR96] S. Khuller, B. Raghavachari, and A. Rosenfeld. Landmarks ingraphs. Discrete Applied Mathematics, 70(3):217–229, 1996.→ pages 20[LB11] Z. Lin and Z. Bai. Probability Inequalities. Springer, Berlin,Heidelberg, 2011. → pages 19[Les] J. Leskovec. Stanford large network dataset collection [online,cited June 2, 2017]. → pages 6[LP87] D. Luciano and G. Prichett. Cryptology: From caesar ciphersto public-key cryptosystems. The College Mathematics Jour-nal, 18:2–17, 1987. → pages 7[LT08] K. Liu and E. Terzi. Towards identity anonymization ongraphs. In Proceedings of the 2008 ACM SIGMOD Inter-national Conference on Management of Data, SIGMOD ’08,pages 93–106, New York, NY, USA, 2008. ACM. → pages 6[Mat] J. Mathai. History of computer cryptography and secrecy sys-tems [online, cited May 30, 2017]. → pages 6[McL05] D.L. McLeish. Monte Carlo Simulation and Finance. WileyFinance. Wiley, 2005. → pages 60, 61, 62, 63[McS09] F. McSherry. Privacy integrated queries. In Proceedings ofthe 2009 ACM SIGMOD International Conference on Man-agement of Data (SIGMOD). Association for Computing Ma-chinery, Inc., June 2009. → pages 6[MOV01] A. Menezes, P. Oorschot, and S. Vanstone. Handbook of Ap-plied Cryptography (5th ed.). CRC Press, 2001. → pages 7[MTRX16] S. Mauw, R. Trujillo-Rasua, and B. Xuan. Counteracting Ac-tive Attacks in Social Network Graphs, pages 233–248. SpringerInternational Publishing, Cham, 2016. → pages 22[NHP11] M. Netter, S. Herbst, and G. Pernul. Analyzing privacy insocial networks - an interdisciplinary approach. In Proc. ofthe Third IEEE International Conference on Social ComputingWorkshop on Security and Privacy in Social Networks (SPSN47Bibliographyat SocialCom). IEEE Computer Society Press, Boston, USA,October 2011. → pages 4, 6[PLZW14] W. Peng, F. Li, X. Zou, and J. Wu. A two-stage deanonymiza-tion attack against anonymized social networks. IEEE Trans.Comput., 63(2):290–303, 2014. → pages 6[PRT08] D. Pedreshi, S. Ruggieri, and F. Turini. Discrimination-awaredata mining. In Proceedings of the 14th ACM SIGKDD Inter-national Conference on Knowledge Discovery and Data Min-ing, KDD ’08, pages 560–568. ACM, New York, NY, USA,2008. → pages 5[Riv90] R. Rivest. Handbook of theoretical computer science (vol. a).chapter Cryptography, pages 617–755. MIT Press, Cambridge,MA, USA, 1990. → pages 7[RSA78] R. Rivest, A. Shamir, and L. Adleman. A method for obtainingdigital signatures and public-key cryptosystems. Communica-tions of the ACM, 21(2):120–126, 1978. → pages 8[SANT14] K. Sellami, M. Ahmed-Nacer, and P. Tiako. From socialnetwork to semantic social network in recommender system.ArXiv e-prints, 2014. → pages 6[Sha49] C. Shannon. Communication theory of secrecy systems. BellSystem Technical Journal, 28(4):656–715, 1949. → pages 7[SIN] Heartbleed bug: RCMP asked revenue canada to delay newsof sin thefts [online, cited June 26, 2017]. → pages 1[SRN] A. Sokol and A. Rønn-Nielsen. Advanced Probability. Depart-ment of Mathematical Sciences, University of Copenhagen. →pages 60[Sta05] W. Stallings. Cryptography and Network Security Principlesand Practices, Fourth Edition. Prentice Hall, New Jersey, 2005.→ pages 8[Swe02] L. Sweeney. K-anonymity: A model for protecting privacy.Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 10(5):557–570, 2002. → pages 248Bibliography[Toc] A. Tockar. Riding with the stars: Passenger privacy in theNYC taxicab dataset [online, cited June 26, 2017]. → pages 1[TRY16a] R. Trujillo-Rasua and I.G. Yero. Characterizing 1-metric an-tidimensional trees and unicyclic graphs. The Computer Jour-nal, 59(8):1264–1273, 2016. → pages 22[TRY16b] R. Trujillo-Rasua and I.G. Yero. k-metric antidimension:A privacy measure for social graphs. Information Sciences,328:403–417, 2016. → pages iii, 2, 20, 21[Tur36] Alan M. Turing. On computable numbers, with an applica-tion to the Entscheidungsproblem. Proceedings of the LondonMathematical Society, 2(42):230–265, 1936. → pages 12[WB90] S. Warren and L. Brandeis. The right to privacy. Harvard LawReview, 4(5):193–220, December 1890. → pages 4[WW96] L. Willenborg and T. Waal. Statistical Disclosure Control inPractice, volume 111. Springer-Verlag, 1996. → pages 4[WYLC10] X. Wu, X. Ying, K. Liu, and L. Chen. A Survey of Privacy-Preservation of Graphs and Social Networks, pages 421–453.Springer US, Boston, MA, 2010. → pages 6[ZG17] C. Zhang and Y. Gao. On the complexity of k-metric antidi-mension problem and the size of k-antiresolving sets in randomgraphs. In Proceedings of Computing and Combinatorics 23rdInternational Conference, pages 555–567, Hong Kong, China,2017. Springer International Publishing AG 2017. → pages iv[ZP08] B. Zhou and J. Pei. Preserving privacy in social networksagainst neighborhood attacks. In Proceedings of the 2008 IEEE24th International Conference on Data Engineering, ICDE ’08,pages 506–515, Washington, DC, USA, 2008. IEEE ComputerSociety. → pages 6[ZPL08] B. Zhou, J. Pei, and W. Luk. A brief survey on anonymizationtechniques for privacy preserving publishing of social networkdata. SIGKDD Explor. Newsl., 10(2):12–22, 2008. → pages 649Appendix50Appendix AThe Asymptotic NotationsThe asymptotic notations describe different sets of functions which helpabstract away some details of functions. We use the asymptotic notationsto analyze the worst-case running time of algorithms or the behavior offunctions. For example, we say an2+bn and cn2 belong to the same set wherea, b, and c are constants. Because when n is big enough, n2 becomes thedominated factor of these two polynomials. Below is the formal definitionsof asymptotic notations in Chapter 3 of [CLRS09].A.1 Θ-NotationDefinition A.1. [CLRS09](Θ-notation). For a given function g(n), wedenote the set of funcions by Θ(g(n)) whereΘ(g(n)) = {f(n) : there exist such positive constants c1, c2, and n0 that0 ≤ c1g(n) ≤ f(n) ≤ c2g(n) for all n ≥ n0}.Because Θ(g(n)) is a set, we could use f(n) ∈ Θ(g(n)) to indicate afunction f(n) belongs to the set Θ(g(n)). If f(n) ∈ Θ(g(n)), we know thereexist such c1, c2, and n0 that f(n) is sandwiched between c1g(n) and c2g(n).Then we can say g(n) is an asymptotically tight bound for f(n). Moreover,Θ-notation has the symmetry property:f(n) ∈ Θ(g(n)) if and only if g(n) ∈ Θ(f(n)).This property can be proved directly from the definition of Θ-notation.Proof. If f(n) ∈ Θ(g(n)), there exist such positive constants c1, c2, and n0that0 ≤ c1g(n) ≤ f(n) ≤ c2g(n)when n ≥ n0. Then when n ≥ n0, we have0 ≤ 1c2f(n) ≤ g(n) ≤ 1c1f(n).As c1 and c2 are constants, 1/c1 and 1/c2 are also constants. Based on thedefinition of Θ-notation, we know g(n) ∈ Θ(f(n)).51A.2. O-Notation and Ω-NotationA.2 O-Notation and Ω-NotationWhen describing an asymptotic upper bound, we use O-notation.Definition A.2. [CLRS09](O-notation). For a given function g(n), wedenote the set of funcions by O(g(n)) whereO(g(n)) = {f(n) : there exist such positive constants c and n0 that0 ≤ f(n) ≤ cg(n) for all n ≥ n0}.Just like an asymptotic upper bound, we denote an asymptotic lowerbound by Ω-notation.Definition A.3. [CLRS09](Ω-notation). For a given function g(n), we de-note the set of funcions by Ω(g(n)) whereΩ(g(n)) = {f(n) : there exist such positive constants c and n0 that0 ≤ cg(n) ≤ f(n) for all n ≥ n0}.From the definitions of asymptotic notations Θ, O, and Ω, we can provethe following theorem in [CLRS09].Theorem A.4. [CLRS09] For two functions f(n) and g(n), we have f(n) ∈Θ(g(n)) if and only if f(n) ∈ O(g(n)) and f(n) ∈ Ω(g(n)).Proof. If f(n) ∈ Θ(g(n)), we know there exist such positive constants c1, c2,and n0 that0 ≤ c1g(n) ≤ f(n) ≤ c2g(n)when n ≥ n0. Then we have f(n) ∈ O(g(n)) because there exist suchpositive constants c2 and n0 that0 ≤ f(n) ≤ c2g(n)when n ≥ n0. Similarly, as there exist such positive constants c1 and n0that0 ≤ c1g(n) ≤ f(n)when n ≥ n0, we know f(n) ∈ Ω(g(n)).If f(n) ∈ O(g(n)) and f(n) ∈ Ω(g(n)), there exist such positive constantsc2 and n2 that0 ≤ f(n) ≤ c2g(n)52A.3. o-Notation and ω-Notationwhen n ≥ n2, and there exist such positive constants c1 and n1 that0 ≤ c1g(n) ≤ f(n)when n ≥ n1. Let n0 = max(n1, n2), then0 ≤ c1g(n) ≤ f(n) ≤ c2g(n)when n ≥ n0.A.3 o-Notation and ω-NotationIn some cases, O-notation may be not asymptotically tight. As theexample in [CLRS09], O(n2) is asymptotically tight for 2n2, but it is notasymptotically tight for 2n. We denote an upper bound but not an asymp-totically upper bound by o-notation.Definition A.5. [CLRS09](o-notation). For a given function g(n), we de-note the set of funcions by o(g(n)) whereo(g(n)) = {f(n) : for any positive constant c, there exists such a positiveconstant n0 that 0 ≤ f(n) < cg(n) for all n ≥ n0}.If f(n) ∈ o(g(n)), the function f(n) becomes insignificant compared withg(n) when n tends to infinity. By the definition of a limit in preliminarycalculus, we getlimn→∞f(n)g(n)= 0.Similarly, we use ω-notation to denote a lower bound but not an asymptot-ically lower bound.Definition A.6. [CLRS09](ω-notation). For a given function g(n), we de-note the set of funcions by ω(g(n)) whereω(g(n)) = {f(n) : for any positive constant c, there exists such a positiveconstant n0 such that 0 ≤ cg(n) < f(n) for all n ≥ n0}.If f(n) ∈ ω(g(n)), we knowlimn→∞f(n)g(n)=∞.The asymptotical notations have the following property.53A.3. o-Notation and ω-NotationProposition A.7. If a function f(n) = f1(n) · f2(n) where f1(n) ∈ ω(1)and f2(n) ∈ Θ(g(n)), then f(n) ∈ ω(g(n)).Proof. By f2(n) ∈ Θ(g(n)), we have that there exist such positive constantsc1 and n1 that when n ≥ n1, f2(n) ≥ c1 · g(n). For any positive constantc, we can get a new constant c2 = c/c1. Then by f1(n) ∈ ω(1), there existssuch a positive constant n2 that when n ≥ n2, f1(n) > c2.Let n0 = max(n1, n2). Then, for any positive constant c there existssuch a positive constant n0 that when n ≥ n0,f(n) = f1(n) · f2(n) > c · g(n).By the definition of ω-notation, we have f(n) ∈ ω(g(n)).54Appendix BThe Probabilistic MethodIn this chapter, we introduce Chernoff Bound and Azuma-Hoeffding In-equality and give the proofs.B.1 Chernoff BoundWe suppose that the distribution X follows the following assumption:X1, ..., Xn are mutually independent indicator random variable withPr[Xi = 1− pi] = piPr[Xi = −pi] = 1− piwherep1, ..., pn ∈ [0, 1].We setp =p1 + ...+ pnnX = X1 + ...+Xn.Lemma B.1. [AS00b] For real numbers α, β where |α| ≤ 1,cosh(β) + α sinh(β) ≤ eβ2/2+αβ.Proof. Clearly, if α = 1 or α = −1, the above inequality is true. Note thateβ2/2−|β| ≤ eβ2/2+αβandcosh(β) + α sinh(β) ≤ 2e|β|.Thus, if|β| ≥ 100,the inequality is true.55B.1. Chernoff BoundAlso, we know thatf(0, β) ≤ 0,f(α, 0) = 0wheref(α, β) = cosh(β) + α sinh(β)− eβ2/2+αβ.We suppose that when |α| ≤ 1, α 6= 0 and |β| ≤ 100, β 6= 0, there exist suchα, β that f(α, β) > 0. Therefore, f(α, β) has a positive global maximum inthe rangeR = {(α, β)||α| ≤ 1, α 6= 0, |β| ≤ 100, β 6= 0}.Setting the partial derivatives equal to zero, we get∂f(α, β)∂α= sinh(β)− βeβ2/2+αβ = 0∂f(α, β)∂β= sinh(β) + α cosh(β)− (α+ β)eβ2/2+αβ = 0.Thus,tanh(β) = βthat meansβ = 0which is a contradiction.Corollary B.2. [AS00b] For real numbers θ, λ where θ ∈ [0, 1],θeλ(1−θ) + (1− θ)e−λθ ≤ eλ2/8.Proof. Settingθ =1 + α2andλ = 2β,we get Corollary B.2 from Lemma B.1.Theorem B.3. [AS00b] For positive real numbers a,Pr{|X| ≥ a} ≤ 2e−2a2/n.56B.1. Chernoff BoundProof. From Corollary B.2, we knowE(eλXi) = pieλ(1−pi) + (1− pi)e−λpi ≤ eλ2/8.Thus,E(eλX) =n∏i=1E(eλXi) ≤ eλ2n/8.For λ > 0, we knowPr{X ≥ a} = Pr{eλX ≥ eλa}.Applying Markov’s inequality, we getPr{eλX ≥ eλa} ≤ E(eλX)eλa≤ eλ2n/8−λa.Setting λ = 4a/n to optimize the inequality, we getPr{X ≥ a} ≤ e−2a2/n.By symmetry, we getPr{X ≤ −a} ≤ e−2a2/n.Lemma B.4. [AS00b]E(eλX) ≤ e−λpn[peλ + (1− p)]n.Proof.E(eλX) =n∏i=1E(eλXi) =n∏i=1[pieλ(1−pi) + (1− pi)e−λpi ]= e−λpnn∏i=1[pieλ + (1− pi)].With λ fixed, the functionf(x) = ln(xeλ + 1− x) = ln[x(eλ − 1) + 1]is concave. Thus, by Jensen’s Inequality,n∑i=1f(pi) ≤ nf(p).57B.1. Chernoff BoundExponentiating both sides, we haven∏i=1[pieλ + (1− pi)] ≤ [peλ + (1− p)]n.Corollary B.5. [AS00b] For positive real numbers a, λ,Pr{X ≥ a} ≤ e−λpn[peλ + (1− p)]ne−λa.Proof.Pr{X ≥ a} = Pr{eλX ≥ eλa}.By Markov’s inequality,Pr{eλX ≥ eλa} ≤ E(eλX)eλa.Now applying Lemma B.4, we get Corollary B.5.Settingλ = ln(1 + a/pn),we get Corollary B.6 by using the fact thatea = e(a/n)n ≥ (1 + a/n)n,Corollary B.6.Pr{X ≥ a} ≤ ea−pn ln(1+a/pn)−a ln(1+a/pn).Plugging a = (β − 1)pn into Corollary B.6, we getCorollary B.7.Pr{X ≥ (β − 1)pn} ≤ (eβ−1β−β)pn.Lemma B.8. For positive real numbers a,Pr{X ≤ −a} ≤ e−a2/2pn.58B.1. Chernoff BoundProof. Let λ > 0. By the argument preceding in Lemma B.4, we getE(e−λX) ≤ eλpn[pe−λ + (1− p)]n.Thus,Pr{X ≤ −a} ≤ eλpn[pe−λ + (1− p)]ne−λa.We apply the inequality1 + µ ≤ eµvalid for all µ, and then we getpe−λ + (1− p) = 1 + p(e−λ − 1) ≤ ep(e−λ−1).Thus,Pr{X ≤ −a} ≤ eλpn+np(e−λ−1)−λa = enp(e−λ−1+λ)−λa.We employ the inequalitye−λ ≤ 1− λ+ λ2/2for λ > 0. Therefore,Pr{X ≤ −a} ≤ enpλ22−λa.We set λ = a/np to optimize the above inequality and getPr{X ≤ −a} ≤ e−a2/2pnas claimed.Note that Y = X + pn can be interpreted as the number of successes inn independent trials when the probability of success in the i-th trial is pi.Clearly,E(Xi) = E(X) = 0.Thus,E(Y ) = E(X) + np = np.Then, we can get the following result.Theorem B.9. [AS00b] Let Y be the sum of mutually independent indicatorrandom variables, µ = E(Y ). For all constants  > 0,Pr{|Y − µ| > µ} < 2e−cµ,where c is a constant only depending on .Proof. Applying Corollary B.7 and Lemma B.8 withc = min(− ln(e(1 + )−(1+)), 2/2),we get Theorem B.9.59B.2. Azuma-Hoeffding InequalityB.2 Azuma-Hoeffding InequalityGiven a probability space (Ω,F , P ), we define G as a subset of F andsuppose X as a random variable on F . There are two equivalent versionsof the definition of the conditional expectation of X with respect to G anddenoted by E(X|G).Definition B.10. [SRN] If1. Y is measurable to G and2. E(Y IA) = E(XIA) for all A ∈ G,thenY = E(X|G).Definition B.11. [McL05] Assume E(X2) < ∞. Then, a G-measurablerandom variable Y is the conditional expectation of X with respect to G ifE[(X − Y )2] = infZE(X − Z)2where the infimum (infimum=greatest lower bound) is over all G-measurablerandom variables.Theorem B.12. [McL05] There exists an almost surely unique E(X|G).We do not show the proof of the above theorem in here; in the followingAppendix B.2, we show some properties of E(X|G).Proposition B.13. [McL05] If X is G-measurable, then E(X|G) = X.Proof. Note thatE(X − Z)2 ≥ E(X −X)2 = 0.Then, the minimizing Z is X.Proposition B.14. [McL05] If G = {∅,Ω}, then E(X|G) = E(X).Proof. As G = {∅,Ω}, we know that only a constant random variable ismeasurable with respect to G. Let c be a constant, minimizingE[(X − c)2] = Var(X) + [E(X)− c]2leads to c = E(X).60B.2. Azuma-Hoeffding InequalityProposition B.15. [McL05] For any square integrable G-measurable ran-dom variable Z,E(ZX) = E(ZE(X|G)).Proof. We define a function of λ byg(λ) = E[(X − E(X|G)− λZ)2].By Definition B.11, at λ = 0, g(λ) is minimized at all real values of λ. Thus,g′(0) = 0. Plugging λ = 0 into the equation g′(λ) = 0, we getE[Z(X − E(X|G)] = 0which leads toE(ZX) = E[ZE(X|G)].Setting Z = 1, we getProposition B.16. [McL05]E(X) = E(E(X|G)).Proposition B.17. [McL05] If a G-measurable random variable Z satisfiesE[(X − Z)Y ] = 0 for all other G-measurable random variables Y , thenZ = E(X|G).Proof. As Z satisfies E[(X − Z)Y ] = 0 for all other G-measurable randomvariables Y , we consider Y = E(X|G)− Z.We define a function of λg(λ) = E[(X − Z − λY )2]= E[(X − Z)2 − 2λE[(X − Z)Y ] + λ2E(Y 2)]= E((X − Z)2) + λ2E(Y 2)≥ E((X − Z)2) = g(0).We knowg(1) = E[(X − E(X|G))2]should be the minimum of g(λ) which means g(0) = g(1). The uniquenessshown in Theorem B.12 leads to Z = E(X|G).61B.2. Azuma-Hoeffding InequalityProposition B.18. [McL05] If Z is G-measurable, thenE(ZX|G) = ZE(X|G).Proof. Let Y be an arbitary G-measurable random variable. As Y and Zare G-measurable random variables, ZY is G-measurable. By PropositionB.15,E(ZY X) = E[ZY E(X|G))]which meansE[(ZX − ZE(X|G))Y ] = 0.Thus, by Proposition B.17,E(ZX|G) = ZE(X|G).Proposition B.19. [McL05]E(X + Y |G) = E(X|G) + E(Y |G),E(cX + d|G) = cE(X|G) + d where c and d are constants .Proof. Let Z be an arbitary G-measurable random variable. By PropositionB.15,E[Z(X + Y − E(X|G)− E(Y |G))] = E[Z(X − E(X|G))]− E[Z(Y − E(Y |G))]= 0− 0 = 0Then, by Proposition B.17,E(X + Y |G) = E(X|G) + E(Y |G).With the similar argument, we can proveE(cX + d|G) = cE(X|G) + d.Proposition B.20. [McL05] If H ⊂ G, thenE[E(X|G)|H] = E(X|H).62B.2. Azuma-Hoeffding InequalityProof. Let Z be an arbitary H-measurable random variable. As H ⊂ G,then Z, E(X|H), and E[E(X|G)|H] are G-measurable.E[Z(E(X|H)− E[E(X|G)|H])] = E[ZE(X|H)− ZE[E(X|G)|H]]= E[E(ZX|H)− E[ZE(X|G)|H]]= E[E(ZX − ZE(X|G)|H)]= E(ZX − ZE(X|G))= 0.Then, by Proposition B.13 and B.17,E[E(X|H)|G] = E[E(X|G)|H],E[E(X|H)|G] = E(X|H).Proposition B.21. [McL05] If X ≤ Y , then E(X|G) ≤ E(Y |G).Proof. We suppose that there exist such ω ∈ Ω that{ω : E(Y (ω)−X(ω)|G) < 0} ⊂ G.Let > maxω(E(Y (ω)−X(ω)|G))whereω ∈ {ω : E(Y (ω)−X(ω)|G) < 0}be a negative constant. Then, for those ω, we give a new assignment ofE(Y (ω)−X(ω)|G) whereE(Y (ω)−X(ω)|G) = .Note that E(Y (ω)−X(ω)|G) is still G-measurable. Then, we get a lowerE[(Y −X)− E(Y −X|G)]2which is a contradiction.In the rest of Appendix B.2, we introduce Azuma-Hoeffding inequality.We begin by the definition of a martingale.63B.2. Azuma-Hoeffding InequalityDefinition B.22. [JLR00] Given a probability space (Ω,F ,P) and an in-creasing sequence of sub-ω-fieldsF0 = {∅, ω} ⊆ F1 ⊆ ... ⊆ Fn,a sequence of random variables X0, X1, ..., Xn (with finite expections) iscalled a martingale if for each k = 0, 1, ..., n− 1,E(Xk+1|Fk) = Xk.Proposition B.23. [JLR00] E(Xk+1) = E(Xk).Proof. Note that Xk is Fk-measurable,E(Xk) = E[E(Xk|Fk)].As E(Xk+1|Fk) = Xk,E(Xk) = E[E(Xk+1|Fk)].Thus,E[E(Xk+1 −Xk|Fk)] = 0.By Proposition B.16, we getE(Xk+1 −Xk) = 0.Theorem B.24. [JLR00] If (Xk)n0 is a martingale with Xn = X and X0 =E(X), and there exist such constants ck > 0 that|Xk −Xk−1| ≤ ckfor each k ≤ n, then, for every t > 0,Pr{X ≥ E(X) + t} ≤ exp(− t22∑nk=1 c2k),Pr{X ≤ E(X)− t} ≤ exp(− t22∑nk=1 c2k).Proof. We setYk = Xk −Xk−1,Sk =k∑i=1Yi = Xk −X0.64B.2. Azuma-Hoeffding InequalityFor any u > 0, by Markov’s inequality,Pr{X ≥ E(X) + t} = Pr{Sn ≥ t} ≤ e−utE(euSn).Because Sn−1 is a Fn−1-measurable random variable,E(euSn) = E[E(euSn |Fn−1)] = E[euSn−1E(euYn |Fn−1)]. (B.1)Now we use the fact: if a random variable Y satisfies |Y | ≤ a for somepositive a, then, for any u, by the convexity of euY ,euY ≤ a+ Y2aeua +a− Y2ae−ua = cosh(ua) +Yasinh(ua).Hence, by cosh(β) ≤ eβ2/2,euY ≤ eu2a2/2 + Yasinh(ua).Thus,E(euYn |Fn−1) ≤ eu2cn2/2 + E(Yn|Fn−1)cnsinh(ucn).By the definition of a martingale,E(Yn|Fn−1) = E(Xn −Xn−1|Fn−1)= E(Xn|Fn−1)− E(Xn−1|Fn−1)= E(Xn|Fn−1)−Xn−1= 0.Thus,E(euYn |Fn−1) ≤ eu2cn2/2.Plugging it back to (B.1), we getE(euSn) ≤ eu2cn2/2E(euSn−1).Iterating this inequality n times, we getE(euSn) ≤ eu2∑ni=1 ci2/2.Thus,Pr{X ≥ E(X) + t} ≤ e−uteu2∑ni=1 ci2/2.65B.2. Azuma-Hoeffding InequalitySetting u = t/∑ni=1 ci2, we getPr{X ≥ E(X) + t} ≤ exp(− t22∑nk=1 c2k).By the symmetry, we getPr{X ≤ E(X)− t} ≤ exp(− t22∑nk=1 c2k).Corollary B.25. [JLR00] Let Z1, ..., ZN be independent random variables,with Zk taking values in a set Λk. Assume a function f :Λ1 × Λ2 × ...× ΛN → Rsatisfies the following condition for some constants ck :if two vectors z, z′ ∈N∏i=1Λi differ only in the kth coordinate, then|f(z)− f(z′)| ≤ ck.Then, the random variable X = f(Z1, ...ZN ) satisfies, for any t ≥ 0,Pr{X ≥ E(X) + t} ≤ exp(− t22∑nk=1 c2k),Pr{X ≤ E(X)− t} ≤ exp(− t22∑nk=1 c2k).Proof. Let Fk be the σ-fields generated by Z1, ...ZK . We define Xk =E(f(Z1, ...ZN )|Fk), k = 0, 1, ..., N . Then,E(Xk+1|Fk) = E(E(f(Z1, ...ZN )|Fk+1)|Fk)= E(E(f(Z1, ...ZN )|Fk)|Fk+1)= E(Xk|Fk+1)= Xk.Thus, (Xk)N0 is a martingale, and X0 = E(X), XN = X. To apply TheoremB.24, we should prove that |Xk+1−Xk| ≤ ck+1. Let Z ′k+1 be an independent66B.2. Azuma-Hoeffding Inequalitycopy of Zk+1 and X′ = f(Z1, ..., Z ′k+1, ..., ZN ). Then,|Xk+1 −Xk| = |E(X|Fk+1)− E(X|Fk)|= |E(X|Fk+1)− E(X ′|Fk)| (by E(X|Fk) = E(X ′|Fk))= |E(X|Fk+1)− E(X ′|Fk+1)| (by E(X ′|Fk) = E(X ′|Fk+1))= |E(X −X ′|Fk+1)|≤ ck+1 (by |X −X ′| ≤ ck+1)The corollary now follows Theorem B.24.67


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items