UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Statistical signal processing on dynamic graphs with applications in social networks Hamdi, Maziyar 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2016_februrary_hamdi_maziyar.pdf [ 2.99MB ]
Metadata
JSON: 24-1.0223171.json
JSON-LD: 24-1.0223171-ld.json
RDF/XML (Pretty): 24-1.0223171-rdf.xml
RDF/JSON: 24-1.0223171-rdf.json
Turtle: 24-1.0223171-turtle.txt
N-Triples: 24-1.0223171-rdf-ntriples.txt
Original Record: 24-1.0223171-source.json
Full Text
24-1.0223171-fulltext.txt
Citation
24-1.0223171.ris

Full Text

Statistical Signal Processing on DynamicGraphs with Applications in Social NetworksbyMaziyar HamdiM.A.Sc., The University of British Columbia, 2010A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)December 2015c⃝Maziyar Hamdi 2015AbstractDue to the proliferation of social networks and their significant effects on our day-to-day activities,there has been a growing interest in modeling and analyzing behavior of agents in social networksover the past decade. The unifying theme of this thesis is to develop a set of mathematical theoriesand algorithmic tools for different estimation and sensing problems over graphs with applicationsto social networks.The first part of this dissertation is devoted to multi-agent Bayesian estimation and learningproblem in social networks. We consider a set of agents that interact over a network to estimate anunknown parameter called state of nature. As a result of the recursive nature of Bayesian modelsand the correlation introduced by the structure of the underlying communication graph, informationcollected by one agent can be mistakenly considered independent, that is, mis-information prop-agation, also known as data incest arises. This part presents data incest removal algorithms thatensure complete mitigation of the mis-information associated with the estimates of agents in twodifferent information exchange patterns: First, a scenario where beliefs (posterior distribution ofstate of nature) are transmitted over the network. Second, a social learning context where agentsmap their local beliefs into a finite set of actions and broadcast their actions to other agents. We alsopresent a necessary and sufficient condition on the structure of information flow graph to mitigatemis-information propagation.The second part of the thesis considers a Markov-modulated duplication-deletion random graphwhere at each time instant, one node can either join or leave the network; the probabilities of joiningor leaving evolve according to the realization of a finite state Markov chain. This part presents tworesults. First, motivated by social network applications, the asymptotic behavior of the degree dis-tribution is analyzed. Second, a stochastic approximation algorithm is presented to track empiricaldegree distribution as it evolves over time. The tracking performance of the algorithm is analyzedin terms of mean square error and a functional central limit theorem is presented for the asymptotictracking error.iiPrefaceThe work presented in this thesis is based on the research and development conducted in the Statis-tical Signal Processing Laboratory at the University of British Columbia (Vancouver). The researchwork presented in the chapters of this dissertation is performed by the author with feedback and as-sistance provided by Prof. Vikram Krishnamurthy. The author is responsible for writeup, problemformulation, research development, data analyses and numerical studies presented in this disserta-tion with frequent suggestions, technical and editorial feedback from Prof. Vikram Krishnamurthy.Hilbert space analysis of Chapter 4 are due in part to Prof. George Yin. The dataset for the psy-chology experiment presented in Chapter 3 is obtained from Prof. Alan Kingstone and Dr. GraydenSolman. For the psychology experiment, informed consent was obtained from all participants, andall experimental procedures and protocols were reviewed and approved by the University of BritishColumbia Behavioral Research Ethics Board (UBC BREB). The UBC BREB approval number is:H10-00527. The work presented in different chapters of this thesis has been appeared in severalpublications which are listed below. In these publications, all co-authors contributed to the editingof the manuscript.• The work of Chapter 2 has been presented in the following publications:– [Journal Paper] V. Krishnamurthy, M. Hamdi, Mis-information Removal Algorithmson Social Networks: Constrained Estimation on Random Graphs, IEEE Journal of Se-lected Topics in Signal Processing, vol:7, issue: 2, pp.333–346.– [Conference Paper]V. Krishnamurthy andM. Hamdi,Data Fusion and Mis-informationRemoval in Social Networks, IEEE conference on Information Fusion, 2012, Singapore,July 2012.• Materials in Chapter 3 have been appeared in the following publications and preprints forpossible publications:– [Book] V. Krishnamurthy, O. N. Gharehshiran, and M. Hamdi. Interactive Sensing andDecision Making in Social Networks. Now Publishers Inc., Hanover, MA, 2014.iiiPreface– [Journal Paper] M. Hamdi, V. Krishnamurthy Removal of data incest in multi-agentsocial learning in social networks, preprint: arXiv:1309.6687.– [Report] M Hamdi, G Solman, A Kingstone, V Krishnamurthy Social learning in ahuman society: An experimental study, preprint: arXiv:1408.5378.• The work of Chapter 4 has been presented in the following publications:– [Book] V. Krishnamurthy, O. N. Gharehshiran, and M. Hamdi. Interactive Sensing andDecision Making in Social Networks. Now Publishers Inc., Hanover, MA, 2014.– [Journal Paper] M. Hamdi, V. Krishnamurthy, G. Yin Tracking a Markov-ModulatedStationary Degree Distribution of a Dynamic Random Graph, IEEE Transaction on In-formation Theory, vol 60, issue 10, pp. 6609–6625.– [Conference Paper] M. Hamdi and V. Krishnamurthy, A Novel Use of Stochastic Ap-proximation Algorithms for Estimating Degree of Each Node in Social Networks, IEEEInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP’12),Kyoto, Japan, March 2012.– [Conference Paper]M.Hamdi, V. Krishnamurthy, The Asymptotic of Duplication-deletionRandom Graphs, accepted for publication in 44th Asilomar Conference on Signals, Sys-tems and Computers. Nov. 2010.• Although not being presented in this thesis, the discussion and results about social networksthat has appeared in the following journal article was inspired by the work presented in thisthesis:– [Journal Paper] A. Leshem, M. Hamdi, V. Krishnamurthy, Boundary Value Problemsin Consensus Networks, Submitted to IEEE Transactions on Signal Processing, arXivpreprint arXiv:1407.7170.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Bayesian Estimation over Social Networks . . . . . . . . . . . . . . . . . 31.1.2 Interactive Social Learning over Networks . . . . . . . . . . . . . . . . . 51.1.3 Tracking Degree Distribution of Social Networks . . . . . . . . . . . . . . 61.2 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.2.1 Bayesian Estimation over Social Networks . . . . . . . . . . . . . . . . . 91.2.2 Interactive Social Learning over Networks . . . . . . . . . . . . . . . . . 91.2.3 Tracking Degree Distribution of Social Networks . . . . . . . . . . . . . . 101.3 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.3.1 Bayesian Estimation over Social Networks . . . . . . . . . . . . . . . . . 121.3.2 Interactive Social Learning over Networks . . . . . . . . . . . . . . . . . 121.3.3 Tracking Degree Distribution of Social Networks . . . . . . . . . . . . . . 141.4 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15Part I Estimation and Learning Over Directed Acyclic Graphs . . . . . . . . . 182 Constrained Estimation Over Random Graphs . . . . . . . . . . . . . . . . . . . . . 19vTable of Contents2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.1.1 Chapter Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.1.2 Main Results and Organization of Chapter . . . . . . . . . . . . . . . . . 222.2 Modeling Information Flow in Social Networks . . . . . . . . . . . . . . . . . . . 222.2.1 Constrained Information Flow Protocol . . . . . . . . . . . . . . . . . . . 232.2.2 Benchmark Full Information Flow Protocol . . . . . . . . . . . . . . . . . 242.2.3 Modeling Time Evolution of the Information Flow . . . . . . . . . . . . . 252.3 Optimal Mis-information Propagation Removal Algorithm . . . . . . . . . . . . . 252.3.1 Optimal Combination Scheme in Constrained Information Flow Protocol . 262.4 Sub-optimal Mis-information Removal AlgorithmWithout Complete Knowledge ofInformation Flow Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.1 Sub-optimal Combination Scheme . . . . . . . . . . . . . . . . . . . . . 302.5 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.5.1 Example and Intuition on Theorems 2.2.1 and 2.3.2 . . . . . . . . . . . . 312.5.2 Numerical Examples Illustrating Alg. A in Estimation Problem (2.5) . . . 322.5.3 Numerical Examples Illustrating Alg. B in Estimation Problem (2.14) . . 342.6 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.7 Proof of Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.7.1 Proof of Theorem 2.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.7.2 Proof of Theorem 2.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.7.3 Proof of Theorem 2.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 Mis-information Management Problem in Social Learning Over Directed Graphs . 423.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.1.1 Social Learning Protocol on Network . . . . . . . . . . . . . . . . . . . . 433.1.2 Chapter Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.1.3 Main Results and Organization of Chapter . . . . . . . . . . . . . . . . . 473.2 Social Learning Over Social Networks . . . . . . . . . . . . . . . . . . . . . . . 483.2.1 Constrained Social Learning in Social Networks . . . . . . . . . . . . . . 493.3 Data Incest Removal Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.3.1 The Idealized Benchmark for Data Incest Free Social Learning in SocialNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.3.2 The Data Incest Free Belief in the Idealized Social Learning Protocol 2 . . 523.3.3 Data Incest Removal Algorithm for Problem (3.9) With Constrained SocialLearning Protocol 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.3.4 Discussion of Data Incest Removal in Social Learning . . . . . . . . . . . 573.4 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58viTable of Contents3.5 Psychology Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.5.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 653.5.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.6 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 713.7 Proof of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733.7.1 Proof of Lemma 3.2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733.7.2 Proof of Theorem 3.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 753.7.3 Proof of Theorem 3.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Part II Tracking Degree Distribution in Dynamic Social Networks . . . . . . . 784 Tracking a Markov Modulated Degree Distribution . . . . . . . . . . . . . . . . . . 794.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.1.1 Chapter Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.1.2 Main Results and Organization of Chapter . . . . . . . . . . . . . . . . . 814.2 Markov-modulated Dynamic Random Graph of Duplication-deletion Type . . . . 824.3 Asymptotic Degree Distribution Analysis for Non-Markov Modulated case . . . . 844.3.1 Fixed Size Random Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 844.3.2 Power Law Exponent for Infinite Duplication-deletion Random Graph . . 874.4 Tracking the Degree Distribution of the Fixed Size Markov-modulated RandomGraph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 884.4.1 Tracking Error of the Stochastic Approximation Algorithm . . . . . . . . 914.4.2 Limit System of Regime-Switching Ordinary Differential Equations . . . . 924.4.3 Scaled Tracking Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . 924.5 Estimating the Degree Distribution of Infinite Duplication-deletion Random Graphs 934.5.1 Infinite Random Graphs without Markovian Dynamics . . . . . . . . . . . 934.5.2 Markov-modulated Probability Mass Functions with Denumerable Support 964.6 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 984.7 Closing Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1024.8 Proof of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1044.8.1 Proof of Theorem 4.3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1044.8.2 Proof of Theorem 4.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.8.3 Proof of Lemma 4.8.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1124.8.4 Proof of Lemma 4.8.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1124.8.5 Proof of Theorem 4.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1164.8.6 Sketch of the Proof of Theorem 4.4.2 . . . . . . . . . . . . . . . . . . . . 1184.8.7 Sketch of the Proof of Theorem 4.4.3 . . . . . . . . . . . . . . . . . . . . 119viiTable of Contents4.8.8 Proof of Theorem 4.5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1204.8.9 Proof of Theorem 4.5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1224.8.10 Proof of Theorem 4.5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 1255 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1275.1 Summary of Findings in Part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1275.2 Summary of Findings in Part II . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295.3 Directions for Future Research and Development . . . . . . . . . . . . . . . . . . 130Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133A Some Graph Theoretic Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145B A Note on Degree-based Graph Construction . . . . . . . . . . . . . . . . . . . . . . 147viiiList of Tables3.1 The frequency of the internal and the boundary nodes in a community of 3316 un-dergraduate students of the University of British Columbia along with the statisticsof the time required by participants (of both types) to make their judgments in mil-liseconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71ixList of Figures1.1 Example of a network of six agents (social sensors) aims to estimate a parameter(state of nature) interactively; each edge depicts a communication link between twosensors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Tracking the underlying state of nature using a Markov-modulated random graph asa social sensor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3 Main results and organization of thesis. . . . . . . . . . . . . . . . . . . . . . . . 162.1 Example of constrained information flow network, S= 2 and K = 3. Circles repre-sent a social group at a specific time indexed by (2.4) in the social network and eachedge depicts a communication link between two nodes. . . . . . . . . . . . . . . . 312.2 The conditional mean of the state of nature given the observation in estimation withoptimal mis-information removal algorithm compared to the full information net-work. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.3 Comparison of the the mean squared errors of the estimates obtained by optimalmis-information removal algorithm, Bayesian estimator in full information flownetwork (free of mis-information), and standard Bayesian estimator in constrainedinformation flow network (with mis-information propagation). . . . . . . . . . . . 352.4 Comparison of the conditional mean of the state of nature x given the observa-tions obtained by sub-optimal mis-information removal algorithm, optimal mis-information removal algorithm (knowing the exact information flow graph), Bayesianestimator in full information flow network (free of mis-information), and standardBayesian estimator in constrained information flow network (with mis-informationpropagation) in “accurate estimation” scenario (β = 0.2). . . . . . . . . . . . . . 362.5 Comparison of the mean squared errors of the estimates obtained by sub-optimalmis-information removal algorithm, optimal mis-information removal algorithm (know-ing the exact information flow graph), Bayesian estimator in full information flownetwork (free of mis-information), and standard Bayesian estimator in constrainedinformation flow network (with mis-information propagation) in “accurate estima-tion” scenario (β = 0.2). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37xList of Figures2.6 Comparison of the conditional mean of the state of nature x given the observa-tions obtained by sub-optimal mis-information removal algorithm, optimal mis-information removal algorithm (knowing the exact information flow graph), Bayesianestimator in full information flow network (free of mis-information), and standardBayesian estimator in constrained information flow network (with mis-informationpropagation) in “inaccurate estimation” scenario (β = 0.8). . . . . . . . . . . . . 382.7 Comparison of the the mean squared errors of the estimates obtained by sub-optimalmis-information removal algorithm, optimal mis-information removal algorithm (know-ing the exact information flow graph), Bayesian estimator in full information flownetwork (free of mis-information), and standard Bayesian estimator in constrainedinformation flow network (with mis-information propagation) “inaccurate estima-tion” scenario (β = 0.8). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.1 Two examples of multi-agent social learning in social networks: (i) target localiza-tion, and (ii) online rating and review systems. . . . . . . . . . . . . . . . . . . . 463.2 Example of communication graph, with two agents (S = 2) and over three eventepochs (K = 3). The arrows represent exchange of information regarding actionstaken by agents. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.3 Protocol 1: Constrained social learning in social networks described in Section 3.1.1.As a result of random (unknown) communication delays, data incest arises. . . . . 493.4 Protocol 2: Idealized benchmark social learning in social networks. In this protocol,the complete history of actions chosen by agents and the communication graph areknown. Hence, data incest does not arise. This benchmark protocol will be used todesign the data incest removal protocol. . . . . . . . . . . . . . . . . . . . . . . . 523.5 Two examples of networks: (a) satisfies the topological constraint, and (b) does notsatisfy the topological constraint. . . . . . . . . . . . . . . . . . . . . . . . . . . . 573.6 Data incest removal algorithm employed by network administrator in the state es-timation problem over social network. The underlying state of nature could be ge-ographical coordinates of an event (target localization problem) or reputation of asocial unit (online rating and review systems). . . . . . . . . . . . . . . . . . . . . 583.7 Three different communication topologies: (a) the communication graph with 41nodes, (b) agents interact on a fully interconnected graph and the information fromone agent reach other agents after a delay chosen randomly from {1,2} with thesame probabilities, (c) star-shaped communication topology with random delay cho-sen from {1,2}. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593.8 Actions of agents obtained with social learning over social networks in three differ-ent scenarios described in Section 3.4 with communication graph depicted in Fig.3.7a. 61xiList of Figures3.9 Mean of the estimated state of nature in the state estimation problem with sociallearning over social networks in three different scenarios described in Section 3.4with communication graph depicted in Fig.3.7a. . . . . . . . . . . . . . . . . . . . 613.10 Actions of agents obtained with social learning over social networks in three differ-ent scenarios described in Section 3.4 with communication graph depicted in Fig.3.7c. 623.11 Mean of the estimated state of nature in the state estimation with social learningover social networks in three different scenarios described in Section 3.4 with com-munication graph depicted in Fig.3.7c. . . . . . . . . . . . . . . . . . . . . . . . . 623.12 Actions of agents obtained with social learning over social networks in three differ-ent scenarios described in Section 3.4 with communication graph depicted in Fig.3.7b. 633.13 Mean of the estimated state of nature in the state estimation problem with sociallearning over social networks in three different scenarios described in Section 3.4with communication graph depicted in Fig.3.7b. . . . . . . . . . . . . . . . . . . . 633.14 Actions of agents obtained with social learning over social networks in three differ-ent scenarios described in Section 3.4 with arbitrary communication graph. . . . . 643.15 Mean of the estimated state of nature in the state estimation problem with sociallearning over social networks in three different scenarios described in Section 3.4with arbitrary communication graph. . . . . . . . . . . . . . . . . . . . . . . . . . 643.16 Mean squared error of estimates (of state of nature) obtained with social learningwith communication graph depicted in Fig.3.7a. . . . . . . . . . . . . . . . . . . . 663.17 Mean squared error of estimates (of state of nature) obtained with social learningwith communication graph depicted in Fig.3.7b (complete fully interconnected graph). 663.18 Mean squared error of estimates (of state of nature) obtained with social learningwith communication graph depicted in Fig.3.7c (star-shaped communication graph). 673.19 Mean squared error of estimates (of state of nature) obtained with social learningwith arbitrary communication graph. . . . . . . . . . . . . . . . . . . . . . . . . . 673.20 Two arrays of circles were given to each pair of participants on a screen. Theirtask is to interactively determine which side (either left or right) had the largeraverage diameter. The partner’s previous decision was displayed on screen priorto the stimulus. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.21 Actions of two participants in a group at different trials in one experiment. . . . . . 703.22 Two scenarios where data incest arose in our experimental studies. . . . . . . . . . 713.23 Social learning with data incest that is exercised by groups of students who wereasked to perform a conceptual task in our experimental study. . . . . . . . . . . . . 723.24 Actions of two participants in a group at different epochs. Participant 1 can beconsidered as an internal node and Participant 2 can be viewed as a boundary node. 73xiiList of Figures3.25 Optimal weights (which depends on the topology of the communication graph) andset of available public belief are computed in separate units. The user, then, cancompute the most updated data-incest free network belief. . . . . . . . . . . . . . 744.1 The power law component for the non-Markovian random graph generated accord-ing to Procedure 4.5 obtained by (4.11) for different values of p and q in Proce-dure 4.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894.2 Degree distribution of the duplication-deletion random graph satisfies a power law.The parameters are specified in Example 4.6.1 of Section 4.6. . . . . . . . . . . . 1004.3 Degree distribution of the fixed size duplication-deletion random graph. The pa-rameters are specified in Example 4.6.2 of Section 4.6. . . . . . . . . . . . . . . . 1004.4 Degree distribution of the fixed size duplication-deletion random graph satisfies apower law when N0 is sufficiently large. The parameters are specified in Exam-ple 4.6.2 of Section 4.6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.5 The estimates obtained by SA algorithm (4.14) follows the expected PMF preciselywith no knowledge of the Markovian dynamics. The parameters are specified inExample 4.6.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1014.6 The average degree of nodes (as a measure of connectivity) of the fixed size Markov-modulated duplication-deletion random graph obtained by Procedure 4.5 for differ-ent values of the probability of connection, p, in Algorithm 4.5. The parameters arespecified in Example 4.6.4 of Section 4.6. . . . . . . . . . . . . . . . . . . . . . . 1024.7 Trace of the covariance matrix of scaled tracking error, trace (Σ(θ)), versus the av-erage degree of nodes as a measure of connectivity of the network. The parametersare specified in Example 4.6.3 of Sec.4.6. . . . . . . . . . . . . . . . . . . . . . . 1034.8 Trace of the covariance matrix of the scaled tracking error, trace (Σ(θ)), versus theorder of delay in the searching problem as a measure of searchability of the network.The parameters are specified in Example 4.6.3 of Sec.4.6. . . . . . . . . . . . . . 103xiiiAcknowledgmentsI owe my sincere gratitude to my academic advisor Prof. Vikram Krishnamurthy. I am deeply in-debted to him for every bit of guidance, encouragement and support that I received during the com-pletion of this work. This feat was not possible without his consistent support, fruitful discussions,valuable feedbacks, and constructive suggestions. I hope that someday I could be as enthusiastic,and energetic as him and be able to advise an audience as well as he can.I would like to express my thanks to Prof. George Yin for his time, effort, suggestions andhelpful comments during our collaboration in the past years which resulted in much improvementin our work. I would also like to extend my thanks to my friends and colleagues in the StatisticalSignal Processing Lab for creating a friendly and productive environment.Furthermore, heartfelt and deep thanks go to my family, and particularly my parents, for all thelove, patience, sacrifices, assistance, and support. They have always encouraged me and given theirunconditional support throughout my studies at the University of British Columbia. Without theirsupport I could not have completed this thesis and it is to them that I dedicate this thesis.Maziyar HamdiUBC, VancouverOctober 2015xivDedicationTo my beloved family: Mohammadreza, Haydeh, and Mazda.xv1Introduction1.1 OverviewSocial networks are crucial to the modern society: they permeate our social, personal, and economiclives. They have changed the way that people connect and communicate, vote, select items to pur-chase, pick hotels to stay in, and adapt a new technology or behavior. For example, in the new“social media” era, Angry Birds1 required only 35 days to obtain 50 million users. For comparisonpurposes and to get a better idea, radio reached the same milestone in 38 years, telephone in 75 years[5]. Social networks facilitate transmission of information, communication, and interaction amongpeople and have grown steadily in size, complexity, and importance [28, 80]. One of the reasons be-hind this huge growth of social networks, both in science and society, is the concept of a “network”as a structural pattern of interactions among social actors. Such a social structure appears on a widerange of contexts in sociology, biology, economics, computer science and electrical engineering,whenever a set of dyadic ties between objects are observed. In all these topics, an ability to modeland analyze those–often huge and complex–social structures, is fundamental for the computationalpurposes and it answers questions about the behavior of the social actors–individually as an agentor collectively as a team–in a data-driven manner.As a result of the enormous effects and applications of social networks in community and inscience, there has been a growing interest in modeling and analysis of social networks over thepast decade. Due to the constraints imposed by the structure of networks and the nature of humans(as interacting sensors), research in social networks requires borrowing techniques from complexnetworks (dynamics of random graphs) and social analysis (mostly used in the areas of economicsand sociology)[80, 146]. Statistical inference using social networks is an area that has witnessedmarvelous progress recently. Such systems comprising of humans acting as sensors, also calledparticipatory sensing, has received wide attention recently in the research communities of com-puter science, economics, marketing, social sciences and electrical engineering. The proliferationof social media such as real-time microblogging services (Twitter2) and online rating and reviewsystems (Yelp) make real time sensing of social activities, social patterns, and behavior easier. In1https://www.angrybirds.com.2About 6 thousand tweets are sent on average in a second. The tweet-per-second number on US Presidential electionday in 2012, was 15 thousand tweets per second resulting in 500 million tweets in the day. Twitter can be considered asa real-time human-based sensor of social situations.11.1. Overviewsignal processing community, the term social sensor is used to denote an agent (or a group of agents)that provides information about its environment (state of nature) to others (possibly via social me-dia channel) after interaction with other agents in a social network. Examples of such social mediachannels include Twitter, Facebook, online rating and review systems like Yelp and Tripadvisor,e-commerce platforms such as Amazon. The ability of social sensors in sensing social activities,patterns and behaviors is beyond physical sensors. For example, level of satisfaction of attendees ofa concert revealed by sentiment analysis on Facebook statuses and tweets are impossible to predictusing physical sensors, or as opposed to physical sensors, social sensors can be used to find out howthe quality of food is in a restaurant from the reviews on online rating system such as Yelp [91, 92].Statistical signal processing in social networks (inference using social sensors) appears in enor-mous range of applications across different industry sectors including marketing and advertisement,health and medicine, and financial technology. For example, [38, 101] used content of tweets toprovide Geo-location services which are useful in targeting and event advertising. Other examplesinclude detecting influential users with applications in marketing [144], localization of natural dis-asters [132], and predicting stock markets [26]. It is shown in [11] that a simple model built fromthe rate of tweets casted about particular topics can outperform market-based predictors.With the above applications of social sensors, there is a strong motivation to develop a set ofmathematical models and algorithmic tools to understand the effects of interactions among socialsensors on estimation problem (interactive sensing). The majority of this thesis is devoted to de-velopment of algorithms and procedures that are aimed at multi-agent estimation, tracking, anddecision making where agents acts as social sensors of an underlying state of nature in the pres-ence of uncertainty. Such problems are non-standard in two ways: First, in social networks, agentsinteract with and influence other agents. For example, ratings posted on online rating and reviewsystems strongly influence the behavior of individuals3 . Such interactions can result in non-standardinformation patterns as a results of constraints imposed and correlations introduced by the structureof the underlying social network (communication topology among social sensors). Second, due toprivacy reasons and time constraints, social sensors typically do not reveal raw observations of theunderlying state of nature. Instead, they reveal their beliefs (opinions), or actions (tweet/re-tweetof common trends, thumbs-up on rating and review systems, purchase an item on e-commerce plat-forms) which can be viewed as a quantized version of their knowledge about state of nature formedby rawmeasurements and interactions with other social sensors. These together with the uncertaintyinvolved in observations of state of nature, result in non-standard estimation and tracking problemwhich are the major topics of this thesis. The unifying theme of this thesis is to develop a set of the-ory and methods for statistical signal processing on graphs (social sensors) which involves adaptivefiltering and stochastic approximation, dynamics of random graphs, multi-agent Bayesian estima-3It is reported in [79] that 81% of hotel managers regularly check Tripadvisor reviews. It is reported in [109] that aone-star increase in the Yelp rating maps to 5-9 % revenue increase.21.1. Overviewtion, and social learning to understand how sensors interact. We employ social learning [16, 24, 35],graph theoretic tools, and stochastic approximation [99, 157] as useful mathematical abstractionsfor modeling the interaction of social sensors. The rest of this section is devoted to an overview ofthese topics along with motivation and research goals that have been addressed in this thesis.1.1.1 Bayesian Estimation over Social NetworksBayesian filtering, which is a recursive form of the famous Bayes’ rule, has been used extensivelyin the traditional signal processing literature for logical inference on uncertain parameters from aprior knowledge and new observations[21]. What statisticians and mathematicians call “Bayesiantheory” was originally developed by Thomas Bayes in a famous paper which was presented at ameeting of the Royal Society of London [18, 138]. The well-known Bayes theorem describes thefundamental probability law in order to perform logical inference. He states[138]“If there be two subsequent events, the probability of the second bN and the probabilityof both together PN , and it being first discovered that the second event has also hap-pened, from hence I guess that the first event has also happened, the probability I amright is Pb .”Bayesian inference is devoted to applying Bayes’ rule to statistical inference in order to update theprobability of a hypothesis as new observations are obtained [21, 125, 130]. Bayesian models forinference have been applied in decision theory, detection and estimation, communications theory,pattern recognition, machine learning and artificial intelligence, and filtering and parameter estima-tion [39, 64, 75, 105, 112].Chapter 2 of this dissertation considers a multi-agent sensing problem where agents (socialsensors) interact over a random graph and evaluate their belief (opinion) about an economic or asocial parameter, namely state of nature. This state of nature can be, for example, quality of food ina restaurant, occurrence of earthquake, or Geo-location of a target. The evaluation of belief aboutstate of nature can be made using Bayesian [1, 64] or non-Bayesian models [57, 81]. In this work,we focus on Bayesian inference in social networks. In the setup considered in Chapter 2, each agentacts a social sensor: (i) she records her observation about state of nature in uncertainty (by samplingfrom a conditional probability distribution), (ii) interact with other agents and receives estimatesfrom other agents, (iii) updates her belief using recursive Baysian models and transmits her updatedbelief over the network, and this repeats on.As a result of the recursive nature of the Bayesian models and the correlation introduced by thestructure of the communication graph between agents, “mis-information propagation” also knownas data incest4 arises in such networks. In a general sense, mis-information propagation involves in-4These terms are used interchangeably throughout this dissertation.31.1. Overviewadvertent re-use of identical information (which are naively considered to be independent) in forma-tion of belief about state of nature [71, 89]. The following example illustrates how mis-informationcan propagate and affect the estimates of social sensors in the multi-agent state estimation overgraphs.Example to Illustrate Mis-information Propagation:Consider six social sensors that aim to estimate an underlying state of nature interactively with theircommunication topology illustrated in Figure 1.1. The graph in Figure 1.1 shows the informationexchange protocol among social sensors, for example, the link between node 1 and node 3 showsthat the estimation of node 1 is available at node 3. Each node records its own observation and com-bines it with the available estimates received from other nodes in the network (which are availabledue to the network structure) in order to form its estimates of state of nature. Then, the updatedestimates are broadcasted over the network.123456Figure 1.1: Example of a network of six agents (social sensors) aims to estimate a parameter (stateof nature) interactively; each edge depicts a communication link between two sensors.To understand what can go wrong with the above example, note that the estimate of node 1 isused at nodes 3 and 4, therefore, the estimates of these nodes are both functions of the estimate ofnode 1. Thus, if node 5 naively combines the information received from node 3 and 4, it would havedouble counted the estimate of node 1; this results in mis-information propagation.Research GoalsAs illustrated with the above example, mis-information propagation (also called data incest) arises ininteractive sensing over networks due to the recursive nature of Bayesian estimators and the correla-tion imposed by the structure of the underlying communication graph among agents (social sensors).Mis-information propagation results in a bias in estimates of agents and over-confidence phenom-ena, i.e., the variance is underestimated. Therefore, in the presence of mis-information propagation,Bayesian estimators require careful design. Chapter 2 is devoted to design and development of al-gorithms that mitigate the effect of mis-information propagation in multi-agent estimation problemover the networks.41.1. OverviewRemark 1.1.1. Our implicit assumption throughout Chapter 2 is that, agents do not directly broad-cast their raw observations over the network. First, (private) raw observations of the other individu-als are not typically available because of the privacy concerns or time constraints. For example, letx ∈ ({1,2, . . . ,5}×{1,2, . . . ,5}) denote quality and affordability of a restaurant (x is a two dimen-sional vector). Assume that an individual in social network (social sensor) checks out the restaurantand based on her observations, she estimates the quality and cost of that restaurant (the estimatedvalue of x). But at the time of her future social interactions, she usually does not provide (or evendoes not remember) details of the “raw” observations and she only shares her belief with others.Second, the dimension of observations, is typically much larger than the dimension of the state ofnature. In the example of restaurant, the observation vector can include many elements such asquality of food, music, lighting, price of food, staff, neighborhood, cost of beverages, etc. Insteadof broadcasting all these raw observations, it is more common in social networks to share only thebeliefs about quality and cost of that restaurants.1.1.2 Interactive Social Learning over NetworksSocial learning which has been applied to understand, model, and predict the behavior of agentsin economics, financial markets, political sciences, and social networks [16, 35, 36, 100] seeksto answer the following question: How do actions (decisions) of other agents affect actions ofsubsequent agents? Social learning model comprises of a set of agents seeking to estimate theunderlying state of nature not only from their private observations, but also from the actions ofprevious agents. All agents know the structure of the model, they know that the action of each agentis a rational response to its private observations, and thus, convey information about state of nature.Social learning can be considered as the diffusion of (private) information about state of nature toall agents through the intercommunication of actions in a set of (finite, or infinite) agents [35].When a human learns form another person’s behavior, decision, or action, social learning occurs.In social learning, the actions of one agent affects the behavior of others, since they know that thoseactions are motivated by some type of information that other agents have about the state of nature.To better understand this, let’s borrow the umbrella example from [35]. When someone sees otherpeople going out with an umbrella, she also takes an umbrella without checking the weather forecast.That happens because people know that the actions or behavior of others have some informationabout the state of nature; this results in rational herding. Social learning is a useful approximationto ordinal human behavior. Classical social learning is used to model the behavior of expected costminimizer agents. Also, social learning can be generalized to of risk averse minimizers, see [95, 96].Chapter 3 considers a social learning model comprising of a set of agents (social sensors) thatare interacting over a network to estimate an underlying state of nature. As opposed to the classicalsocial learning model (where agents act once in a pre-determined order), in the social learningmodel considered in Chapter 3, the structure of social network dictates who interacts with whom.51.1. OverviewWe use social learning as a mathematical abstraction to model interactions between agents (socialsensors) in interactive sensing problem. From a statistical signal processing point of view, theinteractive sensing using social learning models–estimating state of nature via social sensors–isnon-standard in two ways: First, agents are influenced by the rating of other agents, this can resultin interesting phenomenon where rational agents can all end up making the same decision (herdingand information cascades; [35]). Second, (and this effect is more complex), an agent might beinfluenced by his own rating leading to data incest (mis-information propagation). In the following,the main results of Chapter 3 is described briefly.Research GoalsAs is apparent from the above discussion, mis-information can propagate in interactive sensing withsocial learning due to the correlations imposed by the structure of the underlying social networks(existence of multiple paths between agents in the graph which represents the underlying socialnetwork) and the recursive nature of Bayesian inference, see Section 1.1.1. To give an example,suppose an agent wrote a poor rating of the restaurant on a rating and review site. Another agent isinfluenced by this rating and also gives the restaurant a poor rating. The first agent visits the ratingand review again and finds out that another agent has also given the restaurant a poor rating–thisdouble confirms his rating and he enters another poor rating. In a fair system, the first agent shouldhave been aware that the rating of the second agent was influenced by his rating–so that first agenthas effectively double counted his first rating by casting the second poor rating. Mis-informationpropagation cause the over-confidence problem and results in a bias in the estimate of state of nature.Motivated by online rating and review systems, our aim is to manage mis-information propaga-tion problems associated with interactive sensing using social learning. In particular, our goal is todesign and develop a protocol for the administrator of a rating and review system such that it canautomatically maintain a fair (data incest free) rating and review system which results in a systemwith a higher trust rating.1.1.3 Tracking Degree Distribution of Social NetworksTracking a time-varying parameter that evolves according to a finite-state Markov chain (state ofnature) is a problem of much interest in signal processing [20, 63, 153]. In the context of socialnetworks, this parameter can be, for example, the level of happiness in a community, the tendencyof individuals to expand their networks, the strength of social links between individuals, or search-ability of network which cannot be sensed by physical sensors. In such cases, social sensors cango beyond physical sensors and can be used to track those parameters of social networks. A socialnetwork with a large number of individuals can be viewed as an interactive sensing tool to obtaininformation about individuals or state of nature; this is a social sensor. Motivated by social network61.1. Overviewapplications, a social sensor based framework is presented in Chapter 4 to track the degree distri-bution of Markov-modulated dynamic networks whose dynamics evolve over time according to afinite-state Markov chain.The question that may arise here is: “What is the importance of degree distribution of a socialnetwork?” The most important parameter of a network that characterizes its structure is the degreedistribution. It yields useful information about the connectivity of the random graph [10, 86, 116].For example, if a majority of nodes in the random graph have relatively high degrees, the graph ishighly connected and a message can be transferred between two arbitrary nodes with shorter paths.However, if a majority of nodes have smaller degrees then for transmitting a message throughout thenetwork, longer paths are needed, see [80]. The degree distribution can further be used to investigatethe diffusion of information or disease through social networks [108, 146]. The existence of a “giantcomponent”5 in complex networks can be studied using the degree distribution of the graph thatmodels that social network. The size and existence of a giant component has important implicationsin social networks in terms of modeling information propagation and spread of human disease,see [62, 115, 118]. The degree distribution is also used to analyze the “searchability” of a network.The “search” problem arises when a specific node in a network faces a problem (request) whosesolution is at other node, namely, destination (e.g., delivering a letter to a specific person, or findinga web page with specific information) [4, 146]. The searchability of a social network [146] is theaverage number of nodes that need to be accessed to reach the destination. Degree distribution is alsoused to investigate the robustness and vulnerability of a network in terms of the network responseto attacks on its nodes or links [33, 76]. The papers [148, 149] further use degree-dependent toolsfor classification of social networks.Chapter 4 considers a dynamic social network where the interactions between nodes evolveover time according to a Markov process that undergoes infrequent jumps (the state of nature). Anexample of such social networks is the friendship network among residents of a city, where thedynamics of the network change in the event of a large festival. In this chapter, we proposeMarkov-modulated random graphs to mimic social networks where the interactions among nodes evolveover time due to the underlying dynamics (state of nature). For example, state of nature can be levelof happiness in the society which is impossible to measure using participatory sensors. Here, socialnetworks can be used as social sensors for tracking the underlying state of nature. That is, usingnoisy measurements of the degree distribution of the network, the jumps in the underlying state ofnature can be tracked.5A giant component is a connected component with size O(n), where n is the total number of vertices in the graph.If the average degree of a random graph is strictly greater than one, then there exists a unique giant component withprobability one [41], and the size of this component can be computed from the expected degree sequence.71.1. OverviewFigure 1.2: Tracking the underlying state of nature using a Markov-modulated random graph as asocial sensor.Research GoalsChapter 4 considers a Markov-modulated duplication-deletion random graph where, at each timeinstant, one node can either join or leave the network with probabilities that evolve according to therealization of a finite state Markov chain (state of nature). This chapter deals with the followingquestions: How can one estimate the state of nature using noisy observations of nodes’ degreesin a social network? and How good are these estimates? By tracking the degree distribution ofa Markov-modulated random graph, we can design a social sensor to track the underlying state ofnature using the noisy measurements of nodes’ connectivity; see Figure 3.20.Chapter 4 comprises of two results. First, motivated by social network applications, we analyzethe asymptotic behavior of the degree distribution of the Markov-modulated random graph. Fromthis degree distribution analysis, we can study the connectivity of the network, the size and the ex-istence of a large connected component, the delay in searching such graphs, etc. [62, 80, 115, 118].Second, a stochastic approximation algorithm is presented to track the empirical degree distributionas it evolves over time. We further show that the stationary degree distribution of Markov-modulatedduplication-deletion random graphs depends on the dynamics of such graphs and, thus, on the stateof nature. This means that, by tracking the empirical degree distribution, the social network can beviewed as a social sensor to track the state of nature. The tracking performance of the proposedstochastic approximation algorithm is analyzed in terms of mean square error. A functional centrallimit theorem is further presented for the asymptotic tracking error.81.2. Main Contributions1.2 Main ContributionsIn the subsequent sections, a brief summary of major novel contributions of the chapters that con-stitute this thesis is provided in order that they appear in the thesis. More detailed description of thecontributions and findings of each chapter is provided in individual chapters.1.2.1 Bayesian Estimation over Social NetworksAs briefly described in Section 1.1.1, Chapter 2 considers multi-agent Bayesian estimation problemwith constraints imposed by the structure of the underlying social network. In such problems, as aresult of the recursive nature of Bayesian estimators and the correlation imposed by the communi-cation topology of social sensors, mis-information propagation arises.The main contributions of Chapter 2 are summarized below:1. Mis-information propagation problem in interactive-sensing problem over social networks–where agents transmit their beliefs about state of nature instead of raw observations over thenetwork–is mathematically formulated using a family of directed acyclic graphs.2. A necessary and sufficient condition on the information flow graph6 is presented for the exactmis-information removal problem. It is shown in Section 2.3 that under Constraint 2.3.1,mis-information associated with estimates of agents can be completely removed.3. An optimal information aggregation algorithm is proposed for multi-agent estimation problemover networks which mitigates the mis-information associated with estimates of agents (socialsensors) when the information flow graph is known.4. A sub-optimal mis-information removal algorithm is presented for scenarios where the theinformation flow graph is not completely known.1.2.2 Interactive Social Learning over NetworksMotivated by online rating and review systems, Chapter 3 considers social learning as a mathemat-ical abstraction to model the interactions among agents (social sensors) in state estimation problemusing interactive sensing. Agents record their own private observations, then, update their privatebeliefs about the state of nature using Bayes’ rule. Based on their belief each agent, then, choosesan action (rating) from a finite set and transmits this action over the social network. An importantconsequence of such social learning over a network is the ruinous multiple re-use of information6Information flow graph is a directed acyclic graph that models the flow of information among social sensors, forexample, a directed edge from Sensor 1 to Sensor 2 means that the information– or beliefs in the context of Chapter 2–ofSensor 1 is available at Sensor 2, see Section 2.2 for more details.91.2. Main Contributionsknown as data incest (or mis-information propagation). In this chapter, the data incest managementproblem in social learning context is formulated on a family directed acyclic graphs.The main contributions of Chapter 3 are summarized below:1. A social learning model is presented to mimic the behavior of agents in online rating andreview systems that aim to estimate a state of nature (for example quality of a restaurant onYelp).2. A fair rating and review protocol is presented and the criterion for achieving a fair rating isdefined. This protocol is used as a benchmark in the data incest management problem insocial learning over social networks.3. An automated incest removal protocol is developed for the administrator of online rating andreview system, to deploy such that the system maintains a fair rating of its entities. Suchalgorithm can easily be applied to any interactive-sensing system that involves transmissionof actions and Bayesian inference.4. Necessary and sufficient conditions on the graph topology of social interactions between so-cial sensors are presented to eliminate data incest.1.2.3 Tracking Degree Distribution of Social NetworksChapter 4 considers dynamical random graphs. The most important measure that characterizesthe structure of a network (specially when the size of the network is large and the connections—adjacency matrix of the underlying graph—are not given) is the degree distribution of that network.The degree of a node in a network (also known as the connectivity) is the number of connectionsthe node has in that network. In this chapter, motivated by social network applications, we considera class of stochastic approximation algorithms to track a time-varying probability mass functionthat evolves according to a finite-state Markov chain whose transition matrix is close to identity. Inthe context of social network analysis, the time-varying probability mass function which we aim totrack is the expected degree distribution of a dynamic random graph.The main contributions of Chapter 4 are summarized below:1. A family of Markov-modulated duplication-deletion random graphs are introduced in Chap-ter 4 that mimic social networks where interaction among agents are varying over time ac-cording to realization of a finite-state Markov chain. We consider two categories of suchgraphs: (i) fixed size duplication-deletion random graph, and (ii) infinite duplication-deletionrandom graph.2. An asymptotic degree distribution analysis is presented for the fixed size Markov-modulatedrandom graph. In particular, it is shown that the expected degree distribution of such graphs101.3. Related Worksat each time can be computed in terms of the expected degree distribution at the previous timeand the dynamics of the graph via a recursive equation.3. We extend the degree distribution analysis to infinite random graphs and prove that the degreedistribution of such graphs satisfy a power law with the exponent depending on the dynamicsof the graph. An expression is presented to compute the power law exponent in terms of thedynamics of the duplication-deletion model.4. Chapter 4, further, considers the adaptive estimation problem of degree distribution for afixed size Markov-modulated duplication-deletion random graph given noisy observations.A stochastic approximation algorithm is presented for tracking the degree distribution as itevolves over time. In the following, the results related to the tracking performance of stochas-tic approximation algorithm are presented.• Mean square error analysis: Using error bounds on two-time scale Markov chains andperturbed Lyapunov function methods, the asymptotic mean square error between theexpected degree distribution and the estimates obtained via the stochastic approximationalgorithm is computed.• Weak convergence analysis: We show that the asymptotic behavior of the stochasticapproximation algorithm converges weakly to the solution of a switched Markovianordinary differential equation.• Functional central limit theorem for scaled tracking error: Finally, Chapter 4 investi-gates the asymptotic behavior of the scaled tracking error of stochastic approximationalgorithm. Similar to [94], it is shown that the interpolated scaled tracking error con-verges weakly to the solution of a switching diffusion process.5. Chapter 4, further, investigates infinite (denumerable) duplication-deletion random graphswhere the number of nodes in the graph (and so the support of degree distribution) is no longerfixed and increases over time. A Hilbert-space-valued stochastic approximation algorithmis proposed to track the degree distribution of the infinite graph with support on the set ofnon-negative integers. To study the tracking performance of such a Hilbert-space-valuedstochastic approximation algorithm, limit system characterization and asymptotic analysis ofscaled tracking error are provided.1.3 Related WorksThis section is devoted to the literature review of topics and advances in the fields related to thisdissertation.111.3. Related Works1.3.1 Bayesian Estimation over Social NetworksBayesian inference deals with the problem of inferring knowledge about unknown parameters usingBayes’ rule [21, 125, 130]. Bayesian theory has many application in decision theory, detection andestimation, communications theory, pattern recognition, machine learning and artificial intelligence,and filtering and parameter estimation [39, 58, 64, 75, 77, 105, 112]. For a comprehensive survey onBayesian inference and estimation theory, we refer the interested reader to books [130, 133, 141].Bayesian networks and different inference methods in Bayesian networks are investigated in [120].A model of Bayesian social learning where agents receive a private information about state of natureand observe the actions of their neighbors is investigated in [83]. They proposed an algorithm tocompute actions of agents on tree-based social networks and analyzed their algorithm in terms ofefficiency and convergence.There are several papers discussing the spread of information in social networks, see [7, 31, 36]for a comprehensive survey and tutorial on different methods for diffusion of information in socialnetworks. Applications of gossip algorithms, which is a protocol based on communication of agentswith their local neighbors, in signal processing are studied in [65]. A type of mis-information prop-agation in social networks caused by “influential” or “ forceful” agents is investigated in [3]. Viralpropagation of faulty information (for example, mis-information of swine flu) through social mediachannels is studied in [30, 119]. For the motivation of the mis-information problem addressed inthis work, we refer to [29, 49, 78, 106, 128] in sensor networks. Data incest in sensor network con-text happens in distributed tracking systems where sensors locally integrate the estimates receivedfrom other sensors through a (possibly loopy) communication graph with random delays. The keyrequirement is to fuse estimates that share a common information set. An optimal solution for thecase of connected tree networks by combining a decentralized information filter and a channel filteris presented in [49].In this thesis, we consider mis-information propagation through a social network with arbitrarynetwork topologies. Each agent records its observation of state of nature in presence of noise. Weused a combination of graph theoretic tools and Bayesian estimation to remove the mis-informationremoval generated by different delays in links.1.3.2 Interactive Social Learning over NetworksSocial learning theory is used to investigate the learning behavior of agents in social and economicnetworks [2]. There are several papers in the literature discussing Bayesian models [1, 43, 92, 120]and non-Bayesian models [14, 48, 56, 57, 81] for social learning. Different models for diffusion ofbeliefs in social networks are presented in [36]. For a comprehensive survey on herding and infor-mation cascade in social learning, see [37]. Stochastic control with social learning for sequentialchange detection problems is considered in [87].121.3. Related WorksMis-information in the signal processing literature refers to faulty or inaccurate informationwhich is broadcasted unintentionally. A brief summary of works related to misinformation prop-agation and removal is presented in Section 1.3.1. Mis-information in the context of this chapteris motivated by sensor networks where the term “data incest” is used [89]. In multi-agent sociallearning in networks, data incest occurs when information (action) of one agent is double-countedby other agents (due to the lack of information about the topology of the communication graph);this yields to overconfidence. The overconfidence phenomena (caused by data incest) also arises inBelief Propagation (BP) algorithms [113, 123] which are used in various fields such as graphicalmodels for learning, computer vision, and error-correcting coding theory. The aim of BP algorithmsis to solve inference problems over graphical models such as Bayesian networks (where nodes rep-resent random variables and edges depict dependencies among them) by computing a marginaldistribution. BP algorithms require passing local messages over the graph (Bayesian network) ateach iteration. These algorithms converge to the exact marginal distribution when the factor graphis a tree (loop free). But for graphical models with loops, BP algorithms are only approximate dueto the over-counting of local messages [113, 152] (which is similar to data incest in multi-agentsocial learning)7.In Chapter 2 and papers [89, 90], data incest is considered in a network where agents exchangetheir private belief states—that is, no social learning is considered. In a social network, agents rarelyexchange private beliefs, they typically broadcast actions (votes) over the network. Motivated bytrustable online rating and review systems, we consider data incest in a social learning contextwith social network structure where actions (or equivalently public belief of the social learning)are transmitted over the network. This is quite different from private belief propagation in socialnetworks. Simpler versions of this information exchange process and estimation were investigatedby Aumann [12] and Geanakoplosand and Polemarchakis [66]. The results derived in this chapterextend theirs.Finally, the methodology of Chapter 3 can be interpreted in terms of the recent Time magazinearticle [145] which provides interesting rules for online rating and review systems. These include:(i) review the reviewers, (ii) censor fake (malicious) reviewers. The data incest removal algorithmproposed in this chapter can be viewed as “reviewing the reviews” of other agents to see if they areassociated with data incest or not.7There exists some similarities between BP and social learning in the sense that they are both systematic structuresto perform Bayesian inference over graphs. However, they are not related in principle. While graphs represent socialinteractions among agents in social learning, graphical models in BP depict the conditional dependency between nodes(random variables)– they do not imply the actual communications, for more detail see[36].131.3. Related Works1.3.3 Tracking Degree Distribution of Social NetworksWith a large number of rational agents, social networks can be viewed as social sensors for extract-ing information about the world or people. For example, the paper [132] presents a social sensor(based on Twitter) for real-time event detection of earthquakes in Japan, namely, the target event.They perform semantic analysis on tweets (which are related to the target event) in order to detectthe occurrence and the location of the target event (earthquake). Another example is the use ofthe social network as a sensor for early detection of contagious outbreaks [40]. Using the fact thatcentral individuals in a social network are likely to be infected sooner than others, a social sensoris designed for the early detection of contagious outbreaks in [40]. The performance of this socialsensor was verified during a flu outbreak at Harvard College in 2009—the experimental studiesshowed that this social sensor provides a significant additional time to react to epidemics in thesociety. Social sensing in the context where physical sensors present in mobile devices such as GPSor Bluetooth are used to infer social interactions is studied in [6, 32, 34, 54] . Here, we considera scenario that a social network considered as a sensor of social interactions, human activities, orbehavior and aims to track the degree distribution of a random graph via stochastic approximationalgorithms.Stochastic approximation algorithms have several applications in diverse areas such as systemidentification, control theory, adaptive filtering, state estimation, wireless communications, targettracking, change detection, and economics [20, 50–52, 93, 99, 110, 111, 150, 153]. The ubiquitoususe of stochastic approximation algorithms is mainly due to their ability to track a time-varyingunknown parameter of a system; this is called “tracking capability”, see [20]. Tracking a time-varying parameter that evolves according to a finite-state Markov chain has several applicationsin target tracking [63], change detection [20], multi-user detection in wireless systems [153], andeconomics [93]. Tracking capability of regime switching stochastic approximation algorithms isfurther investigated in [154] in terms of the mean squared error. The interested reader is referredto [99] for a comprehensive development of stochastic approximation algorithms.For the background and fundamentals on social and economic networks, we refer to [80]. Here,the related literature on dynamic social networks is reviewed briefly. The book [53] provides adetailed exposition of random graphs. The dynamics of random graphs are investigated in the math-ematics literature, for example, see [41, 60, 104] and the reference therein. In [121], a duplicationmodel is proposed where at each time step a new node joins the network. However, the dynamics ofthis model do not evolve over time. In [41], it is shown that the degree distribution of such networkssatisfies a power law. In random graphs which satisfy the power law, the number of nodes witha specific degree depends on a parameter called “power law exponent” [82, 147]. A generalizedMarkov graph model for dynamic social networks along with its application in social network syn-thesis and classification is also presented in [149]. The degree distribution analysis of real-world141.4. Thesis Outlinenetworks has attracted much attention recently, [8, 45, 55, 67, 85, 114, 118]. A large networkdataset collection can be found in [103], which includes datasets from social networks, web graphs,road, internet, citation, collaboration, and communication networks. The paper [114] investigatesthe structure of scientific collaboration networks in terms of degree distribution, existence of giantcomponent, and the average degree of separation. In the scientific collaboration networks, two sci-entists are connected if they have co-authored a paper. Another example is the degree distributionof the affiliation network of actors8, which is studied in [118] based on real data from IMDb. In [8],the structure and characteristics of three different online social networks (Cywork, Myspace, andOrkut) are investigated. The authors use snowball sampling method9 [22, 102] to estimate the de-gree distribution of the network when it is not possible to access all nodes in the network (speciallywhen the size of the network is large). It is further shown in [55] that the degree distribution ofemail networks satisfies a power law.Finally, different applications of social sensors in detection and estimation are investigated in [9,40, 132]. The differences between social sensors, social sensing, and pervasive sensors along withchallenges and open areas in social sensors are further presented in [88, 131].1.4 Thesis OutlineIn this section, we present the organization of this dissertation which is illustrated in Figure 1.3. Therest of this thesis is divided into two parts and four chapters as outlined below:Motivated by different information diffusion patterns in interactive sensing over social networks,Part I considers multi-agent state estimation and learning problem over directed acyclic graphs insocial networks and comprises of two chapters:• Chapter 2 considers Bayesian estimation over directed acyclic graphs where agents transmittheir beliefs about state of nature (the posterior distribution of state of nature given private ob-servations and beliefs of other agents which are available due to the structure of the underlyingsocial networks) instead of raw observations. It then formulates data incest (also known asmis-information propagation) that arises in such estimation problems using a graph-theoreticsetup. It is, then, shown that under some necessary and sufficient conditions on the topol-ogy of the communication graph among agents, mis-information can completely be removedfrom the estimates of agents. Assuming that the communication graph is known, an optimalmis-information removal algorithm is proposed. We also provide a sub-optimal algorithmfor reducing the effect of mis-information when the communication graph is not completely8Collaboration network of movie actors.9There are different methods of sampling a network, for example, link sampling, node sampling, and snowball sam-pling. For a complete survey, we refer to [102]. In link or node sampling, a given fraction of links or nodes are sampled.In snowball sampling method, one node is chosen randomly and the next samples are chosen from its neighbors.151.4. Thesis OutlineSignal Processing Methods for Interactive Sensing Using  Social SensorsPart II: Tracking the Degree Distribution in Dynamic Social Networks:x Degree distribution analysis of Markov-modulated graphsx  Social sensor of Markovian dynamics (state of nature)Part I: Estimation and  Learning Over Directed Acyclic Graphs:x Bayesian estimation  for interactive    sensingx Multi-agent social learningx Data incest in social learningAlgorithmic Tools: Stochastic Approximation, Bayesian FilteringAnalysis Tools:  Weak Convergence analysis, Graph TheoryFigure 1.3: Main results and organization of thesis.known. This chapter is concluded with a numerical study that illustrates the excellent perfor-mance of the proposed algorithms.• Motivated by online rating and review systems, Chapter 3 employs social learning to modelthe interactions among agents in a multi-agent estimation problem where the actions of agentsare transmitted over the network (instead of raw observations or private beliefs). In such asetup, each agent—in order to estimate an underlying state of nature—chooses an action froma finite set of actions to minimize a local cost function and then transmits this action over thenetwork. We give necessary and sufficient conditions on the graph topology of social inter-actions to eliminate data incest. A data incest removal algorithm is, then, proposed in thischapter such that the public belief of social learning (and, hence, the actions of agents) isnot affected by data incest propagation. This results in an online rating and review systemwith a higher trust rating. This chapter then presents an actual psychology experiment thatwas conducted by our colleagues at the Department of Psychology of University of BritishColumbia in September and October, 2013, to illustrate social learning, data incest and so-cial influence. Finally, numerical examples are provided to illustrate the performance of theproposed optimal data incest removal algorithm.Motivated by applications of degree distribution in social network analysis and tracking aMarko-vian dynamics of graphs, Part II considers Markov modulated duplication-deletion random graphs161.4. Thesis Outlineand is comprised of one chapter:• Chapter 4 considers a Markov-modulated duplication-deletion random graph where at eachtime instant, one node can either join or leave the network; the probabilities of joining or leav-ing evolve according to the realization of a finite state Markov chain. This chapter comprisesof two inter-related research problems. First, motivated by social network applications, theasymptotic behavior of the degree distribution is analyzed. Second, a stochastic approxima-tion algorithm is presented to track empirical degree distribution as it evolves over time. Thetracking performance of the algorithm is analyzed in terms of mean square error and a func-tional central limit theorem is presented for the asymptotic tracking error. Chapter 4, then,presents a Hilbert-space-valued stochastic approximation algorithm that tracks a Markov-modulated probability mass function with support on the set of nonnegative integers. Finally,this chapter is concluded with some numerical example that illustrates the performance oftracking algorithms and corroborate the findings of this chapter.Chapter 5 briefly outlines a summary of findings in these two parts and provides a direction forfuture research and development in the fields related to this dissertation.A brief review of some graph theoretic definitions and tools that have been used throughout thisdissertation is presented in Appendix A.An important associated problem to numerical studies of social networks is how to actuallyconstruct random graphs via simulation algorithms. In particular, for large social networks, onlythe degree sequence is available, and not the adjacency matrix. (The degree sequence is a non-increasing sequence of vertex degrees.) Does a simple graph exist that realizes a particular degreesequence? How can all graphs that realize a degree sequence be constructed? Appendix B presentsa discussion of these issues.17Part IEstimation and Learning Over DirectedAcyclic Graphs182Constrained Estimation Over RandomGraphs2.1 IntroductionThis chapter deals with Bayesian estimation problem over networks and considers a social networkwhere each group of individuals use received information from the other social groups and employsBayesian models of information aggregation to evaluate their belief about state of nature. In thiscontext, each group of individuals form a social sensor of a an economic or a social parameter.The process of exchanging information between social groups is crucial for individuals to evaluatetheir beliefs. An important parameter that characterizes how the belief evolves is the delay in thisinformation exchange. This delay can be extrinsic - individuals take different amounts of time toform beliefs and communicate them, or intrinsic to the network; highly connected nodes exchangeinformation faster compared to nodes that have fewer connections. The most important consequenceof this delay is mis-information (or incest) propagation as we will explain shortly. Let us firstformulate the observation and information exchange using a graph-theoretic notation.State of Nature: Let x represent a state of nature that individuals in the social network aim toestimate such as quality of a restaurant. Assume that x belongs to a finite set X = {x1,x2, . . . ,xN}.Here, xi (for 1 ≤ i ≤ N) is in Rd or R+d or Nd where d is a positive integer number. Assume that xhas prior distribution π0.Observation Protocol: To estimate x, each individual in the social network obtains an M-dimensional observation vector where M is a positive integer number. To simplify the analysis,we assume that the set of individuals in the social network is partitioned into S social groups suchthat within each social group individuals record the same observations. At time k, the noisy obser-vation of social group s, z⌊s,k⌋ has conditional probability distributionp(z⌊s,k⌋ ≤ z|x= xi) =∑z≤zBiz, 1≤ i≤ N. (2.1)Here ∑z denotes integration with respect to Lebesgue measure (in which case Biz is the conditionalprobability distribution function) or the counting measure (in which case Biz is the conditional prob-ability mass function). Assume that zn given x for different values of “s” and “k” are independent192.1. Introductionrandom variables with respect to s and k. Each social group s combines its private observation z⌊s,k⌋,with information received from other groups in social network to update its belief about state ofnature x. Then, it communicates this updated belief to other groups in the social network.Information Exchange Protocol: Let G⌊s,k⌋ = (V⌊s,k⌋,E⌊s,k⌋), k = 1,2, , . . ., s= 1,2, . . . ,S denotea sequence of time-dependent directed graphs of information flow in the social network until andincluding time k. Here V⌊s,k⌋ denotes the set of vertices,V⌊s,k⌋ = {(s,k′)|k′ ≤ k,s ∈ {1,2, . . . ,S}} (2.2)and E⌊s,k⌋ ⊆V⌊s,k⌋×V⌊s,k⌋ is the set of edges which depicts the connections between vertices inG⌊s,k⌋.For example if ((s,k′),(s′,k′′)) ∈ E⌊s,k⌋, it means that the information from social group s at time k′is available at social group s′ at time k′′ (k′ ≤ k′′ ≤ k). Each social group uses Bayesian model toestimate the underlying state of nature x.As a result of the recursive nature of Bayesian estimators, mis-information propagation canarise in a social network with the above information exchange protocol. For example, assume thatthe estimates of social group 1 at time 1, θ⌊1,1⌋, reach social group 2 at time 2. Also suppose theestimates from social group 2, θ⌊2,2⌋, reach social group 1 at time 3. Since social group 2 usedθ⌊1,1⌋, the estimate generated by social group 2 is a function of the θ⌊1,1⌋. Therefore, if social group1 naively combines the estimate of social group 2 received at time 3, θ⌊2,2⌋, with its own privateestimates, it would have double counted its estimate at time 1, θ⌊1,1⌋. In the above graph theoreticnotation, we can depict graph G⌊2,3⌋ as(1,1) → (1,2) → (1,3)↘ ↗(2,1) → (2,2) → (2,3)(2.3)where the two-tuples denote vertices defined in (2.2) and the arrows denote edges of directed acyclicgraph. The fact that there exists two distinct paths between (1,1) and (1,3) in the graph of (2.3) showsthat information in (1,1) is double counted leading to mis-information propagation. As the aboveexample shows, the mis-information (rumor) propagation can be viewed as the destructive re-useof observation information. It leads to an overconfidence phenomenon i.e the variance is under-estimated. The recursive nature of Bayesian estimation requires careful design to cope with thepossible ruinous re-use of information. In more realistic problems considered in this chapter, thereare multiple groups in social network (and thus an arbitrarily complex network topology) togetherwith random delays in the network. For such cases, mis-information management is a non-trivialproblem.202.1. Introduction2.1.1 Chapter GoalsA network of social groups is considered in this chapter that aim to estimate the underlying state ofnature x. Before proceeding, let us introduce the following scalar index n instead of ⌊s,k⌋ for thesake of notational simplicity:n! s+S(k−1), s ∈ {1, . . . ,S}, k ∈ {1,2,3, . . .} . (2.4)Notice that n is a composite of time k and social group s. Subsequently, we will refer to n as a“node” of a time dependent graph namely information flow graph. This estimation problem can beexpressed in the following abstract form:Estimate x with prior π0 subject to:⎧⎪⎨⎪⎩Gn = (Vn,En) is given.zn ∼ Biz, x= xi, (observation process)θn =A (Θn,zn), (filter constraint)(2.5)Here, Θn denotes the set of beliefs from nodes (social groups at previous times) available at noden (social group s at time k) which depends on the information flow network Gn. Let θn denotethe posterior distribution10 of x given Θn and zn. In 2.5, A denotes the algorithm used by eachnode to update the belief θn. The aim of this chapter is to construct the information aggregationalgorithm A such that estimates θn are not affected by mis-information propagation. If A is notconstructed properly, then mis-information can propagate in social network as explained earlier inthis chapter. Thus, from an abstract point of view, mis-information removal can be interpreted asoptimal Bayesian estimation on a directed acyclic graph with information exchange constraints.This chapter aims to address the following questions:1. Existence Problem: Under what constraints on the information flow, is complete mis-informationremoval possible?2. Design Problem: Synthesize an algorithm A such that mis-information propagation is pre-vented.3. Reconstruction Problem: If the information flow graph, Gn, is not completely known at eachtime, design an algorithm to mitigate the mis-information propagation.10For some distributions, instead of transmitting the posterior distribution, it is sufficient to broadcast the sufficientstatistic. In the finite case, the sufficient statistic and posterior distribution are similar.212.2. Modeling Information Flow in Social Networks2.1.2 Main Results and Organization of ChapterThis chapter considers mis-information propagation through a social network with arbitrary net-work topologies. Each social group records their observation of state of nature with any arbitraryaposteriori probability distribution and any arbitrary observation noise. The recursive nature ofBayesian estimation in decentralized fusion requires careful design to cope with the possible re-useof information such that the estimates θn are equal to mis-information free estimates of the optimalscenario which is described in more detail in Section 2.2.2. In more realistic problems consideredin this chapter, there are multiple groups together with random delays in the network. For suchcases, mis-information management is a non-trivial problem. A combination of graph theory andBayesian estimation is employed to remove the mis-information removal generated by different de-lays in links. The rest of the chapter is organized as follows:• We represent information flow in a social network by a family of directed acyclic graphs inSection 2.2. The communication among social groups is modeled by information exchangeProtocol 2.1 where mis-information propagation may arise. Information exchange Proto-col 2.2 is introduced to benchmark against Protocol 2.1. From the benchmark Protocol 2.2,the mis-information removal algorithm can be specified.• Section 2.3 presents a necessary and sufficient condition (called Constraint 2.3.1) on topologyof the network that guarantees optimal, mis-information free estimates. It is shown that withthe full knowledge of information flow graph, Constraint 2.3.1 leads to an algorithm for exactmis-information removal using optimal Bayesian estimation defined in Section 2.3.• A sub-optimal algorithm is proposed in Section 2.4 to remove the mis-information associatedwith estimates of social groups when the information flow graph is not completely known ateach time.• Numerical results that show the effect of mis-information propagation and also the excel-lent performance of the proposed mis-information removal algorithms are presented in Sec-tion 2.5.2.2 Modeling Information Flow in Social NetworksSection 2.1 outlined the goals and main results of this chapter and informally described constrainedestimation over social networks. In this section, first, using a graph-theoretic notation, the followingtypes of communication protocols are presented:• Constrained information flow protocol: This protocol mimics information exchange, and in-ference in a social network where beliefs are communicated among groups (nodes). As stated222.2. Modeling Information Flow in Social Networksin Section 2.1, mis-information propagation arises in this protocol due to the abusive repeti-tion of information.• Full information flow protocol: This protocol is an ideal (and, thus, impractical) communica-tion protocol that prevents mis-information propagation. To devise an algorithm to mitigatemis-information propagation, this protocol is used as a benchmark against the constrainedinformation flow protocol, as we explain shortly.Then, we assert via Theorem 2.2.1 that the flow of information in a social network can be rep-resented by a family of time dependent Directed Acyclic Graphs (DAGs). Some essential graphtheoretic tools that will be used to formulate the mis-information propagation problem are outlinedin Appendix A.2.2.1 Constrained Information Flow ProtocolThis protocol refers to a social network where nodes aim to estimate an underlying state of natureAs described in Section 2.1, instead of raw observations, the posterior distribution of state of nature(beliefs) are broadcasted over the network. Due to the information exchange constraint in thisprotocol, the complete history of beliefs are not available at each node. It is in such a constrainedinformation flow network that mis-information propagation arises.The information exchange protocol in constrained estimation over social networks described inSection 2.1 can be summarized as following:Protocol 2.1 Constrained Information Flow Network Protocol at each node nStep 1. Observation: Node n (social group s at time k) records its private observation vector znaccording to (2.1), that is,zn ∼ Biz, x= xi,where n= s+S(k−1), see (2.4).Step 2. Interaction with other social groups: Node n, then, accesses the network for beliefs fromother social groups at previous time instants Θn.Step 3. Mis-information removal and Bayesian data fusion: Node n uses mis-information removalalgorithm together with Bayesian data fusion to combine Θn with its private observation zn andupdates its belief θn.Step 4. Transmit the updated belief: Node n, then, broadcasts the updated belief over the network.Remarks: We assume a reasonable degree of flexibility that each node deploys for broadcastingits information over the network. In Step 2 above we assumed for simplicity (to avoid collision ofinformation) that only one social group is allowed to transmit information at each time instant.232.2. Modeling Information Flow in Social Networks2.2.2 Benchmark Full Information Flow ProtocolThe goal of this chapter is to solve estimation problem (2.5) subject to the information exchangeProtocol 2.1. We now describe an idealized (and therefore impractical) Protocol 2.2 that will beused as a benchmark against Protocol 2.1. In the benchmark protocol, we assume that instead oftransmitting posterior distribution θn, each node transmits its own private observations zn and all rawobservations received over the network. In this protocol, since each node has the entire availableobservation history from previous nodes, there is no room for mis-information propagation, i.e.,there is no chance for inadvertent re-use of private observations by any node. Let Zn be the setof observations from previous nodes (recorded or received until time k at social group s wheren= s+S(k−1))11. The benchmark protocol proceeds as follows:Protocol 2.2 Benchmark Full Information Flow Network Protocol at each node nStep 1. Observation: Node n records its private observation vector zn according to (2.1).Step 2. Interaction with other social groups: Node n then accesses the network and receives theprivate observations from other nodes, Zn.Step 3. Information aggregation and Bayesian data fusion: Node n uses zn, Zn to compute it’s beliefyn = p(x|zn,Zn).Step 4. Transmit augmented data: Node n, then, broadcasts the set of observations Zn∪ {zn} overthe network.Since Protocol 2.2 serves as an idealized benchmark for designing mis-information removal al-gorithms, its efficiency is irrelevant. However, it can be made more efficient by requiring each nodeto only broadcast the observations which have not been already integrated in the belief computedby the other nodes in the network. By comparing the posterior distribution of state of nature in Pro-tocol 2.1 with the same in benchmark Protocol 2.2, mis-information removal algorithm is specifiedin Section 2.3.With Protocol 2.2 defined above, the benchmark estimation problem can be summarized as:Estimate x with prior π0 subject to:⎧⎪⎨⎪⎩Gn = (Vn,En) is given.zn ∼ Biz, x= xi, (observation process)yn =F (Zn,zn), (standard estimation problem)(2.6)The estimates yn are free of mis-information because node n uses all available raw observations(and not estimates) form other previous nodes. Note that estimation problem (2.5) is a dynamicconstrained estimation on the directed acyclic graphs. One set of constraints are on the topologyof the information flow graph (which are also valid for the estimation problem (2.6) in benchmarkprotocol). However, there exists another constraint on the algorithm A (which does not hold for11In the example (3) in Section 2.1, Z⌊1,3⌋ = {z⌊1,2⌋,z⌊2,2⌋,z⌊1,1⌋,z⌊2,1⌋}.242.3. Optimal Mis-information Propagation Removal Algorithmthe benchmark scenario). As will show in Section 2.3, the algorithm A has a specific linear formwhich is depicted by (2.11) in Section 2.3.1.2.2.3 Modeling Time Evolution of the Information FlowBefore proceeding, we refer the interested reader to Appendix A for a summary of graph theoreticdefinitions that will be used to model information flow graph in Protocols 2.1 and 2.2. Recall fromSection 2.1, Gn denotes the time-dependent information flow graph of the social network. Eachnode n′ in Gn represent a social group s′ at time k′ such that n′ = s′+ S(k′ − 1), see (2.4). Eachdirected edge of Gn between node i and node j shows that the information (belief in Protocol 2.1or observation in Protocol 2.2) of node i is available at node j in the social network represented byGn. Note that Gn is always a sub-graph of Gn+1. Therefore we can use a family of time dependentDirected Acyclic Graphs (DAGs)12 to model the time evolution of the information flow in the socialnetwork. Indeed, the following proposition shows that information flow in a (group-based) socialnetwork can always be represented by a family of DAGs.Theorem 2.2.1. The information flow in a social network defined in Protocol 2.1 and Proto-col 2.2 comprising of S groups up and until time k can be represented by a family of DAGs G ={Gn}n∈{1,...,N} where N = Sk. Each DAG Gn = (Vn,En) represents the information flow between then first nodes, where the generic node n is defined by (2.4).Proof. The proof is presented in in Section 2.7.1.The Adjacency and the Transitive Closure matrices of Gn are denoted by An and Tn, respectively(see Appendix A for detail). Because of the fact that the information of each node cannot travelbackwards in time, An and Tn are upper triangular matrices.Memory Requirement: In this chapter, we assume that beliefs are valid for duration of K time-instants, where K is a positive integer, i.e., social groups at time k do not remember beliefs generatedbefore time k−K. This means that the size of the adjacency and transitive closure matrices of eachgraph in G is limited to N = SK.2.3 Optimal Mis-information Propagation Removal AlgorithmThis section considers the estimation problem (2.5) with constrained information flow Protocol 2.1in Section 2.2. The aim is to devise the information aggregation algorithm A in (2.5) such that theestimates of (2.5) with Protocol 2.1 are equal to those of (2.6) with the benchmark Protocol 2.2, i.e.,yn = θn. Also, we provide necessary and sufficient conditions on the information flow graph underwhich the mis-information removal is possible.12see Appendix A252.3. Optimal Mis-information Propagation Removal Algorithm2.3.1 Optimal Combination Scheme in Constrained Information Flow ProtocolConsider the estimation problem (2.5) on directed acyclic graphs where social groups deploy Proto-col 2.1. In this section, we address the following question: How should each social group combineits private observation with the received information (beliefs) from the network so that its updatedbelief is misinformation free?To answer this question, consider estimation problem (2.6) with the idealized benchmark Proto-col 2.2, where the set of raw observations are transmitted over the network and, thus, the estimatesare mis-information free. In this scenario, since the history of all observations are available at eachnode and these observations are independent, the standard Bayesian update is used to evaluateyn = p(x|Zn,zn).Estimates yn are free of mis-information, therefore, to prevent mis-information propagation in con-strained information flow Protocol 2.1, the information aggregation algorithm A should devisedsuch thatp(x|Θu(Gn),zn) = p(x|Zv(Gn),zn), for n= 1,2, . . . . (2.7)So, the first step in building the optimal information aggregation algorithm is to compute theestimates yn of the benchmark protocol in terms of raw observations and the information flow graph.Before proceeding, let us defineŷ f ulln = log(yn) = log(p(x|Zn,zn)) for n= 1,2, . . . . (2.8)Since the logarithm is monotonically increasing, surjective function, we can work with logarithm ofestimates ŷf ulln instead of yn13. The following proposition gives an expression for the estimates ŷf ullnin full information flow network.Theorem 2.3.1. Consider estimation problem (2.6) with information exchange Protocol 2.2 of Sec-tion 2.2. The mis-information free estimate at node n is:ŷ f ulln = (tn⊗ Id)ι1:n−1+ ιn, (2.9)where ιn denotes log(p(zn|x)) and ι1:n−1 ! [ι ′1, . . . , ι ′n−1]′ ∈ R(n−1)d×1. Here ⊗ denotes Kronecker(tensor) product and Id denotes the d×d identity matrix. Recall that tn defined in (A.6) as the firstn−1 elements of the nth column of Tn.Proof. The proof is in Section 2.7.213Because the logarithm of product is the sum of individual logarithms, it is more convenient to use logarithm ofestimates in Bayesian estimation context.262.3. Optimal Mis-information Propagation Removal AlgorithmAccording to Theorem 2.3.1, the optimal mis-information free estimates can be expressed asa linear combinations of ιi = log(p(zi|x)) in terms of Transitive Closure Matrix Tn of graph Gn.Eq. (2.9) is quite intuitive. In information exchange Protocol 2.2, a node broadcasts its own rawobservations and also passes the observations received from others nodes so that each node hasthe entire history of all possible observations, i.e., if there exists a path from node i to node n, theobservation of node i, zi, is available at node n. Therefore, the estimate ŷf ulln of node n is sum ofthe information from all nodes that are connected to node n (by single-hop or multi-hop paths) ininformation flow graph Gn.Evaluating the estimates of benchmark Protocol 2.2, we are now ready to tackle the algorithmdesign problem. Consider the estimation problem (2.5) with information exchange Protocol 2.1 ofSection 2.2. The aim is to devise algorithm A such that (2.7) holds. Defineŷn = logθn and ŷ1:n−1 ! [ŷ′1, . . . , ŷ′n−1]′. (2.10)With the n−1 dimensional vector wn below denoting a weight vector (a more precise constructionis given in Eq. (2.11) below), and ιn defined in (2.9), we propose the following optimal combinationscheme:ŷn = (wn⊗ Id)ŷ1:n−1+ ιn, (2.11)Before describing why estimations ŷn in (2.11) are free of mis-information, we introduce thefollowing constraints on the n−1 dimensional weight vector wn.Constraint 2.3.1. Consider the estimation problem (2.5) with information exchange Protocol 2.1.Then the set of weights {wn}n∈{1,...,N} in (2.11) satisfies the topological constraint for constrainedflow network if ∀ j ∈ {1, . . . ,n−1} and ∀n ∈ {1, . . . ,N}an( j) = 0 =⇒ wn( j) = 0, (2.12)where an is defined in (A.6).Constraint 2.3.1 imposes a topological condition on the weight vector wn Assuming that Con-straint 2.3.1 holds, Theorem 2.3.2 below asserts that estimates computed from (2.3.1) are identicalto the optimal, mis-information free estimates of information exchange Protocol 2.2.Theorem 2.3.2. Consider estimation problem (2.5) with information exchange Protocol 2.1 of Sec-tion 2.2. Then the set of weights {wn}n∈{1,...,N} in (2.11) satisfies the topological constraint forconstrained flow network if ∀ j ∈ {1, . . . ,n−1} and ∀n ∈ {1, . . . ,N}. Then the following optimality272.3. Optimal Mis-information Propagation Removal Algorithmproperty holds for the estimates ŷn in the constrained information flow network:ŷn = ŷf ulln ⇐⇒{wn = tn((Tn−1)′)−1and wn satisfy Constraint 2.3.1,(2.13)where ŷn and ŷf ulln are defined in (2.10) and (2.8) respectively. Recall that tn defined in (A.6) as thefirst n−1 elements of the nth column of Tn.Proof. The proof is presented in Appendix 2.7.3In words: A necessary and sufficient condition for ŷn = ŷf ulln to be held is that the weight vectorwn satisfies wn = tn((Tn−1)′)−1. As a result of constrained information flow, wn should simultane-ously satisfy topological constraint (2.12) to ensure that required information for mis-informationremoval is available at node n.Discussion: According to Theorem 2.3.1, the mis-information free estimates of full informationflow protocol at node n is linear in the estimates computed by the previous nodes i.e. ι1:n−1 and theinformation collected ιn at node n. This linearity enables us to remove the mis-information in theconstrained information flow protocol by employing the optimal combination scheme (2.11) withn−1 dimensional weight vector wn defined in Theorem 2.3.2 . However, according to the commu-nication topology described by Adjacency Matrix An, some nodes do not transmit their estimatesto node n. Consequently, the estimates of those nodes are not available at node n and must not beused to compute the estimate ŷn. An obvious way to introduce this constraint in the estimation of ŷnis to set the weight related to an unavailable estimate to zero. This is Constraint 2.3.1. Therefore,it is also clear that (2.12) is a necessary condition for exact mis-information removal. Now, as-suming that all the estimates of the n−1 latest nodes are optimal estimates free of mis-informationi.e. ŷ1:n−1 = ŷf ull1:n−1 and are available at node n, is it possible to find a vector wn so that ŷn is equalto the optimal estimates free of mis-information i.e. ŷf ulln ? Theorem 2.3.2 provides the answer,with wn = tn((Tn−1)′)−1. The non-zero elements of wn show the nodes whose estimates should beavailable at node n to remove the mis-information. But we know that due to the topology of thegraph some of the estimates form previous n−1 nodes are not available at node n. Constraint 2.3.1basically ensures that the essential estimates (to remove the mis-information) from previous nodesare available at node n and gives a necessary and sufficient condition on the topology of the graphfor ŷn = ŷf ulln . In Section 2.5.1, we give more intuition on Constraint 2.3.1 by a simple example.282.4. Sub-optimal Mis-information Removal Algorithm Without Complete Knowledge of Information Flow Graph2.4 Sub-optimal Mis-information Removal Algorithm WithoutComplete Knowledge of Information Flow GraphSo far in this chapter, an optimal information aggregation scheme is proposed for estimation prob-lem (2.5) with Protocol 2.1. Theorem 2.3.2 asserts that the estimates obtained via optimal aggrega-tion scheme (2.11) are equal to those of the benchmark Protocol 2.2. Recall that mis-informationpropagation arises due to the abusive repetition of information received from other nodes. This hap-pens at one node, for example node n, when there exists a node in information flow graph Gn withtwo or more links to node n. When, the information flow graph is known, each node can identifythe origin of mis-information propagation and, thus, remove that (under the assumptions of Con-straint 2.3.1) via optimal information aggregation scheme (2.11). However, this scheme requiresthe full knowledge of the information flow graph Gn. In this section, we relax that assumption andpropose a mis-information removal algorithm for the arbitrary case that each node does not knowthe path of the received message. Instead, the ‘expected adjacency matrix’ of the information flowgraph is known at all nodes. Our goal here, is to devise a sub-optimal information aggregationscheme based on (2.11) to reduce the effect of mis-information propagation where the informa-tion flow graph is unknown. Before presenting the estimation problem when information flow isunknown, let’s take a closer look into the expected adjacency matrix of a graph. Recall that the ad-jacency (or connectivity) matrix of a graph shows the connections among nodes in a graph, i.e., theelement on row i and column j of the adjacency matrix is equal to one if there exists a (single-hop)link from node i to node j, otherwise, it is zero. Similarly, the expected adjacency matrix of a graphis defined as follows:A˜n = [a˜i,i] for 1≤ i, j ≤ n,where a˜i, j denotes the probability of having a link from node i to node j. We assume that insteadof adjacency matrix, A˜n is known at node n. The estimation problem when Gn is unknown can besummarized in the following abstract form:Estimate x with prior π0 subject to:⎧⎪⎨⎪⎩A˜n is given,zn ∼ Biz, x= xi,θn = p(x|Θn,zn, A˜n) =B(Θn,zn, A˜n),(2.14)The aim is to devise an algorithm to reduce the effect of mis-information propagation in estimationproblem (2.14). Knowing A˜n, nodes which are more likely to have multiple paths to node n can beidentified. Note that in the optimal information aggregation scheme (2.11), the weight vector wnis the only term which depends on the information flow graph. More specifically, Theorem 2.3.2computes the optimal weight vector in terms of the transitive closure matrix of the information flow292.5. Numerical Examplesgraph Tn. In estimation problem (2.14), Gn (and consequently Tn) is unknown as opposed to theestimation problem (2.5). Therefore, the main challenge in the scenario where Gn is unknown, is toapproximate Tn and, then, compute wn in terms of the expected transitive closure matrix.2.4.1 Sub-optimal Combination SchemeHaving known the expected adjacency matrix, our sub-optimal approach to reduce the effect ofmis-information at each node, for example node n, consists of three steps:• First, we approximate the transitive closure matrix (probability of having a path between eachpair of nodes in the graph) from the expected adjacency matrix A˜n, that is,T˜n(i, j) = p(Tn(i, j) = 1), 1≤ i, j ≤ n,where p(·) is used to denote probability of an event.• Second, From T˜n, nodes that are more likely to have a path to node n can be identified. Fromthis, the hard estimation of the transitive closure matrix can be constructed, that is,Tn(i, j) ={1 T˜n(i, j) > λth0 otherwise, (2.15)where λth is threshold and Tn is the hard estimate of transitive closure matrix of Gn.• Having computed Tn, Algorithm A , (in the estimation problem (2.5)) is used to reduce effectof the mis-information propagation.yn = (wn⊗ Id)y1:n−1+ ιn, (2.16)where weight vector wn satisfies wn = tn((Tn−1)′)−1and simultaneously satisfies topolog-ical Constraint 2.3.1. Here, Tn is defined in (2.15) and tn is the first n− 1 elements of “n”thcolumn of Tn and ιn = log(p(zn|x)).Algorithm 2.3 summarizes the sub-optimal mis-information removal problem without knowledgeof information flow network.2.5 Numerical ExamplesIn this section, we first provide an example to give more intuition on the topological Constraint 2.3.1which is required for exact mis-information removal. Then, the performance of the optimal mis-information removal algorithm presented in Theorem 2.3.2 is compared with that of the full infor-302.5. Numerical ExamplesAlgorithm 2.3 Algorithm for mis-information removal in step 3 of Protocol. 2.1For n= 1,2, . . .1. Reconstruct the weighted adjacency matrix of the information flow, A˜n.2. Compute T˜n (2.4.1) and Tn using threshold, λth (2.15).3. Using tn, compute wn = tn((Tn−1)′)−1.4. Update the estimates yn = (wn⊗ Id)y1:n−1+ ιn.mation flow communication protocol where instead of the beliefs about the state of nature, all rawobservations are transmitted over the network. Finally, the performance of the sub-optimal mis-information removal algorithm (Algorithm 2.3) is investigated in two scenarios: (i) accurate and (ii)inaccurate estimation of the information flow graph.2.5.1 Example and Intuition on Theorems 2.2.1 and 2.3.2In this subsection, we provide an example that shows the propagation of mis-information in a simplesocial network and the mis-information removal algorithm proposed in Section 2.3 to prevent that.Consider a social network consisting of two groups with the following information flow graph untiltime K = 3123456Figure 2.1: Example of constrained information flow network, S = 2 and K = 3. Circles representa social group at a specific time indexed by (2.4) in the social network and each edge depicts acommunication link between two nodes.There are S = 2 groups and the total time duration K = 3. From (2.4), the element indexedby n = s+ 2(k− 1) in Fig.2.1, represents node s at time k. According to Theorem 2.2.1, we canbuild the family of N = SK = 6 DAGs, namely, {G1,G2,G3,G4,G5,G6}. Based on the informationflow in Fig.2.1, since nodes 1 and 2 do not communicate (see Fig.2.1), clearly A1 and A2 are zeromatrices. Nodes 1 and 3 and Nodes 2 and 3 communicate, hence A3 has two ones; and so on. Theadjacency matrix associated with graphs G1, G2, G3, G4 and G5 are:312.5. Numerical ExamplesA1 =[0], A2 =[0 00 0], A3 =⎡⎢⎣0 0 10 0 10 0 0⎤⎥⎦, A4 =⎡⎢⎢⎢⎢⎣0 0 1 10 0 1 10 0 0 00 0 0 0⎤⎥⎥⎥⎥⎦, A5 =⎡⎢⎢⎢⎢⎢⎢⎣0 0 1 1 10 0 1 1 10 0 0 0 10 0 0 0 10 0 0 0 0⎤⎥⎥⎥⎥⎥⎥⎦.The transitive closure matrices Tn are obtained using (A.4). Using (A.4), we derive the transitiveclosure matrices from the adjacency matrices associated with graph G1, G2, G3, G4 and G5:T−11 =[1],T−12 =[1 00 1],T−13 =⎡⎢⎣1 0 −10 1 −10 0 1⎤⎥⎦ ,T−14 =⎡⎢⎢⎢⎣1 0 −1 −10 1 −1 −10 0 1 00 0 0 1⎤⎥⎥⎥⎦ ,T−15 =⎡⎢⎢⎢⎢⎢⎢⎣1 0 −1 −1 10 1 −1 −1 10 0 1 0 −10 0 0 1 −10 0 0 0 1⎤⎥⎥⎥⎥⎥⎥⎦.Note that Tn(i, j) is non-zero only for i≥ j due to the causality—since information sent by a socialgroup can only arrive at another social group at a later time instant. Also note that since Gn is thesubgraph of Gn+1 with node n+1 removed, the adjacency matrix and transitive closure matrix An+1and Tn+1 contain An and Tn, respectively. Also, they are upper left n×n matrices, see Remark A.2in Section A. The weight vectors are derived from the Transitive Closure Matrices via (2.13):w2 =[0],w3 =[1 1],w4 =[1 1 0],w5 =[−1 −1 1 1].Let us examine these weight vectors. w2 means that node 2 does not use estimate from node 1.This formula is consistent with the constraint information flow because estimate from node 1 is notavailable to node 2; see Fig.2.1. w3 means that node 3 uses estimates from node 1 and 2; w4 meansthat node 4 only uses estimates from node 1 and node 2. The estimate from node 3 is not available atnode 4. As shown in Fig.2.1, the mis-information propagation occurs at node 5. The vector w5 saysthat node 5 adds estimates from nodes 3 and 4 and removes estimates from nodes 1 and 2 to avoiddouble counting of these estimates already integrated in estimates from node 3 and 4. Indeed, usingthe algorithm and the weight vector proposed in Theorem 2.3.2, the mis-information propagationis completely prevented in this example. Now consider the case that the edge between node 3 andnode 5 does not exist. In this scenario a5(2) = 0 while w5(2) ̸= 0, therefore Constraint 2.3.1 doesnot hold and exact mis-information removal is not possible.2.5.2 Numerical Examples Illustrating AlgorithmA in Estimation Problem (2.5)In this section, numerical results are given to illustrate the effect of mis-information propagationon the performance of multi-agent Bayesian estimation. The excellent efficacy of the optimal mis-information removal algorithm proposed in Section 2.3 is also corroborated in this section. We con-322.5. Numerical Examplessider a social network consisting of S= 2 different groups that aim to estimate a scalar state of naturex in the network with a given prior distribution. We simulate a social network with communicationdelays between different groups chosen randomly from {1,2, . . . ,10}. The prior distribution of x isuniform distributionU [0,4] and the observation noise is zero-mean normal distribution N(0,1). Thesample of state of nature is x∗ = 2.78. The simulation is repeated M = 100 times. At each iterationi, estimated value of state of nature for node n is computed viaxin =N∑j=1x j p(x= x j|Θn,zn). (2.17)Then, the results of all 100 iterations xln are averaged to find the conditional mean of the state ofnature x for node n as 1M ∑Ml=1 xln.0 5 10 15 20 25 30 35 40 45 502.22.32.42.52.62.72.82.9Node nMeanoftheestimatedstateofnature,E{x|Obs.}  Ful l i n formati on flowStandard Baye si an e st imat i on wi th mi s-i nformati onOptimal mi s-i nformati on removal al gor i thmFigure 2.2: The conditional mean of the state of nature given the observation in estimation withoptimal mis-information removal algorithm compared to the full information network.Fig.2.2 illustrates the effect of mis-information propagation in the Bayesian estimators. We con-sider three scenarios:(i) Full information flow protocol which is free of mis-information as discussed in Section 2.2,332.5. Numerical Examples(ii) Constrained information flow protocol with standard Bayesian filter (naive mixing of observa-tions and received information) which may contain mis-information,(iii) Constrained information flow protocol with optimal mis-information removal algorithm pro-posed in Section 2.3.As can be seen in this figure, the performance of the Bayesian estimator is ruined in the existenceof the mis-information propagation. The dash-dash line, which represents the standard Bayesianestimation without mis-information removal algorithm, converges to a slightly different value than2.78. Fig.2.2 also shows the excellent performance of the mis-information removal algorithm pre-sented in Theorem 2.3.2. This figure shows that, the expected value of x (unknown state of nature)given estimates obtained by the optimal information aggregation scheme (2.11)–depicted with dash-dot line marked with “♦”–is similar to that of the optimal mis-information free estimate of the fullinformation flow protocol depicted by the solid line. This verifies the results of Theorem 2.3.2. Wealso investigate the performance of the mis-information removal algorithm in terms of the meansquared error in estimation of state of nature, namely σn =1M ∑Ml=1(xln− x∗)2 where x∗ is the truestate of nature and xln is the estimates of node n at iteration l as in (2.17).We can see in Fig.2.3 that mean squared error associated with the estimates obtained by mis-information removal algorithm proposed in Section 2.3 (dash-dot line marked with “♦”) is lowerthan that of the constrained information flow protocol without mis-information removal proposal(dash-dash line ).2.5.3 Numerical Examples Illustrating AlgorithmB in Estimation Problem (2.14)Performance of sub-optimal Algorithm 2.3 is studied in a social network consisting of two groupsS = 2 over a duration of K = 25 instants. From (2.4), social group s at time k is represented bynode n = s+ 2(k− 1) in information flow graph. Information from one social group reaches theothers after a random delay. In our numerical study, the prior of x is uniform distribution U [0,4].Furthermore, the observation noise is zero-mean normal distribution N(0,1). The sample of stateof the nature is x= 2.78. In this study, we assume that information flow graph is not known at eachnode, but a weighted adjacency matrix (which can be considered as a noisy version of true adjacencymatrix) is available at each node. This is motivated by the fact that nodes are able to construct theadjacency matrix An from the additional information they have about communication topology.This information can be, for example, the distribution of communication delays. Therefore, eachnode can construct the weighted adjacency matrix of the information flow graph with elements(i, j) which depicts the probability that the information of a node i reaches node j. To investigatethe performance of the sub-optimal information removal algorithm, we choose the elements of the342.5. Numerical Examples0 5 10 15 20 25 30 35 40 45 5000.20.40.60.811.21.41.61.82Node nMeansquarederror  Ful l i n formati on flowStandard Baye si an e st imat i on wi th mi s-i nformati onOptimal mi s-i nformati on removal al gor i thmFigure 2.3: Comparison of the the mean squared errors of the estimates obtained by optimal mis-information removal algorithm, Bayesian estimator in full information flow network (free of mis-information), and standard Bayesian estimator in constrained information flow network (with mis-information propagation).weighted adjacency matrix, A˜n as follows:a˜i j ={ai j−βui j, ai j ̸= 00, ai j = 0(2.18)where ui j has a uniform distribution U [0,1] and β is a positive real number in (0,1). We chooseβ = 0.2 for the “accurate estimation” and β = 0.8 for the “inaccurate estimation” of the informa-tion flow graph. The estimation problem (2.14) is investigated in the following four cases:(i) Full information flow network with full knowledge of information flow graph at each time (opti-mal mis-information free scenario) which is shown by the solid line.(ii) Standard (naive in this context) Bayesian estimation in constrained information flow networkwith full knowledge of information flow graph at each time which is depicted by the dash-dash line.(iii) Optimal mis-information removal algorithm in constrained information flow network with fullknowledge of the information flow graph at each time which is shown via the dotted line markedwith “♦”.352.5. Numerical Examples(iv) Sub-optimal mis-information removal algorithm in constrained information flow network with-out knowledge of the information flow graph (using the weighted adjacency matrix with λth = 0.6)which is shown by the dash-dot line marked with “×”.Fig.2.4 illustrates the expected value of state of nature in the above four scenarios. Similar to0 5 10 15 20 25 30 35 40 45 502.22.32.42.52.62.72.82.9Node nMeanoftheestimatedstateofnature,E{x|Obs.}  Ful l i n formati on flowStandard Baye si an e st imat i on wi th mi s-i n formati onOptimal mi s-i nformati on removal al gor i thmSub-opt imal mi s-i nformati on removal al gor i thmFigure 2.4: Comparison of the conditional mean of the state of nature x given the observationsobtained by sub-optimal mis-information removal algorithm, optimal mis-information removal al-gorithm (knowing the exact information flow graph), Bayesian estimator in full information flownetwork (free of mis-information), and standard Bayesian estimator in constrained information flownetwork (with mis-information propagation) in “accurate estimation” scenario (β = 0.2).Section 2.5.2, the estimates are found by means of the Monte-Carlo simulations with M = 100iterations. As can be seen in this figure, the estimations of the state of nature with employing sub-optimal mis-information removal algorithm in “good estimation” scenario (β = 0.2 in (2.18)) isvery close to the mis-information free estimates in full information flow network and those obtainedby the optimal mis-information removal, knowing the exact adjacency matrix of information flowgraph.Mean squared errors associated with four scenarios studied in this section, are compared inFig.2.5. As can be seen in the figure, the mean squared error of estimates obtained by the sub-optimal mis-information removal algorithm is lower than the mean squared error of the estimates362.5. Numerical Examples0 5 10 15 20 25 30 35 40 45 5000.20.40.60.811.21.41.61.82Node nMeansquarederror  Ful l i n formati on flowStandard Baye si an e st imat i on wi th mi s-i nformati onOptimal mi s-i nformati on removal al gor i thmSub-opt imal mi s-i nformati on removal al gor i thmFigure 2.5: Comparison of the mean squared errors of the estimates obtained by sub-optimalmis-information removal algorithm, optimal mis-information removal algorithm (knowing the ex-act information flow graph), Bayesian estimator in full information flow network (free of mis-information), and standard Bayesian estimator in constrained information flow network (with mis-information propagation) in “accurate estimation” scenario (β = 0.2).obtained by the standard Bayesian estimation without mis-information removal algorithm. Figures2.4 and 2.5 show the excellent performance of the proposed sub-optimal algorithm when the in-formation flow graph is not completely known but a good approximation of it is available at eachtime.To study the effect of the estimated adjacency matrix of the information flow graph on theperformance of the sub-optimal algorithm for mitigating the mis-information propagation, we repeatthe simulation for β = 0.8 in (2.18) (inaccurate estimation scenario). The expected value of the stateof nature given the available information in four different scenarios described above are depicted inFig.2.6.Fig.2.7 shows the means squared errors of estimates for state of nature in four scenarios underinvestigation with β = 0.8 in (2.18). As can be inferred from Figures 2.6 and 2.7, the performanceof the sub-optimal depends on the estimated adjacency matrix of information flow graph. Whenan accurate approximation is available the performance of the sub-optimal algorithm is very close372.6. Closing Remarks0 5 10 15 20 25 30 35 40 45 502.22.32.42.52.62.72.82.9Node nMeanoftheestimatedstateofnature,E{x|Obs.}  Ful l i n formati on flowStandard Baye si an e st imat i on wi th mi s-i nformati onOptimal mi s-i nformati on removal al gor i thmSub-opt imal mi s-i nformati on removal al gor i thmFigure 2.6: Comparison of the conditional mean of the state of nature x given the observationsobtained by sub-optimal mis-information removal algorithm, optimal mis-information removal al-gorithm (knowing the exact information flow graph), Bayesian estimator in full information flownetwork (free of mis-information), and standard Bayesian estimator in constrained information flownetwork (with mis-information propagation) in “inaccurate estimation” scenario (β = 0.8).to the performance of the optimal mis-information removal algorithm. However, in the presenceof high variance noise in the estimation of the adjacency matrix of the information flow graph, theperformance of the sub-optimal algorithm for mitigation mis-information drops.2.6 Closing RemarksIn this chapter, the problem of mis-information propagation among different groups in social net-works is addressed. We considered the most general scenario with arbitrary observation noise andany priori distribution of state of nature. A sufficient and necessary condition for mis-informationremoval problem is derived based on the topology of the information flow network. Also the perfor-mance of the proposed mis-information removal algorithm is illustrated in numerical examples. Wealso proposed a sub-optimal algorithm to mitigate the mis-information propagation when the infor-mation flow network is not known. Numerical results are presented to illustrate the performance of382.7. Proof of Theorems0 5 10 15 20 25 30 35 40 45 5000.20.40.60.811.21.41.61.82Node nMeansquarederror  Ful l i n formati on flowStandard Baye si an e st imat i on wi th mi s-i n formati onOptimal mi s-i nformati on removal al gor i thmSub-opt imal mi s-i nformati on removal al gor i thmFigure 2.7: Comparison of the the mean squared errors of the estimates obtained by sub-optimalmis-information removal algorithm, optimal mis-information removal algorithm (knowing the ex-act information flow graph), Bayesian estimator in full information flow network (free of mis-information), and standard Bayesian estimator in constrained information flow network (with mis-information propagation) “inaccurate estimation” scenario (β = 0.8).the proposed algorithms.Note that data incest considered in this chapter may also arise in any set of sensors that interactover graphs (possibly with random communication delays) and employ Bayesian models for infor-mation aggregation. Examples of such sensor network setup includes decentralized target tracking,localization, and fault detection. Although this chapter is motivated by social networks, but thedata incest removal algorithm presented in Section 2.3 can applied to remove the mis-informationassociated with estimates of sensors in such setups.2.7 Proof of TheoremsHere, we present proof for propositions and results of this chapter in the order they appeared.392.7. Proof of Theorems2.7.1 Proof of Theorem 2.2.1To prove that the graph Gn = (Vn,En) from family Gn is a directed acyclic graph, we only need toshow that the adjacency matrix of Gn is an upper triangular matrix. Then from Lemma A.1, thegraph Gn is a directed acyclic graph. Suppose that vi and v j are two vertices of Gn, that is vi,v j ∈Vn.From re-indexing scheme (3.7), vi and v j represents agents si and s j at time instants ki and k j,respectively. We have vi = si+S(ki−1) and v j = s j +S(k j−1). Because of the information flow,information of each agent may become available at other agents at later time instants, a messagecannot travel back in the time! This means that if ki < k j, there should not be an edge from v j to vi,(v j,vi) /∈ En. Using re-indexing scheme if ki < k j, then vi < v j (because ki and k j are integers andsi,s j ≤ S). Therefore, we deduce thati< j⇒ (v j,vi) /∈ En. (2.19)Consequently, the adjacency Matrix is a strictly upper triangular matrix so that Gn is a DAG. Thenit follows from the construction of the DAGs that GN is a family of DAGs.2.7.2 Proof of Theorem 2.3.1In the full information flow protocol, each node uses Bayesian estimation to update the probabilitydistribution of x given the set of available observations. The following recursive equation is used ateach node to update the estimation of the probability distribution at node n+1:p(x|Zv(Gn+1),zn+1) = π0p(zn+1|x) ∏i∈Zv(Gn+1 )p(zi|x) (2.20)There is a path from each node j ∈ v(Gn) to node n, therefore from (A.6), tn( j) = 1. Thus,(tn⊗ Id)ι1:n−1 can be written as(tn⊗ Id)ι1:n−1 = ∑j∈v(Gn)ι j = ∑j∈v(Gn)log (p(z j|x)) = ∑j∈v(Gn)log(p(z j|x)) . (2.21)Taking logarithms of (2.20), yields:log(p(x|Zv(Gn),zn))= ∑j∈v(Gn)log (p(z j)|x)) . (2.22)Using (2.22) and (2.21), we can rewrite ŷf ulln and complete the proof,ŷ f ulln = ∑j∈v(Gn)log(p(z j)|x)) = ∑j∈v(Gn)log(p(z j)|x))+ log(p(zn)|x)) = (tn⊗ Id)ι1:n−1+ ιn. (2.23)402.7. Proof of Theorems(Note that we omit logπ0 which is the same for both full and constrained information flow protocolsfor simplicity.)2.7.3 Proof of Theorem 2.3.2We, first, show that the left hand side of (2.13) (i.e., ŷn = ŷf ulln ) implies the right hand side of (2.13).Start with (2.11) for ŷn and replacing ŷn with ŷf ulln yieldsŷ f ulln = (wn⊗ Id)ŷ f ull1:n−1+ ιn . (2.24)Using Theorem 2.3.1 it follows that yf ull1:n−1 = (T′n−1⊗ Id)ι1:n−1. Then, incorporating this into (2.24)yieldsŷ f ulln = (wn⊗ Id)(T ′n−1⊗ Id)ι1:n−1+ ιn . (2.25)From Theorem 2.3.1, we have ŷf ulln = (tn⊗ Id)ι1:n−1 + ιn. Equating the right hand sides of thisequation and (2.25) yields(tn⊗ Id)ι1:n−1 = (wn⊗ Id)(T ′n−1⊗ Id)ι1:n−1 =((wnT′n−1)⊗ Id))ι1:n−1 . (2.26)The last equality above follows from the distributive property of tensor products. (2.26) is true forany information vector ι1:n−1. This implies tn = wnT ′n−1 =⇒ wn = tn(T ′n−1)−1, since Tn is an uppertriangular matrix with ones on the diagonal14 and so invertible. To complete the proof that the lefthand side of (2.13) implies the right hand side, recall that the information structure for constrainedflow is such that if an( j) = 0 then certain components of the vector ŷ1:n−1 are not available to node n.If the corresponding weight wn( j) is non-zero it is impossible to reconstruct ŷn according to (2.11)to be equal to ŷf ulln . This is simply the topological constraint (2.12). Showing that the right handside of (2.13) implies the left hand side is very similar to the above proof and is omitted.14See Appendix A.413Mis-information Management Problemin Social Learning Over DirectedAcyclic Graphs3.1 IntroductionMotivated by online rating and review systems, we investigate social learning in a network whereagents interact on a time dependent graph to estimate an underlying state of nature. Agents recordtheir own private observations, then update their private beliefs about the state of nature using Bayes’rule. Based on their belief, each agent, then, chooses an action (rating) from a finite set and trans-mits this action over the social network. An important consequence of such social learning overa network is the ruinous multiple re-use of information known as data incest (or mis-informationpropagation). In this chapter, the data incest management problem in social learning context isformulated on a directed acyclic graph. We give necessary and sufficient conditions on the graphtopology of social interactions to eliminate data incest. A data incest removal algorithm is proposedsuch that the public belief of social learning (and hence the actions of agents) is not affected by dataincest propagation. This results in an online rating and review system with a higher trust rating.Numerical examples are provided to illustrate the performance of the proposed optimal data incestremoval algorithm.In social learning, agents aim to estimate the state of nature using their private observations andactions from other agents [2]. The process of updating belief by agents can be done using Bayesianmodels [1, 64] or non-Bayesian models [56, 57]. Classical social learning is used to model thebehavior of expected cost minimizer agents. Also, social learning can be generalized to of riskaverse minimizers. The resulting risk-averse social learning filter is studied in [95].In this chapter, we consider Bayesian social learning that models expected cost minimizers alongwith data incest (mis-information propagation). This results in a non-standard information patternfor Bayesian estimation. Before proceeding to the formal definition of data incest in learning oversocial networks, let us describe the social learning model.423.1. Introduction3.1.1 Social Learning Protocol on NetworkConsider a social network comprising of S agents that aim to estimate (localize) an underlying stateof nature (a random variable). Let x ∈ {x1,x2, · · · ,xX} represent a state of nature (such as qualityof a hotel) with known prior distribution π0 where X denotes the dimension of the state space. Letk = 1,2,3, . . . depict epochs at which events occur. These events comprise of taking observations,evaluating beliefs and choosing actions as described below. The index k depicts the historical orderof events and not necessarily absolute time. However, for simplicity, we refer to k as “time” in therest of this chapter. Assume that there exists a network administrator who provides the networkbelief π−⌊s,k⌋ defined in Step 5 to node s at time k. Network belief can be considered as a summaryof information received from nodes whose actions are available at node [s,k] due to the constraintsimposed by the structure of social network. The agents use the following Bayesian social learningprotocol to estimate the state of nature:Step 1. Private observations: To estimate the state of nature x, each agent records its M-dimensional private observation vector. At each time k= 1,2,3, . . ., each agent s (1≤ s≤ S) obtainsa noisy private observation z[s,k] from the finite set15 Z= {z1,z2, . . . ,zZ}with conditional probabilityBi j = p(z[s,k] = z j|x= xi). (3.1)It is assumed that the observations z[s,k] are independent random variables with respect to agent sand time k16.Step 2. Private belief: After obtaining its private observation, each agent combines its privateobservation with the network belief to evaluate its private belief about state of nature. Each agents combines its private observation z[s,k] with the network belief (which is provided by the networkadministrator) and evaluates its private belief of state of nature17. Private belief, µ[s,k], is evaluatedvia Bayesian models from the network belief and private observations, that isµ[s,k] = (µ[s,k](i),1 ≤ i≤ X),where µ[s,k](i) = p(x= xi|π−[s,k],z[s,k]). (3.2)Note that private belief of each agent is only available to herself and not to the other agents or thenetwork administrator, (that is why the term “private” is used).15The results of this chapter also apply to continuous-valued observations. We consider discrete-valued observationssince humans typically record discrete observations.16It is not necessary for agents to record observations at each time k and this does not interfere with the commonknowledge assumption in social learning where agents all know about the structure of social learning model. Agents atdifferent time instants are treated as different nodes in our graphical model. The assumption that agents record observationat each time k simplifies notation.17The scenario where agents choose their actions according to the network belief is similar to the classical sociallearning formulation [35] where actions are transmitted over the network.433.1. IntroductionStep 3. Myopic action: Based on its private belief µ[s,k], agent s at time k chooses an actiona[s,k] from a finite set A= {1,2, . . . ,A} to minimize its expected cost function (based on the currentinformation available on the network). That isa[s,k] = argmina∈AE{C(x,a)|µ[s,k]}. (3.3)Here E denotes expectation and C(x,a) denotes the cost incurred by the agent if action a is chosenwhen the state of nature is x. In the context of rating and review systems, cost function can beconsidered as the cost of loosing reputation in that review system. For example, if one under-ratea good restaurant, her reputation will be affected negatively and this is costly for her. After agents at time k records its action a[s,k], the network administrator automatically computes the updated“public belief” at this node by combining action a[s,k] with network belief π−[s,k]; that is, the publicbelief π[s,k] isπ[s,k] = (π[s,k](i),1 ≤ i≤ X),where π[s,k](i) = p(x= xi|π−[s,k],a[s,k]). (3.4)Step 4. Social network: Unlike private beliefs, public beliefs are visible to the other agents andare broadcasted over the social network18. These public beliefs are observed by other agents aftera random delay (communication delay). We model this information exchange using a family ofdirected acyclic graphs. LetG[s,k] = (V[s,k],E[s,k]), k = 1,2,3, . . . ,s= 1,2, . . . ,S, (3.5)denote a sequence of time-dependent graphs of information flow in the social network until andincluding time k. Each vertex in V[s,k] represents an agent s in the social network at time k and eachedge ([s′,k′], [s′′,k′′]) in E[s,k] ⊆ V[s,k] ×V[s,k] shows that the public belief (or action) of agent s′ attime k′ reaches agent s′′ at time k′′.Step 5. Network belief: For the past actions (those from other agents at previous time instants),the network administrator has already computed the public beliefs, see Step 3. DefineΘ[s,k] = {π[i, j]; for all [i, j] ∈V[s,k] where ([i, j], [s,k]) ∈ E[s,k]}.18We assume that multiple agents can transmit simultaneously over the network without interfering with each other.This is realistic in a social network, since the time required to exchange (broadcast) information is substantially smallerthan the time to record observations, update beliefs or take actions.443.1. IntroductionAt each node, the automated network administrator fuses all the available public beliefs (for exampleΘ[s,k]) into a single network belief; that is, the network belief isπ−[s,k] = p(x|Θ[s,k]) =A (Θ[s,k]), (3.6)where A denotes the information fusion algorithm used to aggregate the public beliefs receivedfrom the network. If algorithm A is not constructed properly, mis-information propagation (dataincest) occurs.As we will see shortly, a major issue with the above protocol with naive information aggregationin Step 5, is the inadvertent reuse of information (actions of previous agents) which makes theestimates of state of nature biased; that is data incest.3.1.2 Chapter GoalsThe above protocol models the interaction of agents in a social network that aim to estimate theunderlying state of nature x. An example is where users aim to localize a target event by tweetingthe location of the detected “target” on Twitter [132]. Another example is where the state of natureis the true quality of a social unit (such as restaurant). Online rating and review systems such as Yelpor Tripadvisor maintain logs of votes by agents (customers). Each agent visits a restaurant based onreviews on a review website such as Yelp. The agent then obtains private noisy measurement of thestate (quality of food in a restaurant). The agent then reviews the restaurant on that review website.Such a review typically is a quantized version (for example, rating) of the total information (privatebelief) gathered by the agent19. With such a protocol, how can agents obtain a fair (unbiased)estimate of the underlying state?20. The aim of this chapter is for the network administrator tomaintain an unbiased rating and review system, or alternatively modify the actions of agents, toavoid incest.From a statistical signal processing point of view, estimating the state of nature x using the abovefive-step protocol is non-standard in two ways: First, agents are influenced by the rating of otheragents, this is prior influences their posterior and hence their rating. This effect of agents learningfrom the actions (ratings) of other agents along with their own private observation is termed “sociallearning” in the economics literature. Social learning can result in an interesting phenomenon whererational agents can all end up making the same decision (herding and information cascades; [35]).Second, (and this effect is more complex), an agent might be influenced by his own rating leading19The dimension of private beliefs is typically larger than that of actions. Also, individuals tend not to provide theirprivate beliefs at the time of their further social interactions. Therefore, agents map their beliefs to a finite set of actionswhich are easier to broadcast.20Having fair estimates of quality of a social unit is a problem of much interest in business. Most of hotel managers(81%) regularly check the reviews on Tripadvisor [79]. In [109], it is found that a one-star increase in the average ratingof users in Yelp is mapped to about 5-9 % revenue increase.453.1. IntroductionExamples of Social Learning in Social NetworksTarget Localization Online Rating and Review System:x  Geographical coordinates of the target:na (action) region of detected targetAim: To estimate location of a target(action) rating of the social unit:x (State of nature) quality of the social unit:naAim: To estimate quality of a social unitFigure 3.1: Two examples of multi-agent social learning in social networks: (i) target localization,and (ii) online rating and review systems.to data incest.To explain what can go wrong with the above protocol, suppose an agent wrote a poor ratingof the restaurant on a social media site at time 1. Another agent is influenced by this rating andalso gives the restaurant a poor rating at time 2. Assume that the information exchange is modeledby the graph depicted in Fig.3.2. The first agent visits the social media site at time 3 and sees thatanother agent has also given the restaurant a poor rating - this double confirms his rating and heenters another poor rating. In a fair system, the first agent should have been aware that the rating ofthe second agent was influenced by his rating - so that first agent has effectively double counted hisfirst rating by casting the second poor rating. Data incest is a consequence of the recursive natureof Bayesian social learning and the communication graph. The data incest in a social network isdefined as the naive re-use of actions of other agents in the formation of the belief of an agent whenthese actions could have been initiated by the agent. In Figure 3.2, the fact that there exist twodistinct paths between Agent 1 at time 1 and Agent 1 at time 3 (depicted in red) implies that theinformation of Agent 1 at time 1 is double counted, thereby leading to a data incest event.The twin effects of social learning and data incest lead to non-standard information patterns instate estimation. Herd occurs when the public belief overrides the private observations and thusactions of agents are independent of their private observations. An extreme case of this is an infor-mation cascade when the public belief of social learning hits a fixed point and does not evolve anylonger. Each agent in a cascade acts according to the fixed public belief and social learning stops[35]21. Data incest results in bias in the public belief as a consequence of the unintentional re-useof identical actions in the formation of public belief of social learning; the information gathered by21There are subtle differences between an individual agent herding, a herd of agents and an information cascade; seefor example [35, 91].463.1. IntroductionFigure 3.2: Example of communication graph, with two agents (S= 2) and over three event epochs(K = 3). The arrows represent exchange of information regarding actions taken by agents.each agent is mistakenly considered to be independent. This results in over confidence and bias inestimates of state of nature. Due to the lack of information about the topology of the communica-tion graph, data incest arises in Bayesian social learning in social networks. Therefore, the Bayesiansocial learning protocol requires a careful design to ensure that data incest is mitigated. The aimof this chapter is to modify the five-step protocol presented in Section 3.1.1 such that data incestdoes not arise. As we will see in Section 3.3.4, the proposed data incest removal algorithm can beapplied to the state estimation problems shown in Fig.3.1.3.1.3 Main Results and Organization of ChapterWith the above five-step social learning protocol in social networks, we are now ready to outline themain results of this chapter:1. In Section 3.2, the data incest problem is formulated on a family of time dependent directedacyclic graphs2. In Section 3.3, a necessary and sufficient condition on the graph is provided for exact dataincest removal. This constraint is on the topology of communication delays (communicationgraph). Also examples where exact incest removal is not possible, are illustrated.3. A data incest removal algorithm is proposed for the five-step social learning protocol in Sec-tion 3.3. The data incest removal algorithm is employed by the network administrator toupdate the network belief in Step 5 of the social learning protocol of Section 3.1.122.22In this chapter we consider Bayesian estimation over a finite time horizon. We do not consider the asymptoticagreement of social learning or consensus formation in social networks. Consensus formation is asymptotic and typicallynon-Bayesian. From a practical point of view, information exchange in a social network is typically over a finite horizon.473.2. Social Learning Over Social NetworksFinally in Section 3.4, numerical examples are provided which illustrate the data incest removalalgorithm.3.2 Social Learning Over Social NetworksThe five-step social learning protocol is introduced in Section 3.1.1. We also discussed that as aresult of the loopy information exchange graph, data incest (or mis-information propagation) arisesbecause of the abusive re-use of information of the other agents. Here, with the graph theoreticdefinitions provided in Appendix A, we discuss the diffusion of information in the social networkin more details. Before proceeding, for notational simplicity, instead of [s,k], the following scalarindex n is used:n! s+S(k−1), s ∈ {1, . . . ,S}, k ∈ {1,2,3, . . .} . (3.7)Note that, in the social learning model considered in this chapter, the historical order of events isimportant and k is used to denote the order of occurrence of events in real time. Subsequently, wewill refer to n as a “node” of the time dependent communication graph Gn. Recall from Section 3.1,Gn = (Vn,En) denotes the time-dependent communication graph of the social network. Each noden′ in Gn represents an agent s′ at time k′ such that n′ = s′+S(k′ −1), see (3.7). Each directed edgeof Gn shows a communication link in the social network represented by Gn. This means that if(n,n′) ∈ En, agent s′ at time k′ uses the information of agent s at time k to update his private beliefabout the underlying state of nature x. Note that with the way we defined the communication graph,Gn is always a sub-graph of Gn+1. Therefore, as the following theorem proves, diffusion of actionscan be modeled via a family of time-dependent Directed Acyclic Graphs (DAGs)23.Theorem 3.2.1. The information flow in a social learning over social networks comprising of Sagents for k = 1,2,3, . . . ,K can be represented by a family of DAGs G = {Gn}n∈{1,...,N} whereN = SK. Each DAG Gn = (Vn,En) represents the information flow between the n first nodes, wherethe generic node n is defined by (3.7).Proof. The proof is similar to that presented in Section 2.7.1.The adjacency and the transitive closure matrices of Gn are denoted by An and Tn, respectively(see Appendix A). Using the adjacency and transitive closure matrices of Gn, the following twosets which have involved in formulation of data incest problem in social learning over networks, areintroduced:Fn = {k, An(k,n) = 1},Hn = {k, Tn(k,n) = 1}. (3.8)23See (A.3) in Appendix A.483.2. Social Learning Over Social NetworksProtocol 1: Constrained Social Learning in Social NetworksObservation Process Bayesian Belief UpdateChoosing Local Actionnznanm}1),(;{ ==Q niAnin p% Topology of the communication graph is not knownBayesian Social LearningnGNetwork Administrator Social Network(communication delays modeled via         with adjacency matrix        )npNetwork belief or equivalently set of public beliefs of other agents after communication delayactionPrivate beliefPublic beliefObservationnAFigure 3.3: Protocol 1: Constrained social learning in social networks described in Section 3.1.1.As a result of random (unknown) communication delays, data incest arises.In words,Fn consists of all nodes who have a single-hop link (edge) to node n andHn includes theones with either a single-hop or multi-hop (path) link to node n.3.2.1 Constrained Social Learning in Social NetworksThe five-step constrained social learning protocol introduced in Section 3.1.1, is illustrated in Fig.3.3.Note that in the constrained social learning problem, agents do not have information about the com-munication graph. This is why the term “constrained” is used. The constrained social learning insocial networks can be summarized in an abstract form asChoose action an = argmina∈AE{C′aµn} subject to: (3.9)⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩zn ∼ Biz, x= i, (observation process)Θn = {πi; i ∈Fn}, (network constraint)π−n =A (Θn), (network belief formation)µn = p(x|π−n,zn), (private belief evaluation)In (3.9) Algorithm A is the automated information fusion algorithm employed by the network ad-ministrator to compute the network belief. Due to the loopy communication graph and the recursivenature of Bayesian models, data incest (mis-information propagation) arises in constrained sociallearning if algorithm A is not designed properly. The aim of this chapter is to devise the algorithmA such that the public belief of social learning (and consequently, actions an for all n = 1,2, . . .)are not affected by data incest.The following lemma summarizes the social learning filters in (3.9).493.2. Social Learning Over Social NetworksLemma 3.2.1. Consider the five-step social learning protocol presented in Section 3.1.1 with Sagents and the communication graph Gn. Let π−n denote the network belief of social network at thisnode. Then, the social learning elements (private belief, action, and public belief) of node n withobservation vector zn = zl can be computed from (1≤ m≤ X)µn(m) = p(x= xm|Θn,zn) ∝ cπ−n(m)Bml,an = argmina∈AE{C(x,a)|Θn,zn} = argmina∈AE{C′aµn},πn(m) ∝ cπ−n(m)Z∑j=1[∏â∈A−{an}I(C′anB jπ−n <C′âB jπ−n)]Bmj, (3.10)where c is a generic normalizing constant, B j = diag(B1 j, . . . ,BX j), and I(·) is indicator function.Here, Ca is the cost vector defined as Ca = [C(1,a) C(2,a) . . . C(X ,a)]. Using a matrix notation,µn =Bzlπ−n1′.Bzlπ−n,where 1 denotes a all-one vector. Also, the public belief πn can be written asπn =Rπ−nan π−n1′.Rπ−nan π−n.Here, Rπ−nan = diag(r1, . . . ,rX), where rm = ∑Zj=1[∏â∈A−{an} I(C′anB jπ−n <C′âB jπ−n)]Bmj.Proof. The proof is presented in Section 3.7.1.Lemma 3.2.1 summarizes the social learning problem considered in this chapter. Each nodecombines the network belief with its private observation to evaluate its private belief. Based on thisprivate belief, action an is chosen such that a local cost function is minimized. Action an is usedby the network administrator to automatically update the public belief of social learning. Then, thepublic belief is transmitted over the network. As described in Section 3.1, a major issue with theabove protocol is data incest. The aim of this chapter is to devise a data incest removal algorithmfor the network administrator to deploy such that the estimates of agents are unbiased.Remark 3.2.1. In order to choose an action from the finite set of all possible actions, agents min-imize a cost function. This cost function can be interpreted in terms of the reputation of agentsin online rating and review systems. For example if the quality of a restaurant is good and anagent wrote a bad review for it in Yelp and he continues to do so for other restaurants, his repu-tation becomes lower among the users of Yelp. Consequently, other people ignore reviews of that(low-reputation) agent in evaluation of their opinion about the social unit under study (restaurant).503.3. Data Incest Removal AlgorithmTherefore, agents minimize the penalty of writing inaccurate reviews (or equivalently increase theirreputations) by choosing proper actions. This behavior is modeled by minimizing a cost function inour social learning model.Remark 3.2.2. In comparison to the public belief which can be computed by the network admin-istrator (who monitors the agents’ actions and communication graph), the agents’ private beliefscannot be computed by the network administrator. The private belief depends on the local obser-vation which is not available to the network. Note that in Step 2 of the constrained social learningProtocol 1, the results of Lemma 3.2.1 are used to compute µn using zn and π−n.Remark 3.2.3. The constrained social learning protocol is practiced in many online rating andreview systems such as Yelp or Tripadvisor243.3 Data Incest Removal AlgorithmSo far in this chapter, Bayesian social learning model and communication amongst agents in socialnetworks have been described. This section presents the main result of this chapter, namely the so-lution to the constrained social learning problem (3.9). We propose a data incest removal algorithmsuch that the public belief of social learning (and consequently the chosen action) is not affected bydata incest. To devise the data incest removal algorithm, an idealized framework is presented thatprevents data incest as we will describe shortly. Comparing the public belief of the idealized frame-work with the same of the constrained social learning, the data incest removal algorithm is specified.This data incest removal algorithm is used by the network administrator and replaces Step 5 of thesocial learning protocol presented in Section 3.1.1. A necessary and sufficient condition for the dataincest removal problem is also presented in this section.3.3.1 The Idealized Benchmark for Data Incest Free Social Learning in SocialNetworksIn this subsection, an idealized (and therefore impractical) framework that will be used as a bench-mark to derive the constrained social learning protocol, is described. In the idealized protocol, itis assumed that the entire history of actions along with the communication graph are known ateach node. Due to the knowledge about the entire history of actions and the communication graph(dependencies among actions) in the idealized framework, data incest does not arise25. DefineΘfulln = {ai; i ∈Hn}, (3.11)24In Section 3.3, a discussion is presented on the data incest removal algorithms in consumer rating web sites such asYelp" – www.yelp.com.25In the constrained social learning algorithm, each node receives the most recent public beliefs of its neighbors orequivalently the updated public belief.513.3. Data Incest Removal AlgorithmProtocol 2: Idealized Social Learning in Social NetworksObservation ProcessBayesian Belief UpdateChoosing Local Actionnzna! Topology of the communication graph is known at each nodeIdeal Bayesian Social Learning nGfullnmnG,Social Network(communication delays modeled via         with transitive closure matrix        )}1),(;{ ==Q niTa nifullnObservationPrivate beliefactionActions of all nodes who have path to node n along with dependencies among these actions nTFigure 3.4: Protocol 2: Idealized benchmark social learning in social networks. In this protocol,the complete history of actions chosen by agents and the communication graph are known. Hence,data incest does not arise. This benchmark protocol will be used to design the data incest removalprotocol.where,Hn is defined in (3.8). In the idealized framework, the network belief can be written asπ full−n = p(x|Θfulln ) ∝ π0 ∏ai∈Θfullnp(ai|x,Si), (3.12)where Si ⊂Θfulln denotes the set of actions that ai depends on them. The public belief in the idealizedsocial learning is free of data incest, as it can be inferred from (4.37). The idealized social learningin social networks (Protocol 2) is illustrated in Fig.3.4. The private belief of node n in the idealizedsocial learning is denoted by µ̂n.Note that if there exists a path between node i and node n, then action ai ∈ Θfulln . Since thehistory of actions and the dependencies among them (communication topology) are available in theidealized social learning, π full−n is free of data incest.3.3.2 The Data Incest Free Belief in the Idealized Social Learning Protocol 2The goal of this chapter is to replace Step 5 of the five-step constrained social learning protocol withan algorithm that mitigates data incest. As described earlier, to solve the data incest managementproblem, we introduced the idealized social learning that prevents data incest. By comparing thenetwork belief (or equivalently the public beliefs of agents) in the idealized social learning withthat in the constrained social learning Protocol 1, the data incest removal algorithm can be invented.Our aim is to devise algorithm A in (3.9)–also in Step 5 of the five-step social learning protocol in523.3. Data Incest Removal AlgorithmSection 3.1– such thatp(x|Θfulln ) = π−n. (3.13)Here, first, an expression is derived for the public beliefs of agents in the idealized social learningProtocol 2. Then, using that, algorithm A is constructed such that (3.13) holds; that is, data incestis mitigated. Let θ fulln denote the logarithm of public belief of node n in the idealized social learningProtocol 2, that isθ fulln = log(p(x|Θfulln ,an)). (3.14)Theorem 3.3.1 below gives an expression for θ fulln in the idealized social learning Protocol 2.Theorem 3.3.1. Consider problem (2.6) with the idealized social learning Protocol 2. The dataincest free public belief of node n (which represents agent s at time k according to re-indexingequation (3.7)) is:θ fulln =n−1∑i=1tn(i)νi+νn, (3.15)where νk denotes log(p(ak|x,Sk)).Recall that tn defined in (A.6) in Appendix A as the first n− 1elements of the nth column of Tn.Proof. The proof is presented in Section 3.7.2.As can be seen in (3.15), the (logarithm of the) public belief of node n can be written as a linearfunction in terms of νi using tn. Due to this linearity, the data incest removal algorithm can beconstructed as we will explain later in this section. Also (3.15) implies that the optimal data incestfree public beliefs of agents in the idealized social learning Protocol 2 depend on the communicationgraph explicitly in terms of the transitive closure matrix26. Basically the non-zero elements of tnshow all nodes who have a path to node n and thus their actions contribute in the formation ofthe private belief of node n. Eq. (3.15) is quite intuitive from the fact that each agent employs arecursive Bayesian filter to combine its private observation with the information received from thenetwork.3.3.3 Data Incest Removal Algorithm for Problem (3.9) With Constrained SocialLearning Protocol 1Given the expression for the public belief of the idealized social learning Protocol 2, the aim hereis to propose an optimal information aggregation scheme (that replaces Step 5) such that the publicbelief of the constrained social learning Protocol 1 is equal to the same of the idealized social26See (A.3) in Appendix A.533.3. Data Incest Removal Algorithmlearning Protocol 2 (which is free of data incest). That is, (3.13) holds or equivalentlyp(x|ai; i ∈Hn) = p(x|πi; i ∈Fn). (3.16)Similar to θ fulln , let θ̂n denote the logarithm of the after action public belief of node n,θ̂n = log(p(x|Θn,an)) . (3.17)We propose the following optimal information aggregation scheme to evaluate the public beliefusing a n−1 dimensional weight vector wn as follows,θ̂n =n−1∑i=1wn(i)θ̂i+νn, (3.18)where wn with elements wn(i) (1 ≤ i ≤ n− 1) is defined more precisely in (3.20). Using optimalinformation aggregation scheme (3.18) and (3.10) in Lemma 3.2.1, algorithm A in (3.9) can bespecified.Remark 3.3.1. The optimal information aggregation scheme (3.18) is deployed by the automatednetwork administrator in Step 5 of the social learning protocol presented in Section 3.1.1 to combinethe received information (beliefs or equivalently actions) form other nodes and compute∑n−1i=1 wn(i)θ̂i,this is the network belief at node n. Then, node n updates its private belief based on the most up-dated network belief (provided by the network administrator) and chooses its action an accordinglyand then transmits it over the network. Then, the network administrator automatically evaluates νnand updates public belief by computing θ̂n = ∑n−1i=1 wn(i)θ̂i+νn.The weight vector wn depends on the communication graph and can be computed simply by(3.20). Theorem 3.3.2 below asserts that by using the optimal information aggregation scheme(3.18) with wn defined in (3.20), data incest can be completely mitigated. However, for somenetwork topologies, it is not possible to remove data incest completely. The following constraintpresents the necessary and sufficient condition on the network for the exact data incest removal.Constraint 3.3.1. Consider the constrained social learning problem (3.9) with Protocol 1. Then,the weight vector wn used in optimal information aggregation scheme (3.18) satisfies the topologicalconstraints if ∀ j ∈ {1, . . . ,n−1} and ∀n ∈ {1, . . . ,N}bn( j) = 0 =⇒ wn( j) = 0, (3.19)where bn is defined in (A.6) and denotes the n-th column of the adjacency matrix of Gn. Basically543.3. Data Incest Removal AlgorithmConstraint 3.3.1 puts the “availability constraint” on the communication graph.This means that if information of node j is required at node n (wn( j) ̸= 0), there should be acommunication link between node j and node n (bn( j) ̸= 0). Assuming that Constraint 1 holds,Theorem 3.3.2 below ensures that the public belief of nodes in problem (3.9) with the constrainedsocial learning Protocol 1 is identical to the same of the problem (2.6) with the idealized sociallearning Protocol 2.Theorem 3.3.2. Consider problem (3.9) with the constrained social learning Protocol 1 of Sec-tion 3.2. Then using the optimal information aggregation scheme (3.18), data incest can be miti-gated by using the optimal set of weights {wn}n∈{1,...,N} given that the topological Constraint 3.3.1is satisfied. The optimal weight vector iswn = tn((Tn−1)′)−1. (3.20)By using the optimal combination scheme (3.18) and optimal weight vector defined in (3.20), thedata incest in social learning problem (3.9) is completely mitigated, that is θ̂n = θ fulln if wn satisfiestopological Constraint 3.3.1 where θ̂n and θ fulln are defined in (3.17) and (3.14) respectively. Recallthat tn is defined in (A.6) as the first n−1 elements of the nth column of Tn.Proof. The proof is presented in Section 3.7.3.Theorem 3.3.2 proves that if wn = tn((Tn−1)′)−1then θ̂n = θ fulln , that is, data incest (mis-information propagation mitigated).Using the optimal information aggregation scheme (3.18), the five-step Bayesian social learningprotocol in Section 3.1.1 with data incest removal algorithm can be summarized as553.3. Data Incest Removal AlgorithmAlgorithm 3.4 Constrained Bayesian social learning with data incest removal algorithm at eachnode nStep 1. Observation process: Private observation vector zn is obtained according to (3.1).Step 2. Private belief: Node n accesses the network and evaluates its private belief according to(3.2) using the most recent network belief at node n,µn =Bzlπ−n1′.Bzlπ−n.Step 3. Myopic action: Action an is chosen via (3.3), that isan = argmina∈AE{C′aµn}.Then, the automated network administrator compute the public belief at node n which is defined in(3.4), that isπn =Rπ−nan π−n1′.Rπ−nan π−n.Step 4. Social network: Social network model is similar to the same in Step 4 of the protocolpresented in Section 3.1.1.Step 5. Network belief update: The automated network administrator evaluates the network beliefusing Θn = {πi, i ∈Hn} and the optimal weight vectorwn = tn((Tn−1)′)−1.Discussion of topological constraint (3.19): The non-zero elements of wn are correspondingto the nodes whose information are required at node n to remove data incest. This imposes a topo-logical constraint on the communication graph. If wn( j) is non-zero, this means that information ofnode j is needed at node n and there should be an edge in Gn that connects node j to node n, this isthe topological Constraint 3.3.1. Constraint 3.3.1 ensures that the essential elements for data incestremoval are available at node n and Theorem 3.3.2 specifies the exact data incest removal algorithm.From Theorem 3.3.2, it is simple to show that Constraint 3.3.1 is a necessary and sufficient condi-tion for data incest removal in learning problem (3.9). Consider two examples of communicationgraph shown in Figure 3.5.The optimal weight vector at node 5 for both networks of Fig.3.5 computed from (3.20) isw5 = [−1,−1,1,1]. This means that there should be a link between node 2 and node 5 for exactdata incest removal according to the topological constraint (3.19). Hence, Constraint 3.3.1 doesnot hold for the network of Figure 3.5b, while the topological constraint is satisfied in networkdepicted in Fig.3.5a. Also as it is clear from the network shown in Fig.3.5a, there is no need for thecommunication graph to be a tree.563.3. Data Incest Removal Algorithm123456(a)123456(b)Figure 3.5: Two examples of networks: (a) satisfies the topological constraint, and (b) does notsatisfy the topological constraint.3.3.4 Discussion of Data Incest Removal in Social LearningHere, we discuss the application of data incest removal Algorithm 3.4 (presented in Theorem 3.3.2)in two examples of multi-agent state estimation problem which are presented in the introductorypart of this chapter (i) online rating and review systems, and (ii) target localization using social net-works, see Fig.3.1. Both problems can be formulated using the five-step constrained social learningprotocol presented in Section 3.1.1. As illustrated in Fig.3.6, agents observe the underlying stateof nature in noise and practice social learning to choose an action such that a local cost function isminimized. But as a result of unknown communication graph and the recursive nature of Bayesianestimators, data incest or abusive re-use of information occurs. To mitigate data incest, the net-work administrator plays an intermediating role. Instead of transmitting the communication graphand complete history of actions, the network administrator monitors all the information exchangesand provides the data incest free network belief of social learning at each node. To compute thedata incest free public belief, the network administrator uses the optimal information aggregationscheme (3.18) with the optimal weight vector wn, see (3.20). Using the most updated public beliefand its own private observation zn, node n evaluates its private belief. Based on this private belief(which is free of data incest), action an is chosen and transmitted it over the network. Given that thecommunication graph satisfies the topological Constraint 1, Theorem 3.3.2 ensures that by means573.4. Numerical ExamplesFigure 3.6: Data incest removal algorithm employed by network administrator in the state estimationproblem over social network. The underlying state of nature could be geographical coordinates of anevent (target localization problem) or reputation of a social unit (online rating and review systems).of the optimal weight vector wn, action an is not affected by data incest and, therefore, performanceof state estimation process is improved.3.4 Numerical ExamplesIn this section, numerical examples are given to illustrate the performance of data incest removalAlgorithm 3.4 presented in Section 3.3. As described in the five-step protocol of Section 3.1, agentsinteract on a graph to estimate an underlying state of nature (which represents the location of atarget event in target localization problem, or the reputation of a social unit in online rating andreview systems). The underlying state of nature x is a random variable uniformly chosen fromX= {1,2, · · · ,20}, and actions are chosen from A= {1,2, . . .10}. We consider the following threescenarios for each of four different types of social networks:1. Constrained social learning without data incest removal algorithm (data incest occurs) de-picted with dash-dot line2. Constrained social learning with Protocol 1 with data incest removal algorithm depicted withdashed line3. Idealized framework where each node has the entire history of raw observations and thusdata incest cannot propagate. This scenario is only simulated for comparison purposes and isdepicted by solid line.The effect of data incest on estimation problem and the performance of the data incest removalalgorithm, proposed in Section 3.3, is investigated for the networks shown in Fig.3.7.583.4. Numerical ExamplesWe first consider a communication graph with 41 nodes. The communication graph under study,which is shown in Fig.3.7a, satisfies the topological constraint (3.19). The action of node 1 reachesall other nodes and node 41 receives all actions of previous 40 nodes (some edges are omitted fromthe figure to make it more clear).(a) (b)(c)Figure 3.7: Three different communication topologies: (a) the communication graph with 41 nodes,(b) agents interact on a fully interconnected graph and the information from one agent reach otheragents after a delay chosen randomly from {1,2} with the same probabilities, (c) star-shaped com-munication topology with random delay chosen from {1,2}.As can be seen in Fig.3.8, data incest makes agents’ actions in the constrained social learningwithout data incest removal different from the same in the idealized framework. Also Fig.3.8 cor-roborates the excellent performance of data incest removal Algorithm 3.4. As illustrated in Fig.3.8,the actions of agents in social learning with data incest removal algorithm are exactly similar tothose of the idealized framework without data incest. The social learning problem over the graphshown in Fig.3.7a is simulated 100 times to investigate the difference between the estimated stateof nature with the true one (x = 10). The estimates of state of nature (obtained in three differentscenarios discussed in the beginning of the section) are depicted in Fig.3.9. As can be seen fromthe figure, the estimates obtained with data incest removal algorithm are very close to data incest593.4. Numerical Examplesfree estimates of Scenario (iii). The bias in estimates in presence of data incest is also clear in thisfigure.In the next simulation, a different communication topology is considered. We repeat the simu-lation for a star-shaped communication graph comprising of six agents (S= 6) at four time instants,K = 4, so the total number of nodes in the communication graph is 24, see Fig.3.7c. The commu-nication delay is randomly chosen from {1,2} with the same probabilities. We simulated the sociallearning in three different scenarios discussed above, to investigate the effect of data incest on theactions and the estimates of agents in the star-shaped social network. The actions chosen by nodesare depicted in Fig.3.10. As can be seen from Fig.3.10, using the data incest removal algorithm,the agents’ actions in the constrained social learning with Protocol 1 are very close to those of theidealized social learning with Protocol 2 which are free of data incest. Also the estimates of state ofnature are very close to the true value of state of nature compared to the constrained social learningwithout data incest removal algorithm. Also note that the effect of data incest, as expected, in thiscommunication topology is different for each agent; the agent who communicates with all othernodes is affected more by data incest. This fact is verified in Figures 3.10 and 3.11.In the third example, a complete fully interconnected graph (where agents communicate withall other agents) is considered. In this example, action of each agent becomes available at all otheragents after a random delay chosen from {1,2} with the same probabilities. The agents’ actionsare shown in Fig.3.12. Similar to the star-shaped graph, using data incest removal Algorithm 3.4makes the agents’ actions in the constrained social learning very similar to those of the idealized(data incest free) framework. Also, the excellent performance of data incest removal Algorithm 3.4in the estimation problem is depicted in Fig.3.13.We also extend our numerical studies to an arbitrary random network with five agents, S =5,K = 4. We consider a fully connected network and assume that the interaction between twoarbitrary agents (say agent i and agent j) at time k has four (equiprobable) possible statuses: (i)connected with delay 1, (ii) connected with delay 2, (iii) connected with delay 3, and (iv) not con-nected. If the link is connected with delay τ , this means that the information from agent (i) at time kbecomes available at agent j at time k+ τ . If the link is not connected, the information of agent i attime k never reaches agent j. We verify that the underlying communication graph, Gn, satisfies thetopological Constraint 3.3.1. Fig.3.14 depicts the agents’ actions in three different scenarios (withdata incest, without data incest, and with data incest removal algorithm). The simulation resultsshow that, even in this case with arbitrary network (that satisfies topological constraint), the actionsobtained by the constrained social learning with data incest removal algorithm is very close to thosein the idealized social learning. As expected, using the data incest removal algorithm, the data incestassociated with the estimates of agents can be mitigated completely, as shown in Fig.3.15.Here, we also present numerical studies to investigate the accuracy of the state estimation con-sidered in this chapter in terms of mean squared error. The mean squared error of estimates obtained603.4. Numerical Examples2 4 6 8 10 12 1445678910Node nActions  Scenario (i) with data incestScenario (ii) with data incest removal algorithmScenario (iii) without data incestFigure 3.8: Actions of agents obtained with social learning over social networks in three differentscenarios described in Section 3.4 with communication graph depicted in Fig.3.7a.2 4 6 8 10 12910111213141516Node nMeanoftheestimatedstateofnature,E{x|Actions}  Scenario (i) with data incestScenario (ii) with data incest removal algorithmScenario (iii) without data incestFigure 3.9: Mean of the estimated state of nature in the state estimation problem with social learningover social networks in three different scenarios described in Section 3.4 with communication graphdepicted in Fig.3.7a.613.4. Numerical Examples0 5 10 15 20 2545678910Node nActions  Scenario (i) with data incestScenario (ii) with data incest removal algorithmScenario (iii) without data incestFigure 3.10: Actions of agents obtained with social learning over social networks in three differentscenarios described in Section 3.4 with communication graph depicted in Fig.3.7c.0 5 10 15 20 25910111213141516Node nMeanoftheestimatedstateofnature,E{x|Actions}  Scenario (i) with data incestScenario (ii) with data incest removal algorithmScenario (iii) without data incestFigure 3.11: Mean of the estimated state of nature in the state estimation with social learning oversocial networks in three different scenarios described in Section 3.4 with communication graphdepicted in Fig.3.7c.623.4. Numerical Examples0 5 10 15 2045678910Node nActions  Scenario (i) with data incestScenario (ii) with data incest removal algorithmScenario (iii) without data incestFigure 3.12: Actions of agents obtained with social learning over social networks in three differentscenarios described in Section 3.4 with communication graph depicted in Fig.3.7b.0 5 10 15 2099.51010.51111.51212.51313.5Node nMeanoftheestimatedstateofnature,E{x|Actions}  Scenario (i) with data incestScenario (ii) with data incest removal algorithmScenario (iii) without data incestFigure 3.13: Mean of the estimated state of nature in the state estimation problem with social learn-ing over social networks in three different scenarios described in Section 3.4 with communicationgraph depicted in Fig.3.7b.633.4. Numerical Examples0 10 20 30 40 5055.566.577.588.599.510Node nActions  Scenario (i) with data incestScenario (ii) with data incest removal algorithmScenario (iii) without data incestFigure 3.14: Actions of agents obtained with social learning over social networks in three differentscenarios described in Section 3.4 with arbitrary communication graph.0 10 20 30 40 509.51010.51111.51212.51313.5Node nMeanoftheestimatedstateofnature,E{x|Actions}  Scenario (i) with data incestScenario (ii) with data incest removal algorithmScenario (iii) without data incestFigure 3.15: Mean of the estimated state of nature in the state estimation problem with sociallearning over social networks in three different scenarios described in Section 3.4 with arbitrarycommunication graph.643.5. Psychology Experimentin social learning with three different scenarios discussed in the beginning of this section (with dataincest, with data incest removal algorithm, and the idealized framework) is computed for each offour scenarios considered in our numerical studies. Fig. 3.16 depicts the mean squared error of esti-mates obtained in the first example with social learning over the communication graph of Fig. 3.7a.As can be seen from this figure, the mean squared error associated with the estimates of the con-strained social learning with data incest removal Algorithm 3.4 are lower than the those of theconstrained social learning in presence of data incest. This means that the performance of estima-tion problem with social learning is improved using data incest removal algorithm proposed in thischapter.Figures 3.17 and 3.18 show the mean squared error of estimation with communication graphspresented in Figures 3.7b and 3.7c, respectively, and Figure 3.19 depicts the same for the arbitraryrandom network with five agents and random communication delays described earlier in this section.As can be seen in these figure, as a result of herding, in star shaped and random communicationtopologies, the mean squared error of estimates is slightly (compared to the scenario without dataincest removal algorithm) more than the idealized framework at each time.3.5 Psychology ExperimentThis section presents an experimental study to investigate the learning and decision making behav-ior of individuals in a human society. Social learning is used as the mathematical basis for modelinginteraction of individuals that aim to perform a perceptual task interactively. A psychology exper-iment was conducted on a group of undergraduate students at the University of British Columbiato examine whether the decision (action) of one individual affects the decision of the subsequentindividuals. The major experimental observation that stands out here is that the participants of theexperiment (agents) were affected by decisions of their partners in a relatively large fraction (60%)of trials. We fit a social learning model that mimics the interactions between participants of the psy-chology experiment. Mis-information propagation (also known as data incest) within the societyunder study is further investigated in this experiment.3.5.1 Experiment SetupHere, a detailed description of the psychology experiment we carried out to study the learningbehavior of individuals in a human society is presented:• Experiment Date: The psychology experiment was conducted in September and October2013.• Society under study: The participants were 36 undergraduate students of the department ofPsychology of the University of British Columbia who completed the experiment for course653.5. Psychology Experiment2 4 6 8 10 120510152025Node nMeansquarederroroftheestimation  Scenario (i) with data incestScenario (ii) with data incest removal algorithmScenario (iii) without data incestFigure 3.16: Mean squared error of estimates (of state of nature) obtained with social learning withcommunication graph depicted in Fig.3.7a.0 5 10 15 20012345678910Node nMeansquarederroroftheestimation  Scenario (i) with data incestScenario (ii) with data incest removal algorithmScenario (iii) without data incestFigure 3.17: Mean squared error of estimates (of state of nature) obtained with social learning withcommunication graph depicted in Fig.3.7b (complete fully interconnected graph).663.5. Psychology Experiment5 10 15 20 25024681012Node nMeansquarederroroftheestimation  Scenario (i) with data incestScenario (ii) with data incest removal algorithmScenario (iii) without data incestFigure 3.18: Mean squared error of estimates (of state of nature) obtained with social learning withcommunication graph depicted in Fig.3.7c (star-shaped communication graph).0 10 20 30 40 50051015Node nMeansquarederroroftheestimation  Scenario (i) with data incestScenario (ii) with data incest removal algorithmScenario (iii) without data incestFigure 3.19: Mean squared error of estimates (of state of nature) obtained with social learning witharbitrary communication graph.673.5. Psychology Experimentcredit.• Experiment Setup: Participants were asked to perform a perceptual task interactively. Twoarrays of circles were given to each pair of participants, then, they were asked to judge whicharray had the larger average diameter; that is, picking their actions. On each trial, two 4× 4grids of circles were generated by randomly drawing from the radii: {20,24,29,35,42} (inpixels). The average diameter of each grid was computed, and if the means differed by morethan 8% or less than 4%, new grids were made, i.e., each trial had arrays of circles differing inthe average diameter length by 4-8%27. One participant was chosen randomly and started theexperiment by choosing an action according to his observation. Thereafter, each member sawtheir partner’s previous response (action) and his own previous action prior to making theirown judgment; this is social learning. The participants continued choosing actions until theirresponses stabilized for a run of at least three (two participants did not necessarily agree, buteach was fixed in her responses). In this experimental study, each participant chose an actionin A = {0,1}; a = 0 when she judged that the left array of circles had the larger diameterand a = 1 when her judgments was that the right array of circles had the larger diameter. Ineach experiment, judgments (actions) of participants are recorded along with the amount oftime taken to make that judgment. Fig. 3.24 shows the judgments of two participants withina group at different trials in one experiment. In this experiment, the average diameter of theleft array of circles was 32.1875 and the right array was 30.5625 (in pixels).3.5.2 Experimental ResultsThe results of our experimental study, which are summarized in Fig.3.23, are as follows:• Social learning Model: As mentioned above, the experiment for each pair of participants wascontinued until both participants’ responses stabilized. A question that may arise here is: Inwhat percentage of these experiments, an agreement is made between two participants? Theanswer to this question unveils that whether in our experiments “herding” occurred or not.In other words, it reveals that if participants exercised a social learning (influenced by theirpartners) or not. Interestingly, our experimental study shows that in 66% of total experiments(1102 among 1658), participants reached an agreement; that is herding occurred. Further,our experimental studies show that in 32% of experiments, the social learning was successfuland both participants made the right judgment after few interactions. To find a proper sociallearning model, we focus on the experiments where both participants reached an agreement.Define the social learning (SL) success rate asSL Success Rate:No. of experiments where both participants chose the correct sideNo. of experiments where both participants reached an agreement·27These numbers are based on the work in Treisman et. al. paper [142].683.5. Psychology ExperimentFigure 3.20: Two arrays of circles were given to each pair of participants on a screen. Their taskis to interactively determine which side (either left or right) had the larger average diameter. Thepartner’s previous decision was displayed on screen prior to the stimulus.In this experimental study, the state of nature belongs to x ∈ {0,1} where x= 0, when the leftarray of circles has the larger diameter and x= 1, when the right array has the larger diameter.The initial belief for both participants is considered to be π0 = [0.5,0.5]. The observationstate is assumed to be z ∈ {0,1}. We fit a social learning model to our experimental datawhich gives the same success rate as the experimental study. The social learning parameters(probability observations, Biz = p(zk = z|x = i), i ∈ {0,1},z ∈ {0,1} and the cost functionC(i,a), i ∈ {0,1},a ∈ A), obtained by exhaustive search, are as follows:Biy =[0.61 0.390.41 0.59],C(i,a) =[0 22 0].• Data incest: Here, we study the effect of data incest on the judgments of participants in ourexperimental study. Since we do not have access to the private observations of individuals(almost no one has such information!), we cannot exactly verify that whether data incestchanged the judgment of an individual in each trial of the experiment. However, two scenarioswhich are depicted in Fig.3.22 are used to find data incest events in the experiments. In these693.5. Psychology Experiment0 5 10 15 20 25 30-0.200.20.40.60.811.2TrialAction(0forLeft,1forRight)  Participant 1Participant 2Figure 3.21: Actions of two participants in a group at different trials in one experiment.two events, as can be seen in Fig.3.22, the action of the first participant at time k influencesthe action of the second participant at time k+ 1, and thus, is double counted by the firstparticipant at time k+ 2. As discussed above, since we do not have access to the privateobservations of participant, we cannot exactly say that data incest affects the action of the firstparticipant at time k+ 2 or not. However, it is clear that in these events, data incest occurs.As we expect from the communication topology between partners, data incest occurred inrelatively large percentage of trials in the experiment. More precisely, in 79% of experimentsone of the data incest events shown in Fig.3.22 occurred (1303 experiments with data incestamong 1658 experiments). Our experimental study further shows that in 21% of experiments,data incest resulted in changing the decision of one of the participants in the group, i.e., thejudgment of participant at time k+ 1 differed from her judgments at time k+ 2 and k in theevents shown in Fig.3.22. This experimental study reveals that data incest is quite commonin social learning in human societies (happened frequently even in our simple social learningsetup) and, therefore, social learning protocols require a careful design to handle and mitigatedata incest.• Discussion: Among 3316 (non-unique) participants of this experiment, 1336 participants(around 40%) did not change their judgments after observing the action of their partners,while the other 60% changed their initial judgment and were influenced by the action oftheir partners. An experimental observation that stands out here is that the individuals can bedivided into two types: (i) boundary agents who stand firm on their decisions during the trial,703.6. Closing RemarksAttempt KAttempt kAttempt K+1Attempt K+1Attempt K+2Attempt K+2DataIncestParticipant1Participant2action: 1action: 1action: 1action: 0/1action: 0/1 action: 0/1Attempt kAttemptkAttempt k+1Attempt K+1Attempt K+2Attempt K+2DataIncestParticipant1Participant2action: 0action: 0action: 0action: 0/1action: 0/1 action: 0/1Figure 3.22: Two scenarios where data incest arose in our experimental studies.i.e., their decisions are independent of decisions of the other agents and (ii) internal agentswho are affected by decisions of the other agents. Fig. 3.24 shows the sample path of twoparticipants in a group, Participant 1 is an internal node while Participant 2 is a boundarynode.To study the decision making behavior of individuals of each type, we investigate the timetaken by each participant to make his judgment. Let µjudg. and σjudg. denote the mean and thestandard deviation of the time taken by participants to make their judgments in milliseconds.The results of our experimental study, which are presented in Table I, show that the internalnodes, in average, required more time to make their judgments compared to the boundarynodes; this is quite intuitive from the fact that the boundary nodes stood firm on their deci-sions and ignored the judgment of their partners and thus required less time to make theirjudgments.Type of nodes relative frequency µjudg. σjudg.Internal 40 % 1058 ms 315 msBoundary 60 % 861 ms 403 msTable 3.1: The frequency of the internal and the boundary nodes in a community of 3316 under-graduate students of the University of British Columbia along with the statistics of the time requiredby participants (of both types) to make their judgments in milliseconds.3.6 Closing RemarksIn this chapter, the state estimation problem in social networks with social learning is investigated.State of nature could be geographical coordinates of an event (target localization problem) or quality713.6. Closing Remarks1 20102030405060708090100Percentageofexperiments  SuccesfulsociallearningDataincestresultedinchangingdecisionDataincestHerdingFigure 3.23: Social learning with data incest that is exercised by groups of students who were askedto perform a conceptual task in our experimental study.of a social unit (online rating and review system). In online rating and review systems, privacyconcerns impose a constraint on the resolution of information that users reveal to other people.People are more likely to share a lower resolution action to others rather than their detailed privateobservations28 .As discussed in the chapter, data incest arises in this setup as a result of the recursive natureof Bayesian estimation and random communication delays in social networks. We proposed a dataincest removal algorithm for the multi-agent social learning in social networks in this chapter alongwith a topological necessary and sufficient condition for data incest free estimation. The maindifference of this work with the data incest removal algorithms in Chapter 2 is that in this chapterwe considered a social learning context where only public belief of agents (which can be computeddirectly from actions) is transmitted over the network while in the previous chapter, the private beliefof agents which depends on their private observations are transmitted through the network.The results of this chapter can be applied to a scenario where the network administrator providesthe public beliefs to agents (instead of the updated network belief). In this scenario, agents combinethe received public beliefs using the optimal weight vector wn to compute the updated public belief28One of the issues that comes with proliferation of online social networks (especially content-aware recommendersystems), due to the astronomical amounts of information that these sites have about their users, is privacy. People are notusually willing to disclose their private information to a large group of audience. That’s the reason why finding a trade-offbetween accuracy and privacy of the users have been studied widely in the literature of recommender systems [47, 126,129, 151].723.7. Proof of Results1 2 3 4 5 6 7-0.200.20.40.60.811.21.4EpochAction(0forLeft,1forRight)  Participant 1 (internal node)Participant 2 (boundary node)Figure 3.24: Actions of two participants in a group at different epochs. Participant 1 can be consid-ered as an internal node and Participant 2 can be viewed as a boundary node.and then evaluate their private belief accordingly. Optimal weights (which depends on the topologyof the communication graph) and set of available public beliefs are the essential ingredients that oneneeds in order to compute the network belief. As can be seen in Figure 3.25, these ingredients canbe computed by separate units (that do not communicate with each other) and the user can aggregatethese two and compute the most updated network belief. This mean that only he has access to theupdated network belief and, thus, his privacy is preserved29.3.7 Proof of ResultsHere, we present proof for propositions and results of this chapter in the order of their appearance.3.7.1 Proof of Lemma 3.2.1Proof. We assume that each node has the most up-to-date public belief of social learning, π−n =p(x|Θn). This node records its own private observation zn = zl . The private belief isµn(m) = p(x = xm|Θn,zn). (3.21)29Another type of privacy concerns that may arise here is the fact that the network administrator can compute thenetwork belief for users, and thus, can predict the actions that users are about to make.733.7. Proof of ResultsOnline Rating and Review SystemOptimal Weights Public BeliefsTopology of the communication graph is known.Private cost function of users are known.To protect privacy of agentsactionnw),( n FŒmmpna(Node n: User of Online Rating and Review System)Figure 3.25: Optimal weights (which depends on the topology of the communication graph) andset of available public belief are computed in separate units. The user, then, can compute the mostupdated data-incest free network belief.Using Bayes’ theorem, (3.22) can be written asµn(m) = p(x= xm|Θn,zn) = cp(zn|x)p(x|Θn)= cπ−n(m)Bml. (3.22)The normalizing factor c is used to make µn a true probability mass function, that is∑Xm=1 µn(m)= 1.Expected cost given µn is equal to C′aµn thus the action an is an = argmina∈A{C′aµn}. To completethe proof, we need to compute the public belief, π+n = p(x|Θn,an). Applying Bayes’ theorem, theafter action public belief can be written asπ+n(m) = p(x= xm|Θn,an) = cp(an|Θn,x)p(x = xm|Θn)= cp(an|x,π−n)π−n(m) = cZ∑j=1p(an|x,z= z j,π−n)p(z= z j). (3.23)Knowing observations and public belief, the private belief can be computed. From the private belief,the action an is speified. Thusp(an|x,z = z j,π−n) ={1 if an = argmina∈A{C′aB jπ−n}0 if an ̸= argmina∈A{C′aB jπ−n}(3.24)743.7. Proof of Resultswhere Bj = diag(B1 j, . . . ,BX j). Using indicator function I(·), Eq. (3.24) can be reorganized asp(an|x,z= z j,π−n) = ∏â∈A−{an}I(C′anB jπ−n <C′âB jπ−n) (3.25)Substituting (3.25) in (3.23) completes the proof as followsπ+n(m) = cπ−n(m)Z∑j=1[∏â∈A−{an}I(C′anB jπ−n <C′âB jπ−n)]Bmj (3.26)3.7.2 Proof of Theorem 3.3.1Proof. The logarithm of the public belief of learning problem (2.6) with benchmark informationexchange Protocol 2, θ fulln , is log(p(x|Θfulln ,Gn)). Recall that Θfulln denotes the entire history ofactions from previous nodes who have a path to node n and Si denotes the set of all actions that aidepends on them. Also from definition of the transitive closure matrix (A.3) and tn in (A.6), thenodes who have a path to node n are corresponding to non-zero elements of tn. Because if tn(i) = 1,then there exists a path from node i to node n. Therefore, the public belief can be written asp(x|Θfulln ,an,Gn) =cp(an|Sn,x)p(x|{ai;ai ∈ Θfulln })=cπ0p(an|Sn,x) ∏ai∈Θfullnp(ai|Si,x). (3.27)Note that Bayes’ theorem is used recursively to expand p(x|{ai;ai ∈ Θfulln }) and Si includes ac-tions(from Θfulln ) into account that ai depends on them. Taking the logarithm of both sides of (3.27)yieldsθ fulln = log(p(x|Θfulln ,an,Gn))= log(cπ0p(an|Sn,x) ∏ai∈Θfullnp(ai|Si,x))= log(p(an|Sn,x))+ ∑tn(i) ̸=0log(p(ai|Si,x)) ,=n−1∑i=1tn(i)νi+νn, (3.28)where νi denotes log(p(ai|x,Si)). Note that the normalizing constant c and π0 are omitted for thesake of simplicity as they are the same for both learning problems (3.9) with the constrained sociallearning Protocol 1 and (2.6) with the benchmark Protocol 2.753.7. Proof of Results3.7.3 Proof of Theorem 3.3.2Proof. The aim here is to show that if wn = tn(T ′n−1)−1then θ̂n defined in (3.18) is exactly equal toθ fulln in (3.14). Before proceeding, let us first rewrite (3.18) and (3.14) using the following notationsθ fulln = νn+(tn⊗ Id)ν1:n−1,θ̂n = νn+(wn⊗ Id)θ̂1:n−1, (3.29)where θ̂1:n−1 ! [θ̂ ′1, . . . , θ̂′n−1]′, ν1:n−1 ! [ν ′1, . . . ,ν′n−1]′ ∈ R(n−1)d×1. Here ⊗ denotes Kronecker(tensor) product and Id denotes the d×d identity matrix.To prove Theorem 3.3.2, we first start fromθ̂n = θfulln . (3.30)Assume that (3.30) holds for all i where 1 ≤ i ≤ n. From (3.18), θ fulln can be written as (given thatEq. (3.30) holds)θ fulln = θ̂n = (wn⊗ Id)θ̂1:n−1+νn= (wn⊗ Id)θ full1:n−1+νn. (3.31)Eq. (3.30) holds for all i where 1≤ i≤ n. Therefore, θ̂1:n−1 = θ full1:n−1. From (3.14) in Theorem 3.3.1,θ full1:n−1 can be expressed asθ full1:n−1 =(T ′n−1)ν1:n−1. (3.32)Note that in the derivation of (3.32), we use the definition of tn−1 in (A.6) as the first n−2 elementsof Tn−1 and so on. Using (3.32), (3.31) can be written asθ fulln = (wn⊗ Id)(T ′n−1)ν1:n−1+νn. (3.33)From (3.14) in Theorem 3.3.1, we have another expression for θ fulln . Comparing (3.33) and (3.14)yields(tn⊗ Id)ν1:n−1 = (wn⊗ Id)(T ′n−1)ν1:n−1=((wnT′n−1)⊗ Id)ν1:n−1. (3.34)Note that in going from the first line to the second line in (3.34), the distributive property of tensorproducts is used. From (3.34) it can be inferred that tn = wnT ′n−1. As presented in Appendix A, Tn isupper triangular matrix with ones in the diagonal. Therefore Tn is invertible and wn=(tnT′n−1)−1. Tocomplete the proof we need to start from wn =(tnT′n−1)−1and obtain θ̂n = θ fulln . This part of proof763.7. Proof of Resultsis straightforward and thus omitted from the chapter. Note that the topological Constraint 3.3.1says that if bn( j) = 0 then the j−th entry of ν1:n−1 is not available to the node n and thus thecorresponding element of the weight vector wn( j) should be equal to zero as well. Also note that νnis computed by the network administrator and the data incest free public belief, π−n, is available tothe network administrator.77Part IITracking Degree Distribution inDynamic Social Networks784Tracking a Markov Modulated DegreeDistribution4.1 IntroductionDynamic random graphs have been widely used to model social networks, biological networks [42]and Internet graphs [41]. Such dynamic models can be viewed as a sequence of graphs where therandom graph at each time may depend on all the earlier graphs (snapshots of the evolving graphat earlier times) [41]. Motivated by analyzing social networks, we introduce Markov-modulatedduplication-deletion random graphs30 where at each time instant, nodes can either be added to oreliminated from the graph with probabilities that change according to a finite-state Markov chain.Such graphs mimic social networks where the interactions between nodes evolve over time accord-ing to a Markov process that undergoes infrequent jumps. An example of such a social network isthe friendship network among residents of a city, where the dynamics of the network change in theevent of a large festival.Social networks can be viewed as complex sensors that provide information about interactingindividuals and an underlying state of nature31. In this chapter, we consider a dynamic social net-work where at each time instant one node can join or leave the network. The probabilities of joiningor leaving evolve according to the realization of a finite state Markov chain that represents the stateof nature. This chapter presents two results. First, motivated by social network applications, theasymptotic behavior of the degree distribution of the Markov-modulated random graph is analyzed.Second, using noisy observations of nodes’ connectivity, a “social sensor” is designed for trackingthe underlying state of nature as it evolves over time.4.1.1 Chapter GoalsAs explained above, in this chapter, Markov-modulated dynamic random graphs are introduced tomimic social networks where the evolution of the network is varying over time. The most importantparameter of a network that characterizes its structure is the degree distribution. It yields useful30The duplication-deletion procedure for Markov-modulated random graphs is described in Section 4.2.31For example, real-time event detection from Twitter posts is investigated in [132] or the early detection of contagiousoutbreaks via social networks is studied in [40].794.1. Introductioninformation about the connectivity of the random graph [10, 86, 116]. For example, if a majorityof nodes in the random graph have relatively high degrees, the graph is highly connected and amessage can be transferred between two arbitrary nodes with shorter paths. However, if a majorityof nodes have smaller degrees then for transmitting a message throughout the network, longer pathsare needed, see [80]. The degree distribution can further be used to investigate the diffusion ofinformation or disease through social networks [108, 146]. The existence of a “giant component”32in complex networks can be studied using the degree distribution. The size and existence of a giantcomponent has important implications in social networks in terms of modeling information propa-gation and spread of human disease [62, 115, 118]. The degree distribution is also used to analyzethe “searchability” of a network. The “search” problem arises when a specific node in a networkfaces a problem (request) whose solution is at other node, namely, destination (e.g., delivering a let-ter to a specific person, or finding a web page with specific information) [4, 146]. The searchabilityof a social network [146] is the average number of nodes that need to be accessed to reach the desti-nation. Degree distribution is also used to investigate the robustness and vulnerability of a networkin terms of the network response to attacks on its nodes or links [33, 76]. The papers [148, 149]further use degree-dependent tools for classification of social networks.The fist goal of this chapter is to provide a degree distribution analysis that allows us to deter-mine the relation between the structure of the network (in terms of connectivity) and the underlyingstate of nature. Indeed, it will be shown in Section 4.3 that there exists a unique stationary degreedistribution for the Markov-modulated graph for each state of the underlying Markov chain. It thussuffices to estimate the degree distribution in order to track the underlying state of nature. The sec-ond goal of the chapter is to propose a stochastic approximation algorithm to track the empiricaldegree distribution of the Markov-modulated random graph. In particular, our goals are to addressthe following two questions in Section 4.4:• How can a social sensor estimate (track) the empirical degree distribution using a stochasticapproximation algorithm with no knowledge of the Markovian dynamics?• How accurate are the estimates generated by the stochastic approximation algorithm whenthe random graph evolves according to the duplication-deletion model with Markovian switch-ing?By tracking the degree distribution of a Markov-modulated random graph, we can design a socialsensor to track the underlying state of nature using the noisy measurements of nodes’ connectivity.32A giant component is a connected component with size O(n), where n is the total number of vertices in the graph.If the average degree of a random graph is strictly greater than one, then there exists a unique giant component withprobability one [41], and the size of this component can be computed from the expected degree sequence.804.1. Introduction4.1.2 Main Results and Organization of ChapterSection 4.2 describes the construction of Markov-modulated duplication-deletion random graphs.Section 4.3 provides an asymptotic degree distribution analysis for the non-Markov modulatedcase of two different scenarios: (i) fixed size duplication-deletion random graph, and (ii) infiniteduplication-deletion random graph. Theorem 4.3.1 in Section 4.3.1 asserts that the expected de-gree distribution of the fixed size Markov-modulated random graph at each time can be computedin terms of the expected degree distribution of the graph at the previous time and the dynamics ofthe graph via recursive equation (4.9). Section 4.3.2 extends the results of Section 4.3.1 to infiniterandom graphs. Theorem 4.3.2 parameterizes the degree distribution of such a graph by the powerlaw exponent which depends on the dynamics of the graph.Section 4.4 considers the problem of adaptively estimating the degree distribution of a fixed sizeMarkov-modulated duplication-deletion random graph given observations of the degree distribution.A stochastic approximation algorithm is presented for tracking the degree distribution as it evolvesover time. In particular, Section 4.4 presents three results regarding the tracking performance of thestochastic approximation algorithm:• Mean square error analysis: Theorem 4.4.1 analyzes the asymptotic mean square error be-tween the expected degree distribution and the estimate obtained via the stochastic approx-imation algorithm. Deriving this result uses error bounds on two-time scale Markov chainsand perturbed Lyapunov function methods.• Weak convergence analysis: Theorem 4.4.2 shows that the asymptotic behavior of the stochas-tic approximation algorithm converges weakly to the solution of a switched Markovian ordi-nary differential equation.• Functional central limit theorem for scaled tracking error: Finally, Theorem 4.4.3 investi-gates the asymptotic behavior of the scaled tracking error. Similar to [94], it is shown that theinterpolated scaled tracking error converges weakly to the solution of a switching diffusionprocess.Section 4.5 extends the results of Section 4.4 to infinite (denumerable) duplication-deletionrandom graphs where the number of nodes in the graph (and so the support of degree distribution) isno longer fixed and increases over time. A Hilbert-space-valued stochastic approximation algorithmis proposed to track the degree distribution of the infinite graph with support on the set of non-negative integers. To study the tracking performance of such a Hilbert-space-valued stochasticapproximation algorithm, limit system characterization and asymptotic analysis of scaled trackingerror are provided. Numerical examples are presented in Section 4.6.814.2. Markov-modulated Dynamic Random Graph of Duplication-deletion Type4.2 Markov-modulated Dynamic Random Graph ofDuplication-deletion TypeThis section outlines the construction of Markov-modulated dynamic random graphs of duplication-deletion type. Let n = 0,1,2, . . . denote discrete time. Denote by θn a discrete-time Markov chainwith state spaceM = {1,2, ...,M}, (4.1)and initial probability distribution π0.Assumption 4.2.1. The Markov chain θn evolves according to the transition matrixAρ = I+ρQ. (4.2)Here, I is an M×M identity matrix, ρ is a small positive real number, and Q = [qi j] is an irre-ducible33 generator of a continues-time Markov chain satisfyingqi j > 0 for i ̸= j, and Q1= 0, (4.3)where 1 and 0 represent column vectors of ones and zeros, respectively. The transition probabilitymatrix Aρ is therefore close to identity matrix. Here and henceforth, we refer to such a Markovchain θn as a “slow” Markov chain. The initial distribution π0 is assumed independent of ρ .A Markov-modulated duplication-deletion dynamic random graph is parameterized by the 7-tuple (M,Aρ ,π0,r, p,q,G0). Here, p and q areM-dimensional vectors with elements p(i) and q(i) ∈[0,1], i = 1, . . . ,M, where p(i) denotes the connection probability, and q(i) denotes the deletionprobability. Also, r ∈ [0,1] denotes the probability of the duplication step, and G0 denotes the initialgraph at time 0. In general, G0 can be any finite simple connected graph. For simplicity, assume thatG0 is a simple connected graph with size N0. The duplication-deletion random graph is constructedvia the duplication-deletion Procedure 4.534.TheMarkov-modulated random graph generated by the duplication-deletion Procedure 4.5 mim-ics social networks where the interactions between nodes evolve over time due to the underlying dy-namics (state of nature) such as seasonal variations (e.g., the high school friendship social networkevolving over time with different winter/summer dynamics). In such cases, the connection/deletion33The irreducibility assumption implies that there exists a unique stationary distribution π ∈ RM×1 for this Markovchain such that π ′ = π ′Aρ .34In Procedure 4.5, Step 1 is executed with probability r. Then, regardless of execution of Step 1, Step 2 is imple-mented. For convenience in the analysis, assume that a node generated in the duplication step cannot be eliminated inthe deletion step immediately after its generation. Also, nodes whose degrees change in the edge-deletion part of Step 2,remain unchanged in the duplication part of Step 2 at that time instant. Finally, to prevent formation of isolated nodes, as-sume that the neighbor of a node with degree one cannot be eliminated in the deletion step. Note also that the duplicationstep in Step 2 ensures that the graph size does not decrease.824.2. Markov-modulated Dynamic Random Graph of Duplication-deletion TypeProcedure 4.5Markov-modulated Graph parameterized by (M,Aρ ,π0,r, p,q,G0)At time n, given the graph Gn and Markov chain state θn, simulate the following events:Step 1: Duplication step: With probability r implement the following steps:• Choose node u from graph Gn randomly with uniform distribution.• Vertex-duplication: Generate a new node v.• Edge-duplication:– Connect node u to node v. (A new edge between u and v is added to the graph.)– Connect each neighbor of node u with probability p(θn) to node v. These connectionevents are statistically independent.Step 2: Deletion Step: With probability q(θn) implement the following steps:• Edge-deletion: Choose node w randomly from Gn with uniform distribution. Delete node walong with the connected edges in graph Gn.• Duplication Step: Choose a node from graph x from Gn randomly and implement Vertex-duplication and Edge-duplication processes as described in Step 1.Step 3: Denote the resulting graph by Gn+1. Generate θn+1 (Markov chain) using transition matrixAρ . Set n→ n+1 and go to Step 1.probabilities p,q evolve with time. Procedure 4.5 models these time variations as a finite stateMarkov chain θn with transition matrix Aρ .The Markov-modulated random graph generated by Procedure 4.5 mimics social networkswhere the interactions between nodes evolve over time due to the underlying dynamics (state ofnature) such as seasonal variations (e.g., the high school friendship social network evolving overtime with different winter/summer dynamics). In such cases, the connection/deletion probabilitiesp,q depend on the state of nature and evolve with time. Procedure 4.5 models these time variationsas a finite state Markov chain θn with transition matrix Aρ .Discussion:The connection/deletion probabilities p,q can be determined by the solution of a utility maximiza-tion problem. LetU join : [0,1]×M → R denote a utility function that gives payoff to an individualwho considers to expand his neighbors in “Edge-duplication step” of Procedure 4.5 as a function of(p,θ). Similarly, let U leave : [0,1]×M → R denote a utility function that pays off to an individualwho considers to leave the network in “Deletion step” of Procedure 4.5 as a function of (q,θ). Withthe above utility functions, the probabilities of connection/deletion when the state of nature is θ can834.3. Asymptotic Degree Distribution Analysis for Non-Markov Modulated casebe viewed as the solutions of the following maximization problems:p(θ) = argmaxp{U join(p,θ)},q(θ) = argmaxq{U leave(q,θ)}.(4.4)These utility functions can be interpreted in terms of mutual benefits and privacy concerns. Oneexample could be U join(p,θ) = bjoin(p,θ)− v, where bjoin(p,θ) is the benefit one obtains by ex-panding his network with probability p when the underlying state of nature is θ , and v is the costincurred by sacrificing his “privacy”. In this example, when an individual decides to leave the net-work, the utility he obtains will beU leave(q,θ) = bleave(q,v)− c(θ), where bleave(q,v) is the benefithe earns by preserving privacy and c(θ) is the benefit he loses by leaving the network when theunderlying state of nature is θ .4.3 Asymptotic Degree Distribution Analysis for Non-MarkovModulated caseThis section presents degree distribution analysis for duplication-deletion random graphs generatedaccording Procedure 4.5 for the non-Markov modulated case, i.e., M = 1. The stationary degreedistribution obtained in Section 4.3.1 below will be used in the Markov modulated case. The resultsin this section constitute a minor extension of [41] to the duplication-deletion random graphs.Notation At each time n, let Nn denote the number of nodes of graph Gn. Also, let fn be a Nndimensional vector such that its i-th element, f in, denotes the number of vertices of graph Gn withdegree i. Clearly f ′n1=Nn where 1 denotes the vector of ones. Here, ′ is used to denote the transposeof a vector or matrix. Define the “empirical vertex degree distribution” asgn = (gin, i= 1,2, . . .), where gin =f inNn. (4.5)Note that gn can be viewed as a probability mass function since all of its elements are non-negativeand g′n1= 1.4.3.1 Fixed Size Random GraphThis subsection analyzes the evolution of the expected degree distribution for a fixed size duplication-deletion random graph generated according to Procedure 4.5 with r = 0, M = 1. (Recall r denotesthe probability of Step 1 in Procedure 4.5.) Therefore, the number of vertices in the graph remains844.3. Asymptotic Degree Distribution Analysis for Non-Markov Modulated casefixed, i.e., Nn =N0 for n= 0,1,2, . . .. Theorem 4.3.1 below gives a recursion for the expected degreedistribution of the fixed size Markov-modulated duplication-deletion random graph.Theorem 4.3.1. Consider the fixed size duplication-deletion random graph generated according toProcedure 4.5, where r = 0, M = 1. Let gn denote the expected degree distribution of nodes at timen. Then, gn satisfies the recursiongn+1 = (I+1N0L′)gn, (4.6)where L is a generator matrix35 with elements (for 1≤ i, j ≤ N0):l ji =⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩0, j < i−1,qpi−1+q(1+ p(i−1)), j = i−1,iqpi−1(1− p)−q(i+2+ pi), j = i,q(i+1i−1)pi−1(1− p)2+q(i+1), j = i+1,q(ji−1)pi−1(1− p) j−i+1, j > i+1.(4.7)Proof. The proof is presented in Appendix 4.8.1.Theorem 4.3.1 shows that evolution of the expected degree distribution in a fixed size Markov-modulated duplication-deletion random graph satisfies (4.6). One can rewrite (4.6) asgn+1 = B′N0gn, where BN0 = I+1N0L. (4.8)Since L is a generator matrix, BN0 can be considered as the transition matrix of a slow Markovchain. It is also straightforward to show that BN0 is irreducible and aperiodic36. Hence, there existsa unique stationary distribution g= (gi, i= 1,2, . . .) such thatg= B′N0g. (4.9)The stationary distribution g is the stationary expected degree distribution of a fixed size duplication-deletion random graph generated according to Procedure 4.5 where r = 0. Note that the underlyingMarkov chain {θn} depends on the small parameter ρ . The main idea is that, although θn is time-varying but it is piecewise constant (since ρ is small parameter)—it changes slowly over time.Further, in light of (4.6), the evolution of gn depends on1N0. Our assumption throughout this chapteris that ρ≪ 1N0 . Therefore, the evolution of gn is faster than the evolution of θn. That is, gn reaches itsstationary distribution g before the state of θn changes. From (4.9), the expected degree distribution35That is, each row adds to zero and each non-diagonal element of L is positive.36It is straightforward to show that all elements of (BN0)N0 are strictly greater than zero. Therefore, BN0 is irreducibleand aperiodic.854.3. Asymptotic Degree Distribution Analysis for Non-Markov Modulated caseof the fixed size Markov-modulated duplication-deletion random graph can be uniquely computedfor each state of the underlying Markov chain θn = θ .Example: Searchability of a NetworkSo far in this section, an asymptotic analysis of the degree distribution was presented for a randomgraph generated according to Procedure 4.5. We now comment briefly on how the degree distribu-tion can be used to investigate the searchability of the network. This also motivates the stochasticapproximation algorithm presented in Section 4.4 as will be described below. The search prob-lem arises in a network when a specific node faces a problem (request) whose solution is at othernode (e.g., delivering a letter to a specific person or finding a web page with specific information).Assume [146] that on receiving a search request, each node follows the following protocol: (a) Itaddress the request if it or its neighbors have the solution; otherwise (b) it relays the request to oneof its neighbors chosen uniformly. The objective is to find the expected search delay, that is, theexpected number of steps until the request is addressed.Lemma 4.3.1. Consider the sequence of fixed size Markov-modulated duplication-deletion randomgraph obtained by Procedure 4.5, {Gn}, with (M,Aρ ,π0, ′,q, p,G0) where Aρ = I+ρQ and p = 0and expected degree distribution gn. The expected search delay isλ (N0) = O(N0δd2−δ), (4.10)as n→ ∞ where δ = ∑N0i=1 ign(i) and d2 = ∑N0i=1 i2gn(i).Proof. See Chapter 5 of [146] and recall that size of the considered random graph is N0.Lemma 4.3.1 implies that, if the empirical degree distribution of the possibly time-varying net-work can tracked accurately, then such an estimate can be used to track the searchability of thenetwork. Also, using the estimated degree distribution and Lemma 4.3.1, we can address the fol-lowing design problem as: How can p and q in Algorithm 4.5 be chosen so that the average delaydoes not exceed a threshold?Using the stochastic approximation algorithm in (4.14) (see Section 4.4 below for the convergenceproof), we can estimate the expected degree distribution, ĝn, and from that, we can compute δ andd2. Then, from Lemma 4.3.1 we can find the measure of searchability and compare it with themaximum acceptable average delay and modify the parameters of Procedure 4.5 accordingly. Weillustrate searchability in numerical examples given in Section 4.6.864.3. Asymptotic Degree Distribution Analysis for Non-Markov Modulated case4.3.2 Power Law Exponent for Infinite Duplication-deletion Random GraphThe degree distribution analysis provided in the previous subsection was for a fixed size randomgraph generated according to the duplication-deletion Procedure 4.5 with r= 0. This section extendsthis analysis to infinite duplication-deletion random graphs (obtained by choosing r = 1). Assumethat G0 is an empty set. Since r = 1, at time n, the graph Gn has n nodes. By employing the sameapproach as in the proof of Theorem 4.3.1, it will be shown that the infinite duplication-deletionrandom graph without Markovian dynamics satisfies a power law. An expression is further derivedfor the power law exponent.Definition 4.3.1 (Power Law Distribution). The degree distribution g= (gi, i= 1,2, . . .), of a graphG has a power law distribution37 if there exists an integer i∗ such that for all i≥ i∗,loggi = α−β log iwhere α is a constant38 and β > 1. Parameter β is called the power law exponent.The power law is satisfied in many networks such as WWW-graphs, peer-to-peer networks,phone call graphs, co-authorship graph and various massive online social networks (e.g. Yahoo,MSN, Facebook) [17, 27, 44, 135, 139]. The following theorem states that the graph generatedaccording to Procedure 4.5 with r = 1 and M = 1 satisfies a power law.Theorem 4.3.2. Consider the infinite random graph with Markovian dynamics Gn obtained byProcedure 4.5 with 7-tuple (1,1,1,1, p,q,G0) with the expected degree distribution gn. Then, iflog p+ p< q1+q < p, the expected degree of nodes in Gn has a power law distribution with exponentβ > 1. The power law exponent is computed from(1+q)(pβ−1+ pβ − p) = 1+βq. (4.11)Here, p and q are the probabilities defined in duplication and deletion steps, respectively.Proof. The proof is similar to that of Theorem 4.3.1 with some modifications, see [72]. Here, weonly present an outline of the proof which is comprised of two steps: (i) finding the power lawexponent, and (ii) showing that the degree distribution converges to a power law with the computedexponent as n→ ∞. To find the power law exponent, we derive a recursive equation for the numberof nodes with degree i+ 1 at time n+ 1, denoted by f i+1n+1, in terms of the degrees of nodes in37There is a difference between “power law” and “power law distribution”. Power law is a functional relationshipbetween two parameters where one parameter is proportional to the power of another, i.e., x ∝ y−β , where β can be anyreal number. In comparison, the exponent of a power law distribution is strictly greater that one [117]. Otherwise, theprobability distribution does not add up to one.38The normalization constant α is computed from α =− log[ζ (β , i∗)], where ζ (β , i∗) =∑∞k=i∗ k−β denotes the incom-plete Riemann ζ -function.874.4. Tracking the Degree Distribution of the Fixed Size Markov-modulated Random Graphgraph Gn. Then, rearranging this recursive equation yields an equation for the power law exponent.To prove that the degree distribution satisfies a power law, we show that limn→∞∑ik=1E{ f kn} =∑ik=1 ck−β , where β > 1 is the power law exponent computed in the first step and f kn is the k-thelement of fn.Theorem 4.3.2 asserts that the infinite duplication-deletion random graph without Markoviandynamics generated by Procedure 4.5 satisfies a power law and provides an expression for the powerlaw exponent. The significance of this theorem is that it ensures, with use of one single parameter(the power law exponent), we can describe the degree distribution of graphs with relatively largenumber of nodes. The above result slightly extends [42, 121], where only a duplication model wasconsidered. Theorem 4.3.2 allows us to explore characteristics (such as searchability, diffusion, andexistence/size of the giant component) of large networks which can be modeled with the infiniteduplication-deletion random graphs. Remark 1. Outline of Proof: The proof of Theorem 4.3.2,which is presented in Appendix 4.8.2, consists of two steps: (i) finding the power law componentand (ii) showing that the degree distribution converges to a power law as n→ ∞. To find the powerlaw component, we derive a recursive equation for the number of nodes with degree i+1 at time n+1, fn+1(i+1), in terms of degree of nodes in graph Gn. Then, this recursive equation is rearrangedto equation for the power law component. To prove that the degree distribution satisfies a power law,we define a new parameter hn(i) =1n ∑ik=1E{ fn(k)} and we show that limn→∞ hn(i) = ∑ik=1Ck−βwhere β is the power law component computed by the solving the recursive equation. Theorem 4.3.2asserts that the infinite duplication-deletion random graph without Markovian dynamics generatedby Procedure 4.5 satisfies a power law and provides an expression for the power law component.The significance of this theorem is that it ensures that with use of one single parameter (the powerlaw component), we can describe the degree distribution of large numbers of nodes in graphs thatmodel social networks.Remark 2. Power Law Component: Let β ∗ denote the solution of (4.11). Then the power lawcomponent is defined as β = max{1,β ∗}. Fig.4.1 shows the the power law component and β ∗versus p for different values of probability of deletion, q. As can be seen in Fig.4.1, the power lawcomponent is increasing in q and decreasing in p.4.4 Estimating (Tracking) the Degree Distribution of the Fixed SizeMarkov-modulated Duplication-deletion Random GraphIn Section 4.2, an expression was given for the unique stationary degree distribution g for the non-Markov modulated case, see (4.9). In this section, we consider fixed size Markov modulated du-plication deletion random graphs. Consider Procedure 4.5 and assume that there are M possiblestationary degree distributions, namely g = {g(1),g(2), . . . ,g(M)} corresponding to the M states884.4. Tracking the Degree Distribution of the Fixed Size Markov-modulated Random Graph0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9-10-8-6-4-20246810probability of connection in Algorithm 1, ppowerlawcomponentβ  q = 0q = 0.1q = 0.3q = 0.5q = 0.6q = 0.8β∗βFigure 4.1: The power law component for the non-Markovian random graph generated according toProcedure 4.5 obtained by (4.11) for different values of p and q in Procedure 4.5.of a Markov chain. Here each g(i) is computed using (4.9) where the corresponding parametersp(i),q(i) are used. At each time n, a stationary distribution g(θn) ∈ g is chosen where θn evolvesaccording to an M-state Markov chain as described in Section 4.2. We assume that the stationarydegree distribution of the graph is sampled by a network administrator. How can the network ad-ministrator track the expected degree distribution of the fixed size Markov-modulated duplicationdeletion random graph without knowing the dynamics of the graph? The motivation for tracking thestationary degree distribution stems from social networks where the dynamics of the degree distri-bution evolve on a faster time scale than the Markov chain θn. Therefore, it suffices to track g(θn)given observations.At each time n, the network administrator samples a node from the graph based on degreedistribution g(θn) and records its degree yn(θn). Let zn(θn) = eyn(θn) denote the observation vectorwhere ei ∈RN0×1 is the i-th standard unit vector. Such a sampling procedure can be time correlated.Therefore, we allow zn(θn) to be a mixing process with thee following assumption:Assumption 4.4.1. For each θ ∈M , the sequence {yn(θ)} is stationary φ -mixing with sufficientlyfast mixing rate such that the sequence {yn(1), . . . ,yn(M)} is independent of {θn} and that for eachθ ∈M , {zn(θ)} is a stationary φ -mixing sequence with mixing rate ψn satisfying ∑∞j=0ψ1/2j < ∞.Remark 4.4.1. Because {yn(θ)} is a stationary φ -mixing sequence for each θ ∈M , {zn(θ)} is894.4. Tracking the Degree Distribution of the Fixed Size Markov-modulated Random Grapha bounded sequence of φ -mixing process for each θ ∈M [97, p. 82] (see also [25, p.170]). Thestationarity implies thatEzn(θ) = Ez1(θ) =∞∑i=0eiP(y1(θ) = i) =∞∑i=0gi(θ)ei = g(θ) (4.12)The mixing rate given requires that for any positive integers i and j,∥EkI{yn(θ) = i}−gi(θ)∥ ≤ ψn−k for n≥ k,∥E[I{yl(θ) = j}−gj(θ)][I{yn=i}−gi(θ)]∥ ≤ ψ1/2n−lψ1/2l−k for any k < l < n,(4.13)where Ek denotes the conditional expectation on the past data up to time k (i.e., condition on theσ -algebra Fk generated by {z j(θ) : j ≤ k}) and I{·} denotes the indicator function. Here, ∥ · ∥ isused to denote the Euclidean norm.The analysis in this chapter can be generalized to include certain non-stationary cases forthe observation process {yn(θ)}. For example, for each θ ∈M , suppose {ζn(θ)} is an ergodicfinite state Markov chain39 . Let yn(θ) = f˜ (ζn(θ)). The n-step transition probability matrix of theMarkov chain converges to a matrix (with identical rows consisting of its stationary distribution)with exponential rate. Then it can be verified similar to [25, pp.178] that yn(θ) is mixing. Although(4.12) does not hold, the analysis using mixing inequalities can still be obtained.Given the observation sequence zn(θn), n = 0,1,2, . . ., the aim is to adaptively estimate g(θn)via the following stochastic approximation algorithm with (small positive) constant step-size ε :ĝn+1 = ĝn+ ε (zn(θn)− ĝn) , ĝ0 = e1 (4.14)To summarize, the evolution of the slow Markov chain θn and stochastic approximation algo-rithm (4.14) form a two-time-scale Markovian system as follows when ρ ,ε = o( 1N0){True system: g(θn) ∈ {g(1), . . . ,g(M)}, where θn evolves according to Aρ = I+ρQ,Algorithm: ĝn+1 = ĝn+ ε (zn(θn)− ĝn) , zn(θn) = eyn(θn), where yn(θn)∼ g(θn).(4.15)Note that the stochastic approximation algorithm (4.14) does not assume any knowledge of theMarkov-modulated dynamics of the graph. The Markov chain assumption for the random graphdynamics is only used in our convergence and tracking analysis. By means of the stochastic ap-proximation (4.14), the network administrator can track the stationary expected degree distributiong(θn).39Respondent driven sampling (RDS) was introduced in [74] as an approach for sampling from hidden populationsin social networks. RDS has been selected by the U.S. Centers for Disease Control and Prevention as part of the HIVbehavioral surveillance system. RDS can be viewed as a form of Markov Chain Monte Carlo sampling [68].904.4. Tracking the Degree Distribution of the Fixed Size Markov-modulated Random Graph4.4.1 Tracking Error of the Stochastic Approximation AlgorithmThe goal here is to analyze how well algorithm (4.14) tracks the empirical degree distribution ofthe fixed size Markov-modulated duplication-deletion graph. Define the tracking error as g˜n = ĝn−g(θn). Theorem 4.4.1 below shows that the difference between the sample path and the stationarydegree distribution is small—implying that the stochastic approximation algorithm can successfullytrack the Markov-modulated node distribution given the noisy measurements. We again emphasizethat no knowledge of the Markov chain parameters are required in the algorithm. It also finds theorder of this difference in terms of ε and ρ .Theorem 4.4.1. Consider the random graph (M,Aρ ,π0, p,q,r,G0). Suppose ρ2≪ ε and Assump-tions 4.2.1 and 4.4.1 hold40. Then, for sufficiently large n, the tracking error of the stochasticapproximation algorithm (4.14) isE∥g˜n∥2 = O(ε+ρ+ρ2ε). (4.16)Proof. The proof uses the perturbed Lyapunov function method and is provided in Appendix 4.8.5.Remark 4.4.2. Most existing literature analyzes stochastic approximation algorithms for trackinga parameter that evolves according to a “slowly time-varying” sample path of a continuous-valuedprocess so that the parameter changes by small amounts over small intervals of time. When the rateof change of the underlying parameter is slower than the adaptation rate of the stochastic approx-imation algorithm (e.g., a slow random walk), the mean square tracking error can be analyzed asin [19, 69, 99, 111, 127, 137]. In comparison, our analysis covers the case where the underlyingparameter evolves with discrete jumps that can be arbitrarily large in magnitude on short intervalsof time. Also, the jumps occur on the same time scale as the speed of adaptation of the stochas-tic approximation algorithm. We explicitly consider this Markovian time-varying parameter in ourmean square error and weak convergence analysis.As a corollary of Theorem 4.4.1, we obtain the following mean square error convergence result.Corollary 4.4.1. Under the conditions of Theorem 4.4.1, if ρ = O(ε),limn→∞E∥g˜n∥2 = O(ε).Therefore,limε→0limn→∞E∥g˜n∥2 = 0.40In this work, we assume that ρ = O(ε). Therefore, ρ2≪ ε .914.4. Tracking the Degree Distribution of the Fixed Size Markov-modulated Random Graph4.4.2 Limit System of Regime-Switching Ordinary Differential EquationsThe following theorem asserts that the sequence of estimates generated by the stochastic approxi-mation algorithm (4.14) follows the dynamics of a Markov-modulated ordinary differential equation(ODE).Before proceeding with the main theorem below, let us recall a Definition.Definition 4.4.1 (Weak Convergence). Let Zk and Z be Rr-valued random vectors. We say Zkconverges weakly to Z (Zk⇒ Z) if for any bounded and continuous function f (·), E f (Zk)→ E f (Z)as k→ ∞.Weak convergence is a generalization of convergence in distribution to a function space41.Theorem 4.4.2. Consider the Markov-modulated random graph generated according to Proce-dure 4.5, and the sequence of estimates {ĝn}, generated by the stochastic approximation algo-rithm (4.14). Suppose Assumptions 4.2.1 and 4.4.1 hold and ρ = O(ε). Define the continuous-timeinterpolated processĝε(t) = ĝn, θε (t) = θn for t ∈ [nε ,(n+1)ε). (4.17)Then, as ε → 0, (ĝε(·),θε (·)) converges weakly to (ĝ(·),θ(·)), where θ(·) is a continuous-timeMarkov chain with generator Q, ĝ(·) satisfies the Markov-modulated ODEdĝ(t)dt=−ĝ(t)+g(θ(t)), ĝ(0) = ĝ0 (4.18)and g(θ) ∈ g .The above theorem asserts that the limit system associated with the stochastic approximationalgorithm (4.14) is a Markovian switched ODE (4.18). As mentioned in Section 4.1, this is unusualsince typically in averaging of stochastic approximation algorithms, convergence occurs to a deter-ministic ODE. The intuition behind this is that the Markov chain evolves on the same time-scale asthe stochastic approximation algorithm. If the Markov chain evolved on a faster time-scale, thenthe limiting dynamics would be a deterministic ODE weighed by the stationary distribution of theMarkov chain. If the Markov chain evolved slower than the dynamics of the stochastic approxima-tion algorithm, then the asymptotic behavior would also be a deterministic ODE with the Markovchain being a constant.4.4.3 Scaled Tracking ErrorNext, we study the behavior of the scaled tracking error between the estimates generated by thestochastic approximation algorithm (4.14) and the expected degree distribution. The following the-orem states that the tracking error should also satisfy a switching diffusion equation and provides a41We refer the interested reader to [99, Chapter 7] for further details on weak convergence and related matters.924.5. Estimating the Degree Distribution of Infinite Duplication-deletion Random Graphsfunctional central limit theorem for this scaled tracking error. Let νk =ĝk−g(θk)√εdenote the scaledtracking error.Theorem 4.4.3. Suppose Assumptions 4.2.1 and 4.4.1 hold. Define νε(t) = νk for t ∈ [kε ,(k+1)ε).Then, (νε (·),θε (·)) converges weakly to (ν(·),θ(·)) such that ν(·) is the solution of the followingMarkovian switched diffusion processν(t) =−∫ t0ν(s)ds+∫ t0Σ12 (θ(τ))dω(τ). (4.19)Here, ω(·) is an RN0-dimensional standard Brownian motion. The covariance matrix Σ(θ) in (4.19)can be explicitly computed asΣ(θ) = Z(θ)′D(θ)+D(θ)Z(θ)−D(θ)−g(θ)g′(θ). (4.20)Here, D(θ) = diag(g(θ)) and Z(θ) = (I−BN0(θ)+1g′(θ))−1, where g(θ) ∈G. For each θ ∈M ,BN0(θ) is computed using (4.8) where the corresponding parameters p(i),q(i) are used.For general switching processes, we refer to [157]. In fact, more complex continuous-state de-pendent switching rather than Markovian switching are considered there. Equation (4.20) revealsthat the covariance matrix of the tracking error depends on BN0(θ) and g(θ) and, consequently, onthe parameters p and q of the random graph. Recall from Section 4.2 that BN0(θ) is the transi-tion matrix of the Markov chain which models the evolution of the expected degree distribution induplication-deletion random graphs and can be computed from Theorem 4.3.1.4.5 Estimating the Degree Distribution of InfiniteDuplication-deletion Random GraphsThis section has two results: First, the results of Section 4.4 are extended to infinite random graphswithout Markovian dynamics generated according to Procedure 4.5. Second, we show how thisanalysis can be extended to Markov-modulated probability mass functions with denumerable sup-port. The analysis is non-standard, since it is formulated on a Hilbert space.4.5.1 Infinite Random Graphs without Markovian DynamicsConsider the infinite duplication-deletion random graph without Markovian dynamics generatedaccording to Procedure 4.5 with 7-tuple (1,1,1,1, p,q,G0). In this section, let gn represent thedegree distribution of the infinite graph with support on the set of non-negative integers; its elementsare denoted by gin, i= 0,1,2, . . .. Recall from Section 4.3.2 that, the size of such a graph incrementsat each time by one and thus the size of the graph at time n is equal to n; that is Nn = n. Therefore,934.5. Estimating the Degree Distribution of Infinite Duplication-deletion Random Graphsthe maximum degree of the graph at time n cannot exceed n−1 and gjn = 0 for j≥ n. Similar to theproof of Theorem 4.3.1, the following theorem asserts that the expected degree distribution of theinfinite duplication-deletion random graph satisfies a recursive equation.Theorem 4.5.1. Consider the infinite duplication-deletion random graph without Markovian dy-namics generated according to Procedure 4.5 with 7-tuple (1,1,1,1, p,q,G0), Let gn = E{gn} de-note the expected degree distribution of nodes with support on the set of non-negative integers.Then, gn satisfies the following recursiongn+1 = gn+1nL(n)′gn, (4.21)where L(n) is a generator matrix of infinite size with elements:l(n)ji =⎧⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎩(1+q)(pi−1+1+ p(i−1)) , j = i−1 and 1≤ i, j ≤ n(1+q)(ipi−1(1− p)+1+ pi)−q(i+1), j = i and 1≤ i, j ≤ n(1+q)(i+1i−1)pi−1(1− p)2+q(i+1), j = i+1 and 1≤ i, j ≤ n(1+q)(ji−1)pi−1(1− p) j−i+1, j > i+1 and 1≤ i, j ≤ n,0, otherwise(4.22)Proof. The proof is similar to the proof of Theorem 4.3.1 and is omitted due to the lack of space.Remark 4.5.1. Theorem 4.3.2 in Section 4.3.2 asserts that the expected degree distribution con-verges to a power law probability distribution g with exponent β > 1, if log p+ p< q1+q < p; thatis limn→∞ gin =i−βζ (β) . We assume that the dynamics of the degree distribution evolve on a fastertime scale than the stochastic approximation algorithm. Therefore, it suffices to track the stationarydegree distribution g given observations.At each time n, the network administrator samples from the graph and records the degree of arandomly chosen vertex of the graph which is denoted by yn. Let zn = eyn denote the observationvector. Here, ei is the i-th standard unit vector with support on the set of non-negative integers (i.e.,ei = (0, . . . ,1, . . .)′ ∈R∞). The following stochastic approximation algorithm is used to estimate theexpected degree distribution of the graph from such samples.ĝn+1 = ĝn+ ε (zn− ĝn) . (4.23)Here, ε > 0 denote a small positive step size and ĝ0 = e1. Therefore, (4.23) is a Hilbert-space-valued stochastic approximation algorithms. By means of the stochastic approximation (4.23), thenetwork administrator can track the expected degree distribution of the infinite graph size increasesover time.944.5. Estimating the Degree Distribution of Infinite Duplication-deletion Random GraphsDefineĝε(t) = ĝn for t ∈ [nε ,nε+ ε).Then ĝε(·) ∈ D([0,∞) : ℓ2) the space of functions defined on [0,∞) taking values in ℓ2 = {z ∈ R∞ :∑∞i=0 ∥zi∥2 < ∞} such that the functions are right continuous and have left limits endowed with theSkorohod topology. Here, we obtain a weak convergence result of the interpolated sequence ofiterates. Theorem 4.5.2 below asserts that the mean square tracking error is bounded and shows thatthe sequnce of estimates obtained by (4.23) converge to the solution of an ODE. Before proceedingto the main theorem, we shall use the following conditions.Theorem 4.5.2. Suppose Assumption 4.4.1 holds with the modification that M = 1, i.e., there isno Markovian dynamics. Define g˜n = g− ĝn Then, limn→∞E∥g˜n∥2 = O(ε). Also, ĝε(·) is tight inD([0,∞) : ℓ2). Any convergent subsequence has a limit ĝ(·) that is the solution of the differentialequationdĝ(t)dt= g− ĝ(t), ĝ(0) = e1. (4.24)Proof. The proof is presented in Appendix 4.8.8. The proof of the theorem is divided into severalsteps, which uses techniques in stochastic approximation [99] but with the modification that ℓ2 is aHilbert space (see [61, 98]).The above result concerns n→∞ and ε→ 0, εn remains to be bounded. We next obtain a resultwith ε → 0, n→ ∞, εn→ ∞.Corollary 4.5.1. Consider ĝε(·+tε ), where tε→∞ as ε→ 0. Under the condition of Theorem 4.5.2,ĝε(·+ tε)→ g in probability as ε → 0.Proof Note that {ĝk} is tight. Define ĝε ,large(·) = ĝε(·+ tε). Using the same approach, we canshow that {ĝε ,large(·)} is tight. We extract a weakly convergent subsequence of (ĝε ,large(·), ĝε ,large(·−T )) with limit denoted by (ĝ(·), ĝT (·)). We note that ĝ(0) = ĝT (T ) and that ĝT (0) belongs to a setthat is bounded in probability. Writing it in variational form, we obtainĝT (T ) = e−T ĝT (0)+∫ T0e−(T−t)gdt = e−T ĝT (0)+g−∫ ∞Te−tdt→ g as T → ∞.The desired result then follows.To study the rate of variation of estimation error, we define the sequence of scaled estimationerror νn = (ĝn−g)/√ε . Theorem 4.5.3 asserts that the scaled estimation error satisfy a differentialequation and provides a weak convergence results for it.954.5. Estimating the Degree Distribution of Infinite Duplication-deletion Random GraphsTheorem 4.5.3. Suppose assumptions of Theorem 4.5.2 hold. Then, for sufficiently small ε thereis an Nε such that E{〈νn,νn〉} = O(1) for all n > Nε . Define the sequence of continuous-timeinterpolation of estimation error asνε(t) = νn for t ∈ [(n−Nε)ε ,(n−Nε+1)ε).Under the assumptions of Theorem 4.5.2, {νε(·)} is tight in D([0,∞);ℓ2). Moreover, suppose thatνε(0) converges weakly to ν(0), νε(·) converges weakly to ν(·) such that ν(·) is the solution of thefollowing stochastic differential equationdν(t) =−ν(t)dt+dW (t). (4.25)Here, W (t) = ∑∞i=0Wi(t)ei and the covariance operator is given byE〈W (t),v〉〈W (t),z〉= t〈z,Γv〉= t∞∑i=0σ 2i〈ei,v〉〈ei,z〉for v,z ∈ ℓ2, (4.26)where Wi(·) is a real-valued Wiener process with covariance tσ 2i andσ 2i = E[〈z0−g,ei〉]2+2∞∑j=1E〈z0−g,ei〉〈z j−g,ei〉.Proof. The proof is presented in Appendix 4.8.9.Note that the covariance σ 2i depends on the stationary expected degree distribution g and thus isa function of the power law exponent β .4.5.2 Markov-modulated Probability Mass Functions with Denumerable SupportHere, we extend the above results to the problem of tracking a time-varying probability mass func-tion with infinite support. The aim is to track a probability mass function with support on the setof non-negative integers that evolves according to a slow Markov chain θn with M states and ini-tial probability distribution π0. The state spaceM , and the transition probability matrix Aρ of theunderlying Markov chain θn are defined in (4.1) and (4.2), respectively. For each θ ∈M , letg(θ) = [g1(θ),g2(θ), . . .]′, (4.27)be a probability mass function with support on the set of non-negative integers such that∑∞i=1 gi(θ)=1 and gi(θ) ∝ i−βθ , where βθ > 1. When the underlying Markov chain θn jumps from one state toanother withinM , g(θn) switches accordingly.964.5. Estimating the Degree Distribution of Infinite Duplication-deletion Random GraphsAt each time n, we sample yn(θn) from PMF g(θn); that is yn(θn)∼ g(θn). Let zn(θn) = eyn(θn)denote the observation vector. To estimate g(θn), the following constant step size stochastic approx-imation algorithm is deployedĝn+1 = ĝn+ ε(zn(θn)− ĝn). (4.28)Here ε > 0 denotes a small positive step size and ĝ0 = e1. We further assume that the Markov chainis slowly changing in that the rate of changes is an order slower than that of adaptation (4.28); thatis ρ = ε2.To analyze the asymptotic properties of the stochastic approximation algorithm, we define thesequence of continuous-time interpolation ĝε (t) = ĝn for t ∈ [nε ,nε + ε). Similar to what havebeen obtained thus far for the non-Markovian case, with the details omitted, we obtain the followingweak convergence results. Theorem 4.5.4 states that the sequence of estimates obtained via Hilbert-space-valued stochastic approximation algorithm (4.28) converges weakly to the solution of an ODEwhich depends on the initial distribution of the underlying Markov chain.Theorem 4.5.4. Suppose Assumptions 4.2.1 and 4.4.1 hold. Then ĝε(·) is tight in D([0,∞) : ℓ2).Any convergent subsequence has a limit ĝ(·) that is the solution of the differential equationdĝ(t)dt=M∑θ=1g(θ)pθ − ĝ(t), ĝ(0) = e1, (4.29)whereg(θ) =∞∑i=0gi(θ)ei, and (pθ : θ ≤M) = π0is the initial probability distribution of Markov chain.Proof. The proof is presented in Appendix 4.8.10.Furthermore, we can obtain the following corollary. The proof is similar to that of Corol-lary 4.5.1 and thus omitted.Corollary 4.5.2. Consider ĝε(·+tε ), where tε→∞ as ε→ 0. Under the condition of Theorem 4.5.4,ĝε(·+ tε)→ g= ∑Mθ=1 pθg(θ) in probability as ε→ 0.Redefine νn = (ĝn−g)/√ε . It can be shown that there exists Nε such that the sequence {νn : n≥Nε} is tight. Next, redefine νε(t) = νn for t ∈ [ε(n−Nε),ε(n−N−ε)+ε).With a little more effort,we can also obtain the associated rates of convergence result, which is stated in the next theorem.Theorem 4.5.5. Suppose Assumptions 4.2.1 and 4.4.1 hold. Then, {νε(·)} is tight in D([0,∞);ℓ2).Moreover, suppose that νε(0) converges weakly to ν(0), then νε(·) converges weakly to ν(·) such974.6. Numerical Examplesthat ν(·) is the solution of the following stochastic differential equation (SDE)du(t) =−ν(t)dt+M∑θ=1pθdW (θ , t), (4.30)where for each θ ∈M , W (θ , ·) is a Wiener process as given in Theorem 4.5.3.Proof. The proof is similar to the proof of Theorem 4.5.3 with modifications similar to those of theproof of Theorem 4.5.4.4.6 Numerical ExamplesIn this section, numerical examples are given to illustrate the results from Section 4.2, and Sec-tion 4.4.The main conclusions are:1. The infinite duplication-deletion random graph without Markovian dynamics generated bythe duplication-deletion Procedure 4.5 satisfies a power law as stated in Theorem 4.3.2; seeExample 4.6.1.2. The degree distribution of the fixed size duplication-deletion random graph generated by theduplication-deletion Procedure 4.5 can be computed from Theorem 4.3.1. When N0 (the sizeof the random graph) is sufficiently large, numerical results show that the degree distributionsatisfies a power law as well; see Example 4.6.2.3. The estimates obtained by stochastic approximation algorithm (4.14) follow the expectedprobability distribution precisely without information about the Markovian dynamics; seeExample 4.6.3.4. The larger the trace of the asymptotic covariance of the scaled tracking error, the greater theaverage degree of nodes and the searchability of the graph. This is illustrated in Example 4.6.4below.Example 4.6.1. Consider an infinite duplication-deletion random graph without Markovian dy-namics (so M = 1) generated by Procedure 4.5 with p = 0.5 and q = 0.1. Theorem 4.3.2 impliesthat the degree sequence of the resulting graph satisfies a power law with exponent computed using(4.11). Fig.4.2 displays the un-normalized degree distribution on a logarithmic scale. The linear-ity in Fig.4.2 (excluding the nodes with very small degree), implies that the resulting graph fromduplication-deletion process satisfies a power law. As can be seen in Fig.4.2, the power law is abetter approximation for the middle points compared to both ends.984.6. Numerical ExamplesExample 4.6.2. Consider the fixed size duplication-deletion random graph generated by Proce-dure 4.5 with r = 0, N0 = 10, p = 0.4, and q = 0.1. We consider M = 1 (no Markovian dynamics)to illustrate Theorem 4.3.1. Fig. 4.3 depicts the normalized degree distribution of the fixed sizeduplication-deletion random graph obtained by Theorem 4.3.1. As can be seen in Fig. 4.3, the com-puted degree distribution is close to that obtained by simulation. The numerical results show thatthe degree distribution of the fixed size random graph also satisfies a power law for some valuesof p when the size of random graph is sufficiently large. Fig. 4.4 shows the number of nodes withspecific degree for the fixed size random graph obtained by Procedure 4.5 with r = 0, N0 = 1000,p= 0.4, and q= 0.1 on a logarithmic scale for both horizontal and vertical axes.Example 4.6.3. Consider the fixed size Markov-modulated duplication-deletion random graph gen-erated by Procedure 4.5 with r = 0 and N0 = 500. Assume that the underlying Markov chain hasthree states, M = 3. We choose the following values for probabilities of connection and deletion:state (1): p= q= 0.05, state (2): p= 0.2 and q= 0.1, and state (3): p= 0.4, q= 0.15. The samplepath of the Markov chain jumps at times n= 3000 from state (1) to state (2) and n= 6000 from state(2) to state (3). As the state of the Markov chain evolves, the expected degree distribution, g(θ),obtained by (4.9) evolves over time. The corresponding values for the expected degree distributionfor nodes of degree i= 3 are displayed in Fig.4.5 using a dotted line. The estimated probability massfunction, ĝn, obtained by the stochastic approximation algorithm (4.14) is plotted in Fig.4.5 usinga solid line. The figure shows that the estimates using by the stochastic approximation algorithm(4.14) follow the expected degree distribution (4.9) satisfactorily even though the algorithm has noinformation about the underlying Markovian dynamics.Example 4.6.4. Consider the fixed size Markov-modulated duplication-deletion random graph ob-tained by Procedure 4.5 with M = 91 and r = 0 and N0 = 1000. For each value of p(θ) =0.04+ θ × 0.01,θ ∈ {1,2, . . . ,91} and q ∈ {0.05,0.1,0.15,0.2}, we compute L(θ) from (4.7) andconsequently the stationary distribution, g(θ), from (4.9). As expected, the stationary distribu-tion does not depend on q because only the deletion step in Procedure 4.5 occurs with probabilityq. From g(θ), we compute the average degree of nodes, δ . Fig.4.6 shows the average degreeof nodes versus the probability of the connection in Procedure 4.5. As can be seen in Fig.4.6,with increasing the probability of connection in Procedure 4.5, the average degree of nodes in thegraph (which is a measure for the connectivity of the graph, see [41]). Then for each value ofp(θ) = 0.04+ θ × 0.01,θ ∈ {1,2, . . . ,91} and q ∈ {0.05,0.1,0.15,0.2}, the covariance matrix iscomputed using (4.6). Fig.4.7 depicts the trace of the covariance matrix, trace (Σ(θ)), for eachvalue of p and q versus the corresponding average degree of nodes (for each value of p). As canbe seen in Fig.4.7, the trace of the covariance matrix is larger when the average degree of nodes ishigher (the graph is highly connected).Recall from Lemma 4.3.1, the order of delay in the searching problem can be computed by994.6. Numerical Examples101 102100101102103104105Degree of each nodeNumberofnodes  Duplication-deletion random graph generated by Algorithm 1A line with slope −β = −2.68 obtained by Eq. (2.3)Figure 4.2: Degree distribution of the duplication-deletion random graph satisfies a power law. Theparameters are specified in Example 4.6.1 of Section 4.6.1 2 3 4 5 6 7 8 9 1000.050.10.150.20.250.30.350.4DegreeProbability  Degree distribution obtained by simulationDegree distribution obtained by Theorem 2.1.1Figure 4.3: Degree distribution of the fixed size duplication-deletion random graph. The parametersare specified in Example 4.6.2 of Section 4.6.1004.6. Numerical Examples100 101 102100101102103Degree of each nodeNumberofnodes  Fixed size duplication-deletion random graphA line with slope -2.3Figure 4.4: Degree distribution of the fixed size duplication-deletion random graph satisfies a powerlaw when N0 is sufficiently large. The parameters are specified in Example 4.6.2 of Section 4.6.0 2000 4000 6000 8000 1000000.020.040.060.080.10.120.140.160.18IterationProbabilityofhavinganodewithdegree3  Stochastic approximation algorithmExpected degree distribution obtained by Eq. (2.9)Figure 4.5: The estimates obtained by SA algorithm (4.14) follows the expected PMF precisely withno knowledge of the Markovian dynamics. The parameters are specified in Example 4.6.31014.7. Closing Remarks0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.902468101214161820Probability of connection , p, in Algorithm 1Averagedegreeofnodes,d¯1Figure 4.6: The average degree of nodes (as a measure of connectivity) of the fixed size Markov-modulated duplication-deletion random graph obtained by Procedure 4.5 for different values of theprobability of connection, p, in Algorithm 4.5. The parameters are specified in Example 4.6.4 ofSection 4.6.λ (N0) = O(N0δd2−δ). Knowing the degree distribution g(θ), δ and d2 can be computed for eachvalue of p ∈ {0.05,0.06, . . . ,0.95}. Fig.4.8 shows the trace of the covariance matrix versus(δd2−δ)as a measure of the searchability for each value of q ∈ {0.05,0.1,0.15,0.2}. As can be seen inFig.4.8, the trace of covariance matrix is larger when the order of delay in the search problem in(4.10) is smaller42.4.7 Closing RemarksMarkov-modulated duplication-deletion random graphs are analyzed in terms of their degree distri-bution. When the size of graph is fixed (r= 0) and ρ is small, the expected degree distribution of theMarkov-modulated duplication-deletion random graph can be computed from (4.6) for each state ofthe underlying Markov chain. This result allows us to express the structure of network (degree distri-bution) in terms of the dynamics of the model. We also showed that, the infinite duplication-deletionrandom graph without Markovian dynamics generated according to Procedure 4.5 (r = 1,M = 1)satisfies a power law with component computed from (4.11). The importance of this result is thata single parameter (power law component) characterizes the structure of a possibly very large dy-42This means that the target node can be found in the search problem with smaller number of steps.1024.7. Closing Remarks2 4 6 8 10 12 14 16 18 20020004000600080001000012000Average degree of nodes, d¯1Traceofcovariancematrix,Σ(θ)  q = 0.05q = 0.1q = 0.15q = 0.2Figure 4.7: Trace of the covariance matrix of scaled tracking error, trace (Σ(θ)), versus the averagedegree of nodes as a measure of connectivity of the network. The parameters are specified inExample 4.6.3 of Sec.4.6.0.2 0.4 0.6 0.8 1 1.2020004000600080001000012000Order of searchability,(d¯1d¯2−d¯1)Traceofcovariancematrix,Σ(θ)  q = 0.05q = 0.1q = 0.15q = 0.2Figure 4.8: Trace of the covariance matrix of the scaled tracking error, trace (Σ(θ)), versus the orderof delay in the searching problem as a measure of searchability of the network. The parameters arespecified in Example 4.6.3 of Sec.4.6.1034.8. Proof of Resultsnamic network.Also a stochastic approximation algorithm was presented to adaptively estimate the degree dis-tribution of random graphs. The stochastic approximation algorithm (4.14) does not assume knowl-edge of the Markov-modulated dynamics of the graph. Theorem 4.4.1 showed that the tracking errorof the stochastic approximation algorithm is small and is in order of O(ε). As a result of this bound,we showed that the scaled tracking error weakly converges to a diffusion process. Motivated bythe analysis of social networks, we presented a Hilbert-space-valued stochastic approximation algo-rithm to estimate the expected degree distribution of the infinite duplication-deletion random graphwithout Markovian dynamics. The asymptotic behaviour of such an algorithm is analyzed in termsof the power law degree distribution. Finally, we extended the analysis to a Hilbert-space-valuedstochastic approximation algorithm that aims to track a Markov-modulated probability mass func-tion with denumerable support. Using weak convergence methods, it was shown that the estimatesobtained via such an algorithm converge weakly to the solution of an ordinary differential equation.It was also shown that the interpolated sequence of scaled tracking error converges weakly to thesolution of a stochastic differential equation.4.8 Proof of Results4.8.1 Proof of Theorem 4.3.1The proof is based on the proof of Lemma 4.1 in [41, Chapter 4, p79]. To compute the expecteddegree distribution of the Markov-modulated random graph, we find a relation between the numberof nodes with specific degree at time n and the degree distribution of the graph at time n−1. Recallthat the i-th element of fn, fin, denotes the number of vertices with degree i at time k. Given theresulting graph at time n, the aim is to find the expected number of nodes with degree i+1 at timen+1. The following events can occur that result in a node with degree i+1 at time n+1:• Degree of a node with degree i increments by one in the duplication step (Step 1 of theduplication-deletion Procedure 4.5) and remains unchanged in the deletion step (Step 2):– A node with degree i is chosen at the duplication step as a parent node and remainsunchanged in the deletion step. The probability of occurrence of such an event isr(1− q(i+1)+q(1+ pi)−q(1+ pi)(i+1)/NnNn)f inNn;the probability of choosing a node with degree i isf inNnand the probability of the event1044.8. Proof of Resultsthat this node remains unchanged in the deletion step is431− q(i+1)+q(1+ pi)−q(1+ pi)(i+1)/NnNn.– One neighbor of a node with degree i is selected as a parent node; the parent nodeconnects to its neighbors (including the node with degree i) with probability p in theedge-duplication part of Step 1. The probability of such an event isrf in piNn(1− q(i+2)+q(1+ p(i+1))−q(1+ p(i+1))(i+2)/NnNn).Note that the node whose degree is incremented by one in this event should remainunaffected in Step 2; the probability of being unchanged in Step 2 for such a node is1− q(i+2)+q(1+ p(i+1))−q(1+ p(i+1))(i+2)/NnNn.• A node with degree i+1 remains unchanged in both Step 1 and Step 2 of Procedure 4.5:– Using the same argument as above, the probability of such an event isf i+1n(1−qi+3+ p(i+1)−(1+p(i+1))(i+2)NnNn)(1− r p(i+1)+1Nn).• A new node with degree i+1 is generated in Step 1:– The degree of the most recently generated node (in the vertex- duplication part of Step 1)increments to i+ 1; the new node connects to “i” neighbors of the parent node andremains unchanged in Step 2. The probability of this scenario isr(1−qi+3+ p(i+1)−(1+p(i+1))(i+2)NnNn)∑j≥ifjnNn(ji)pi(1− p) j−i.• Degree of a node with degree i+2 decrements by one in Step 2:43The deletion step (Step 2 of Procedure 4.5) comprises an edge-deletion step and a duplication step. The probabilitythat the degree of node with degree i changes in the edge-deletion step isq(i+1)Nn; either this node or one of its neighborsshould be selected in the edge-deletion step. Also given that the degree of this node dose not change in the edge-deletionstep, if either this node or one of its neighbor is selected in the duplication step (within Step 2) then the degree of thisnode increments by one with probability 1+piNn . Therefore, the probability that the degree of a node of degree i remainsunchanged in Step 2 is1− q(i+1)+q(1+ pi)−q(1+ pi)(i+1)/NnNn.Note that for simplicity in our analysis, it is assumed that the nodes whose degrees change in the edge-deletion part ofStep 2, remain unchanged in the duplication part of Step 2 at that time instant. Also, the new node, which is generated inthe vertex-duplication step of Step 1, remains unchanged in Step 2.1054.8. Proof of Results– A node with degree i+ 2 remains unchanged in the duplication step and one of itsneighbors is eliminated in the deletion step. The probability of this event isq(i+2Nn)(1− p(i+2)+1Nn).• A node with degree i+1 is generated in Step 2:– The degree of the node generated in the vertex-duplication part of duplication stepwithin Step 2 increments to i+1. The probability of this event isq∑j≥i1Nnf jn(ji)pi(1− p) j−i.• Degree of a node with degree i increments by one in Step 2:– A node with degree i remains unchanged in Step 1 and its degree increments by one inthe duplication part of Step 2. The corresponding probability isq(1+ pi)Nn(1− 1+ piNn).Let Ω denote the set of all arbitrary graphs and Fn denote the sigma algebra generated by graphsGτ ,τ ≤ n. Considering the above events that result in a node with degree i+ 1 at time n+ 1, thefollowing recurrence formula can be derived for the conditional expectation of f i+1n+1:E{ f i+1n+1|Fn} =(1−qi+3+ p(i+1)−(1+p(i+1))(i+2)NnNn)(1− r p(i+1)+1Nn)f i+1n+ r(1− q(i+1)+q(1+ pi)−q(1+ pi)(i+1)/NnNn)(1+ piNn)f in+ r(1−qi+3+ p(i+1)−(1+p(i+1))(i+2)NnNn)∑j≥ifjnNn(ji)pi(1− p) j−i+q∑j≥ifjnNn(ji)pi(1− p) j−i+q(i+2Nn)(1− p(i+2)+1Nn)f i+2n+q(1+ pi)Nn(1− 1+ piNn)f in. (4.31)1064.8. Proof of ResultsLet fin = E{ fn}. By taking expectation of both sides of (4.31) with respect to trivial sigma algebra{Ω, /0}, the smoothing property of conditional expectations yields:fi+1n+1 =(1−qi+3+ p(i+1)−(1+p(i+1))(i+2)NnNn)(1− r p(i+1)+1Nn)fi+1n+ r(1− q(i+1)+q(1+ pi)−q(1+pi)(i+1)NnNn)(1+ piNn)fin+ r(1−qi+3+ p(i+1)−(1+p(i+1))(i+2)NnNn)∑j≥ifjnNn(ji)pi(1− p) j−i+q∑j≥i1Nnfjn(ji)pi(1− p) j−i+q(i+2Nn)(1− p(i+2)+1Nn)fi+2n+q(1+ pi)Nn(1− 1+ piNn)fin. (4.32)Assuming that size of the graph is sufficiently large, each term likefinN2ncan be neglected. Eq. (4.32)can be written asfi+1n+1 =(1− q(i+2)+(r+q)(p(i+1)+1)Nn)fi+1n+((1+ pi)(r+q)Nn)fin+q(i+2Nn)fi+2n+q∑j≥i1Nnfjn(ji)pi(1− p) j−i. (4.33)Using (4.32), we can write the following recursion for the (i+1)-th element of gn+1:gi+1n+1 =(Nn−(q(i+2)+(r+q)(p(i+1)+1))Nn+1)gi+1n+((1+ pi)(r+q)Nn+1)gin+q(i+2Nn+1)gi+2n+q∑j≥i1Nn+1g jn(ji)pi(1− p) j−i. (4.34)1074.8. Proof of ResultsSince the probability of duplication step r = 0, the number of vertices does not increase. Thus,Nn = N0 and (4.34) can be written asgi+1n+1 =(1− 1N0(q(i+2)+q(p(i+1)+1)))gi+1n+1N0((1+ pi)qgin+1N0q(i+2)gi+2n)+1N0q∑j≥ig jn(ji)pi(1− p) j−i. (4.35)It is clear in (4.35) that the vector gn+1 depends on elements of gn. Using matrix notation, (4.35)can be expressed asgn+1 =(I+1N0L′)gn (4.36)where L is defined as (4.7).To prove that L is a generator, we need to show that lii < 0 and ∑N0i=1 lki = 0. Accordingly,N0∑i=1lki =−(q(k+1)+q(1+ pk))+ (1+ pk)q+qk+q ∑k≤i−1(ki−1)pi−1(1− p)k−i+1=−q+q ∑k≤i−1(ki−1)pi−1(1− p)k−i+1. (4.37)Let m= i−1. Then, (4.37) can be rewritten asN0∑i=1lik =−q+qk∑m=0(km)pm(1− p)k−m=−q+q(1− p)kk∑m=0(km)(p1− p)m. (4.38)Knowing that ∑km=0(km)am = (1+a)k, (4.38) can be written asN0∑i=1lik =−q+q(1− p)k(11− p)k= 0. (4.39)Also, it can be shown that lii < 0. Since pi−1 ≤ 1, pi−1 < 1+ 2i + p+ pi. Consequently, iqpi−1(1−p)−q(i+2+ ip)< 0. Therefore, lii < 0 and the desired result follows.1084.8. Proof of Results4.8.2 Proof of Theorem 4.3.2Proof. To prove Theorem 4.3.2, we first compute the power law component, β , and then we provethat the expected degree distribution converges to the power law distribution with component β . Letf n(i) = E{ fn(i)}. Similar to (4.31), f n(θn, i) can be written asf n+1(i+1) =(1−q(i+2)+ (1+ p(i+1))Nn)(1− p(i+1)+1Nn)f n(i+1)+(1−q(i+1)+ (1+ pi)Nn)(1+ piNn)f n(i)+(1−q(i+2)+ (1+ p(i+1))Nn)∑j≥i1Nnf n( j)(ji)pi(1− p) j−i+q∑j≥i1Nnf n( j)(ji)pi(1− p) j−i+q(i+2Nn)(1− p(i+2)+1Nn)f n(i+2)+q(1+ pi)Nn(1− 1+ piNn)f n(i)+q(i+2Nn)(p((i+1)+1)Nn)f n(i+1). (4.40)To compute the power law component, we can heuristically assume that f n(i) = ait as Nn = ngoes to infinity (we will prove this precisely later on this section). Therefore, each term likef n(i′)N2ncan be neglected as n approaches infinity. So (4.40) can be re-written asf n+1(i+1) =(1− q(i+2)+ (1+q)(p(i+1)+1)Nn)f n(i+1)+((1+ pi)(1+q)Nn)f n(i)+q(i+2Nn)f n(i+2)+ (1+q)∑j≥i1Nnf n( j)(ji)pi(1− p) j−i. (4.41)Substituting f τ( j) = ajτ and Nn = n in (4.98) yieldsai+1(n+1) =ai+1n−ai+1((1+ p(i+1))(1+q)+q(i+2))+(1+q)(1+ pi)ai+q(i+2)ai+2+(1+q)∑j≥ia j(ji)pi(1− p) j−i. (4.42)1094.8. Proof of ResultsTaking all terms with ai+1 to the left hand side, we haveai+1(1+(1+q)(1+ p(i+1))+q(i+2)))=(1+q)((1+ pi)ai+∑j≥ia j(ji)pi(1− p) j−i)+q(i+2)ai+2. (4.43)Dividing both sides of (4.43) by ai yieldsai+1ai(1+(1+q)(1+ p(i+1))+q(i+2)))=(1+q)((1+ pi)+∑j≥ia jai(ji)pi(1− p) j−i)+q(i+2)ai+2ai. (4.44)Solving Equation (4.43) for ai, we can complete the proof of Theorem 4.3.2 The following lemmawhose proof can be found in [42] is used to solve the recurrence relation for ai.Lemma 4.8.1.∑j≥ia jai(ji)pi(1− p) j−i = pβ−1+O(1i). (4.45)Proof. The proof is presented in Appendix 4.8.3.To solve (4.43) for ai, we can further assume that ai =Ci−β [41]. Therefore, ai+αai =(i+αi)−β(1− βi)(1+(1+q)(1+ p(i+1))+q(i+2))=(1+q)(1+ pi+ pβ−1)+O(1i)+q(i+2)(1− 2βi). (4.46)Neglecting the O(1i)terms, yields(1+q)(pβ−1+ pβ − p) = 1+βq. (4.47)Note that the proof presented above depends on few assumptions. To give a rigorous proof, thesucceeding steps should be followed as described in [41]:• First, we need to show that the limit limn→∞ 1nE{ fn(i)} exists.• Let ai be the solution of (4.43) such that ∑∞i=1 ai = 1 and a0 = 0, then it is needed to show thatlimn→∞1nE{ fn(i)} = ai. (4.48)1104.8. Proof of Results• Finally, we should show that ai is proportional to i−β , where β is the root of (4.47).To complete the proof we define new function as follows hn(i) =1n ∑ik=1E{ fn(k)} which can bedescribed as CDF of degree of each node in random graph. It is sufficient to show that for all i> 0,limn→∞hn(i) =i∑k=1ak (4.49)where ai is the solution of (4.43). It is obvious if (4.49) holds, hn(i)−hn(i−1) = ai and thuslimn→∞1nE{ fn(i)} = ai(as presented in (4.48)). The following lemma gives a recurrence formula to compute the value ofh(n+1, i).Lemma 4.8.2.hn+1(i) = Dn+1(i)hn(i)+Bn+1(i)hn(i−1)+Cn+1(i)hn(i+1)+ 1+qn+1 ∑j≥i−1hn( j)F( j, i−1, p),(4.50)whereDn+1(i) =⎛⎝n−(q(i+2)+ (1+q)(pi+1))n+1⎞⎠ ,Bn+1(i) =(1+q)(1+ pi)n+1,Cn+1(i) =q(i+1)n+1,F( j, i, p) =i∑k=0(jk)pk(1− p) j−k−i∑k=0(j+1k)pk(1− p) j+1−k.This lemma can be proved by induction. The complete proof can be found in Appendix 4.8.4.The recursive equation presented in Lemma 4.8.2 is used later to prove that the degree distributionconverges to a power law.Lemma 4.8.3. Let si = ∑ik=1 ai andω(n) = supi≥1hn(i)si, (4.51)1114.8. Proof of Resultswhere hn(i) satisfies (4.50). Then the limit limn→∞ω(n) exists and we have limn→∞ω(n) = 1.Sketch of the proof Knowing that hn(i) satisfies the recurrence formula (4.50), the proof is similarto [41]. Plugging i= n in (4.51) yields ω(n) ≥ hn(n)sn ≥ 1sn ≥ 1. Using the Lemma 4.8.2 and similarto [41], it can be shown that ω(n+ 1) ≤ ω(n). ω(n) is bounded and decreasing, so the limit oflimn→∞ω(n) exists. To show limn→∞ω(n) = 1, we assume that limn→∞ω(n) = c. It can be shownthat if c ̸= 1, ω(n)≤ 1 is violated. Thus c= 1 and the proof is complete.4.8.3 Proof of Lemma 4.8.1Proof.∑j≥ia jai(ji)pi(1− p) j−i =∑j≥i(ij)β(ji)pi(1− p) j−i=∑j≥i(ij)β(jj− i)pi(1− p) j−i=(1+O(1i))∑j≥i(j−βj− i)pi(1− p) j−i=(1+O(1i))pi∑k=0(k+ i−βk)(1− p)k=(1+O(1i))pi∑k=0(β − i−1k)(−1)k(1− p)k=(1+O(1i))pipβ−i−1 =(1+O(1i))pβ−1. (4.52)4.8.4 Proof of Lemma 4.8.2Proof. We prove the lemma by induction on i:For i= 1 It is sufficient to show that:h(n+1,1) =Dn+1(1)h(n,1)+Cn+1(1)h(n,2)+1n+1 ∑ j≥1 h(n, j)F( j,0, p). Also using the definitionof F( j, i, p), we can rewrite F( j,0, p) as (1− p) j− (1− p) j+1. The number of nodes with degree1124.8. Proof of Resultsone at time n+1 can be written as followingE{ f (n+1,1)} =(1− (1+q)(1+ p)+qn)E{ fn(1)}+ 2qnE{ fn(2)}+(1+q)∑j≥11nE{ fn( j)}(1− p) j. (4.53)Note that (4.53) is slightly different from the general equation for each i, (4.98). Because as de-scribed in Section 4.1, neighbors of a node with degree one cannot be eliminated from the graph tomaintain the connectivity in the graph. Therefore, a node with degree one can change in the deletionstep if that node is selected in the deletion step (with probability q). Using (4.53), h(n+1,1) can bewritten ash(n+1,1) =1n+1E{ f (n+1,1)}=1n+1((1− (1+q)(1+ p)+qn)E{ fn(1)}+ 2qnE{ fn(2)})+1n+1 ∑j≥11+qnE{ fn( j)}(1− p) j. (4.54)We know that h(n,0) = 0 for all n. Using the definition of h(·, ·) and (4.53), (4.54) can bere-arranged as followsh(n+1,1) =1n+1((n− ((1+q)(1+ p)+q))h(n,1)+ 2qn(h(n,2)−h(n,1))+(1+q)∑j≥1(h(n, j)−h(n, j−1))(1− p) j)=1n+1((n− (3q+(1+q)(1+ p)))h(n,1)+ 2qnh(n,2))+1+qn+1 ∑j≥1(h(n, j)−h(n, j−1))(1− p) j (4.55)∑ j≥1(h(n, j)−h(n, j−1))(1− p) j can be written in terms of the F( j, i, p).∑j≥1(h(n, j)−h(n, j−1))(1− p) j = ∑j≥1h(n, j)(1− p) j−∑j≥1(h(n, j−1)(1− p) j= ∑j≥1h(n, j)(1− p) j−∑j≥1(h(n, j)(1− p) j +1= ∑j≥1h(n, j)((1− p) j− (1− p) j+1)= ∑j≥1h(n, j)F( j,0, p). (4.56)1134.8. Proof of ResultsSubstituting (4.56) in (4.55) yieldsh(n+1,1) =1n+1((n− ((1+q)(1+ p)+3q))h(n,1)+ 2qnh(n,2)+ (1+q)∑j≥1h(n, j)F( j,0, p))= Dn+1(1)h(n,1)+Cn+1(1)h(n,2)+1+qn+1 ∑j≥1h(n, j)F( j,0, p). (4.57)Thus (4.50) holds for i = 1. Now it is assumed that (4.50) holds for i = k, we want to show that italso holds for i= k+1.E{ f (n+1,k+1)} =(1− q(k+2)+ (1+q)(p(k+1)+1)n)E{ f (n,k+1)}+((1+q)(1+ pk)n)E{ fn(k)}+(q(k+2)n)E{ fn(k+2)}+(1+q)∑j≤kfn( j)n(jk)pk(1− p) j−k. (4.58)from definition of h(n,k), we have : E{ fn(k)} = n(h(n,k)−h(n,k−1)). Eq. (4.58) can be re-written as followsE{ f (n+1,k+1)} =(n−(q(k+2)+ (1+q)(p(k+1)+1)))(h(n,k+1)−h(n,k))+(1+q)(1+ pk)(h(n,k)−h(n,k−1))+q(k+2)(h(n,k+2)−h(n,k+1))+(1+q)∑j≤k(h(n, j)−h(n, j−1))( jk)pk(1− p) j−k. (4.59)Using the Abel summation identity, and knowing thatF( j,k, p) =k∑k=0(jk)pk(1− p) j−k−k∑k=0(j+1k)pk(1− p) j+1−k,the last term can be written as∑j≤k(h(n, j)−h(n, j−1))( jk)pk(1− p) j−k (4.60)= ∑j≥k((jk)pk(1− p) j−k−(j+1k)pk(1− p) j+1−k)− pkh(n,k−1)=−pkh(n,k−1)+∑j≥kh(n, j)(F( j,k, p)−F( j,k−1, p)). (4.61)1144.8. Proof of ResultsSubstituting (4.60) in (4.59) yieldsE{ f (n+1,k+1)} =h(n,k+2)(q(k+2))+h(n,k+1)(n− (2q(k+2)+ (1+q)(p(k+1)+1)))+h(n,k)((1+q)(2+ p(2k+1))+q(k+2)−n)+h(n,k−1)(1+q)(−1− pk− pk)+ (1+q)∑j≥kh(n, j)(F( j,k, p)−F( j,k−1, p)). (4.62)The value of h(n+1,k+1) can be computed using h(n,k+1) and E{ fn(k+1)} as followsh(n+1,k+1) = h(n+1,k)+1n+1E{ f (n+1,k+1)}. (4.63)Eq.(4.62) gives an expression for E{ f (n+1,k+1)} in terms of the value of h(·, ·) at time n. Sub-stituting (4.62) in (4.63) gives a recursive equation for computing h(n+1,k+1):h(n+1,k+1) =h(n+1,k)+1n+1E{ f (n+1,k+1)}=Dn+1(k)h(n,k)+Bn+1(k)h(n,k−1)+Cn+1h(n,k+1)+1+qn+1 ∑j≥k−1h(n, j)F( j,k−1, p)+1n+1(h(n,k+2)(q(k+2))+h(n,k+1)(n− (2q(k+2)+ (1+q)(p(k+1)+1)))+h(n,k)((1+q)(2+ p(2k+1))+h(n,k−1)(1+q)(−1− pk− pk)+ (1+q)∑j≥kh(n, j)(F( j,k, p)−F( j,k−1, p))). (4.64)We assume that (4.98) holds for i= k so substituting the values for Dn+1(k), Bn+1(k), and Cn+1(k)from (4.98) in (4.64) yieldsh(n+1,k+1) =h(n,k+2)(q(k+2)n+1)+h(n,k+1)⎛⎝n−(q(k+3)+ (1+q)(p(k+1)+1))n+1⎞⎠+h(n,k)((1+q)(1+ p(k+1))n+1)+1+qn+1 ∑j≥kh(n, j)(F( j,k, p)). (4.65)1154.8. Proof of Results(4.65)can be written as followsh(n+1,k+1) =Dn+1(k+1)h(n,k+1)+Bn+1(k+1)h(n,k)+Cn+1(k+1)h(n,k+2)+1+qn+1 ∑j≥kh(n, j)F( j,k, p). (4.66)Thus, (4.98) holds for i= k+1 and the proof is completed by induction.4.8.5 Proof of Theorem 4.4.1Proof. Define the Lyapunov function V (x) = (x′x)/2 for x ∈RN0 . Use En to denote the conditionalexpectation with respect to the σ -algebraHn generated by {z j(θ j),θ j, j ≤ n}. Then,En{V (g˜n+1)−V(g˜n)}= En{g˜′n[−ε g˜n+ ε (zn(θn)−g(θn))+g(θn)−g(θn+1)]}+En{∥− ε g˜n+ ε (zn(θn)−g(θn))+g(θn)−g(θn+1)∥2}(4.67)where zn(θn) and g(θn) are vectors in RN0 with elements zin(θn) and g(θn)i, 1 ≤ i ≤ N0, respec-tively. Due to the Markovian assumption and the structure of the transition matrix of θn, definedin (4.2),En{g(θn)−g(θn+1)} = E{g(θn)−g(θn+1)|θn} =M∑i=1E{g(i)−g(θn+1)|θn = i}I {θn = i}=M∑i=1[g(i)−M∑j=1g( j)Aρi j]I {θn = i} =−ρM∑i=1M∑j=1g( j)qi jI {θn = i} = O(ρ) (4.68)where I {·} denotes the indicator function. Similarly, it is easily seen thatEn{∥g(θn)−g(θn+1)∥2} = O(ρ). (4.69)Using K to denote a generic positive value (with the notation KK=K and K+K=K), a familiarinequality ab≤ a2+b22 yieldsO(ερ) = O(ε2+ρ2). (4.70)Moreover, we have ∥g˜n∥= ∥g˜n∥ ·1≤ (∥g˜n∥2+1)/2. Thus,O(ρ)∥g˜n∥ ≤ O(ρ)(V (g˜n)+1) . (4.71)1164.8. Proof of ResultsThen, detailed estimates lead toEn{∥∥∥− ε g˜n+ ε (zn(θn)−g(θn))+g(θn)−g(θn+1)∥∥∥2}≤ KEn{ε2∥g˜n∥2+ ε2∥(zn(θn)−g(θn)∥2+ ε2∥∥g˜′n (zn(θn)−g(θn+1))∥∥+ ε∥g˜′n (g(θn)−g(θn+1))∥+ ε∥(zn(θn)−g(θn))′ (g(θn)−g(θn+1))∥}+En{∥g(θn)−g(θn+1)∥}2. (4.72)It follows thatEn{∥∥∥− ε g˜n+ ε (zn(θn)−g(θn))+g(θn)−g(θn+1)∥∥∥2}=O(ε2+ρ2)(V (g˜n)+1). (4.73)Furthermore,En{V (g˜n+1)−V (g˜n)} =−2εV (g˜n)+ εEn{g˜′n[zn(θn)−g(θn)]}+En{g˜′n[g(θn+1)−g(θn)]}+O(ε2+ρ2)(V (g˜n)+1).(4.74)To obtain the desired bound, define Vρ1 and Vρ2 as follows:Vρ1 (g˜,n) = ε∞∑j=ng˜′En{z j(θ j)−g(θ j)},Vρ2 (g˜,n) =∞∑j=ng˜′En{g(θ j)−g(θ j+1)}. (4.75)It can be shown that|V ρ1 (g˜,n)| = O(ε)(V (g˜)+1),|V ρ2 (g˜,n)| = O(ρ)(V (g˜)+1).(4.76)DefineW (g˜,n) asW (g˜,n) =V (g˜)+V ρ1 (g˜,n)+Vρ2 (g˜,n). (4.77)This leads toEn{W (g˜n+1,n+1)−W(g˜n,n)} = En{V ρ1 (g˜n+1,n+1)−V ρ1 (g˜n,n)}+En{V (g˜n+1)−V (g˜n)}+En{V ρ2 (g˜n+1,n+1)−V ρ2 (g˜n,n)}. (4.78)1174.8. Proof of ResultsMoreover,En{W (g˜n+1,n+1)−W (g˜n,n)} =−2εV (g˜n)+O(ε2+ρ2)(V (g˜n)+1). (4.79)Equation (4.79) can be rewritten asEn{W (g˜n+1,n+1)−W (g˜n,n)}≤O(ε2+ρ2)(W (g˜n,n)+1)−2εW(g˜n,n). (4.80)If ε and ρ are chosen small enough, then there exists a small λ such that −2ε+O(ρ2)+O(ε2)≤−λε . Therefore, (4.80) can be rearranged asEn{W (g˜n+1,n+1)}≤ (1−λε)W (g˜n,n)+O(ε2+ρ2). (4.81)Taking expectation of both sides yieldsE{W (g˜n+1,n+1)}≤ (1−λε)E{W (g˜n,n)}+O(ε2+ρ2). (4.82)Iterating on (4.82) then resultsE{W (g˜n+1,n+1)}≤ (1−λε)n−NρE{W (g˜Nρ ,Nρ)}+n∑j=NρO(ε2+ρ2)(1−λε) j−Nρ . (4.83)As the result,E{W (g˜n+1,n+1)}≤ (1−λε)n−NρE{W (g˜Nρ ,Nρ)}+O(ε+ρ2/ε). (4.84)If n is large enough, one can approximate (1−λε)n−Nρ = O(ε). Therefore,E{W (g˜n+1,n+1)}≤ O(ε+ρ2ε)(4.85)Finally, using (4.76) and replacingW (g˜n+1,n+1) with V (g˜n+1), we obtainE{V (g˜n+1)}≤ O(ρ+ ε+ρ2ε). (4.86)4.8.6 Sketch of the Proof of Theorem 4.4.2Proof. Since the proof is similar to [154, Theorem 4.5], we only indicate the main steps in whatfollows and omit most of the verbatim details.1184.8. Proof of ResultsStep 1: First, we show that the two component process (ĝε (·),θε (·)) is tight in D([0,T ] : RN0×M). Using techniques similar to [156, Theorem 4.3], it can be shown that θε(·) converges weaklyto a continuous-time Markov chain generated by Q. Thus, we mainly need to consider ĝε(·). Weshow thatlim∆→0limsupε→0E[sup0≤s≤∆Eεt ∥ĝε (t+ s)− ĝε(t)∥2]= 0 (4.87)where Eεt denotes the conditioning on the past information up to t. Then, the tightness follows fromthe criterion [97, p. 47].Step 2: Since (ĝε(·),θε (·)) is tight, we can extract weakly convergent subsequence accordingto the Prohorov theorem; see [99]. To figure out the limit, we show that (ĝε(·),θε (·)) is a solutionof the martingale problem with operator L0. For each i ∈M and continuously differential functionwith compact support f (·, i), the operator is given byL0 f (ĝ, i) = ∇ f′(ĝ, i)[−ĝ+g(i)]+ ∑j∈Mqi j f (ĝ, j), i ∈M . (4.88)We can further demonstrate the martingale problem with operator L0 has a unique solution (in thesense of in distribution). Thus, the desired convergence property follows.4.8.7 Sketch of the Proof of Theorem 4.4.3Proof. The proof comprises of four steps as described below:Step 1: First, noteνn+1 = νn− ενn+√ε(yn+1−Eg(θn))+ E[g(θn)−g(θn+1]√ε. (4.89)The approach is similar to that of [154, Theorem 5.6]. Therefore, we will be brief.Step 2: Define an operatorL f (ν , i) =−∇ f ′(ν , i)ν + 12tr[∇2 f (ν , i)Σ(i)]+ ∑j∈Mqi j f (ν , j), i ∈M , (4.90)for function f (·, i) with compact support that has continuous partial derivatives with respect to ν upto the second order. It can be shown that the associated martingale problem has a unique solution(in the sense of in distribution).1194.8. Proof of ResultsStep 3: It is natural now to work with a truncated process. For a fixed, but otherwise arbitraryr1 > 0, define a truncation functionqr1(x) ={1, if x ∈ Sr1 ,0, if x ∈ RN0−Sr1 ,where Sr1 = {x ∈ RN0 : ∥x∥ ≤ r1}. Then, we obtain the truncated iteratesνr1n+1 = νr1n − ενr1n qr1(νr1n )+√ε(yn+1−Eg(θn))+ E[g(θn)−g(θn+1]√εqr1(νr1n ). (4.91)Define νε ,r1(t) = νr1n for t ∈ [εn,εn+ ε). Then, νε ,r1(·) is an r-truncation of νε(·); see [99, p.284] for a definition. We then show the truncated process (νε ,r1(·),θε (·)) is tight. Moreover, byProhorov’s theorem, we can extract a convergent subsequence with limit (νr1(·),θ(·)) such that thelimit (νr1(·),θ(·)) is the solution of the martingale problem with operator L r1 defined byL r1 f r1(ν , i) =−∇′ f r1(ν , i)ν + 12tr[∇2 f r1(ν , i)Σ(i)]+ ∑j∈Mqi j fr1(ν , j) (4.92)for i ∈M , where f r1(ν , i) = f (ν , i)qr1(ν).Step 4: Letting r1 → ∞, we show that the un-truncated process also converges and the limit,denoted by (ν(·),θ(·)), is precisely the martingale problem with operator L defined in (4.90). Thelimit covariance can further be evaluated as in [154, Lemma 5.2].4.8.8 Proof of Theorem 4.5.2Proof. The proof of the theorem is divided into several steps and uses techniques in stochasticapproximation [99] but with the modification that ℓ2 is a Hilbert space (see [61, 98]). Wheneverpossible, we only indicate the main idea and refer to the literature of stochastic approximation.Step 0: Note that (4.24) has a unique solution for each initial condition since it is linear in ĝ(·).Step 1: Preliminary estimates. From (4.23), we obtain that for 0< ε < 1, the elements of ĝn arenon-negative and add up to one. Thus, ĝn is bounded.In addition, define V (ĝ) = 12〈ĝ− g, ĝ− g〉, which can be thought of as a Lyapunov function.Then using perturbed Lyapunov function argument [99], it can be shownEV (ĝn) =O(ε). (4.93)Step 2: Tightness of {ĝε (·)}. Henceforth, we often use t/ε and (t+ s)/ε to denote ⌊t/ε⌋ and⌊(t+ s)/ε⌋, the integer parts of t/ε and (t+ s)/ε , respectively. By using the boundedness of {ĝn}established in the first step together with the Ho¨lder inequality, we have for each 0 < T < ∞, any1204.8. Proof of Resultst ≥ 0, any 0< δ , any 0< s≤ δ , and ε > 0,Eεt ∥ĝε (t+ s)− ĝε(t)∥2≤ Eεt∣∣∣∣∣ε (t+s)/ε−1∑j=t/ε[z j− ĝ j]∣∣∣∣∣2≤ Kε(t+ sε− tε)≤ Ks,where K > 0 is independent of ε and Eεt denotes the conditional expectation with respect to Fεt .ThusEεt ∥ĝε (t+ s)− ĝε(t)∥2 ≤ Ks,limδ→0limsupε→0E[ sup0<s≤δEεt ∥ĝε(t+ s)− ĝε(t)∥2] = 0. (4.94)The tightness criterion (see [97, Theorem 3, p. 47] with Rr replaced by ℓ2; see also [61]) enables usto conclude that {ĝε (·)} is tight in D([0,∞) : ℓ2).Step 3: Characterization of the limit process. Since {ĝε (·)} is tight, by Prohorov’s theorem,we can extract a convergent subsequence. Select such a sequence and still denote it by ĝε (·) withlimit denoted by ĝ(·). By using the Skorohod representation, with a slight abuse of notation, wemay assume that ĝε (·) converges to ĝ(·) w.p.1 and the convergence is uniform on any bounded timeinterval. We shall show that ĝ(·) is a solution of the martingale problem with operatorL f (ĝ) =〈∇ f (ĝ), [g− ĝ]〉for any f (·)∈C10(ℓ2 :R) (collection of real-valuedC1 functions defined on ℓ2 with compact support).We need to show thatf (ĝ(t))− f (ĝ(0))−∫ t0L f (ĝ(τ))dτ is a marginale.To prove the martingale property, we pick out any bounded and continuous function h(·) definedon ℓ2, any T < ∞, any 0 < t,s ≤ T , any positive integer κ , and tl1 ≤ t for any l ≤ κ . To derive thedesired property, we need only show thatEh(ĝ(tl1) : l1 ≤ κ)(f (ĝ(t+ s))− f (ĝ(t))−∫ t+stL f (ĝ(τ))dτ)= 0. (4.95)To prove (4.95), we work with the process indexed by ε . First, by the weak convergence and theSkorohod representation,limε→0Eh(ĝε(tl1) : l1 ≤ κ)[ f (ĝε(t+ s))− f (ĝε(t))]= Eh(ĝ(tl1) : l1 ≤ κ)[ f (ĝ(t+ s))− f (ĝ(t))].(4.96)1214.8. Proof of ResultsChoose a sequence of integers {mε} such that mε → ∞ as ε → 0 but ∆ε = εmε → 0. Next, we notef (ĝε(t+ s))− f (ĝε(t)) = f (ĝ(t+s)/ε)− f (ĝt/ε)=(t+s)/ε−1∑lmε=t/ε[ f (ĝlmε+mε )− f (ĝlmε )]= ε(t+s)/ε−1∑lmε=t/ε〈∇ f (ĝlmε ),lmε+mε−1∑j=lmε[z j− ĝ j]〉+o(1)=(t+s)/ε−1∑lmε=t/ε∆ε〈∇ f (ĝlmε ),1mεlmε+mε−1∑j=lmε[z j− ĝ j]〉+o(1),where o(1)→ 0 in probability as ε → 0. The stationarity and the mixing condition imply that1mεlmε+mε−1∑j=lmεElmε z j → Ez0 =∞∑i=0eiP(y0 = i) = g in probability.Therefore,Eh(ĝε (tl1) : l1 ≤ κ)[(t+s)/ε−1∑lmε=t/ε∆ε〈∇ f (ĝlmε ),1mεlmε+mε−1∑j=lmεz j〉]= Eh(ĝε (tl1) : l1 ≤ κ)[(t+s)/ε−1∑lmε=t/ε∆ε〈∇ f (ĝlmε ),1mεlmε+mε−1∑j=lmεElmε z j〉]→ Eh(ĝ(tl1) : l1 ≤ κ)(∫ t+st〈∇ f (ĝ(τ)),g〉dτ)as ε → 0.(4.97)Likewise,Eh(ĝε(tl1) : l1 ≤ κ)[−(t+s)/ε−1∑lmε=t/ε∆ε〈∇ f (ĝlmε ),1mεlmε+mε−1∑j=lmεĝ j〉]→ Eh(ĝ(tl1) : l1 ≤ κ)[−∫ t+st〈∇ f (ĝ(τ)), ĝ(τ)〉dτ].(4.98)Combing (4.96)–(4.98), (4.95) follows.4.8.9 Proof of Theorem 4.5.3Proof. In the proof of Theorem 4.5.3, we use several lemmas and propositions described below.From (4.23),νn+1 = νn− ενn+√ε(zn−g). (4.99)Lemma 4.8.1. Under assumption Theorem 4.5.2, for sufficiently small ε , there is an Nε such thatEV (νn) =O(1) for all n≥ Nε .1224.8. Proof of ResultsProof The proof uses a perturbed Lyapunov function argument.To proceed, recall the definition of covariance operator and Wiener process [46, 98] on ℓ2. Acovariance Γ of an ℓ2-valued random variable y is an operator from ℓ2 to ℓ2 defined by Γv=EY〈v,y〉for any v ∈ ℓ2. A process W (·) is a zero mean (stationary increment) ℓ2-valued Wiener process ifthere are mutually independent real-valued, zero mean, Wiener processes {Wi(·)} with covariancestρi satisfying ∑∞i=0ρi < ∞ and there is an orthonormal sequence {βi} with βi ∈ ℓ2 such thatW (t) =∑∞i=0Wi(t)βi. For v,z ∈ ℓ2, the covariance operator ofW (t) is defined byE〈W (t),v〉〈W (t),z〉= t〈z,Γv〉= t∞∑i=0ρi〈βi,v〉〈βi,z〉. (4.100)Lemma 4.8.2. Assume the conditions of Theorem 4.5.2. For any natural number i ∈ N, defineW εi (t) =√εt/ε−1∑j=0〈z j−g,ei〉.Then W εi (·) converges weakly to a real-valued Wiener process Wi(·) with covariance tσ 2i , whereσ 2i = E[〈z0−g,ei〉]2+2∞∑j=1E〈z0−g,ei〉〈z j−g,ei〉. (4.101)Proof Note that with the use of inner product in ℓ2, {〈zn−g,ei〉} is a real-valued mixing sequencewith mean 0. The desired convergence follows from the functional invariance principle for mixingprocess; see [99, Chapter 7] (see also [25, 61]).Lemma 4.8.3. Under the conditions of Lemma 4.8.2, for i ̸= l, EW εi (t)W εl (t) = 0. As a result, thelimit Wiener processes Wi(·) and Wl(·) are independent.Proof It is straightforward thatEW εi (t)Wεl (t) = εEt/ε−1∑k=0t/ε−1∑j=0〈z j−g,ei〉〈zk−g,el〉= εEt/ε−1∑k=0t/ε−1∑j=0〈z j−g,eie′l(zk−g)〉= 0 since eie′l = 0 ∈ R∞×∞.Since EW εi (t) = 0, we conclude that Σ(Wεi (t),Wεl (t)) = 0. Consequently, Σ(Wi(t),Wl(t)) = 0, andas a resultWi(t) andWl(t) are independent Wiener processes.1234.8. Proof of ResultsProposition 4.8.4. Under the conditions of Lemma 4.8.2, defineW ε(t) =√εt/ε−1∑j=0[z j−g]. (4.102)Then W ε(·) converges weakly to W (·) such thatW (t) =∞∑i=0Wi(t)ei, (4.103)and the covariance operator is given byE〈W (t),v〉〈W (t),z〉= t〈z,Γv〉= t∞∑i=0σ 2i〈ei,v〉〈ei,z〉for v,z ∈ ℓ2, (4.104)where σ 2i is defined in (4.101).Proof In view of the definition of (4.102), for any δ > 0, t > 0, 0 < s ≤ δ , with Eεt denotes theconditional expectation with respect toF εt , using the mixing properties, we can show thatlimδ→0limsupε→0[ sup0≤δ≤sEεt〈W ε(t+ s)−W ε(t),W ε(t+ s)−W ε(t)〉] = 0.ThusW ε(·) is tight in D([0,∞);ℓ2). We can extract any weakly convergent subsequence and denotethe limit byW (·). We next characterize its limit.Again, using (4.102)W ε(t) =∞∑i=0W εi (t)ei =√ε∞∑i=0t/ε−1∑j=0〈z j−g,ei〉ei.Therefore, for each l ∈ N,E[〈W ε(t),el〉]2 = E∥W εl (t)∥2 = tσ 2l .By virtue of exponential decay property of gi ∝ i−β ,∑∞l=0σ 2l <∞.By Lemma 4.8.2,Wεi (·) convergesweakly to Wi(·). By virtue of Lemma 4.8.3, Wi(·) are independent Wiener processes. In view ofthe definition of Wiener process on ℓ2, we conclude thatW ε(·) converges weakly toW (·) such that(4.103) holds. In addition, the structure of the covariance operator (4.104) is obtained.We proceed to obtain the desired weak convergence of νε(·). Since the stochastic differentialequation (4.25) is linear, there is a unique solution for each initial condition. The rest of the proofis similar to the finite dimensional counter part with necessary modifications similar to that of theproof of Theorem 4.5.2.1244.8. Proof of Results4.8.10 Proof of Theorem 4.5.4Proof. Before proceeding to the main proof, we first state a preliminary result. The proofs of theassertions below can be found in [156, Theorem 3.6 and Theorem 4.3, respectively] and are thusomitted.Lemma 4.8.5. Under Assumption 4.2.1, the following claims hold:(a) Denote pρn = [P(θρn = 1), . . . ,P(θρn =M)] and the n-step transition probability by (Aρ)n withAρ given in (4.2) with ρ = ε2. Thenpρn = p(ρn)+O(ρ+ρ−k0t/ρ),(Aρ)n−n0 = Ξ(ρn,ρn0)+O(ρ+ e−k0(t−t0)/ρ),(4.105)where p(t) ∈R1×M and Ξ(t, t0)∈RM×M are the continuous-time probability vector and tran-sition matrix satisfyingdp(t)dt= p(t)Q, p(0) = p0,dΞ(t, t0)dt= Ξ(t, t0)Q, Ξ(t0, t0) = I,(4.106)with t0 = ρn0 and t = ρn.(b) θρ(·) converges weakly to θ(·), a continuous-time Markov chain generated by Q.To analyze the algorithm, the techniques developed in the proof of Theorem 4.5.2 are usedalong with the ideas and methods developed in [155]. The developments are similar in the approachand the results, but are more complex due to the added switching process. For example, withmodifications, Step 1 in the proof of Theorem 4.5.2 can still be carried out. Also Step 2 can beproved. So the sequence {ĝε (·)} is tight.To characterize the limit, we still use martingale averaging techniques. We shall only highlightthe main difference here. In carrying out the analysis similar to that of Step 3 in the proof ofTheorem 4.5.2, we will encounter the following termEh(ĝε(tl1) : l1 ≤ κ)[(t+s)/ε−1∑lmε=t/ε∆ε〈∇ f (ĝlmε ),1mεlmε+mε−1∑j=lmεYj(θ j)〉]= Eh(ĝε (tl1) : l1 ≤ κ)[(t+s)/ε−1∑lmε=t/ε∆ε〈∇ f (ĝlmε ),1mεlmε+mε−1∑j=lmεElmεYj(θ j)〉]= Eh(ĝε (tl1) : l1 ≤ κ)[(t+s)/ε−1∑lmε=t/ε∆ε〈∇ f (ĝlmε ),1mεM∑θ=1lmε+mε−1∑j=lmεElmεYj(θ)I{θ j=θ}〉].(4.107)1254.8. Proof of ResultsSince Yj(θ) and θ j are independent, we have1mεM∑θ=1lmε+mε−1∑j=lmεElmεYj(θ)I{θ j=θ}=1mεM∑ι0=1M∑θ=1lmε+mε−1∑j=lmεElmεYj(θ)P(θ j = θ |θlmε = ι0)I{θlmε=ι0}.(4.108)For each θ ∈M , the averaging of Yj(θ) can be carried out as in Case 1. We concentrate on the terminvolving Markov chain. By virtue of Lemma 4.8.5, noting ρ = ε2 and using (4.105), we have[Aρ ] j−lmε = Ξ(ε2 j,ε2lmε)+O(ε2+ e−k0(ε2 j−ε2lmε )/ε2).Because we are working with (4.28) and the stepsize is ε . In the interval [l∆ε , l∆ε + ∆ε) with∆ε = εmε , it is readily seen that Ξ(ε2 j,ε2lmε)→ Ξ(0,0) = I as ε → 0. As a result,P(θ j = θ |θlmε = ι0)+oε(1) = δι0,θ +oε(1) ={1, if ι0 = θ0, otherwise+oε(1),where o(1)→ 0 as oε(1)→ 0 as ε → 0. Putting the above estimates in (4.108), we obtain the limitin probability of1mεlmε+mε−1∑j=lmεElmεYj(θ)P(θ j = θ |θlmε = ι0)I{θlmε=ι0}is the same as that of∞∑i=1eiθi(θ)δι0θ I{θε2(ε2lmε )=θ0}.This further leads to that as ε → 0,Eh(ĝε (tl1) : l1 ≤ κ)[(t+s)/ε−1∑lmε=t/ε∆ε〈∇ f (ĝlmε ),1mεlmε+mε−1∑j=lmεYj(θ j)〉]→ Eh(ĝ(tl1) : l1 ≤ κ)[∫ t+st〈∇ f (ĝ(τ)),θi(θ)P(θ(0) = θ)〉dτ]= Eh(ĝ(tl1) : l1 ≤ κ)[∫ t+st〈∇ f (ĝ(τ)),θi(θ)ei pθ〉dτ].1265ConclusionsThe unifying theme of this thesis was to devise a set of theories and methods for statistical sig-nal processing on graphs (possibly random) which involves multi-agent Bayesian estimation, so-cial learning, stochastic approximation algorithms and adaptive filtering, and dynamics of randomgraphs to understand the effects of the interactions among agents on the estimation/tracking prob-lem. Part I of this dissertation was devoted to mis-information management problem in multi-agentstate estimation over social networks. Part II of this thesis deals with tracking a time-varying de-gree distribution of a dynamic social network using noisy observations. This chapter concludes thiswork and presents a summary of findings along with some some directions for future research anddevelopment.5.1 Summary of Findings in Part IOver the last decade there has been a growing interest in social networks which facilitate our day-to-day activities such as our decision-makings, social communications, and sharing news or stories.Many of these activities involve estimation, learning or decision making using social networks suchas rating and review systems, micro-blogging platforms, and online social networks. In these esti-mation problems, structure of the underlying social network imposes a communication constraintand dictates who talks to whom. First part of this thesis was motivated by such social networksthat comprises of a set of agents (social sensors) that seek to estimate an underlying state of natureinteractively. Part I dealt with mis-information management problem in two different informationexchange protocols:• Chapter 2 considered an information exchange protocol where agents broadcast their (private)beliefs over the network. In such a protocol, each agent records its (private) observations,then, it combines it with the information received from other agents to form its belief aboutstate of nature. Finally, it transmits the updated belief over the network. Note that this is notsocial learning, since (private) beliefs of agents are transmitted.• Chapter 3 used social learning to model the interactions among agents. In this model, eachagent computes its private belief using the local observation and the information receivedfrom the network. Then, based on its private belief, it chooses an action from a finite set such1275.1. Summary of Findings in Part Ithat a local cost function is minimized. In the social learning context considered in Chapter 3,as opposed to the one in Chapter 3, private beliefs of agents which depend on the privateobservations are not communicated to other agents. Instead, public beliefs of them, whichcan be computed directly from their actions, are broadcasted over the network.In both of the above protocols, mis-information propagation arises as a result of the correlationintroduced by the loops in the communication graph and the recursive nature of Bayesian models.In Chapter 2, we present an optimal information aggregation scheme that completely removes themis-information associated with estimates of agents under some conditions on the topology of thecommunication graph. The optimal mis-information removal algorithm proposed in Chapter 2 re-quires knowledge of transitive closure matrix of the communication graph. For the scenarios wheredue to a random delay in communications among agents the transitive closure matrix is not com-pletely known, a sub-optimal algorithm is proposed to mitigate “double counting” events which aremore likely to happen.The social learning model considered in Chapter 3 results in two interesting phenomena: (i)herding where all rational agents end up choosing the same action and (ii) mis-information propa-gation which produces a bias in the public belief and results in overconfidence. Inspired by onlinerating and review systems, Chapter 3 presented a 5-step protocol to mimic the interactions amongagents (social sensors) that aim to estimate an underlying state of nature. It then introduced a fairprotocol that prevents mis-information propagation and was used as a benchmark. Using that, analgorithm is invented for the administrator of the rating system to deploy such that it maintainsfair ratings. The results of a psychology experiment on a group which has been conducted by ourcolleagues at the University of British Columbia on a group of undergraduate students to study thelearning behavior of humans in a society. The experiment showed that the interactions of the agentscan be modeled using a social learning model. We further showed that, as a result of the informationexchange protocol (between individuals within a group) and the recursive nature of decision makingprocess, data incest arises in a large fraction of trials in the experiment.ToolsGraph theory first started 250 years ago with a paper written by Leonhard Eu¨ler on the SevenBridges of Knigsberg published in 1763 [23]. Since then, it has became a powerful tool to modelseveral networks. Each vertex denotes an agent (or group of individuals) in the social network andeach edge depicts a relationship between different agents in the social network. In this work, graphtheoretic tools and definitions are used to model the flow of information through the network. Also,the necessary and sufficient condition for complete mis-information removal is presented in termsof adjacency and transitive closure matrices of the underlying communication graph.Social learning is another mathematical abstraction which is used in this work to model the1285.2. Summary of Findings in Part IIinteractions among agents (social sensors) in social networks.5.2 Summary of Findings in Part IIThe second part of this thesis was motivated by the importance of the degree distribution in analysisof social networks. The interaction between nodes in dynamic social networks is not always fixedand may evolve over time. An example of such time-varying dynamics is the seasonal variations infriendships among college students. Chapter 4 considered social networks where dynamics of theunderlying graph is evolving according to realization of a Markov chain. The Markov-modulatedrandom graph generated by Algorithm 4.5 mimics such networks where the dynamics (the connec-tion/deletion probabilities p,q) depend on the state of nature and evolve over time. Algorithm 4.5models these time variations as a finite state Markov chain {θn}. This model forms our basis foranalysis of social networks.Markov-modulated duplication-deletion random graphs are analyzed in terms of degree distri-bution. When the size of graph is fixed (r = 0) and ρ is small, the expected degree distribution ofthe Markov-modulated duplication-deletion random graph can be uniquely computed from (4.6) foreach state of the underlying Markov chain. This result allows us to express the structure of network(degree distribution) in terms of the dynamics of the model. We also showed that, when the size ofthe graph is fixed and there is no Markovian dynamics, the random graph generated according toAlgorithm 4.5 satisfies a power law with exponent computed from (4.11). The importance of thisresult is that a single parameter (power law exponent) characterizes the structure of a possibly verylarge dynamic network.Moreover, a stochastic approximation algorithm is presented to estimate the empirical degreedistribution of the finite duplication-deletion random graph using noisy observations. The pro-posed stochastic approximation algorithm (4.14) does not assume any knowledge of the Markov-modulated dynamics of the graph (state of nature). Since the expected degree distribution can beuniquely computed for each state of underlying Markov chain, a social sensor can be designed basedon (4.14) to track the state of nature using the noisy observations of nodes’ degrees. These noisyobservations can be samples of the degree sequence of each node; that is, some nodes are randomlychosen and inquired about the number of connections that they have. Using perturbed Lyapunovfunction, we showed that the tracking error of the stochastic approximation algorithm is small andbounded.Then, in Chapter 4 a Hilbert-space-valued stochastic approximation algorithm is presented totrack the expected degree distribution of the infinite duplication-deletion random graph withoutMarkovian dynamics. The asymptotic behavior of such an algorithm is analyzed in terms of thepower law degree distribution. Finally, we extended the analysis to a Hilbert-space-valued stochasticapproximation algorithm that aims to track a Markov-modulated probability mass function with1295.3. Directions for Future Research and Developmentdenumerable support. Using weak convergence methods, it was shown that the estimates obtainedvia such an algorithm converge weakly to the solution of an ordinary differential equation. It wasalso shown that the interpolated sequence of scaled tracking error converges weakly to the solutionof a stochastic differential equation.ToolsWe borrowed techniques from graph theory to perform the degree distribution analysis providedin Chapter 4. Such techniques were previously used in analysis of complex networks in the socialand economic networks literature, see [41, 80]. Adaptive filtering, stochastic approximation, andMarkov-switched systems are among the tools which are employed in this chapter in order to esti-mate the expected degree distribution and consequently the state of underlying Markov chain. Weakconvergence analysis and functional central limit theorem are used as mathematical abstractions toanalyze the performance of such tracking algorithms.5.3 Directions for Future Research and DevelopmentThere is clearly much work to be done in the area of signal processing on complex (and possiblyrandom) networks to understand the behavior of agents in social networks. In this section we presentsome of the immediate extensions of the work presented in this dissertation.Mis-information removal algorithms in Bayesian quickest-time change detection:In change detection problem, the objective is to detect a random change time by optimizing thetrade-off between number of observations and delay penalty [124, 134]. This problem is very simi-lar to the sensing problems that have been considered in the first part of this dissertation and involveinteractive sensing with the goal of detecting a random change in state of nature. Multi-agentBayesian change detection involves a set of agents (sensors) where each agent estimates an under-lying state and then, using Bayesian models, updates the posterior distribution of the change. Thenit sends this updated posterior distribution (or a myopic action obtained based on a local cost opti-mization [87]) over the network. This process repeats until a global decision maker detects a changeand all agents stops making observations. In other words, using all the local information (decisionsor distributions), the goal in the quickest-time change detection is to detect a change and make aglobal decision. Because of the recursive nature of the Bayesian estimators and the information ex-change protocol, mis-information propagation can arise in such system. One immediate extensionof the work presented in this dissertation is to investigate the effect of mis-information propagationin such interactive sensing scenarios and possibly to devise a mis-information removal algorithm1305.3. Directions for Future Research and Developmentfor each agent to employ, such that the decision of the global decision maker is not affected by themis-information propagation.Analyzing spread of contagious disease through networkDiffusion of information and disease through society has been studied extensively in the literatureof social network analysis, see [80, 107, 108, 122, 146]. One of the models that used to model thediffusion of information in the social network is called the Susceptible-Infected-Susceptible (SIS)model [122]. Consider a social network where agents interacts with other nodes that are dictatedby the structure of the network. Each agent in such a social network can be in two states: (i)infected, or (ii) not infected but susceptible to becoming infected. In SIS model, infected nodescan recover and become susceptible again. In the model considered in [107, 108], the assumptionis that the degree of nodes remain fixed. One extension or the work presented in the second partof this thesis, is to study the diffusion of information in dynamic graphs (where the graph is notfixed and is evolving according to Procedure 4.5 presented in Chapter 4). Investigating contagion insuch dynamic networks is non-standard in two ways: First, the spread of disease44 is dynamic anddepends on the number and the distribution of infected nodes over the network. Second, the structureof the underlying graph (which dictates who talks to whom) is evolving over time. Analyzingdiffusion process in such networks (both in stationary state and in transient phase) can be a possibleavenue for further research and development.Mis-information removal algorithms in non-Bayesian modelsIn our work, we addressed mis-information management problem in two scenarios: (i) constrainedfiltering and (ii) social learning over social networks. In both of these scenarios, Bayesian modelsfor information aggregation have been used. There exists a large body of works in the literaturethat consider Bayesian models to formulate the learning behavior of humans [1, 13, 16, 24, 136].On the other hand, a body of social learning literature focuses on non-Bayesian social learningmodels, see [3, 14, 15, 56, 81]. In these models, agents use a simple rule-of-thumb to updatetheir beliefs from their private information about state of nature and those received from otheragents. Similar to the learning problems considered in Part I of this dissertation, as a result ofthe recursive nature of the information aggregation schemes and possible loops in communicationgraph, mis-information propagation may arise in such non-Bayesian learnings over social networks.Mis-information management in learning with non-Bayesian models over social networks can be apossible direction for further research and development.44Instead of spread of disease or information, we can also consider a scenario where nodes are interactively decidingwhether to adapt a technology (infected) or not (remain susceptible). The aim now is to study the rate of adaptation ofnew technology in the network.1315.3. Directions for Future Research and DevelopmentMis-information propagation mitigation for specific network structuresIn the proposed mis-information algorithm presented in this thesis, we used a combination of theprevious estimates in order to completely mitigate data incest. However, the optimality of thisapproach demands a topological constraint to hold. An extension to the work presented in thisdissertation is to investigate mis-information management problem in graphs with specific structuresthat do not necessarily satisfy such a constraint. An alternative approach is to “censor” the estimatesof some nodes that are most likely to be polluted with mis-information. This can be viewed asdeliberately cutting some edges in order to lower the risk of mis-information propagation. Cuttingthese edges can be useful for some specific graph structures (like bi-bipartite graphs) speciallywhen the topological constraints is not satisfied and, thus, optimal mis-information removal is notpossible.132Bibliography[1] D. Acemoglu, M. A. Dahleh, I. Lobel, and A. Ozdaglar. Bayesian learning in social networks.The Review of Economic Studies, 78(4):1201–1236, 2011.[2] D. Acemoglu and A. Ozdaglar. Opinion dynamics and learning in social networks. DynamicGames and Applications, 1(1):3–49, Mar. 2011.[3] D. Acemoglu, A. Ozdaglar, and A. ParandehGheibi. Spread of (mis) information in socialnetworks. Games and Economic Behavior, 70(2):194–227, 2010.[4] L. Adamic and E. Adar. How to search a social network. Social Networks, 27(3):187–203,Jul. 2005.[5] T. Aeppel. 50 million users: The making of an angry birds internet meme. Wall StreetJournal, Mar. 2015.[6] C. C. Aggarwal and T. Abdelzaher. Integrating sensors and social networks. Social NetworkData Analytics, pages 397–412, 2011.[7] D. Agrawal, C. Budak, and A. El Abbadi. Information diffusion in social networks: Observ-ing and affecting what society cares about. In Proceedings of the 20th ACM InternationalConference on Information and Knowledge Management, CIKM ’11, pages 2609–2610, NewYork, NY, USA, 2011. ACM.[8] Y. Y. Ahn, S. Han, H. Kwak, S. Moon, and H. Jeong. Analysis of topological characteristicsof huge online social networking services. In Proceedings of 16th International Conferenceon World Wide Web, pages 835–844, Banff, AB, Canada, May 2007.[9] M. Albakour, C. Macdonald, and I. Ounis. Identifying local events by using microblogs associal sensors. In Proceedings of 10th Conference on Open Research Areas in InformationRetrieval, pages 173–180, Lisbon, Portugal, May 2013.[10] R. Albert and A.-L. Baraba´si. Statistical mechanics of complex networks. Reviews of ModernPhysics, 74(1):47–97, Jan. 2002.133Bibliography[11] S. Asur and B. A. Huberman. Predicting the future with social media. In 2010IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Tech-nology, volume 1, pages 492–499, Toronto, ON, Canada, 2010.[12] R. J. Aumann. Agreeing to disagree. The Annals of Statistics, 4(6):1236–1239, Nov. 1976.[13] R. J. Aumann. Correlated equilibrium as an expression of bayesian rationality. Econometrica,55(1):1–18, Jan. 1987.[14] V. Bala and S. Goyal. Learning from neighbours. The review of economic studies, 65(3):595–621, 1998.[15] V. Bala and S. Goyal. Conformism and diversity under social learning. Economic theory,17(1):101–120, 2001.[16] A. Banerjee. A simple model of herd behavior. Quaterly Journal of Economics, 107(3):797–817, Aug. 1992.[17] A.-L. Baraba´si and A. Reka. Emergence of scaling in random networks. Science,286(5439):509, 1999.[18] T. Bayes. An essay toward solving a problem in the doctrine of chances. PhilosophicalTransactions of the Royal Society of London, 53, 1764.[19] A. Benveniste, M. Me´tivier, and P. P. Priouret. Adaptive algorithms and stochastic approxi-mations. Springer-Verlag, 1990.[20] A. Benveniste, M. Me´tivier, and P. Priouret. Adaptive Algorithms and Stochastic Approxima-tions. Springer, 2012.[21] J. M. Bernardo and A. Smith. Bayesian theory, volume 405. John Wiley & Sons, 2009.[22] P. Biernacki and D. Waldorf. Snowball sampling: Problems and techniques of chain referralsampling. Sociological Methods & Research, 10(2):141–163, Nov. 1981.[23] N. L. Biggs, E. K. Lloyd, and R. J. Wilson. Graph Theory. Oxford University Press, 1986.[24] S. Bikchandani, D. Hirshleifer, and I. Welch. A theory of fads, fashion, custom, and culturalchange as information cascades. Journal of Political Economy, 100(5):992–1026, Oct. 1992.[25] P. Billingsley. Convergence of probability measures, volume 493. John Wiley & Sons, 2009.[26] J. Bollen, H. Mao, and X. Zeng. Twitter mood predicts the stock market. Journal of Compu-tational Science, 2(1):1–8, Mar. 2011.134Bibliography[27] B. Bolloba´s, O. Riordan, J. Spencer, and G. Tusna´dy. The degree sequence of a scale-freerandom graph process. Random Structures and Algorithms, 18(3):279–290, 2001.[28] P. Bonhard and M. A. Sasse. knowing me, knowing you using profiles and social networkingto improve recommender systems. BT Technology Journal, 24(3):84–98, 2006.[29] T. Brehard and V. Krishnamurthy. Optimal data incest removal in Bayesian decentralized es-timation over a sensor network. In Proceedings of IEEE International Conference on Acous-tics, Speech and Signal Processing, pages 173–176, Honolulu, Hawaii, Apr. 2007.[30] Ceren Budak, Divyakant Agrawal, and Amr El Abbadi. Limiting the spread of misinforma-tion in social networks. In Proceedings of the 20th International Conference on World WideWeb, WWW ’11, pages 665–674. ACM, 2011.[31] D. Budak and A. El Abbadi. Information diffusion in social networks: Observing and influ-encing societal interests. Proceedings of the VLDB Endowment, 4(12), 2011.[32] J. Burke, D. Estrin, M. Hansen, A. Parker, N. Ramanathan, S. Reddy, and M. B. Srivastava.Participatory sensing. In Workshop on World-Sensor-Web: Mobile Device Centric SensorNetworks and Applications, Boulder, CO, 2006.[33] D. S. Callaway, M. E. J. Newman, S. H. Strogatz, and D. J. Watts. Network robustness andfragility: Percolation on random graphs. Physical Review Letters, 85(25):5468–5471, Dec.2000.[34] A. T. Campbell, S. B. Eisenman, N. D. Lane, E. Miluzzo, R. Peterson, H. Lu, X. Hong,X. Zheng, M. Musolesi, K. Fodor, and G. Ahn. The rise of people-centric sensing. IEEEInternet Computing, 12(4):12–21, 2008.[35] C. Chamley. Rational Herds: Economic Models of Social Learning. Cambridge UniversityPress, 2004.[36] C. Chamley, A. Scaglione, and L. Li. Models for the diffusion of beliefs in social networks:An overview. IEEE Signal Processing Magazine, 30(3):16–29, May 2013.[37] C. P. Chamley. Rational Herds: Economic Models of Social Learning. Cambridge UniversityPress, 2003.[38] Z. Cheng, J. Caverlee, and K. Lee. You are where you tweet: A content-based approach togeo-locating twitter users. In Proceedings of 19th ACM International Conference on Infor-mation and Knowledge Management, pages 759–768, Toronto, ON, Canada, Oct. 2010.135Bibliography[39] W. H. Chin, D. B. Ward, and A.G. Constantinides. Semi-blind mimo channel tracking us-ing auxiliary particle filtering. In Global Telecommunications Conference, 2002. GLOBE-COM’02. IEEE, volume 1, pages 322–325. IEEE, 2002.[40] N. A. Christakis and J. H. Fowler. Social network sensors for early detection of contagiousoutbreaks. PLoS ONE, 5(9), Sep. 2010.[41] F. Chung and L. Lu. Complex Graphs and Networks. Conference Board of the MathematicalSciences, National Science Foundation (U.S.), 2006.[42] F. Chung, L. Lu, T. G. Dewey, and D. G. Galas. Duplication models for biological networks.Journal of Computational Biology, 10(5):677–687, Oct. 2003.[43] T. Conley and C. Udry. Social learning through networks: The adoption of new agriculturaltechnologies in Ghana. American Journal of Agricultural Economics, 83(3):668–673, Aug.2001.[44] C. Cooper and A. Frieze. A general model of web graphs. Random Structures and Algo-rithms, 22(3):311–335, 2003.[45] R. Corten. Composition and structure of a large online social network in the netherlands.PLoS ONE, 7(4), Apr. 2012.[46] R. F. Curtain and A. J. Pritchard. Infinite dimensional linear systems theory. Lecture notes incontrol and information sciences, 8, 1978.[47] B. Debatin, J. P. Lovejoy, A. K. Horn, and B. N. Hughes. Facebook and online privacy:Attitudes, behaviors, and unintended consequences. Journal of Computer-Mediated Commu-nication, 15(1):83–108, 2009.[48] M. H. DeGroot. Reaching a consensus. Journal of the American Statistical Association,69(345):118–121, 1974.[49] P. Dodin and V. Nimier. Distributed tracking systems and their optimal inference topology. InProceedings of 5th International Conference on Information Fusion, volume 1, pages 585–592, Annapolis, MD, Jul. 2002.[50] X. G. Doukopoulos and G. V. Moustakides. Adaptive power techniques for blind channelestimation in cdma systems. IEEE Transactions on Signal Processing, 53(3):1110–1120,Mar. 2005.[51] X. G. Doukopoulos and G. V. Moustakides. Blind adaptive channel estimation in OFDMsystems. IEEE Transactions on Wireless Communications, 5(7):1716–1725, Jul. 2006.136Bibliography[52] X. G. Doukopoulos and G. V. Moustakides. Fast and stable subspace tracking. IEEE Trans-actions on Signal Processing, 56(4):1452–1465, Apr. 2008.[53] R. Durrett. Random Graph Dynamics. Cambridge Series on Statistical and ProbabilisticMathematics. Cambridge University Press, 2007.[54] N. Eagle and A. Pentland. Reality mining: Sensing complex social systems. Personal andUbiquitous Computing, 10(4):255–268, May 2006.[55] H. Ebel, L. I. Mielsch, and S. Bornholdt. Scale-free topology of e-mail networks. PhysicalReview E, 66, 2002.[56] G. Ellison and D. Fudenberg. Rules of thumb for social learning. Journal of Political Econ-omy, 101(4):612–643, Aug. 1993.[57] G. Ellison and D. Fudenberg. Word-of-mouth communication and social learning. The Quar-terly Journal of Economics, 110(1):93–125, 1995.[58] Y. Ephraim. A bayesian estimation approach for speech enhancement using hidden markovmodels. IEEE Transactions on Signal Processing, 40(4):725–735, 1992.[59] P. Erdo¨s and T. Gallai. Graphs with given degrees of vertices. Mat Lapok, 11:264–274, 1960.[60] P. Erdo¨s and A. Re´nyi. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad.Sci, 5:17–61, 1960.[61] S. N. Ethier and T. G. Kurtz. Markov processes: characterization and convergence, volume282. John Wiley & Sons, 2009.[62] S. Eubank, H. Guclu, V. S. Anil Kumar, M. V. Marathe, A. Srinivasan, Z. Toroczkai, andN. Wang. Modelling disease outbreaks in realistic urban social networks. Nature, 429:180–184, May 2004.[63] R. Evans, V. Krishnamurthy, G. Nair, and L. Sciacca. Networked sensor management anddata rate control for tracking maneuvering targets. IEEE Transactions on Signal Processing,53(6):1979–1991, Jun. 2005.[64] D. Gale and S. Kariv. Bayesian learning in social networks. Games and Economic Behavior,45(2):329–346, Nov. 2003.[65] A. G.Dimakis, S. Kar, J. M. F. Moura, M. G. Rabbat, and A. Scaglione. Gossip algorithmsfor distributed signal processing. Proceedings of the IEEE, 98(11):1847 –1864, Nov. 2010.137Bibliography[66] J. D. Geanakoplos and H. M. Polemarchakis. We can’t disagree forever. Journal of EconomicTheory, 28(1):192–200, Oct. 1982.[67] M. Girvan and M. E. J. Newman. Community structure in social and biological networks.Proceedings of the National Academy of Sciences, 99(12):7821–7826, 2002.[68] S. Goel and M. J. Salganik. Respondent-driven sampling as markov chain Monte Carlo.Statistics in Medicine, 28(17):2209–2229, Jul 2009.[69] L. Guo, L. Ljung, and G. J. Wang. Necessary and sufficient conditions for stability of lms.IEEE Transactions on Automatic Control,, 42(6):761–770, 1997.[70] S. L. Hakimi. On realizability of a set of integers as degrees of the vertices of a linear graph.I. Journal of the Society for Industrial and Applied Mathematics, 10(3):496–506, 1962.[71] M. Hamdi and V. Krishnamurthy. Removal of data incest in multi-agent social learning insocial networks. arXiv:1309.6687 [cs.SI], Sep. 2013.[72] M. Hamdi, V. Krishnamurthy, and G. Yin. Tracking the empirical distribution of a Markov-modulated duplication-deletion random graph. arXiv:1303.0050 [cs.IT], Feb. 2013.[73] V. Havel. A remark on the existence of finite graphs. Cˇasopis Peˇst Mat 80:477, 1955.[74] D. D. Heckathorn. Respondent-driven sampling: A new approach to the study of hiddenpopulations. Social Problems, 44(2):174–199, May 1997.[75] Y.C. Ho and RCKA Lee. A bayesian approach to problems in stochastic estimation andcontrol. Automatic Control, IEEE Transactions on, 9(4):333–339, 1964.[76] P. Holme, B. J. Kim, C. N. Yoon, and S. K. Han. Attack vulnerability of complex networks.Physical Review E, 65(5), May 2002.[77] B. Huang and T. Jebara. Maximum likelihood graph structure estimation with degree distri-butions. In Analyzing Graphs: Theory and Applications, NIPS Workshop, volume 14, 2008.[78] M. B. Hurley. An information theoretic justification for covariance intersection and its gen-eralization. In Proceedings of the 5th International Conference on Information Fusion, vol-ume 1, pages 505–511, Annapolis, MD, Jul. 2002.[79] B. Ifrach, C. Maglaras, and M. Scarsini. Monopoly pricing in the presence of social learning.NET Institute Working Paper No. 12-01, 2011.[80] M. O. Jackson. Social and Economic Networks. Princeton University Press, 2008.138Bibliography[81] A. Jadbabaie, P. Molavi, A. Sandroni, and A. Tahbaz-Salehi. Non-bayesian social learning.Games and Economic Behavior, 76(1):210–225, 2012.[82] H. Jeong, S. P. Mason, A. L. Baraba´si, and Z. N. Oltvai. Lethality and centrality in proteinnetworks. Nature, 411(6833):41–42, 2001.[83] Y. Kanoria and O. Tamuz. Tractable bayesian social learning on trees. In Proceedings ofIEEE International Symposium on Information Theory, pages 2721 –2725, Cambridge, MA,Jul. 2012.[84] H. Kim, Z. Toroczkai, P. L. Erdo¨s, I. Miklo´s, and L. A. Sze´kely. Degree-based graph con-struction. Journal of Physics A: Mathematical and Theoretical, 42(39), Oct. 2009.[85] G. Kossinets and D. J. Watts. Empirical analysis of an evolving social network. Science,311(5757):88–90, Jan. 2006.[86] P. L. Krapivsky, G. J. Rodgers, and S. Redner. Degree distributions of growing networks.Physical Review Letters, 86(23):5401–5404, Jun. 2001.[87] V. Krishnamurthy. Quickest detection POMDPs with social learning: Interaction of local andglobal decision makers. IEEE Transactions on Information Theory, 58(8):5563–5587, 2012.[88] V. Krishnamurthy, O. Namvar Gharehshiran, and M. Hamdi. Interactive sensing and decisionmaking in social networks. Foundations and Trends in Signal Processing, 7(1-2):1–196,2013.[89] V. Krishnamurthy and M. Hamdi. Data fusion and mis-information removal in social net-works. In Proceedings of 15th International Conference on Information Fusion, pages 1142–1149, Singapore, Jul. 2012.[90] V. Krishnamurthy and M. Hamdi. Mis-information removal in social networks: Constrainedestimation on dynamic directed acyclic graphs. IEEE Journal of Selected Topics in SignalProcessing, 7(2):333–346, Apr. 2013.[91] V. Krishnamurthy and H. V. Poor. Social learning and Bayesian games in multiagent signalprocessing: How do local and global decision makers interact? IEEE Signal ProcessingMagazine, 30(3):43–57, May 2013.[92] V. Krishnamurthy and H. V. Poor. A tutorial on interactive sensing in social networks. IEEETransactions on Computational Social Systems, 1(1), 2014.[93] V. Krishnamurthy and T. Ryden. Consistent estimation of linear and non-linear autoregressivemodels with Markov regime. Journal of Time Series Analysis, 19(3):291–307, May 1998.139Bibliography[94] V. Krishnamurthy, K. Topely, and G. Yin. Consensus formation in a two-time-scale marko-vian system. SIAM Journal on Multiscale Modeling and Simulation, 7(4):1898–1927, 2009.[95] Vikram Krishnamurthy and Sujay Bhatt. Sequential detection of market shocks using risk-averse agent based models. arXiv preprint arXiv:1511.01965, 2015.[96] Vikram Krishnamurthy and William Hoiles. Online reputation and polling systems: Data in-cest, social learning, and revealed preferences. Computational Social Systems, IEEE Trans-actions on, 1(3):164–179, 2014.[97] H. J. Kushner. Approximation and Weak Convergence Methods for Random Processes, withApplications to Stochastic Systems Theory. MIT Press, Cambridge, MA, 1984.[98] H. J. Kushner and A. Shwartz. Stochastic approximation in hilbert space: Identification andoptimization of linear continuous parameter systems. SIAM journal on control and optimiza-tion, 23(5):774–793, 1985.[99] H. J. Kushner and G. Yin. Stochastic Approximation and Recursive Algorithms and Appli-cations, volume 37 of Stochastic Modeling and Applied Probability. Springer-Verlag, 2ndedition, 2003.[100] P. J. Lamberson. Social learning in social networks. The BE Journal of Theoretical Eco-nomics, 10(1), 2010.[101] R. Lee and K. Sumiya. Measuring geographical regularities of crowd behaviors for twitter-based geo-social event detection. In Proceedings of 2nd ACM SIGSPATIAL InternationalWorkshop on Location Based Social Networks, pages 1–10, San Jose, CA, Nov. 2010.[102] S. H. Lee, P.-J. Kim, and H. Jeong. Statistical properties of sampled networks. PhysicalReview E, 73(1), Jan. 2006.[103] J. Leskovec. SNAP library. http://snap.stanford.edu/data/index.html.[104] E. Lieberman, C. Hauert, and M. A. Nowak. Evolutionary dynamics on graphs. Nature,433(7023):312–316, Jan. 2005.[105] T. T. Lin and S. S. Yau. Bayesian approach to the optimization of adaptive systems. SystemsScience and Cybernetics, IEEE Transactions on, 3(2):77–85, 1967.[106] C. Lingji, P. O. Arambel, and R. K. Mehra. Estimation under unknown correlation: covari-ance intersection revisited. IEEE Transactions on Automatic Control, 47(11):1879–1882,Nov. 2002.140Bibliography[107] D. Lo´pez-Pintado. Contagion and coordination in random networks. International Journalof Game Theory, 34(3):371–381, Oct. 2006.[108] D. Lo´pez-Pintado. Diffusion in complex social networks. Games and Economic Behavior,62(2):573–590, Mar. 2008.[109] M. Luca. Reviews, reputation, and revenue: The case of Yelp.com, Technical Report 12-016.Harvard Business School, 2011.[110] R. Maben, G. V. Moustakides, and J. S. Baras. Adaptive sampling for linear state estimation.SIAM Journal on Control and Optimization, 50(2):672–702, 2012.[111] G. V. Moustakides. Exponential convergence of products of random matrices: Applicationto adaptive algorithms. International Journal of Adaptive Control and Signal Processing,12(7):579–597, 1998.[112] K. P. Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.[113] K. P. Murphy, Y. Weiss, and M. I. Jordan. Loopy belief propagation for approximate infer-ence: an empirical study. In Proceedings of 15th Conference on Uncertainty in ArtificialIntelligence, pages 467–475, 1999.[114] M. E. J. Newman. The structure of scientific collaboration networks. Proceedings of theNational Academy of Sciences, 98(2):404–409, 2001.[115] M. E. J. Newman. Assortative mixing in networks. Physical Review Letters, 89, Oct. 2002.[116] M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45(2):167–256, 2003.[117] M. E. J. Newman. Power laws, pareto distributions and zipf’s law. Contemporary physics,46(5):323–351, 2005.[118] M. E. J. Newman, D. J. Watts, and S. H. Strogatz. Random graph models of social networks.Proceedings of the National Academy of Sciences, 99(S1):2566–2572, 2002.[119] N. P. Nguyen, G. Yan, M. T. Thai, and S. Eidenbenz. Containment of misinformation spreadin online social networks. In Proceedings of the 4th Annual ACM Web Science Conference,WebSci ’12, pages 213–222. ACM, 2012.[120] T. D. Nielsen and F. V. Jensen. Bayesian networks and decision graphs. Information Scienceand Statistics. Springer, 2007.141Bibliography[121] R. Pastor-Satorras, E. Smith, and R. V. Sole´. Evolving protein interaction networks throughgene duplication. Journal of Theoretical Biology, 222(2):199–210, May 2003.[122] R. Pastor-Satorras and A. Vespignani. Epidemic spreading in scale-free networks. PhysicalReview Letters, 86(14):3200, 2001.[123] J. Pearl. Fusion, propagation, and structuring in belief networks. Artificial Intelligence,29(3):241–288, Sep. 1986.[124] H. V. Poor and O. Hadjiliadis. Quickest Detection. Cambridge University Press, 2008.[125] S. J. Press. Subjective and objective Bayesian statistics: principles, models, and applications,volume 590. John Wiley & Sons, 2009.[126] Naren Ramakrishnan, Benjamin J Keller, Batul J Mirza, Ananth Y Grama, and GeorgeKarypis. Privacy risks in recommender systems. IEEE Internet Computing, 5(6):54–63,2001.[127] J. Reason and W. Ren. Estimating the optimal adaptation gain for the lms algorithm. In Pro-ceedings of the 32nd IEEE Conference on Decision and Control,, pages 1587–1588. IEEE,1993.[128] S. Reece and S. Roberts. Robust, low-bandwidth, multi-vehicle mapping. In Proceedings of8th International Conference on Information Fusion, volume 2, Philadelphia, PA, Jul. 2005.[129] F. Ricci, L. Rokach, and B. Shapira. Introduction to recommender systems handbook.Springer, 2011.[130] C. Robert. The Bayesian choice: from decision-theoretic foundations to computational im-plementation. Springer Science & Business Media, 2007.[131] A. Rosi, M. Mamei, F. Zambonelli, S. Dobson, G. Stevenson, and Y. Juan. Social sensors andpervasive services: Approaches and perspectives. In Proceedings of IEEE International Con-ference on Pervasive Computing and Communications Workshops, pages 525–530, Seattle,WA, Mar. 2011.[132] T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: Real-time eventdetection by social sensors. In Proceedings of 19th International Conference on World WideWeb, pages 851–860, Raleigh, NC, Apr. 2010.[133] L. Scharf. Statistical signal processing, volume 98. Addison-Wesley Reading, MA, 1991.[134] A. N. Shiryaev. On optimum methods in quickest detection problems. Theory of Probabilityand its Applications, 8(1):22–46, 1963.142Bibliography[135] J. Shrager, T. Hogg, and B. A. Huberman. A graph-dynamic model of the power law ofpractice and the problem-solving fan-effect. Science, 242(4877):414–416, 1988.[136] L. Smith and P. Sørensen. Pathological outcomes of observational learning. Econometrica,pages 371–398, 2000.[137] V. Solo and X. Kong. Adaptive signal processing algorithms: stability and performance.Prentice-Hall information and system sciences series. Prentice Hall, 1995.[138] S. M. Stigler. The true title of bayess essay. Statistical Science, 28(3):283–288, 2013.[139] S. H. Strogatz. Exploring complex networks. Nature, 410(6825):268, 2001.[140] R. Taylor. Contrained switchings in graphs. In K. L. McAvaney, editor, CombinatorialMathematics VIII, volume 884 of Lecture Notes in Mathematics, pages 314–336. SpringerBerlin Heidelberg, 1981.[141] H. L. Van Trees. Detection, estimation, and modulation theory. John Wiley & Sons, 2004.[142] Anne M Treisman and Garry Gelade. A feature-integration theory of attention. Cognitivepsychology, 12(1):97–136, 1980.[143] A. Tripathi and S. Vijay. A note on a theorem of erdo¨s and gallai. Discrete Mathematics,265(1–3):417–420, Apr. 2003.[144] M. Trusov, A. V. Bodapati, and R. E. Bucklin. Determining influentual users in internet socialnetworks. Journal of Marketing Research, XLVII:643–658, Aug. 2010.[145] B. Tuttle. Fact-checking the crowds: How to get the most out of hotel-review sites. Timemagazine, Jul. 2013.[146] F. Vega-Redendo. Complex Social Networks. Economic Society Monographs. CambridgeUniversity Press, 2007.[147] A. Wagner. The yeast protein interaction network evolves rapidly and contains few redundantduplicate genes. Molecular Biology and Evolution, 18(7):1283–1292, 2001.[148] T. Wang and H. Krim. Statistical classification of social networks. In Proceedings of IEEEInternational Conference on Acoustics, Speech and Signal Processing, pages 3977–3980,Kyoto, Japan, Mar. 2012.[149] T. Wang, H. Krim, and Y. Viniotis. A generalized Markov graph model: Application to socialnetwork analysis. IEEE journal of Selected Topics in Signal Processing, 7(2):318–332, Apr.2013.143Bibliography[150] X. Xie and R. J. Evans. Multiple target tracking and multiple frequency line tracking usinghidden markov models. IEEE Transactions on Signal Processing, 39(12):2659–2676, Dec.1991.[151] Y. Xin and T. Jaakkola. Controlling privacy in recommender systems. In Advances in NeuralInformation Processing Systems, pages 2618–2626, 2014.[152] J. S. Yedidia, W. T. Freeman, and Y. Weiss. Constructing free-energy approximationsand generalized belief propagation algorithms. IEEE Transactions on Information Theory,51(7):2282–2312, Jul. 2005.[153] G. Yin and V. Krishnamurthy. LMS algorithms for tracking slow Markov chains with appli-cations to hidden Markov estimation and adaptive multiuser detection. IEEE Transactionson Information Theory, 51(7):2475–2490, Jul. 2005.[154] G. Yin, V. Krishnamurthy, and C. Ion. Regime switching stochastic approximation algorithmswith application to adaptive discrete stochastic optimization. SIAM Journal on Optimization,14(4):1187–1215, 2004.[155] G. Yin, Y. Sun, and L. Y. Wang. Asymptotic properties of consensus-type algorithms fornetworked systems with regime-switching topologies. Automatica, 47(7):1366–1378, 2011.[156] G. Yin and Q. Zhang. Discrete-Time Markov Chains: Two-Time-Scale Methods and Appli-cations, volume 55 of Applications of Mathematics. Srpinger-Verlag, New York, NY, 2005.[157] G. Yin and C. Zhu. Hybrid Switching Diffusions: Properties and Applications, volume 63 ofStochastic Modeling and Applied Probability. Springer, 2010.144Appendix ASome Graph Theoretic DefinitionsGraph, Directed Graph, Path and Directed Acyclic Graph (DAG):• A graph GN comprising of N nodes is a pair (V,E), where V = {v1, . . . ,vN} is a set of nodes(also called vertices), and E ⊂V ×V is a set of edges between the nodes.• Graph GN is an undirected graph if for any (vi,v j) ∈ E then (v j,vi) ∈ E and a graph is said tobe directed if (v j,vi) ∈ E is not a consequence of (vi,v j) ∈ E .• A path is an alternating sequence of nodes and edges, beginning and ending with an edge, inwhich each node is incident to the two edges that precede and follow it in the sequence.• A Directed Acyclic Graph (DAG) is a directed graph with no path that starts and ends at thesame node.• A family of DAGs GN is defined as a set of DAGs {G1, . . . ,GN} where Gn is the sub-graph ofGn+1 such that for n= 1, . . . ,N−1{Vn =Vn+1/vn+1 ,En = En+1/{(vi,vn+1) ∈ En+1|vi ∈Vn+1} .(A.1)Adjacency and Transitive Closure matrices:Let GN = (V,E) denote a graph with N nodes V = {v1, . . . ,vN}.• The Adjacency Matrix A of GN is an N×N matrix whose elements A(i, j) are given byA(i, j) ={1 if (v j,vi) ∈ E ,0 otherwise. A(i, i) = 0. (A.2)• The Transitive Closure Matrix T of GN is an N×N matrix whose elements T (i, j) are givenby T (i, i) = 1, andT (i, j) ={1 if there is a path between v j and vi ,0 otherwise. (A.3)145Appendix A. Some Graph Theoretic DefinitionsThe following shows the special form of the adjacency matrix of the directed acyclic graphand provides a closed form expression to compute the transitive closure matrix from the adjacencymatrix of a directed acyclic graph.Lemma A.1. A sufficient condition for a graph GN to be a DAG is that the Adjacency matrix A is anupper triangular matrix. For a DAGGN, the Transitive Closure Matrix T is related to the Adjacencymatrix byT = Q({IN −A}−1). (A.4)Here, IN is the N×N identity matrix, and Q denote the matrix valued ”quantization” functionso that for any N×N-matrix B, Q(B) is the N×N matrix with elementsQ(B)(i, j) =⎧⎨⎩0 if B(i, j) = 0 ,1 if B(i, j) ̸= 0 . (A.5)Proof: This result is derived from the classical interpretation of matrix {IN −A}−1. The entryin row i and column j of this matrix gives the number of paths from node i to node j. #To deal with information flow in a social network, we now introduce the concept of a family ofDAGs.Remark A.1. For the sake of simplicity in notations, let us define two vector representatives ofadjacency and transitive closure matrices of directed acyclic graph. For each graph Gn ∈ GN, letthe n×n matrices An and Tn, respectively, denote the adjacency matrix and transitive closure matrix.Define the following:{tn ∈ {0,1}1×(n−1): transpose of first n−1 elements of nth column of Tn,bn ∈ {0,1}1×(n−1): transpose of first n−1 elements of nth column of An.(A.6)Remark A.2. As can be straightforwardly followed from the construction of adjacency and transi-tive closure matrices in (A.1), for a family of DAGs GN = {G1, . . . ,GN}, for any n ∈ {1, . . . ,N−1},the adjacency matrix An and transitive closure matrix Tn of graph Gn are respectively the n×n leftupper matrices of the adjacency matrix An+1 and transitive closure matrix Tn+1 of graph Gn+1.146Appendix BA Note on Degree-based GraphConstructionThe first step in numerical studies of social networks is the graphical modeling of such networks. Agraph can be uniquely determined by the adjacency matrix (also known as the connectivity matrix)of the graph. However, in the graphical modeling of social networks (specially when the size of thenetwork is relatively large), the only available information is the degree sequence of nodes, and notthe adjacency matrix of the network.Definition B.1. The degree sequence, denoted by d, is a non-increasing sequence comprising of thevertex degrees of the graph vertices.The degree sequence, in general, does not specify the graph uniquely; there can be a largenumber of graphs that realize a given degree sequence. It is straightforward to show that not allinteger sequences represent a true degree sequence of a graph. For example, sequence d= {2,1,1}represents a tree with two edges, but d = {3,2,1} cannot be realized as the degree sequence of asimple graph. Motivated by social network applications, this section addresses the following twoquestions given a degree sequence d:• Existence Problem: Is there any simple graph that realizes d?• Construction Problem: How can we construct all simple graphs that realize a true degreesequence d?There are two well-known results that address the existence problem: (i) the Erdo¨s-Gallai theo-rem [59] and the Havel-Hakimi theorem [70, 73]. These theorems provide necessary and sufficientconditions for a sequence of non-negative integers to be a true degree sequence of a simple graph.Here, we recall these results without proofs.Theorem B.1 (Erdo¨s-Gallai, [59]). Let d1 ≥ d2 ≥ · · · ≥ dn > 0 be integers. Then, the degree se-quence d= {d1, · · · ,dn} is graphical if and only if1. ∑ni=1 di is even;147Appendix B. A Note on Degree-based Graph Construction2. for all 1≤ k < n:k∑i=1di ≤ k(k−1)+n∑i=k+1min{k,di}. (B.1)It is shown in [143] that there is no need to check (B.1) for all 1≤ k≤ n−1; it suffices to check(B.1) for 1≤ k ≤ s, where s is chosen such that ds ≥ s and ds+1 < s+1. Note that, in degree-basedgraph construction, we only care about nodes of degree greater than zero; zero-degree nodes areisolated nodes which can be added to the graph consisting of nodes of degree greater than zero.The Havel-Hakimi theorem also provides necessary and sufficient conditions for a degree se-quence to be graphical. It also gives a greedy algorithm to construct a graph from a given graphicaldegree sequence.Theorem B.2 (Havel-Hakimi, [70, 73]). Let d1 ≥ d2 ≥ · · · ≥ dn > 0 be integers. Then, the de-gree sequence d = {d1, · · · ,dn} is graphical if and only if the degree sequence d′ = {d2− 1,d3−1, · · · ,dd1+1−1,dd1+2, · · · ,dn} is graphical.In the following, we provide algorithms to construct a simple graph from a true degree sequence.In the construction problem, the degree sequence is treated as a collection of half-edges; a node withdegree di has di half-edges. One end of these half-edges are fixed at node i, but the other ends arefree. An edge between node i and node j is formed by connecting a half-edge from node i to ahalf-edge from node j. The aim is to connect all these half edges such that no free half-edge is left.The Havel-Hakimi theorem provides a recursive procedure to construct a graph from a graphicaldegree sequence. This procedure is presented in Algorithm B.6Using Algorithm B.6, one can sample from graphical realizations of a given degree sequence. Inthis algorithm, each vertex is first connected to nodes with lower degrees. Therefore, Algorithm B.6generates graphs where high-degree nodes tend to connect to low-degree nodes; the resulting graphhas assortative property [84, 115]. To overcome this problem, one way is to perform edge swappingrepeatedly such that the final graph looses its assortative property. In the edge swapping method,two edges (for example (1,2) and (3,4)) can be swapped (to (1,4) and (2,3)) without changing thedegree sequence. Edge swapping method is also used to generate all samples from a given degreesequence; one sample is generated via Algorithm B.6 and then, by use of Markov chain Monte-Carlo algorithm based on edge swapping [140], other samples from the graphical realizations of thedegree sequence are obtained.In [84] a swap-free algorithm is proposed to generate all graphical realizations of a true degreesequence. Before proceeding to Algorithm B.7, we first provide definitions which will be used inthis algorithm.Definition B.2. Let d = {d1, · · · ,dn} be a degree sequence of a simple graph and N(i) be the setof adjacent nodes of node i. Then, the degree sequence reduced by N(i) is denoted by d|N(i) =148Appendix B. A Note on Degree-based Graph ConstructionAlgorithm B.6 Creating a sample graph from a given degree sequenceGiven a graphical sequence d1 ≥ d2 ≥ · · ·≥ dn > 0:Start from i= 1(i) Initialize k = n.(ii) Connect (one half-edge of) node i to (a half-edge of) node k(iii) Check that the resulting degree sequence is graphical– if Yes:1. Let k = k−1.2. Repeat (i).– if No:1. Save the connection between node i and node k2. If node i has any half-edges left, let k = k−1 and repeat (i)(iv) If i< n, then, i← i+1 and repeat (i).{d1|N(i), · · · ,dn|N(i)} with elements defined as followsdk|N(i) ={ dk−1, if k ∈ N(i),0, if k = i,dk, otherwise.(B.2)Definition B.3. Let (a1,a2, . . . ,an) and (b1,b2, . . . ,bn) be two sequences. Then, (a1,a2, . . . ,an)<CR(b1,b2, . . . ,bn) if and only if there exists an index m such that 1 ≤ m ≤ n and am < bm and ai = bifor all m< i≤ n.Let d be a non-increasing graphical degree sequence. In order to construct the graph, we needto find all possible neighbors N(i) (“allowed set”) of each node i such that if we connect this nodeto its allowed set, then the resulting reduced degree sequence d|N(i) is also graphical, i.e., the graph-icality is preserved. Algorithm B.7 provides a systematic way (swap-free) to generate all graphicalrealizations of a true degree sequence (by means of finding all possible neighbors of each node).149Appendix B. A Note on Degree-based Graph ConstructionAlgorithm B.7 Constructing all graphs from a graphical degree sequence [84]Given a graphical sequence d1 ≥ d2 ≥ · · ·≥ dn > 0Start from i= 1Step 1: Find neighbors with highes index of node iThe aim is to find AR(i):(i) Initialize k = n.(ii) Connect node i to node k(iii) Check that the resulting degree sequence is graphical– if Yes:1. Let k = k−12. Repeat (i).– if No:1. Save the connection between node i and node k2. If node i has any stubs left, let k = k−1 and repeat (i)Step 2: Find all possible neighbors of node iWith <CR defined in (B.3), the aim is to findA (i) = {N(i) = {v1, · · · ,vdi};N(i)<CR AR(i) and d|N(i) is graphical}(i) Find all sets of nodes who are colexicographically smaller than AR(1) (prospective neighborsets).(ii) Connect node i to those neighbors and check if the resulting degree sequence is graphical.Step 3: For every N(i) ∈A (i):• Connect node i to N(i)• Discard node i and compute the reduced degree sequence d|N(i)• Create all graphs from degree sequence d|N(i) using this algorithm150

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0223171/manifest

Comment

Related Items