Tracking Infection Diffusion inSocial NetworksbyTavis Joseph PedersenBASc. Engineering Physics, University of British Columbia, Canada, 2015A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2017c© Tavis Joseph Pedersen 2017AbstractThis thesis explores the problem of tracking the diffusion of contagion processes on socialnetworks. Infection (or Information) diffusion is modeled using the Susceptible-Infected-Susceptible (SIS) model. Mean field approximation is employed to approximate the discretevalued infection dynamics by a deterministic ordinary differential equation, thereby yieldinga generative model for the infection diffusion. The infection is shown to follow polynomialdynamics and is estimated using an exact non-linear Bayesian filter. We compute posteriorCrame´r-Rao bounds to obtain the fundamental limits of the filter which depend on thestructure of the network. The SIS model is extended to include homophily, and filtering onthese networks using the exact non-linear Bayesian filter is illustrated. With considerationfor the collaborative or antagonistic nature of some social processes on networks, we presentan alternative, game theoretic, model for the spread of information based on the evolutionaryMoran process. The diffusion of collaboration following this Moran model is estimated usinga particle filter. Additionally, we validate the efficacy of our method with diffusion datafrom a real-world online social media platform, Twitter. We find that SIS model is a goodfit for the information diffusion and the non-linear filter effectively tracks the informationdiffusion.iiLay SummaryIn this thesis we use expert knowledge in the form of models for infectious disease in com-bination with statistical algorithms to better understand how an infection is spreading andwho the afflicted are. Electrical engineers are practiced at using models of underlying sys-tems to combine observations and form predictions, and these techniques can be extendedbeyond the world of electrons, bits and binary. By better understanding the flow of virusesand disease we can better mobilize our knowledge and efforts to manage, slow and eventuallyeradicate microscopic dangers. While these contagions pose real threat, they are not as ever-present as other viral phenomena in the developed world. Our lives are more immediatelyimpacted by viral news, entertainment, and opinion. The same methods which allow us tobetter understand the presence of Ebola in a population can be utilized to better understandthe population’s outlook on, or exposure to, a political movement.iiiPrefaceThe following publications have resulted from the research presented in this thesis:• V. Krishnamurthy, S. Bhatt, and T. Pedersen, “Tracking Infection Diffusion in SocialNetworks: Filtering Algorithms and Threshold Bounds”, IEEE Transactionson Signaland Information Processing over Networks, vol. 3, no. 2, 2017Statement of AuthorshipI am a co-author of the publication listed above. I have been responsible for developingoriginal ideas, deriving mathematical solutions, and generating simulation results for thispublication with valuable frequent suggestions, advice, contributions, and feedback fromProf. Vikram Krishnamurthy and Sujay Bhatt.A version of the content presented in the Chapter 4 has been submitted as a courseproject for UBC MATH 564: Evolutionary Dynamics.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.0.1 From Electrical Engineering to Infectious Disease . . . . . . . . . . . 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.1 SIS model and Mean Field Dynamics . . . . . . . . . . . . . . . . . . 51.1.2 Non-linear filter and PCRLB for Bayesian Tracking of Infected Popu-lations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7vTable of Contents1.1.3 Extended Models in Network Analysis: Homophily and Game Theo-retic Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.1.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2 Main Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.2.1 SIS model and Mean Field Dynamics . . . . . . . . . . . . . . . . . . 111.2.2 Non-linear filter and PCRLB for Bayesian Tracking of Infected Popu-lations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.2.3 Extended Models in Network Analysis: Homophily and Game Theo-retic Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.2.4 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.3 Outline of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 SIS Model and Mean-field Approximation . . . . . . . . . . . . . . . . . . 152.1 Social Networks and SIS dyamics . . . . . . . . . . . . . . . . . . . . . . . . 152.1.1 Social Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1.2 Infection Processes on Social Networks: SIS Model . . . . . . . . . . 172.1.3 The Mean Field Approximation . . . . . . . . . . . . . . . . . . . . . 202.2 SIS Model and Mean Field Dynamics . . . . . . . . . . . . . . . . . . . . . 212.3 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.3.1 Uniform Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.2 MCMC Based Respondent-Driven Sampling (RDS) . . . . . . . . . . 262.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283 Tracking Infection and Filtering Bounds . . . . . . . . . . . . . . . . . . . . 293.1 Optimal Filtering of Infected Populations . . . . . . . . . . . . . . . . . . . 303.1.1 Mean Field Polynomial Dynamics . . . . . . . . . . . . . . . . . . . 303.1.2 Optimal Filter for Polynomial Dynamics . . . . . . . . . . . . . . . . 333.2 Posterior Crame`r-Rao Bounds . . . . . . . . . . . . . . . . . . . . . . . . . 34viTable of Contents3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 Extended Diffusion Models: Homophily and Game Theoretic Dynamics 424.1 Infected Population Estimation in Homophily Networks . . . . . . . . . . . 434.1.1 SIS Model with Homophily . . . . . . . . . . . . . . . . . . . . . . . 434.2 Moran process: Game Theoretic Network Dynamics . . . . . . . . . . . . . 464.2.1 Moran Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.2.2 Mean Field and Pair Approximations to the Moran Process . . . . . 514.2.3 Tracking Cooperation in Game theoretic Networks . . . . . . . . . . 584.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.1 Example: Tracking the Infected Population State . . . . . . . . . . . . . . . 625.2 Factors Affecting Filter Performance . . . . . . . . . . . . . . . . . . . . . . 645.2.1 Effect of Sampling on Filtering . . . . . . . . . . . . . . . . . . . . . 645.2.2 Sensitivity of Filter Performance to Misspecified Model . . . . . . . . 665.3 Extended and Alternative Diffusion Models . . . . . . . . . . . . . . . . . . 665.3.1 Numerical Exploration of Homophily . . . . . . . . . . . . . . . . . . 665.3.2 Numerical Exploration of Game Theoretic Dynamics . . . . . . . . . 705.4 Analysis and Validation on Twitter Dataset . . . . . . . . . . . . . . . . . . 715.4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.4.2 SIS model for Twitter data . . . . . . . . . . . . . . . . . . . . . . . 755.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 796 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 82Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86viiTable of ContentsAppendixA Related Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95A.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95viiiList of Tables5.1 Goodness of fit for the SIS model: The average and maximum deviationsbetween the Twitter data and SIS model predictions are presented. . . . . . 77ixList of Figures3.1 Filtered estimate log mean square error and PCRLB for deterministic meanfield evolution with scale-free and Erdo˝s-Re´nyi degree distributions. . . . . . 393.2 Filtered estimate log mean square error and PCRLB of simulated networks -scale-free and Erdo˝s-Re´nyi. . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.1 Diffusion of infected population states and their corresponding filtered esti-mates in a scale-free network. . . . . . . . . . . . . . . . . . . . . . . . . . . 635.2 Filtered estimate log mean square error of infection population states for twonetwork types: Erdo˝s Re´nyi and scale-free. . . . . . . . . . . . . . . . . . . 655.3 Comparison of the estimate log mean square error between a Bayesian filter,a misspecified Bayesian filter and a moving average estimator. . . . . . . . . 675.4 Diffusion of infected population states and their corresponding filtered esti-mates in a scale-free network. . . . . . . . . . . . . . . . . . . . . . . . . . . 685.5 Deterministic mean field trajectories of an SIS model with and without ho-mophily are compared. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.6 Cooperating population state of an Erdo˝s Re´nyi network, as well as stateobservations, and particle filter state estimate. . . . . . . . . . . . . . . . . . 715.7 Simulated and mean field approximation of cooperating population state un-der strong selection w = 0.9. . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.8 Simulated and mean field approximation of cooperating population state un-der weak selection w = 0.01. . . . . . . . . . . . . . . . . . . . . . . . . . . . 73xList of Figures5.9 Comparison of infected population state of d = 1, 2, 3 for varying decay pa-rameter ξ. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.10 The true Twitter infection and Twitter infection as predicted by the deter-ministic mean field dynamical equations are compared for nodes of degreed = 1, 2, 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.11 The true Twitter infection and the filter estimates of the Twitter infectedpopulation state are compared. . . . . . . . . . . . . . . . . . . . . . . . . . 805.12 The MSE of the filtered estimate of the true twitter infection probability stateand the MSE of the estimate of the infected population state of a simulatednetwork are shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81xiList of AbbreviationsSIS : Susceptible-Infected-SusceptibleMFT : Mean Field TheoryMFD : Mean Field DynamicsCRLB : Crame`r-Rao Lower BoundPCRLB : Posterior Crame`r-Rao Lower BoundMCMC : Markov Chain Monte CarloRDS : Respondent Driven SamplingHMM : Hidden Markov ModelER : Erdo˝s-Re´nyiSF : Scale-FreeMSE : Mean Square ErrorxiiAcknowledgementsFirstly, I would like to thank my advisor, Professor Vikram Krishnamurthy, whose curios-ity, pragmatism and dedication to excellence have elevated my own work and continue toelevate the work of those around him. Without the intellectual and financial support of Dr.Krishnamurthy, this work would never have been possible. I also owe considerable thanksto Sujay Bhatt, whose abilities cannot be overstated. Sujay’s talented mind, patience, andsense of humour were beacons which guided me during my journey through my MASc.I would like to thank the professors and administration of the University of BritishColumbia. My educational experience has been enriched immeasurably by the care andsupport of this exceptional group. The brilliance of the Faculty under whom I have studied,researched, and worked has unveiled the abundance of joy that can be found in the undyingpursuit of knowledge.I would also like to thank my colleagues in the Statistical Signal Processing laboratory fortheir support and many kindnesses during my MASc. Furthermore, I am grateful to AnahitaShojaei, Philip Edgecumbe, Malik Nasser, Buddhika Netasinghe, Jorden Heatherington, andFelix Funk for their friendship and academic support during my MASc research. Outsidethe boundaries of the university, I lend my thanks and love to my many friends who havecarried me, taught me, and with whom I have shared so much joy.Finally, I am unendingly grateful for the love and support of my mother, brothers, andsister. My family has always been the foundation of all my strength, and anything that Iaccomplish for good in my life will have been because of them.xiiiDedicationTo my cherished mother Cheri Pedersen, brothers Liam and Tyler Pedersen, and sister Cam-ryn PedersenWith love and deepest appreciation.xivChapter 1Introduction1.0.1 From Electrical Engineering to Infectious DiseaseCompared to social, political and economic structures, which are often influenced by enig-matic desires or hidden impetuses, the world of electrical signals is relatively straightforward.We have a good and ever improving understanding of the physical processes that underlie asignal’s generation and propagation. We can, for instance, look to Schroo¨dinger’s equation orthe telegrapher’s equations to understand electrical phenomena on scales as small as atomsor as large as international electrical grids. We have studied for centuries these phenomenaand can summarize their states with words like phase, voltage, current, potential, and fieldstrength. These are all measurable quantities and, if we want to know the voltage of a lineon a computer, we can make use of tools such as a digital voltmeter or a galvanometer (anold type of voltmeter). Sometimes we need to make measurements quickly and accurately,more accurately than our tool can provide in any one measurement. We can utilize ourknowledge of electrodynamics and multiple measurements to improve our estimate. If wemeasure the current of a power line directly and measure the magnetic field at some knowndistance from the line, we know, through Maxwell’s Equations, that our measurement ofthe magnetic field provides information about the current in the line. Thus, we have mul-tiple sources of information about a signal. The combination of imperfect information intobetter and better estimates through knowledge of the underlying system is the foundationof statistical signal processing. Electrical signals were a natural cradle for the developmentof these processing algorithms, but the same algorithms that allow us to better estimatethe voltage of a connector on a motherboard can be used to combine expert knowledge and11.1. Overviewimperfect measurements in other areas. In this thesis, we consider measurements of the lev-els of infection in a population. We can combine models borrowed from epidemiology withsignal processing algorithms to eke out more information from any measurements of how aninfection is propagating through a population. By doing this we heighten the value of everydata point collected. A longitudinal survey of communities in a disease ridden area can beused in combination with model informed algorithms to more accurately estimate how manypeople are infected and which subpopulations are infected. Furthermore, we realize that epi-demiological models are increasingly useful outside of epidemiology. The internet has manyviral phenomena, which spread in a manner similar to diseases. These social infections canrelate to politics, marketing, religion, or entertainment and play an increasingly large rolein our lives as individuals and society as a whole.1.1 OverviewThe dark spectre of disease has stalked humanity since our collective infancy. Mankind’sgreatest enemy in the kingdom Animalia is neither the noble tiger nor the colossal greatwhite. These beasts, as terrifying as they may have been to our ancestors, pose no morethreat to humanity than falling vending machines. Our greatest animal adversary is thepetite and easily squished mosquito. It is through disease that the mosquito, a potentialvector of many illnesses, earns this distinction. More deadly than the conquistadors and theirguns were their germs. More deadly than the trenches and munitions of the First World Warwere the beautifully and woefully efficient distribution systems that brought men from everycorner of the world to the European theater and sent back the Spanish Flu. For most ofhuman history, mankind’s relative immobility has been a bulwark against the spread ofdisease. Even though trade did spread the Black Death throughout Asia and Europe, theplague spread at the speed of sail and could only fly as fast as the feet that carried it. Today,those feet can travel more quickly than ever. A flight attendant might have breakfast in Abu21.1. OverviewDhabi, lunch in Frankfurt, and find themselves in California the following day. This posesa new-dramatic risk for humankind, how can we defend against an infectious disease in aworld that is more connected than ever before? Scares such as SARS, the Avian influenzaand Ebola have garnered public interest, even panic, but have not yet laid down our defenses.Today, we fight the spread of these diseases and infections through the efforts of medicalprofessionals and epidemiologists. Quarantines, vaccines, massive laboratory efforts all serveto stem the tide of infectious disease, but more tools are necessary in this fight. We need tobetter understand the spread of infections if we are to effectively curb the next microscopicthreat to humanity, which may be frozen just beneath some now precariously warm layerof permafrost. To better understand the spread of infections, we need to track them inreal time. We must gather information about who is infected, how quickly the infection isspreading and who we can target with health services to most effectively diminish its spread.The filtering algorithms of engineering literature have a role to play in this titanic effort.Through statistical algorithms, we can more accurately estimate the level of infection inpopulations. It is the development of this idea that we pursue in this thesis.The speed of infection spread depends upon the structure of human interactions. As peo-ple become more connected, infections can more easily spread between distinct portions of thepopulation. The structure over which human interactions occur is the human social network.Our local social environments dictate our exposure to new information, susceptibility to dis-ease, and the network of these connected local environments determines how vulnerable anentire population is to contagion. In this thesis, we will speak of infection, cooperation, andinformation diffusion as analogous processes, a parallel motivated by the knowledge that vi-ral phenomena online behave similarly to virulent disease spreading throughout a population[74]. We therefore extend analysis performed on biological networks and epidemic models tothe spread of knowledge and technology through social and online networks [58]. Motivatingthe inclusion of these more generalized contagions is the importance of understanding thespread of technologies, social learning theory, and the economic theory of interacting agents.31.1. OverviewFurther to this motivation, the statistical filtering techniques of this work are more readilyapplicable in the context of social contagions than to classical disease propagation1.When considering adoption of a new technology, influence often seeps in from thosearound us, indirectly by observing their behaviour or directly through recommendations.This diffusive tide of information, central to technological adoption, has been studied in[18, 31] where the diffusion of hybrid corn and CNC machines were investigated, respectively.The application of epidemic and, in particular, Susceptible-Infected-Susceptible (SIS) modelapproximations to the diffusion of technology are considered in [42, 55, 59].The adoption of a technology or behaviour because of a neighbour’s adoption can beviewed as an act of social learning. Adoption is an action which represents a piece ofinformation about the technology or behaviour itself and its quality [18]. The diffusionmodels considered in this work can therefore be considered in the context of agents who arelearning in a social network. The statistical methods presented for tracking infections arethus applicable to tracking the process by which a population learns [49]. These diffusionand learning processes depend upon interaction between and influence by individuals, oragents, in the network. Agent based models have also been used to model infection diffusion[72], highlighting the interrelation between the two models.Models for infection and information diffusion on social networks utilize establishedknowledge to predict how these processes spread, however, in the real world, randomnesscan dominate the ultimate outcome. Consumers may flock to a product for unpredictablereasons, and diseases may be spread or stopped by the whims of a single traveler. Because ofthese random factors, we cannot formulate models which can predict infection/informationdiffusion exactly. To better estimate the state of the diffusion process, we utilize observationsfrom the real world. These observations can come in the form of polls of political preferencesor surveyor information reporting on the state of an epidemic. Since information can rarely1Health related fields adopt new technologies at a highly conservative pace. Significant further researchand development of the ideas presented in this thesis are necessary before the work can be applied in thecontext of mitigating contagion and saving lives. This work serves as a stepping stone, by which we stridecloser to that ultimate goal.41.1. Overviewbe gathered about the entire population of interest, observations provide noisy (containingerror) estimates. To combine the expert information engrained in diffusion models with thecrucial updates provided by real world observations, we use tracking algorithms borrowedfrom previous work in statistical signal processing.In this thesis, we consider the problem of attempting to track the diffusion of informationor infections on social networks. We formulate a model for infection diffusion, apply Bayesianfiltering algorithms to estimate the infected population state, provide extensions to theinfection diffusion model, and finally demonstrate the filtering methods on a dataset derivedfrom analyzing real world tweets relating to the 2011 Arab Spring Uprising.1.1.1 SIS model and Mean Field DynamicsIndividuals interacting with one another may share knowledge, pathogens, political opinions,or particularly funny jokes. Common to all of these is an underlying social mechanism oftransfer: Contact is made, either online or in person, and the phenomena spreads. Byformulating and analyzing a mathematical framework for contact diffusion processes, wemay reveal a realm in which we can better understand and control their spread. A naturalstarting point is in epidemiology, the study of the spread of infectious disease. Infectiousdiseases have a profound impact on human societies. Some infections insidiously lay withina population and damage their hosts without warranting action on a massive scale, whileothers collapse towns, cities and even civilizations. In the threat they pose, they gather agreat deal of interest. One attempt to model infectious disease is the SIS model [71]. In theSIS model, an individual who is susceptible may become infected if they come into contactwith another infected individual and, should they become infected, may recover to becomesusceptible again. The transitions from infected to susceptible (and vice versa) are assumedto be probabilistic. In the SIS model, real world details about specific interactions betweenindividuals, for instance whether an individual sneezed during a encounter, are encapsulatedinto a single transition probability. The scenario outlined by the SIS model emulates a disease51.1. Overviewwhich may be inflicted upon the same individual repeatedly - superficially, like the commoncold. The evolution of such an infection occurs over a structured population. This structureis formed of social connections. Our friends, family, acquaintances, and colleagues form oursocial environment, and the overlap of all social environments form a social network. Asdescribed in [59], there is a wide range of social phenomena such as diffusion of technologicalinnovations, cultural fads, and economic conventions [18] where individual decisions areinfluenced by the decisions of others. A large body of research on social networks has beendevoted to the diffusion of information (e.g., ideas, behaviors, trends) [34], and particularlyon finding a set of target nodes so as to maximize the spread of a given product [19, 65].In the SIS model, given an individual, its neighbourhood and its current state, its nextstate can be considered a weighted coin toss. While these individual interactions are ex-tremely simple, when the entire network is considered, the exact spread of the infection iscombinatorially complex. In statistical physics, a similar scaling of complexity was encoun-tered in the study of spin glasses, lattices of particles with interacting magnetic dipoles.Except in trivial structures, such as lines, the spin systems could not be analyzed exactly.To model these systems, physicists replaced all the exact interactions between particles withinteractions between individual particles and the ensemble average of all other particles.This approximation is called mean field theory (MFT) and can be applied to other manyparticle systems, such as the SIS model.We apply a mean field theory that discriminates between nodes of differing degree.Through this discrimination, we are able to encode network structure into the dynamicsof the infection diffusion. These degree-based mean field dynamics (MFD) approximationsfor SIS model have been derived in [59, 74].There are several models for studying the spread of infection and technology in com-plex networks including Susceptible-Infected-Susceptible (SIS), Susceptible-Alert-Infected-Susceptible (SAIS), and Susceptible-Exposed-Infected-Vigilant (SEIV); see [27, 41]. Susceptible-Infected-Susceptible (SIS) models have been extensively studied in [42, 49, 58, 59, 71, 82, 83,61.1. Overview88] to model information/infection diffusion, for example, the adoption of a new technologyin a consumer market. In this thesis we follow a similar approach and utilize mean fielddynamics to approximate the discrete-valued infection dynamics by a deterministic ordinarydifferential equation. The mean field dynamics yield a tractable model for Bayesian filter-ing in order to estimate the infection dynamics given a sampling procedure for the socialnetwork.1.1.2 Non-linear filter and PCRLB for Bayesian Tracking ofInfected PopulationsThe infection (or information diffusion) dynamics of the SIS model, and the underlying dif-fusion processes which it represents, are non-deterministic. Every disease, product launch,rumor, or news piece is influenced by factors which can best be described as random. Theserandom factors make it impossible to predict apriori (before observation) how many indi-viduals in a network are infected. Since even precise models would be unable to predictthese random elements, we are motivated to observe the state of the system directly. Theseobservations may be formed by sampling 2 a representative subset of the population. If everyindividual in the network could be sampled, we would know the precise infected populationstate at that instant. Unfortunately, sampling can be expensive or prohibitively difficult. Inthe case of infection, sampling may require physically interacting with infected individuals; adangerous task for those involved. Additionally, the individuals we want to sample may notbe known to us. For example, we may be interested in learning about how infection spreadsamongst drug users. In this case, members may be willing to anonymously share informationfor compensation, but the structure and members of this population may be unknown toresearchers. For all of these cases, sampling an entire population at one time instant may notbe possible, and repeatedly sampling all individuals to track an infection over time is even2We use sampling to refer to learning the state of individuals by some means. The mechanism of learningis not specified but may be surveying or querying directly.71.1. Overviewmore challenging. The SIS model for the underling infection dynamics, discussed in Sec. 2.2,contributes information about how the infection evolves. We can utilize this information topredict how the state evolves. To combine this model based knowledge of state evolutionwith the direct observations of the state, we can use a Bayesian filtering algorithm.Bayesian filtering algorithms allow us to combine observations with apriori state estimatesto form a posteriori (after observation) estimate. The most commonly known Bayesian filteris the Kalman filter. The Kalman filter is a finite dimensional optimal filter that combinesnoisy (contains random errors) linear observations of a linearly evolving noisy (affected byrandom changes) state. It assumes Gaussian observation noise and Gaussian state noise.When the state evolution is not linear, or noise is non-Gaussian, other filtering algorithmsmay produce more accurate estimates of the state. The SIS dynamics described in Sec. 2.2are not linear, but we show that the Mean Field Dynamics (MFD) are polynomial in thestate. That is, in MFD the state at timestep n + 1 depends polynomially on the state attime n. We utilize a Bayesian filter which computes an optimal estimate of the state inpolynomially evolving systems with Gaussian state and observation noise.Now that we have a filtering algorithm, we are interested in knowing how “good” theestimates generated by these filtering algorithms are. We compute the Posterior Cramer-Rao Lower Bound (PCRLB) of the non-linear filter. The PCRLB provides a lower boundon the error covariance of our filter estimate. We examine via numerical examples andposterior Crame´r-Rao lower bounds, how state estimation over large networks is affectedby the network; see [79] for posterior Crame´r-Rao bounds for non-linear filters. Numericalexamples illustrate the difference in performance between power law (scale-free) and Erdo˝s-Re´nyi graphs.In this thesis, since we focus on optimal Bayesian estimation (filtering) of the infectiondynamics by sampling the network, we use the SIS model. In small sized sensor networksrepresented by graphs, [75] derived an optimal estimation algorithm, based on the Kalmanfilter, to estimate the state at each sensor. Further, on specific structures like trees, [75] pro-81.1. Overviewvides expressions for the steady-state covariance; see also [69]. Finally, for Kalman filteringof the Susceptible-Exposed-Infected-Recovered model, see [26].1.1.3 Extended Models in Network Analysis: Homophily andGame Theoretic DynamicsThe infection diffusion of Chapter 2 models contagions which spread only as a function ofthe immediate local environment of each individual. This model is sensible for diseasesand some types of information flow, however, it can fail to represent interactions betweenself-interested, rational characters who are interacting on a cooperative or competitive way.Consider the scenario where your decision depends upon the decisions of your neighbours, andyou neighbour’s decision on their neighbours, in this scenario each agent’s actions dependsnot only on their immediate environment, but also on the network as a whole. Take forinstance, the adoption of a new smart phone, the strategy of an individual [buy, not buy]may be dependent on their neighbourhood. Do they have a significant number of friendswith the new phone with whom they can take advantage of shared features, or is the quality(or fitness) of such a strategy insufficient to warrant the new adoption? They may chooseto copy the choice of their friends, but they may also take into account how happy theirfriends are with their current phones, which in turn depends on the current state of all theirneighbours. In this way, the fitness of any one individual’s choice is effected by the entirestate of the network.Game theory provides a mathematical framework for interactions between rational selfinterested persons, where each involved individual, or player, attempts to predict the choicesof its counterpart and act accordingly to maximize its own reward. Evolutionary game theorywas developed as a framework for researchers to explore the survival of traits or strategieswithin a population in a biological context. In evolutionary game theory, individuals in apopulation have traits or strategies which my promote, suppress or be uninfluential in the in-91.1. Overviewdividual’s survival, ultimately affecting their survival fitness in the population. Importantly,evolutionary graph theory explores the temporal trend of these populations as individualsof each type die and reproduce at a rate which is related to their fitness. Concepts such asfitness, birth and death are easily generalized to non-biological systems, where fitness maybe related to monetary/wellbeing outcomes and birth and death may be of ideas, ratherthan beings. Extending the example of consumers deciding whether or not to choose asmartphone, we may consider the level of satisfaction their friends have with their currentsmartphones. In this scenario, satisfaction is analogous to biological fitness. An individualis more likely to copy the smartphone choice, or strategy, of friends who they know to behighly satisfied with their smartphones.The model of infection diffusion considered primarily in this thesis was the SIS model,however, these models do not consider the fitnesses of individuals and their state. To in-corporate this fitness information into interactions we consider the Moran process [73]. Thefitness of individuals is modeled by the predicted payoff of a two player game.In this thesis, we consider the role of network structure in both the pair approximationas well as the mean field approximation. We develop mean field and pair approximationmodels for the Moran Process. Mean field models are developed for death-Birth updating,Birth-death updating and for death-Rebirth updating (Diffusion). A heterogeneous model isalso developed for death-Birth updating in the Pair Approximation. All of these frameworksthen admit the investigation of the effects of network structure on the long term levels ofcooperation. The deterministic trajectories of these models are then compared against sim-ulated trajectories of the level of cooperation in a synthetic network with the desired degreedistribution. Further, we highlight that the mean field and pair-approximation methods arenot amenable to the optimal filtering of Chapter 3, but the infected population state (orcooperation state) can be estimated using particle filtering methods.101.2. Main Contributions1.1.4 Numerical ResultsIn Sec. 2.2 of this thesis, we provide theoretical justification for using a mean field dynamicalmodel of the state evolution, and in Sec. 3.1 we utilize this model in a filtering algorithm. Thetheoretical justification relies upon the assumptions of an accurate model for the underlyingsystem, sufficiently large network size, sufficiently long time horizon, and Gaussian state andobservation noise. To explore the validity of these assumptions we illustrate the performanceof the filtering algorithm on synthetic and real world data. Sec. 5.2 illustrates the SIS modeland the performance of the Bayesian filter on simulated data and examines the sensitivity ofthe filter to the underlying graph model (Erdo˝s-Re´nyi vs scale-free). Sec. 5.4 also presentsa detailed example using a real Twitter dataset. It is shown via a goodness of fit test thatSIS is a reasonable model for information propagation in the Twitter dataset and that theinfected population state can be tracked satisfactorily over time via the Bayesian filter.1.2 Main ContributionsIn the following subsections, a brief list of contributions of each chapter of this thesis is pro-vided. Detailed discussions of each contribution are provided in the corresponding chapters.1.2.1 SIS model and Mean Field DynamicsAs explained in Section 1.1.1, in Chapter 2, a formulation for infection diffusion based onthe SIS model is developed, and the mean field dynamics of the model are presented. Themain contributions of this chapter are:1. A formulation of discrete time infection dynamics on a social network based on the SISmodel.2. Showing that mean field dynamics can be used to approximate the discrete time infec-tion dynamics with a deterministic differential equation.111.2. Main Contributions3. Presenting two sampling methods that can be used to observe the infected populationstate of a social network.1.2.2 Non-linear filter and PCRLB for Bayesian Tracking ofInfected PopulationsAs explained in Section 1.1.2, in Chapter 3, a non-linear Bayesian filter is presented andapplied to the infection diffusion model of 2. The main contributions of this chapter are:1. A formulation of the non-linear Bayesian filter applied to an infection diffusion model2. The Posterior Cramer Rao Lower Bound of the Bayesian filter is computed for syntheticnetworks and compared to the mean square error of the filter estimate for both scale-free and Erdo¨s-Renyi networks.1.2.3 Extended Models in Network Analysis: Homophily andGame Theoretic DynamicsAs explained in Section 1.1.3, in Chapter 4, we consider extensions and alternatives to theSIS model of Chapter 2. The main contributions of this chapter are:1. An extension to the SIS model to include network homophily is presented.2. A framework based upon the Moran process in evolutionary game theory is presentedfor the spread of infection.3. The mean field and pair approximations to the Moran process are extended to complexnetworks.4. A particle filtering method is presented for filtering game theoretic network dynamics.121.3. Outline of the Thesis1.2.4 Numerical ResultsAs explained in Section 1.1.4, Chapter 5, the filter developed in Chapter 3 is applied tosimulated and real world infections. The main contributions of this chapter are:1. The effect of model misspecification and performance of the filter discussed in Chapter3 is examined.2. Numerical examples of the model extensions presented in Chapter 4 are presented.3. The performance of the filter discussed in Chapter 3 when applied to a real worldTwitter dataset is presented.1.3 Outline of the ThesisThe outline of the rest of this thesis is as follows.Chapter 2 formulates a model for representing the diffusion of infection, information,or opinion on a social network. First, we provide a mathematical description of a socialnetwork and introduce two specific types of networks: Erdo¨s-Renyi and scale-free. Next wepresent mean field theory for interacting particle systems. Then, we present a formulationof infection diffusion model based on the SIS model. We show that this discrete time modelcan be approximated by an ordinary differential equation using mean field theory, and thatthe mean field system evolves with polynomial dynamics. Finally, two sampling methodsare presented for observing the infected population.Chapter 3 considers the problem of attempting to track the infection diffusion modeledin Chapter 2. It begins by formulating the mean field dynamics as a polynomial state spaceevolution with observations given by the sampling methods of Chapter 2. An optimal non-linear Bayesian filter is then presented for estimating the infected population state. ThePCRLB is calculated to give a theoretical performance limit of the filtering algorithm forErdo¨s-Renyi and scale-free network types.131.3. Outline of the ThesisChapter 4 considers an extension to the infection model presented in Chapter 2 to modelnetworks with homophily, as well as an alternative game theoretic model based on workdone in Evolutionary Dynamics. The evolutionary model allows for the representation ofcompetition within the population, expanding the scope of diffusive scenarios which can beaccurately approximated. The problem of tracking the infected population state is consideredfor both models; limitations of the Bayesian filter are considered for the game theoretic modeland an particle filter is presented to track the infected population state.Chapter 5 presents numerical simulations and investigations to establish the performanceof the filters described in this thesis. First effect of misspecified models and sampling onfilter performance is investigated. Further, numerical examples of filtering on networks withhomophily and on networks with game theoretic diffusion are presented. Finally, a real worlddataset from Twitter is used to demonstrate the performance of the filter.Finally, Chapter 6 provides a conclusion of the work presented in this thesis and somefuture research directions.14Chapter 2SIS Model and Mean-fieldApproximationThis section discusses the discrete time SIS model [4], [17] and mean field dynamics for thediffusion of information in a social network.2.1 Social Networks and SIS dyamics2.1.1 Social NetworksHumans are social creatures, and prior to the information age, social networks were predom-inantly the networks formed by connection in the physical world. In this social network, anindividual might be considered connected to another if the two know each other, met reg-ularly, or exchanged letters. With the rise of the Internet, the term “Social Networks” hasbecame ubiquitous. It, or its translated variants, are used around the World, and massiveOnline Social Networks have become a cornerstone of communication, entertainment, infor-mation dissemination, and, increasingly, politics. In 2017, among the most dominant SocialNetworks are Internet leviathans Facebook, Twitter, RenRen, Sina Weibo, and Linkedin.The usage of online Social Networks span socio-economic condition, class, education, andinterest. The ubiquity of social networks motivates increased analysis to comprehend howthese networks effect our lives on a personal and societal level. The processes that occuron Social networks are often dependent on the network structure, and so understanding theinterplay between the social network structure and the processes that evolve on the social152.1. Social Networks and SIS dyamicsnetwork is critical to understanding the spread of disease, political influence, and technolog-ical adoption.Mathematically, a social network is represented by graph G:G = (V,E), where V = {1, . . . ,M}, and E ⊆ V × V. (2.1)Here V denotes the set of M vertices (nodes) and E denotes the set of edges (relationships).The degree of a node m is its number of neighbors3:Ξ(m) = |{g ∈ V : g,m ∈ E}|, | · | denotes cardinality.M(d) denotes the number of nodes in the social network G with degree d, and ρ(d) denotesthe degree distribution. That is, for d = 0, 1, . . . , D,M(d) =∑m∈VI(Ξ(m) = d), ρ(d) = M(d)M.Here I(·) denotes the indicator function andD denotes the maximum degree. Since∑d ρ(d) =1, ρ(d) can be viewed as the probability that a node selected randomly on V has degree d.The degree distribution of a network is a powerful summary measure which gives insightinto connectedness and structure of networks. Throughout this work, we concern ourselveswith complex networks. A complex network is a network with non-trivial network topo-logical features, such as a high degree of clustering or heavy-tailed degree distribution. Inparticular, in this thesis, we consider complex networks with two commonly studied degreedistributions: Erdo¨s-Renyi and scale-free networks.3A vertex g is adjacent to a vertex m if there is an edge between them. The neighbors of a node m areall the vertices that are adjacent to m.162.1. Social Networks and SIS dyamicsErdo¨s-Renyi NetworksErdo¨s-Renyi networks are generated by adding links between nodes randomly with someprobability p. These networks are predominantly of interest because of how well studiedthey are in literature. The degree distribution of Erdo¨s-Renyi networks isρ(d) =(M − 1d)pd(1− p)M−1−d (2.2)Scale-Free NetworksA scale-free network is a network with a degree distribution that follows a power law, thatis, ρ(d) ∝ d−γ. These networks are of interest because many real world networks are knowto be scale-free. Notably, network of outgoing links on the Internet has a scale-free degreedistribution [8], and an example considered in this work, the Social Network Twitter has ascale-free Degree distribution.2.1.2 Infection Processes on Social Networks: SIS ModelWe desire to know how infectious processes evolve on Social networks. Take for instance theEbola epidemic that occurred in West Africa from 2013-2016. Ebola is a virulent diseasethat spreads when an uninfected person comes into contact with the bodily fluids of aninfected individual. Infection can then only spread by people who have a physical connection:their neighbours in a network. Understanding this contact process and the role of networkstructure in the process dynamics may give researchers and health workers insight into howto slow, stop, and prevent the spread of such a deadly virus. Modeling such a process isa fundamental problem in epidemiology. One simple, but heavily investigated model is theSusceptible-Infected-Susceptible (SIS) model [4] [17]. In the SIS model, individuals can beinfected or susceptible, that is, they are in a state which is susceptible to becoming infected.172.1. Social Networks and SIS dyamicsAt each time instant n = 0, 1, . . ., an individual, or node, m has statess(m)n ∈ {1 (infected) 2 (susceptible) }.and the entire population can be described by the infected population state at time n, denotedby x¯n, the fraction of nodes with degree d at time n that are infected. That is4x¯n(d) =1M∑mI(Ξ(m) = d, s(m)n = 1), d = 1, . . . , D. (2.3)Infection can pass from infected individuals to their susceptible neighbours. In this thesis,this infection process is modeled probabilistically. At any discrete timepoint a susceptibleindividual, or node, has some probability of becoming infected, dependent on the number ofinfected neighbours and total neighbours it has. An entire population can then be describedas a state of a Markov chain where the transition probabilities depend on the degree of thenode and the number of infected neighbors.The infection dynamics evolves as follows5,6: :Step a.) A node m is chosen uniformly from the vertices in V . Suppose the node has degreeΞ(m) = d and the number of infected neighbors F(m)n = a. The state s(m)n of the node mat time n evolves from state i to j with transition probabilities P¯ij, where i, j ∈ {1, 2}.These transition probabilities depend on the node’s degree d and its number of infected4Note that M(d) = ρ(d)M and the infected population state is scaled by ρ(d) to have a common denom-inator.5The state of the network (2.1) at time n is given by an M dimensional vector with elements 1 or 2.The state space is given by {1, 2}M and the dynamics can be formulated in terms of a transition matrixbetween all possible states [74],[78]. In this thesis, we group the 2M states into D subsets, one for eachdegree, resulting in a state space of dimension 2M(d) for each degree d.6As an example, suppose there are M = 100 nodes, with 30 nodes having degree 1 and 70 nodes havingdegree 2. Then d ∈ {1, 2}, M(1) = 30, M(2) = 70. Suppose the fraction of infected nodes of degree 1 and2, are respectively, x¯n(1) = 3/30, x¯n(2) = 60/70. Then x¯n+1(1) ∈ {2/30, 3/30, 4/30} if a node of degree 1 ischosen in Step (a), while x¯n+1(2) ∈ {59/70, 60/70, 61/70} if a node of degree 2 is chosen in Step (a).182.1. Social Networks and SIS dyamicsneighbors a asP¯ij(d, a) = P(s(m)n+1 = j|s(m)n = i,Ξ(m) = d, F (m)n = a). (2.4)In real world applications, these transition probabilities can be chosen by experts, suchas a team of virologists who can estimate the probability that an individual interactingwith an Ebola infected person becomes, themselves, infected, or determined empirically,if many individual interactions can be observed.Step b.) The population state of degree d is updated as x¯n+1(d)1− x¯n+1(d) = x¯n(d)1− x¯n(d)+ 1M×I(s(m)n+1 = 1, s(m)n = 2)− I(s(m)n+1 = 2, s(m)n = 1)I(s(m)n+1 = 2, s(m)n = 1)− I(s(m)n+1 = 1, s(m)n = 2) .(2.5)Here 1 − x¯n+1(d) denotes the fraction of susceptible nodes of degree d at time n + 1.According to (2.5), the infected population state, x¯n+1, changes by1Mat every time-step depending on the transition probabilities.Here we have presented the SIS model with respect to its classical application in infection,but we can similarly model the spread of influence, interest, or ideas online. Informationcan be passed between individuals, for example by web posts or tweets, and this informationspreads in a way analogous to an infection. A user may tweet about a news event, forexample a shark attack in Sweden. This would be sensational, and the contacts of this usermay see that tweet and share or produce their own tweets about the topic. These people maythen continue to share, and so on and so forth, as the infection propagates throughout theonline social network. This process is known to spread in a way that mimics viral infections[84].192.1. Social Networks and SIS dyamics2.1.3 The Mean Field ApproximationThe SIS model provides a framework for understanding the spread of an infection, but inpractice, even for small network size, there are too many interactions to account for exactly,and the dimensionality of the SIS model, namely 2M - states, becomes intractable for modelingor signal processing algorithms. The social networks we would like to consider are sometimesmassive, such as the Facebook Social network of 1.86 billion active users [1]. To hope toutilize this model in any real world applications, we must use an approximation. For thiswe look towards mean field theory, and in this section we present a mean field dynamicsapproximation to the SIS model.Mean field theory was developed in statistical physics to approximate interactions be-tween large number of particles. Instead of modeling interactions between individual par-ticles, mean field approximations simplify the process by having particles interact with theensemble averages of all other particles. In social network analysis, the we apply this tech-nique to groups of individuals, instead of particles. We consider the propagation of a processspreading throughout the network, for instance a virulent disease or infection.In its most simplistic form, in the mean field evolution of a system, a summarizing statisticof the network, denoted here by α, evolves as a function of only itself:αn+1 − αn = θ[T +(αn)− T −(αn)] (2.6)Here T + is the probability that αn increases by θ, and T − is the probability that αndecreases by θ. θ > 0 is the diffusion parameter which corresponds to the fraction of thepopulation that interacts per timestep. This diffusion parameter, along with the transmissionprobabilities define the timescale of the infection7.This simple mean field theory does not permit a description of network structure beyond7Here we further note that θ may be dependent on time, n, and thus may non-linearly rescale the evolutionof the system. If, for instance, interactions between individuals occur more frequently on a Saturday than aTuesday, this information can be encoded into θ, to allow for more complex modeling.202.2. SIS Model and Mean Field Dynamicsregular networks and thus may fail to capture structures that may change the rate of ofinfection. For example, these mean field models do not adequately account for influentialindividuals who are well connected in a network. Consider a Twitter user with a greatnumber of followers, if they adopt an idea, they may dramatically increase the rate at whichthat idea spreads. Similarly, a social individual may singlehandedly sustain the spread of aninfection by repeatedly interacting with a great number of susceptible persons. To encodesome of these network effects, we consider a more complex mean field description whenmodeling infection on networks.Mean Field Approximation with Network StructureMean field theory can incorporate information about the network degree distribution toallow for the modeling of complex networks [86]. To represent a simple mean field systemwe utilized a single summarizing statistic, α. We can incorporate network structure intothe mean field dynamics by formulating the evolution of a vector of summarizing statistics.In this case we will consider the infected population state, which is differentiated by degreex¯n(d). Now the change in each infection probability x¯n(d) depends on degree d and the entirecurrent state x¯n:x¯n+1(d)− x¯n(d) = θ[T +(x¯n, d)− T −(x¯n, d)] (2.7)Here θ is the fractional change per interaction, and T +(x¯n, d) and T +(x¯n, d) denote theprobabilities that the fraction of infected nodes of degree d increases or decreases, respec-tively. This incorporation of degree information allows us to model networks and differentiatebetween networks based on their degree distributions.2.2 SIS Model and Mean Field DynamicsIf we again consider our social network, over which an infection evolves according to SISdynamics, we can express mean field dynamical equation in terms of our infected population212.2. SIS Model and Mean Field Dynamicsstates x¯n.The probability that a link, chosen uniformly at random among all links of a node chosenuniformly at random in the network at time n, points to an infected node is given by α(x¯n)as the infected link probability. Clearly,α(x¯n) =∑Dd=1 (# of links from infected node of degree d)∑Dd=1 (# of links of degree d)(2.8)=∑Dd=1 d ρ(d) x¯n(d)∑Dd=1 d ρ(d).With αn, the probability that a susceptible node with d infected neighbors has exactly ainfected neighbors is given by the binomial distribution as(da)αan(1−αn)d−a. Implicit to thisbinomial expression is the assumption that nodes associate randomly with other nodes in anetwork, or the homogeneous mixing assumption. This is a common assumption in modelingSIS dynamics and is discussed in [59]. We can now characterize the transition probabilitiesof the entire population process x¯n(d), whose sample path evolves according to (2.5). Thetransition probability from susceptible to infected is:P(x¯n+1(d) = x¯n(d) +1M(d))=d∑a=0P¯21(d, a)P (a out of d neighbours infected)=d∑a=0P¯21(d, a)(da)αan(1− αn)d−a. (2.9)The transition probability from infected to susceptible is evaluated similarly as:P(x¯n+1(d) = x¯n(d)− 1M(d))=d∑a=0P¯12(d, a)P (a out of d neighbours infected)=d∑a=0P¯12(d, a)(da)αan(1− αn)d−a. (2.10)With the transition probabilities (2.9) and (2.10), define the fractional change of infected222.2. SIS Model and Mean Field Dynamicsand susceptible nodes in the population as:P21(d, x¯n(d))∆=(1− x¯n(d))P(x¯n+1(d) = x¯n(d) +1M)(2.11)P12(d, x¯n(d))∆= x¯n(d)P(x¯n+1(d) = x¯n(d)− 1M)(2.12)where P¯12, P¯21 are defined in (2.4). Here (2.11) is interpreted as the fraction of susceptiblenodes (1−x¯n(d)) that become infected (P¯21) and (2.12), the fraction of infected nodes (x¯n(d))that become susceptible (P¯12).Below we formalize the mean field approximation and show that the deviation betweenthe approximation and the true state satisfies an exponential bound, given a time horizonthat is of the same order as the size of the network.Define the following 2− dimensional simplices for d = 1, 2, . . . , D,Sd = x¯n(d)1− x¯n(d) : x¯n(d) ∈ [0, 1]and the following product space, S = S1 × S2 × . . .× SD.With 1 denoting the D dimensional vector of ones, let∼xn = [x¯′n, (1− x¯n)′],P21(∼xn) = [P21(1, x¯n(1)), . . . , P21(D, x¯n(D)), P12(1, x¯n(1)),. . . , P12(1, x¯n(2)), . . . , P12(D, x¯n(D))]P12(∼xn) = [P12(1, x¯n(1)), . . . , P12(D, x¯n(D)), P21(1, x¯n(1)),. . . , P21(1, x¯n(2)), . . . , P21(D, x¯n(D))]. (2.13)Theorem 1 (Mean Field Dynamics). 1. The population dynamics Markov chain∼xn evolvesaccording to the following martingale difference driven stochastic difference equation:∼xn+1 =∼xn +1M[P21(∼xn)− P12(∼xn)] + ζn. (2.14)232.2. SIS Model and Mean Field DynamicsHere ζn is a 2D-dimensional martingale difference process8 with ‖ζn‖2 ≤ ΓM for somepositive constant Γ.2. Consider the following deterministic mean field dynamics process associated with thepopulation state:xn+1 = xn +1M[P21(xn)− P12(xn)] (2.15)where xn ∈ S, xn = [x′n, (1− xn)′], P21(xn) and P12(xn) are defined as in (2.13) andx0 =∼x0.Then for a time horizon of T points, the deviation between the mean field dynamics xnin (2.15) and actual population state∼xn in (2.14) satisfiesP{max0≤n≤T∥∥∥xn+1 − ∼xn+1∥∥∥∞ ≥ } ≤ C1 exp(−C22M)for some positive constants C1 and C2 providing T = O(M)9.Theorem 1 says that for a time horizon of T points, the deviation between the mean fielddynamics (2.15) and actual population state (2.14) satisfies an exponential bound. Since theexponential bound∑M exp(−C2M) is summable, the Borel-Cantelli lemma applies. Thisin turn implies that, if the deterministic population flow remains forever in some subset ofthe state space, then the stochastic process will remain in the same subset space with aprobability arbitrarily close to one, provided that the population is large enough, see [12].The proof of Theorem 1 given in the appendix, is inspired by the proof in [12] for approx-imating stochastic evolution in games. An important feature of the mean field dynamics isthat it has a state of dimension D compared to the intractable state dimensionD∏d=1(M(d)+1)of the infected population state x¯n10, with x¯n(d) ∈ {0, 1M , 2M , . . . , ρ(d)}.For the purposes of this thesis, the key outcome of Theorem 1 is that the mean field8ζn is a martingale difference process if E{ζn|Fn−1} = 0, where Fn−1 denotes the sigma-algebra generatedby {∼x0, ∼x1, · · · ,∼xn−1}.9Here ‖x‖∞ = maxi |x| denotes the l∞ norm of vector x.10Note that here x¯n is scaled by ρ(d).242.3. Samplingsystem xn has polynomial dynamics, where for every d ∈ {1, 2, . . . , D} we have for theinfected population state,xn+1(d) = xn(d) +1M[P21(d, xn(d))− P12(d, xn(d))] . (2.16)These polynomial dynamics will be exploited in chapter 3 for tracking the infected populationstate.2.3 SamplingTo track an infection we must have some method by which to learn about the underlyingstate of the system. The SIS model derived earlier in this chapter informs us on the approx-imate trajectory, however, for finite networks, there is state noise which causes the infectedpopulation state to drift. The state noise in this example arises because even if we knowthe exact network size, structure, and have estimates of the transmission probabilities thatcorrespond exactly to the true transmission probabilities, the interactions between users areprobabilistic. We may know how frequently an individual interacts with their neighbours,but may not know with whom they interact on a particular Sunday, or if they sneeze duringthis interaction. We encode this shortcoming in of knowledge in the probabilistic model.Even knowing the true probabilities, we do not know a priori, except in trivial cases withtransmission probabilities of 0 and 1, if an infection will spread from one user to another ata particular time. We therefore seek to observe the network and learn about the infectedpopulation state through sampling.We consider sampling the social network (2.1). For social networks with large numbers ofnodes, it is prohibitive to query each node. This necessitates choosing a sampling methodol-ogy in order to estimate the infected population state x from a subset of the total nodes. Eachsampled node is asked if it is infected or not and the reply (measurement) noted. Below, weconsider two popular methods for sampling large networks, see [3, 16, 20, 30, 38, 39, 49, 54]252.3. Samplingfor an overview of these methods:2.3.1 Uniform SamplingWhen a list of all members of the network is available, but perhaps have no precise informa-tion of network structure, we can use uniform sampling to estimate the infected populationstate. We can obtain an estimate of the infected population state from sampling a sequenceof ν individuals uniformly from the population. For each degree, ν(d) are sampled generat-ing a sequence of independent observations denoted by {ml, l ∈ {1, 2, . . . , ν(d)}} from thepopulation M(d). The estimate at time n is then given byxˆn(d) =1ν(d)ν(d)∑l=1I(s(ml)n = 1). (2.17)At each time n, xˆn can be viewed as noisy observation of the infected population state xn.2.3.2 MCMC Based Respondent-Driven Sampling (RDS)Respondent-driven sampling (RDS) was introduced by Heckathorn [38, 39] as an approachfor sampling from hidden populations in social networks and has gained enormous popularityin recent years. In RDS sampling, current sample members recruit future sample members.RDS sampling does not require knowledge of the structure of the target population. This isimportant as it allows observation of hidden populations. These populations may be hiddenby choice, such as sex workers and the MSM community11.The RDS procedure is as follows: A small number of people in the target populationserve as seeds. After participating in the study, the seeds recruit other people they knowthrough the social network in the target population. The sampling continues according to11MSM is the health community’s term for men who have sex with men and is a classic example of ahidden social network. Members of the social network may know other members, but they may be reluctantto identify themselves to researchers due to the danger or prejudice involved in being known as a memberof the community.262.3. Samplingthis procedure with current sample members recruiting the next wave of sample membersuntil the desired sampling size is reached. In real world applications, this recruitment istypically driven by monetary incentives for participants who successfully recruit their peers.RDS can be viewed as a form of Markov Chain Monte Carlo (MCMC) sampling (see [33]for an excellent exposition). Let {ml, l ∈ {1, 2, . . . , ν(d)}} be the realization of an aperiodicirreducible Markov chain with state spaceM(d) comprising of nodes of degree d. This Markovchain models the individuals of degree d that are sampled, namely, the first individual m1is sampled and then recruits the second individual m2 to be sampled, who then recruits m3and so on. Instead of the independent sample estimator (2.17), an asymptotically unbiasedMCMC estimate is then generated as∑ν(d)l=1I(s(ml)n =1)pi(ml)∑ν(d)l=11pi(ml)(2.18)where pi(m), m ∈M(d), denotes the stationary distribution of the Markov chain. For exam-ple, a reversible Markov chain with prescribed stationary distribution is straightforwardlygenerated by the Metropolis Hastings algorithm.In RDS, the transition matrix and, hence, the stationary distribution pi in the esti-mate (2.18) is specified as follows: Assume that edges between any two nodes i and j havesymmetric weights Wij (i.e., Wij = Wji, equivalently, the network is undirected). Node irecruits node j with transition probability Wij/∑jWij. Then, it can be easily seen that thestationary distribution is pi(i) =∑j∈V Wij/∑i∈V,j∈V Wij. Using this stationary distribution,along with the above transition probabilities for sampling agents in (2.18), yields the RDSalgorithm.Since a Markov chain over a non-bipartite connected undirected network is aperiodic [85],the initial seed for the RDS algorithm can be picked arbitratily, and the above estimator isasymptotically unbiased [33].The key outcome of Sec. 2.3 is that by the central limit theorem (for an irreducible272.4. Summaryaperiodic finite state Markov chain), the estimate of the probability that a node is infectedin a large population (given its degree) is Gaussian. For a sufficiently large number ofsamples, observation of the infected population state is approximately Gaussian, and thesample observations can be expressed asyn = Cxn + vn (2.19)where vn ∼ N (0,R) is the observation noise with the covariance matrix R and observationmatrix C dependent on the sampling process and xn ∈ RD is the infected population statewhich evolves according to the polynomial dynamics (2.16).2.4 SummaryThis chapter presented a formulation of a social network and infection diffusion dynamicsover that social network. Mean field theory was then utilized to approximate the discretetime diffusion dynamics with an ordinary differential equation. This SIS model and meanfield approximation incorporate information about the degree distribution of the underly-ing network, and so the infection dynamics depend on the underlying network structure.Furthermore, the mean field dynamics are polynomial in the infected population state, aproperty which will be utilized in the Bayesian filtering algorithm of Chapter 3. Finally, uni-form and respondent-driven sampling were presented as methods of generating observationsof the infected population state.28Chapter 3Tracking Infection and FilteringBoundsIn Sec. 2.2, we formulated the mean field dynamics for the infected population state as asystem with polynomial dynamics (2.16) and linear Gaussian observations (2.19) obtainedby sampling the network. In this section, we consider Bayesian filtering of the infectedpopulation state for large networks.The practical applications of knowing the infected probability state include better esti-mating the trajectories of social and infectious epidemics. Consider an infection on a networkof known degree distribution. Again, we consider a viral infection in a real world social net-work, such as the Ebola epidemic that occurred in West Africa from 2013-2016. It may beexpensive or prohibitively dangerous to generate large and frequent samples of the popula-tion to learn about the infection and how it is spreading, but the high level structure of thesocial network itself, such as the network degree distribution, may be approximately known(consider that the social network in the infected areas would likely have a similar underlyingnetwork structure to uninfected areas). We could use expert knowledge to estimate trans-mission matrices P12 and P21. Using these pieces of information we could perform smallersamples over time and use the filtering techniques outlined in this section to generate anestimate that is as accurate as much larger samples. Thus the ability to filter the sampledinformation and determine the infected probability state saves time, money and decreasesthe amount of exposure surveyors will have with infected individuals.It may similarly by difficult to sample on online social networks to learn about techno-293.1. Optimal Filtering of Infected Populationslogical adoption or information diffusion. Online social networks are massive, and responserates to petitions by interested parties are often very low. What is more, individuals becomequickly apathetic to barrages of requests online. Responses are therefore more likely if theycan be incentivised, but incentives dramatically increase the cost per sample request. Thuswe are again motivated to sample less frequently, and with smaller samples. It is throughfiltering that we can do this, while maintaining or surpassing the accuracy of observationsof population states generated by larger samples.3.1 Optimal Filtering of Infected PopulationsWe first describe how to express the mean field dynamics (2.16) in a form amenable toemploying the non-linear filter described in [40]. The non-linear filter in [40] can be usedwith state spaces with polynomial state space evolutions and linear observations, both ofwhich may have Gaussian noise. One of the benefits of integrating domain knowledge of thestate space evolution of our system into a dynamical mean field equation is that this equationhas polynomial dynamics. That is that the mean field equation 2.16 can be expressed,xn+1 = f(xn), where f(·) is a polynomial function. Once the state space has been thuslyexpresed, we can directly implement the filter, here described in Sec. 3.1.2.3.1.1 Mean Field Polynomial DynamicsConsider a D-dimensional polynomial vector f(x) ∈ RD:f(x) = A0 + A1x+ A2xx′ + A3xxx′ + . . . (3.1)303.1. Optimal Filtering of Infected Populationswhere the co-efficients A0, A1, . . . , Ai denote rank 1, 2, . . . , (i + 1) tensors. Aixx . . . x′ is avector with rth entry given byAixx . . . x′(r) =∑j1,j2,j3,...,jiAi(r, j1, j2, . . . , ji)xj1xj2 . . . xjiwhere Ai(r, j1, j2, . . . , ji) is the r, j1, j2, . . . , ji entry of tensor Ai and xj is the jth entry of x.Equation (2.16) has polynomial dynamics and can be expressed in the form of (3.1) byconstructing the tensors Ai from P¯12, P¯21 and ρ(d). We illustrate how to express the meanfield dynamics of (2.16) in the form (3.1) for12 d = 2. First, we note the average degree is∑Dd=1 d ρ(d) and the link probability given in (2.8) can be expressed as αn = φ′xn, where φis defined as:φ =[ρ(1)∑Dd=1 d ρ(d),2ρ(2)∑Dd=1 d ρ(d), . . . ,Dρ(D)∑Dd=1 d ρ(d)]′.Mean field dynamics for d = 2 in (2.16) is given as:xn+1(2) = xn(2) +1M[P21(2, xn)− P12(2, xn)] . (3.2)For all terms with factor P¯12 there is a corresponding term with factor P¯21, so for conveniencewe will account for all of the former with Ω and the latter with Ω∗.xn+1(2) = xn(2) + Ω− Ω∗ (3.3)12The procedure is same for all degrees.313.1. Optimal Filtering of Infected Populationswhere Ω is all the terms in an expanded (2.16), with d = 2, containing P¯12 andΩ = θ[1MP¯12(2, 0)(φ′xn)2 +2MP¯12(2, 1)(φ′xn) − 2MP¯12(2, 1)(φ′xn)2 +1MP¯12(2, 2)− 2MP¯12(2, 2)(φ′xn) +1MP¯12(2, 2)(φ′xn)2 − xnMP¯12(2, 0)(φ′xn)2 − 2xnMP¯12(2, 1)(φ′xn)+2xnMP¯12(2, 0)(φ′xn)2 − xnMP¯12(2, 2) +2xnMP¯12(2, 2)(φ′xn)− xnMP¯12(2, 2)(φ′xn)2]. (3.4)By grouping all of the terms of (3.3) by their order in xn(d), we can generate the tensors of(3.1). The contributions to the tensors of (3.1) by Ω are:A0(2) =θP¯12(2, 2)MA1(2, :) = φ[2θ(P¯12(2, 1)− P¯12(2, 0))M]A2(2, :, :) = φφ′[θ(P¯12(2, 0)− 2P¯12(2, 1) + P¯12(2, 2))M]A2(2, 2, :) = φ[2θ(P¯12(2, 1)− P¯12(2, 0))M]A3(2, 2, :, :) = −φφ′[θ(P¯12(2, 0)− 2P¯12(2, 1) + P¯12(2, 2))M](3.5)where Ai(j1, j2, . . . , ji−1, :, :) is a submatrix of tensor Ai and Ai(j1, j2, . . . , ji, :) is a subvectorof tensor Ai. By following (3.5) for Ω and Ω∗ for all d, we can generate all the coefficientsin the tensors of (3.1) from P¯12, P¯21, and ρ(d). We note that the polynomial that definesthe dynamics of the network is of order D∗ + 1, where D∗ is the highest degree node withcomplex dynamics, i.e: P¯21(d, a) = P¯12(d, a) = κ for all d > D∗ and for all a, where κ isconstant with respect to both d and a.323.1. Optimal Filtering of Infected Populations3.1.2 Optimal Filter for Polynomial DynamicsOptimal Bayesian filtering refers to recursively computing the conditional mean estimateE{xk|Yk}, for k = 1, 2, · · · where Yk denotes the observation sequence y1, . . . , yk. (Theterm optimal refers to the fact that the conditional mean estimate is the minimum varianceestimate). To estimate the infected population state using the sampled observations (2.19),we employ an optimal Bayesian filter recently developed for polynomial systems. In generalfor nonlinear or non-Gaussian systems, there is no finite dimensional filtering algorithm;however, it is shown in [40] that for Gaussian systems with polynomial dynamics, one candevise a finite dimensional filter (based on the Kalman filter) to compute the conditionalmean estimate.Below we present only the relevant terms in the model that are used in the filtering equa-tions, see [40] for details. The non-linear filter prediction and update equations are given as:Prediction step:xˆ−n = E[xn|Yn−1] = E[f(xn−1)|Yn−1]H−n = E[(xn − xˆn)(xn − xˆn)′|Yn−1]= E[(f(xn−1)− E[f(xn−1)|Yn−1] + vn−1)× (f(xn−1)− E[f(xn−1)|Yn−1] + vn−1)′|Yn−1]= E[f(xn−1)f(xn−1)′|Yn−1]− E[f(xn−1)|Yn−1]× E[f(xn−1)|Yn−1]′ + Qn−1(3.6)where xˆ−n denotes the priori expectation of the state vector (infected population state) xn;Yn = {Yn−1, yn} denotes the observation process; H−n denotes the priori state co-varianceestimate at time n; and vn denotes the Gaussian random variable representing the statenoise at time n, with covariance Qn.H−n and xˆn are initialized with some initial estimate xˆ0 and estimate covariance H−0 . Thefilter relies upon being able to compute the expectation E[f(xn−1)f ′(xn−1)|Yn−1] in terms ofxˆn−1 and H−n . When f(·) is a polynomial, f(xn−1)f(xn−1)′ is a function of xn−1, and the333.2. Posterior Crame`r-Rao Boundsconditional expectations in (3.7) can be expressed only in terms of xˆn−1 and H−n , permittinga closed form13 prediction step.Update step:xˆ+n = E[xn|Yn] = xˆ−n +H−n C ′(Rn + CH−n C ′)−1(yn − Cxˆ−n )Kn = H−n C′(Rn + CH−n C′)−1H+n = (I −KnC)H−n (I −KnC)′ +KnRnK ′n(3.7)where xˆ+n denotes the posteriori expectation of the state vector; H+n denotes the posterioristate co-variance estimate at time n; C denotes the state observation matrix; Rn denotes theobservation noise co-variance matrix; Kn denotes the filter gain; and I denotes the identitymatrix.Since the dynamics of (2.16) are polynomial, the prediction and update steps of (3.6) and(3.7) can be implemented without approximation. These expressions constitute the optimalnon-linear filter and can be used to track the evolving infected population state.3.2 Posterior Crame`r-Rao BoundsThe PCRLB provides a theoretical performance limit for an unbiased estimator in a nonlinearfiltering problem [79]. While not of direct practical application, the PCRLB provides anunderstanding of the fundamental limits of our filtering algorithm, and provides a methodby which we can investigate the properties of a network. The PCRLB can, for instance,inform us on the difference in performance limit of different network types, or under differentsampling schemes. Understanding the PCRLB therefore allows us to better design our datagathering methodologies. In PCRLB, the Fisher Information Matrix (FIM) is derived byevaluating the expectation with respect to the joint distribution of the measurements andthe system states up to the current time. It is determined only by model parameters and13For an explicit implementation of such a filter for a third order system with an exact priori updateequation for H−n and xˆ−n , see [40].343.2. Posterior Crame`r-Rao Boundsthe initial state and is thus independent of any specific realization of the system state.Below we compute the PCRLB for the polynomial dynamical system (2.16) with linearobservations with Gaussian noise (2.19). We first consider the dynamical system with meanzero Gaussian state noise having the covariance matrix Q, then formulate the PCRLB forthe polynomial dynamical system (2.16) without any state noise.Let Xn = {Xn−1, xn} denote the state sequence at time n and Jn denote the ((n+1)D×(n+1)D) Fisher information matrix14 of Xn [79]. The following recursion is used to evaluate Jnin [79]. Let P(X, Y ) = pn denote the joint distribution and ∆yx = ∇x∇′y denote the vectordifferential operator. Then [79],Jn = E{−∆xnxn log(pn)} − E{−∆Xn−1xn log(pn)}[E{−∆Xn−1Xn−1 log(pn)}]−1E{−∆xnXn−1 log(pn)}and corresponds to the inverse of the D ×D right lower block of [Jn]−1. The recursion forJn is given by:Jn+1 = Λ22n − Λ21n(Jn + Λ11n)−1Λ12nwhereΛ11n = E{(∇xnf ′n(xn))Q−1n (∇xnf ′n(xn))′}Λ12n = E{∇xnf ′n(xn)}Q−1nΛ21n = {D12n }′Λ22n = Q−1n + CR−1n C′(3.8)where C is the linear observation matrix and R is the observation noise covariance matrix14The error covariance P is,P = E{(Xˆ −X)(Xˆ −X)′} ≥ J−1where J is the Fisher Information matrix, X is the state, Xˆ is the state estimate, and Y measured data.The elements of the Fisher information matrix J are given by:Jij = E(∂2 log px,y(X,Y )∂Xi∂Xj)353.2. Posterior Crame`r-Rao Boundsof (2.19) and ∇xnf ′n(xn) is given as:∇xnf ′n(xn) =[∂f∂xn(0),∂f∂xn(1), . . . ,∂f∂xn(D)]′∂f∂xn(0)= 0 +∂∂xn(0)[A1xn] +∂∂xn(0)[A2xnx′n] +∂∂xn(0)[A3xnxnx′n] (3.9)Thus∇xnf ′n(xn) = A1 + (A2 + A′2)xn + (A3ijk + A3jki + A3kji)xnx′n (3.10)where A3ijk indicates the ordering of indices of the tensor A3 and is analogous to a higherdimensional transpose.When there is no state noise in the system evolution, as in the mean field dynamicsequation (2.16), to compute the PCRLB, we perturb the state evolution in (2.16) withpairwise independent Gaussian random vectors having covariance matrix Q = I, replacingthe singular state evolution by a perturbed system p(xn+1|xn). For the purturbed systemwe have,− log p(xn+1|xn) = c+ 12{xn+1 − fn(xn)}′Q−1 {x¯n+1 − fn(xn)} (3.11)where c is a constant. The recursion for Jn is then given by:Jn+1 = Λ22,n − Λ21,n(Jn + Λ11,n)−1Λ12,nwhereΛ11,n =1E{(∇xnf ′n(xn))(∇xnf ′n(xn))′}Λ12,n =1E{∇xnf ′n(xn)}Λ21,n = {Λ12n }′Λ22,n =1I + CR−1n C′(3.12)363.2. Posterior Crame`r-Rao BoundsPCRLB: Erdo˝s-Re´nyi vs Scale-FreeWe used the above formulas to compute the PCRLB for the mean field dynamics model(2.16), under two simulation schemes.Scheme A.) Degree distribution based: The deterministic mean field evolution given in (2.16) issimulated with linear Gaussian observations (2.19) and zero state noise. The infectiontransition probabilities P¯12 and P¯21 were chosen uniformly at random over [0, 1]. Theobservation noise covariance matrix R is a random positive definite matrix15 withentries in [0, 10−6]. We note that the diffusion parameter θ = 1, observation matrixC = I, and maximum degree D = 20 unless otherwise stated.Scheme B.) Network based: The network based simulation consisted of generating a network, prop-agating an infection according to Sec.2.1.2 over this network, and then finally samplingthat network using the uniform sampling mechanism described in Sec.2.3. For the net-work simulations, scale-free networks of M = 10000 nodes were generated using theBarabasi-Albert model and Erdo˝s-Re´nyi networks of M = 10000 were generated byrandomly creating links between nodes. The infection was initialized by infecting nodesat time n = 0 with probability 0.01. This infection was propagated for 10000 timesteps.At each timestep 5000 samples were obtained according to the uniform sampling de-scribed in Sec.2.3 to generate an observation at that timestep. These observations werethen filtered. State and observation covariance matrices Q and R used in filtering werecomputed empirically from the data. State and observation noise covariance matriceswere estimated to be constant for all time steps n, that is Qn = Q and Rn = R for alln.We computed the PCRLB for two different network types:Type a.) Scale-free network with degree distribution ρ(d) ∝ d−γ where γ = 2.7. Such networks15One way is to generate a sample from the probability measure on real symmetric positive-definite matricessuch as Inverse-Wishart.373.3. Summaryarise in online social networks such as Twitter [51] and in the link network of the WorldWide Web [2].Type b. Erdo˝s-Re´nyi network with degree distribution ρ(d) ∝ e−λdd!where λ = 2.7. 16Fig. 3.1 displays the PCRLB and mean square error of the degree based simulations.Interestingly, it can be seen from Fig. 3.1 that both PCRLB and its slope are insensitiveto the underlying network structure when observation noise covariance, R in (2.19), is notnetwork dependent. Under the mean field model, differences between PCRLB for differentnetwork types can therefore be attributed to either different state or different observationnoise covariances. The PCRLB and Bayesian filter estimate mean square error of networkbased simulations are shown in Fig. 3.2. In Fig. 3.2, a network of 10000 nodes is generatedand an infection is propagated through the network over 10000 time steps. This infection isthen sampled 5000 times at each timestep to generate observations of the infected populationstates. State and observation covariances are computed empirically and assumed constantthroughout the duration of the system dynamics. The slower convergence of the filter esti-mate on an Erdo˝s-Re´nyi when compared to a scale-free network filter estimate correspondsto the Erdo˝s-Re´nyi network observations having a larger observation noise covariance. It isseen that PCRLB gives a lower bound on the MSE.3.3 SummaryThis Chapter presents a filtering method for tracking the infected population state of asocial network. This enables the combination of apriori knowledge of the infection dynamicswith observations to better estimate the fraction of infected users in a network. First, weexpressed the mean field dynamics of Sec. 2.1.2 in a polynomial form, which is amenablethe non-linear Bayesian filtering algorithm described in Sec. 3.1. The non-linear Bayesianalgorithm filters observations to produce an optimal estimate of the infected population16The value λ = 2.7 was chosen since it is similar to the the out-degree of the World Wide Web, see [15]383.3. SummaryTime step(k)0 50 100 150 200 250 300Log-MeanSquareError-13-12-11-10-9-8-7-6-5-4Bayesian optimal filter - Scale-Free NetworkBayesian optimal filter - Erdos Renyi NetworkPCRLB - Scale-Free NetworkPCRLB - Erdos Renyi NetworkFigure 3.1: Filtered estimate log mean square error and PCRLB for deterministic mean fieldevolution with scale-free and Erdo˝s-Re´nyi degree distributions.393.3. SummaryTime step(k)0 1000 2000 3000 4000 5000Log-MeanSquareError-9-8-7-6-5-4-3-2-10Bayesian optimal filter - Scale-Free NetworkBayesian optimal filter - Erdos Renyi NetworkPCRLB - Scale-Free NetworkPCRLB - Erdos Renyi NetworkFigure 3.2: Filtered estimate log mean square error and PCRLB of simulated networks -scale-free and Erdo˝s-Re´nyi.403.3. Summarystate. This online estimation tracks the infected population state as it evolves over time.Further, the PCRLB was computed to provide a theoretical performance limit of the filteringproblem. The PCRLB was computed for two network types, scale-free networks and Erdo˝s-Re´nyi networks, and compared.41Chapter 4Extended Diffusion Models:Homophily and Game TheoreticDynamicsThe SIS model, like any model, is a simplification of a real world phenomena. Infections inthe real world operate with far more complexity than can be adequeately summarized by asimple SIS model. To better describe real world infections and opinion diffusion we considera powerful extension to the SIS model, as well as an alternative model for the underlyinginfection dynamics. We extend the SIS model to incorporate homophily. Homophily is thetendency for individuals to engage with people similar to themselves [63]. These similaritiescharacterize distinct groups, and these groups may affect how infections spread through anetwork. For example these distinct groups may be formed by the political affiliations ofindividuals within the network. This affiliation may then impact how knowledge of a newsitem diffuses through the network. This incorporation of types and groups into the modeldoes not impede the use of the Bayesian polynomial filter, and thus we can still use thisoptimal filtering algorithm to track infections in networks with homophily. The alternativemodel for infection dynamics we present in this thesis is one which inorporates game theoreticinteractions into the network dynamics. The Moran process from evolutionary dynamics isadopted as the underlying model for infection (or rather, in this case cooperation) diffusion.The Moran process models the dynamics of a population where individuals belonging tosubtypes or substrategies interacts temporally on a stuctured network. This model, when424.1. Infected Population Estimation in Homophily Networksapproximated with mean field dynamics, does not generally have a polynomial state evolu-tion, which means the polynomial filter can not be applied without futher approximation.We also consider the pair approximation for the Moran process, in which the state incorpo-rates additional information about the local infection/cooperation structures of the network.The pair approximation does not admit a polynomial state evolution. We therefore utilizea particle filter for tracking the infected/cooperating population state which can be utilizedalong with the mean field or pair approximations.4.1 Infected Population Estimation in HomophilyNetworksAn infection doesn’t affect the elderly the same way it affects the young. Minor annoyancesfor a student athlete can be hospitalizations for the very young or immunocomprimised.Similarly, your engagment with a news article may depend on your prior previous affiliations,and your initial exposure to that article may depend on how many of your friends share yourpolitical affiliation or subscribe to different beliefs. This more complex diffusion and networkdependence on individual types can be modeled by a network with homophily. In this sectionwe extend the SIS model presented in Sec. 2.2 to include homophily, using the approach in[57].4.1.1 SIS Model with HomophilyLet the social network be comprised of B distinct types of individuals17. Let µb denote thefraction of individuals of type b, for b ∈ {1, 2 . . . ,B}. We define the total number of nodesof type b asM b =∑m∈VI(T(m) = b). (4.1)17For example these types may be groups characterized by political affiliation, age, or income.434.1. Infected Population Estimation in Homophily Networkswhere we define T(m) to be the type of node m. We define the total number of nodes ofdegree d and type b asM b(d) =∑m∈VI(Ξ(m) = d,T(m) = b). (4.2)The degree distributions of each type can be different from one another, and so we specifythe degree distribution of each type b as ρ[b](d), whereρ[b](d) =M b(d)M b. (4.3)These different types interact with each other according to the transition matrix Υ, withentries ςbc = P (individual of type c meets individual of type b), and is defined as:Υ =ς11 . . . ς1B... . . ....ςB1 . . . ςBB (4.4)We denote by x¯n,b(d) the fraction of agents of type b and degree d that are infected at time n.Recall that in Sec. 2.1.2 we defined the infected link probability, α. Since types of individualsinteract with others at different rates, ς, they do not share the same infected link probability.Instead of one infected link probability, we define a vector of infected link probabilities, onefor each type:αb(x¯n) =B∑c=1ςbc∑Dd=1 d ρ[c](d) x¯n,c(d)∑Dd=1 d ρ[c](d). (4.5)In an SIS model with homophily, the transition probabilities between infected and susceptiblestates, first introduced in (2.4), may depend on the type b of the node m. Therefore we defineP¯ bij(d, a) = P(s(m)n+1 = j|s(m)n = i,Ξ(m) = d, F (m)n = a,T(m) = b)(4.6)As in Sec. 2.1.2, with αn,i and P¯bij, the transition probability from susceptible to infected444.1. Infected Population Estimation in Homophily Networksfor an individual m of type b can be evaluated as:P(x¯n+1,b(d) = x¯n,b(d) +1M b(d))=d∑a=0θP¯ b21(d, a)(da)αan,b(1− αn,b)d−a. (4.7)where M(d, b) =∑m∈V I(Ξ(m) = d,T(m) = b)is the total number of nodes of degree d andtype b. Similarly, the transition probability from infected to susceptible is evaluated as:P(x¯n+1,b(d) = x¯n,b(d)− 1M b(d))=d∑a=0θP¯ b12(d, a)(da)αan,b(1− αn,b)d−a. (4.8)With the transition probabilities (4.7) and (4.8), we define the change on the fraction ofinfected and susceptible nodes in the population as:P b21(d, x¯n,b(d))∆=(1− x¯n,b(d))ρ[b](d)P(x¯n+1,b(d) = x¯n,b(d) +1M b(d))(4.9)P b12(d, x¯n,b(d))∆= x¯n,b(d)ρ[b](d)P(x¯n+1,b(d) = x¯n,b(d)− 1M b(d))(4.10)where P¯ b12, P¯b21 are defined in (4.6). As in Sec. 2.2, we use Pb12 and Pb21 along with Theorem 1 toexpress a mean field system xn with polynomial dynamics where for every d ∈ {0, 1, 2, . . . , D}and every b ∈ {1, 2, . . . ,B} we have for the infected population state,xn+1,b(d) = xn,b(d) +1M[P b21(d, xn,b(d))− P b12(d, xn,b(d))]. (4.11)The addition of homophily to the SIS model changes the state evolution, but P bij(d, xn,b(d))of (4.11) and Pij(d, xn(d)) of (2.16) are both polynomial functions of their respective statesxn,b(d) and xn(d). Thus the addition of homophily preserves the polynomial dynamics of(2.16) and we can again exploit this polynomial state evolution to apply the optimal filterof Sec. 3.1.454.2. Moran process: Game Theoretic Network Dynamics4.2 Moran process: Game Theoretic NetworkDynamicsIn this section, we present an alternative model for infection and opinion diffusion on socialnetworks: the Moran process. The Moran process is a model used in mathematical biol-ogy to investigate and analyze evolution and the adoption of mutations in a population.In the Moran process, diffusion is dependent on the defined by the expected outcomes ofgames between individuals on the network. This process is distinguished from the SIS modelbecause the infection/cooperation transmission probabilities may depend on the current in-fection/cooperation state of the network. We then present mean field approximations anda pair approximation to the Moran process which incorporate network structure throughthe network’s degree distribution, extending upon existing work in evolutionary dynamicswhich presents mean field and pair approximations without additional network structureinformation [56] [68].4.2.1 Moran ProcessWe consider the Moran process with frequency dependent fitness to model the evolution ofcooperation in finite populations [56] [68]. Similar to the discrete SIS model, the Moranprocess considers a population of constant size M . In the Moran process individuals interact-play a game- with their environment. These individuals can adopt one of two states: theycan cooperate (C) with their neighbors or defect (D). Thus the state of each node is binary,as in Sec. 2.2, and at each time instant n = 0, 1, . . ., an individual, or node, m has statess(m)n ∈ {1 (Cooperating), 2 (Defecting) }.The individual is then assumed to receive some payoff which is dependent on their action andthe actions of their neighbors. Payoffs for games between two players can be summarized464.2. Moran process: Game Theoretic Network Dynamicswith a 2x2 table, where the row player is the individual chosen for interaction and the columnplayer is a neighbour with which he interacts:C D C a bD c dThat is, if an individual chooses to cooperate, and the neighbour with whom he is inter-acting chooses to also cooperate, he receives a payoff of a. Similarly, if he cooperates and hisneighbour defects, he receives a payoff b. The quintessential example of a two-player game isthe prisoner’s dilemma (PD), a game in which c > a and d > b. In the prisoner’s dilemma,two criminals are being individually questioned for a crime they committed. The investiga-tors have enough evidence to convict each prisoner for minor sentences, but need testimonyfor major convictions. To coax each prisoner to betray the the other, the investigators of-fer a deal: “Squeal on your fellow prisoner and we’ll reduce your sentence”. Cooperatingwith your fellow prisoner would then be staying silent, and not squealing on them. We canimagine a payoff matrix which looks like:C D C −1 −5D 0 −3Thus, if both prisoners stay silent each receives a minor sentence, but if one squeals and theother stays silent, then the silent individual is punished heavily and the prisoner who squealsgoes free. If both prisoners squeal, both receive a moderate sentence. This scenario is ofinterest, because the cooperating strategy is dominated by the defecting strategy. Regardlessof what the other prisoner does, a rational prisoner should choose to defect to minimize itssentence. This dominant strategy does not, however, minimize jail time for the group. The474.2. Moran process: Game Theoretic Network Dynamicslowest net jail time is both prisoner’s cooperating with each other and staying silent. Inthe structured Moran process, we model a population which interacts through playing thesegames.In the Moran process, the specific neighbour with whom an individual interacts is chosenfrom amongst its set of neighbours. Under frequency dependent selection, we define thepayoff for a cooperator with d neighbors, a cooperating neighbors, and d − a defectingneighbors to bepi†1(d, a) = a · a+ (d− a)bSimilarly, the payoff for a defector with d neighbors, a cooperating neighbors, and d − adefecting neighbors ispi†2(d, a) = c · a+ (d− a)d.We do not always know the exact neighbourhood of the individuals we are interested in.We thus look to the average fitnesses, given some infected link probability or, analogously,cooperating link probability αn:pi1(d, αn) = a · αnd+ (d− αnd)b pi2(d, αn) = c · αnd+ (d− αnd)d. (4.12)We presume that the fitness of an individual is linear in these payoffs. For some selec-tion strength w ∈ [0, 1] we have that the fitness of an individual with d neighbours and acooperating neighbours is:f †1(d, αn) = 1− w + wpi†1(d, a) , f †2 = 1− w + wpi†2(d, a)and a randomly selected cooperating/defecting individual have respective fitnesses:f1(d, αn) = 1− w + wpi1(d, αn) , f2 = 1− w + wpi2(d, αn).484.2. Moran process: Game Theoretic Network DynamicsNote that the standard Moran process is for an unstructured population, that is allindividuals interact with all others. This is equivalent to the complete network case, whereindividuals are arranged in a complete graph structure with d = N − 1, where N is the totalnumber of individuals in the network.Types of UpdatingIn the Moran process, at each timepoint n an update occurs. The Moran process assumes aconstant network size of M individuals, and at each timepoint n an update occurs in whichone individual “dies” or becomes open to changing their opinion/adopt a product/becomeinfected and another individual gives “birth” or influences/infects the now open individ-ual. The probability that a particular individual “dies” or “gives birth” may depend ontheir fitness. We first present two types of updating commonly investigated in evolutionarydynamics [67].1. death-Birth updating: One individual in the population, m, is selected uniformly atrandom to die. A new individual is born at the vacant spot and its neighbors com-pete to influence its strategy s(m)n+1. The strategy of the newborn individual is chosenprobabilistically from its neighbour’s strategies with probability weight according tothe fitness of the competing strategies.P¯21(d, a, αn) = P(s(m)n+1 = 1|s(m)n = 2,Ξ(m) = d, F (m)n = a, αn)=af1(d, αn)af1(d, αn) + (d− a)f2(d, αn) .(4.13)2. Birth-death updating: One individual from the population, m′, is chosen with proba-bility proportional to their fitness to reproduce. This individual then picks one of itsneighbours, m, uniformly at random, and changes the strategy of that neighbour to itsown.First, we consider the probability that a node m′ is picked from the population to494.2. Moran process: Game Theoretic Network Dynamicsreproduce. A randomly chosen node has degree d and a cooperating neighbours withprobability x¯n(d)ρ(d)(da)αdn(1 − αn)(d−a) and has fitness f2(d, a). We can, therefore,express the probability that the reproducing node is a cooperator with degree d anda cooperating neighbours as the fractional fitness of all nodes with degree d and aneighbours over the fitness of the entire network.P(s(m′)n = 2,Ξ(m′) = d, F (m′)n = a)=1Zx¯n(d)ρ(d)(da)αdn(1− αn)(d−a)f2(d, a) (4.14)where Z is a normalization function representing the total fitness in the network:Z =D∑d=1d∑a=1x¯n(d)ρ(d)(da)αdn(1− αn)(d−a)f2(d, a)+D∑d=1d∑a=1(1− x¯n(d))ρ(d)(da)αdn(1− αn)(d−a)f1(d, a).(4.15)Second, given thatm′ is chosen to reproduce, it selects a neighbour uniformly at randomto influence. The node m′ has a cooperating neighbours and d defecting neighbours,so a defecting neighbour is influenced with probability:P¯21(d, a, αn) = P(s(m)n+1 = 1|s(m)n = 2, s(m′)n = 2,Ξ(m′) = d, F (m′)n = a, αn)=d− ad(4.16)We also present a third updating method, death-Rebirth updating, because the popula-tion dynamics can be expressed using the SIS dynamics presented in Sec. 2.2.1. death-Rebirth updating (Diffusion): One individual in the population is selected uni-formly at random to die. This focal individual is removed and a new individual isborn in its place. The newborn then chooses a strategy according to the fitness of each504.2. Moran process: Game Theoretic Network Dynamicsstrategy available to it, given its neighbourhood.P¯21(d, a, αn) = P(s(m)n+1 = 1|s(m)n = 2,Ξ(m) = d, F (m)n = a, αn)=f †1(d, a)f †1(d, a) + f†2(d, a). (4.17)4.2.2 Mean Field and Pair Approximations to the Moran ProcessHere we present mean field state evolutions for each of the updating scenarios described inSec II.Mean Field Dynamics: death-BirthIn the death-Birth mean field approximation, a node m of degree d and state s(m)n = 2(defecting strategy) is chosen at time n with probability ρd(1 − x¯n(d)), and this node hasa neighbours with a cooperating strategy with probability(da)αan(1 − αn)(d−a) , finally acooperating neighbour is chosen to influence m with probability given in 4.13. Analogousprobabilities exist for calculating the probability that a cooperating strategy is replaced bya defecting strategy. Thus the update equation for population state x¯n in the death-Birthscenario is:x¯n+1(d)− x¯n(d) = 1M(T +(x¯n, d)− T −(x¯n, d))T +(x¯n, d) =d∑a=0(da)αan(1− αn)(d−a)af1(d, αn)af1(d, αn) + (d− a)f2(d, αn)(1− x¯n(d))T −(x¯n, d) =d∑d′=0(da)αan(1− αn)(d−a)(d− a)f2(d, αn)af1(d, αn) + (d− a)f2(d, αn) x¯n(d)(4.18)Recall that αn is a function of x¯n. In the Moran process, when the prisoner’s dilemma isplayed under weak selection w << 1 there may paradoxically be a surviving fraction ofcooperators in the network.514.2. Moran process: Game Theoretic Network DynamicsMean Field Dynamics: Birth-deathIn the Birth-death mean field approximation, a node is chosen with probability proportionalto its fitness, so a node with a cooperating strategy and degree d is chosen and replaces adefecting node m with probability.P(s(m)n = 1|s(m)n = 2)=P(s(m)n = 1|s(m)n = 2, s(m′)n = 2,Ξ(m′) = d, F (m′)n = a)× P(s(m′)n = 2,Ξ(m′) = d, F (m′)n = a)=1ZD∑d=1d∑a=1x¯n(d)ρ(d)(da)αdn(1− αn)(d−a)f2(d, a)d− adFor convenience we defineΩ1 =1ZD∑d=1d∑a=1(1− x¯n(d))ρ(d)(da)αdn(1− αn)(d−a)f1(d, a)adΩ2 =1ZD∑d=1d∑a=1x¯n(d)ρ(d)(da)αdn(1− αn)(d−a)f2(d, a)d− adIf a cooperating node reproduces and replaces a defecting node then the probability that thedefecting node has degree d is given by ρ(d)(1−x¯n(d))d∑d ρ(d)(1−x¯n(d))d . We notice, however, that the focalnode can have any degree and still reproduce on a node of different degree. Thus we mustsum the contributions from all degrees and our update is given:x¯n+1(d)− x¯n(d) = 1M(T +(x¯n, d)− T −(x¯n, d))T +(x¯n, d) = ρ(d)(1− x¯n(d))d∑d ρ(d)(1− x¯n(d))dΩ2ZT −(x¯n, d) = ρ(d)x¯n(d)d∑d ρ(d)x¯n(d))dΩ1Z(4.19)524.2. Moran process: Game Theoretic Network DynamicsAll degrees have had their dynamics tied together by the terms ΩiZ , i ∈ {1, 2}. This mean fieldapproximation does not allow for nodes of different degree to reach different equilibria. Thereason why cooperation cannot persist when the prisoner’s dilemma is played is that, dueto the nature of random replacement (regardless of fitness), local structures do not promotecooperation.Mean Field Dynamics: death-Rebirth (Diffusion)In the death-Rebirth mean field approximation, a node m of degree d and state s(m)n = 2(defecting strategy) is chosen at time n with probability ρd(1− x¯n(d)), and this node has aneighbours with a cooperating strategy with probability(da)αan(1− αn)(d−a). It now changesits state according to its own fitness in its neighbourhood. It therefore becomes a coop-erator with probability proportional to the fitness of the cooperating strategy. Analogousprobabilities exist for calculating the probability that a cooperating strategy is replaced bya defecting strategy. Thus the update equation for population state x¯n in the death-Rebirthscenario is:x¯n+1(d)− x¯n(d) = 1M(T +(x¯n, d)− T −(x¯n, d))T +(x¯n, d) =d∑a=0(da)αan(1− αn)(d−a)f †1(d, a)f †1(d, a) + f†2(d, a)(1− x¯n(d))T −(x¯n, d) =d∑d′=0(da)αan(1− αn)(d−a)f †2(d, a)f †1(d, a) + f†2(d, a)x¯n(d)(4.20)Because the fitnesses of the reproducing node does not depend on the current cooperationstate of the network, the dynamics of the moran process with death-Rebirth updating can berepresented exactly with the SIS model. The death-Rebirth Moran process can be trackedusing the optimal Bayesian filter of Chapter 3.Mean field approximations do not incorporate local structural information that can be534.2. Moran process: Game Theoretic Network Dynamicseffect long term cooperation (or infection) levels. We can extend our analysis to determinethe effect network structure has on long term cooperation. This can be done using the pairapproximation.Pair Approximation: death-BirthThe mean field description of spreading cooperation does not incorporate local structuralinformation. The network structure is assumed to be entirely encoded in the degree dis-tribution, and the population is assumed to be mixed such that there are no connectioncorrelations between nodes of the same state. That is, cooperating nodes do not have ahigher probability of being connected to other cooperating nodes than defecting nodes beingconnected to cooperating nodes. This assumption can be relaxed using the pair approxima-tion. Local information can be critical to appropriately characterizing the Moran process.Under weak selection prisoner’s dilemma cannot result in a positive steady state fraction ofcooperators, however under the pair approximation, the steady state cooperating populationstate may have a positive fraction of cooperators. This is because local areas of cooperatorsmay be more fit than defectors, resulting in higher population fitness, even when the indi-vidual stategy for defecting is always the better (dominant) individual choice. In the pairapproximation, the state of the network is summarized not simply as the degree segregatedcooperation probabilities, x¯n, but also the conditional probabilities that a neighbour of acooperating individual is cooperating/defecting, the conditional infected population state u¯,where:u¯i|j(d) =1MM∑m=1I(Ξ(m) = d, s(m)n = i∣∣s(g)n = j : g ∈ V : g,m ∈ E), d = 1, . . . , D. (4.21)These conditional probabilities provide a more local description of the cooperation state.Note that we can also represent this conditional probability as the joint probability thata node of degree d and a randomly chosen neighbour are both cooperators u¯11(d) over the544.2. Moran process: Game Theoretic Network Dynamicsprobability that a node of degree d is a cooperator x¯n(d), that is u¯i|i(d) =u¯ii(d)x¯n(d). We canrepresent the state of the network can be described in as an augmented state:z¯n(d) = [x¯n(d), u¯11(d)] , d = 1, . . . , D. (4.22)We can express all joint and conditional probabilities relating cooperating and defectingneighbour pairs as follows:u¯11(d) = u¯1|1(d) · x¯n(d) (4.23)u¯21(d) = u¯12(d) = (1− u¯1|1(d))x¯n(d) (4.24)u¯22(d) = 1− x¯n(d)(2− u¯1|1(d)) (4.25)u¯2|2(d) = 1−x¯n(d)(1− u¯1|1(d))(1− x¯n(d)) (4.26)u¯1|2(d) =x¯n(d)(1− u¯1|1(d))(1− x¯n(d)) (4.27)u¯2|1(d) = 1− u¯1|1(d) (4.28)Pair approximation can, for instance, more accurately describe a population in which thereare high cooperator-cooperator/defector-defector correlations. These correlations may allowcooperation to persist. The superiority of this approximation is made evident in the settingwhere two dense network substructures are connected by a chain of individuals, such thatthe network may be visualized in a dumbbell shape. Consider network dynamics wherein anode m has fitness 1 if it has more cooperating neighbours than defecting neighbours:P¯21(d, a, αn) =1, a > d20, a < d20.5, a = d2, P¯12(d, a, αn) =1, a < d20, a > d20.5, a = d2If one dense subnetwork is of all cooperators and the other all defectors, then a positivefraction of cooperators is expected in the steady state system. Consider this same sce-554.2. Moran process: Game Theoretic Network Dynamicsnario under the mean field approximation, here the populations of cooperators and defectorsare presumed mixed and the steady state system will drift towards the state (cooperat-ing/defecting) of the largest subnetwork. Clearly, the pair approximation provides a betterdescription of the system dynamics. In evolutionary game theory, these local state cor-relations are important as it is known that in regular networks, the pair approximationsupports a positive fraction of cooperators in the defection dominated prisoner’s dilemmaunder death-Birth updating. Previous derivations of heterogeneous pair approximation didnot lend themselves to the Moran process with general fitness, motivating us to explorefurther [64] [13].Much like in the mean field approximation we have derived degree dependent updateequation, only now we must update both the cooperating population state and the condi-tional cooperating population state simultaneously. To perform this update, we generalizethe cooperating link probability αn to the conditional cooperating link probabilities αn,i|j forstates i, j ∈ {1, 2}:αi|1(z¯n) =∑Dd=1 (# of links from infected node of degree d and type 1 to type i)∑Dd=1 (# of links of degree d and type 1)(4.29)=∑Dd=1 d ρ(d) u¯i|1(d)∑Dd=1 d ρ(d)x¯n(d).With this conditional link probability, we can represent the expected payoff of a cooper-ating/defecting individual:pi1(d, αn,1|1, αn,2|1) = a·αn,1|1d+(d−αn,2|1d)b pi2(d, αn,2|2, αn,1|2) = c·αn,1|2d+(d−αn,2|2d)d.(4.30)In the death-Birth pair approximation, the infected population state changes in much thesame way as in the mean field approximation, only now the cooperating link probabilitydepends on the state of the node being chosen. A node m of degree d and state s(m)n = 2564.2. Moran process: Game Theoretic Network Dynamics(defecting strategy) is chosen at time n with probability ρd(1 − x¯n(d)), and this node hasa neighbours with a cooperating strategy with probability(da)αan,1|2(1 − αn,1|2)(d−a) , finallya cooperating neighbour is chosen to influence m with probability given in 4.13. Analogousprobabilities exist for calculating the probability that a cooperating strategy is replaced bya defecting strategy. Thus the update equation for population state x¯n in the death-Birthscenario is:x¯n+1(d)− x¯n(d) = 1M(T +(x¯n, u¯11, d)− T −(x¯n, u¯11, d))T +(x¯n, u¯11, d) =d∑a=0(da)αan,1|1(1− αn,1|1)(d−a)af1(d, αn)af1(d, αn) + (d− a)f2(d, αn)(1− x¯n(d))=d∑a=0U+(x¯n, u¯11, d, a)T −(x¯n, u¯11, d) =d∑d′=0(da)αan,1|2(1− αn,1|2)(d−a)(d− a)f2(d, αn)af1(d, αn) + (d− a)f2(d, αn) x¯n(d)=d∑a=0U−(x¯n, u¯11, d, a)(4.31)Where U+(x¯n, u¯11, d, a) is the contribution of nodes of degree d with a cooperating neigh-bours to the increase in the cooperating population state. For each individual m that changesits state from cooperating to defecting the joint probability of an edge connecting two cooper-ating individuals decreases by ad, and there is a similar increase for defecting to cooperatingtransitions. We can, therefore, update the joint probability of neighbouring cooperatingnodes as:u¯11,n+1(d)− u¯11,n(d) = 1Md∑a=0(ad(U+(x¯n, u¯11, d, a)− U−(x¯n, u¯11, d, a)))+D∑d′=0,d′ 6=dd′∑a=0(aρ(d)d(U+(x¯n, u¯11, d′, a)− U−(x¯n, u¯11, d′, a))) (4.32)574.2. Moran process: Game Theoretic Network DynamicsThe second term accounts for the probability that a change in a node not of degree dcauses a change in the probability that a cooperating node of degree d interacts with anothercooperating node of arbitrary degree.4.2.3 Tracking Cooperation in Game theoretic NetworksIn Chapter 3, we presented a non-linear optimal filter for tracking the infected populationstate of a network. This filter cannot be directly applied to the cooperation models presentedin 4.2.2 for all updating methods because the mean field approximation and pair approxima-tions presented in these sections do not have a polynomial state evolution. In the mean fieldapproximation, the in death-Birth and Birth-death updating, the update contains a termin the denominator which depends on the state18. In the pair approximation, no matterwhich updating type is chosen (although we have only explored death-Birth in this thesis),the state does not have a polynomial evolution. As with the simpler polynomial infectedpopulation state, we are motivated to track the cooperating population state. Through thesampling mechanisms presented in Sec. 2.3, we can generate observations of the cooperatingpopulation state. The Bayesian polynomial filter of Chapter 3 cannot be applied withoutapproximation, and so we are motivated to utilize other filtering methods. We, therefore, uti-lize the suboptimal filtering method, particle filtering, to tracked the cooperating populationstate.In this thesis, we provide a brief overview of particle filtering for the nonlinear dynamics ofthe presented mean field and pair approximations to the Moran process, for a more completeexposition we refer readers to [7]. Particle filtering is a Monte Carlo algorithm for recursiveBayesian estimation of the system state, here the system state is the cooperating populationstate. In particle filtering, weighted random samples are used to approximate the posteriordensities of the state prediction and update. As the number of particles increases, this18We can however note that death-Rebirth updating has polynomial dynamics that match the polynomialdynamics of Sec. 2.1.3.584.2. Moran process: Game Theoretic Network Dynamicsapproximation converges to the true posterior densities and the approximation approachesan optimal Bayesian estimate.As in the polynomial filter, the filtering problem consists of prediction and updating.The posterior density is approximated by a set of Np particles and weights {xin,win}Np :p(xn|yn) ≈∑iwinδ(xn − xin)At each time step we sample particles from a proposal density xin ∼ pi(x0:n, Yn) and eachparticle is weighted according towin =p(xi0:n|y1:n)pi(xi0:n|Yn)and we can approximate the filtered posterior density as:p(xn|yn) ≈∑iwin∑j wjnδ(xi0:n) (4.33)The posterior state estimate can be computed directly from the filtered posterior density,and, therefore, approximated by the weighted particles:xˆn+1 = E(xn|Yn) ≈∑ixinwin∑j wjnThe proposal density is updated in real timepi(x0:n|Yn) = pi(x0)n∏t=1pi(xn|x0:n−1, Yn) (4.34)pi(xn|xn−1, Yn) = p(xn|xin−1, yn) =p(yn|xn)p(xn|xn−1)p(yn|xn−1)which in turn yields a recursion for the particle weightswin =p(xi0:n|Yn)pi(xi0:n|Yn)∝ p(yn, Yn−1, xi0:n)p(xin|xi0:n−1)pi(xin|xi0:n−1, Yn)win−1. (4.35)594.2. Moran process: Game Theoretic Network DynamicsFrom Eq. 4.35 and Eq. 4.35, we obtain the optimal update equation for the particle weights.win = p(yn|xin−1)win−1 (4.36)Since our system has nonlinear dynamics and linear measurements 2.19 and both haveGaussian noise with known covariance, there is a closed form expression p(yn|xn−1) andp(xn|xn−1, yn). First we define mean vector m and precision matrix Σ. Recall that Q denotesthe state noise covariance and R denotes the observation noise covariance. From eq. 4.18,we can express the mean field evolution of the state, xn, with death-Birth updating as thenonlinear function f(x):xn+1(d) = f(xn) = xn(d) +1M(T +(xn, d)− T −(xn, d)) .The state update can be expressed as similar nonlinear functions for the pair approximationas well as mean field approximations with other updating types. Utilizing this functionalupdate as well as the state and observation noise covariances we defineΣ−1 =Q−1 + C ′R−1Cm =Σ(Q−1f(xn−1) + C ′R−1yn)(4.37)Utilizing this mean vector and covariance we present analytic evaluations of the conditionaldensities which can be used to re -sample particles and update their respective weights:p(yn|xn−1) =N (Cf(xn−1), (Q + CRC ′))p(xn|xn−1, yn) =N (m,Σ). (4.38)The presented particle filtering algorithm does not depend on the state evolution havingpolynomial dynamics and can, therefore, be employed to filter the mean field death-Birth and604.3. SummaryBirth-death Moran processes, as well as the pair approximation to these Moran processes.This more general filtering method is not optimal for finite numbers of particles and, as such,the state estimates produced using particle filtering are inferior to the state estimates of thepolynomial Bayesian filter presented in Chapter 3 when the underlying state has polynomialdynamics. Even when the underlying system follows polynomial dynamics, it may be sensibleto employ a particle filter over the Bayesian filter. The particle filter computation time scaleswith the number of particles, whereas the polynomial filter computation time scales with thedimension of the system. Since the dimension of the proposed system scales directly withthe maximum degrees of the considered networks, for extremely large networks, it may besensible to utilize a particle filtering method to generate state estimates.4.3 SummaryThis chapter presented two extensions to the SIS model of chapter 2. The first extensionconsidered was the addition of homophily in the network, wherein individuals were delin-eated by type, and the frequency and outcome of interactions between individuals dependedupon the types of individuals involved. The second extension considered is an alternative un-derlying model, the Moran process. The Moran process is a model for network evolutionarydynamics which is considered in the context of modeling game theoretic cooperation. Thedynamics of the Moran process are approximated using both the mean field approximationconsidered in chapter 2 and the pair approximation. A particle filter is presented for trackingthe cooperating population state of a network.61Chapter 5Numerical ResultsSec. 5.1 first presents an example of an infected population state, observations of the in-fected population state, and a filtered state estimate. The parameters for this network aredetermined empirically from the Twitter dataset, as outlined explicitly in Sec. 5.4.Sec. 5.2 below examines the effect of sampling and model misspecification on the perfor-mance of the non-linear filter discussed in Sec.3.1. This analysis is useful for selecting thesampling methodology and for assessing the performance trade-off due to imperfect knowl-edge of the degree distribution of the underlying network.Sec. 5.3 presents numerical examples of filtering on networks with homophily as well asfiltering on game theoretic networks. Additional network simulations illustrate the role ofhomophily in modeling SIS dynamics and mean field approximations to the Moran process.Sec. 5.4 presents the performance analysis on a real-world Twitter dataset. First, usinga goodness-of-fit test, we validate the sufficiency of the SIS model to capture the infectionpropagation on the Twitter network. Second, the non-linear filter of Sec.3.1 is shown totrack the infection diffusion satisfactorily.5.1 Example: Tracking the Infected Population StateFollowing Scheme B of Sec. 3.2, we generated scale-free networks with M = 13000 (numberof nodes) to emulate the Twitter Social network. The infections were initialized by infectingnodes at time 0 with probability 0.01. This infection was propagated for 2× 104 timesteps19with a healing probability ξ = 0.001 and P¯12 generated empirically from Twitter data of19Recall from Theorem 1 that the duration is of the order of number of nodes.625.1. Example: Tracking the Infected Population StateSec. 5.4. At each timestep, 104 samples were obtained according to the uniform samplingdescribed in Sec.2.3. The state and observation covariance matrices Qn and Rn used infiltering were computed empirically from the true underlying state and observation data,and were estimated to be constant for all time steps n. The resulting observations andfiltered estimates are shown in Fig. 5.1, where the network was simulated, sampled andfiltered according to ‘Scheme B’ of Sec. 3.2. It can be seen that the estimates convergetowards the true state for all degrees. The mean square error of the filter estimates areshown in Fig. 5.12 and compared to the filtered estimate for the Twitter data of Sec. 5.4.The displayed mean square errors are the average of 50 independent simulations.Time step(k)0 1000 2000 3000 4000 5000 6000Degreeinfectionprobability0.050.10.150.20.250.30.35Observed infection probability d = 2Observed infection probability d = 1Observed infection probability d = 0Filtered infection probability d = 2Filtered infection probability d = 1Filtered infection probability d = 0True infection probability d= 2True infection probability d= 1True infection probability d= 0Figure 5.1: Diffusion of infected population states and their corresponding filtered estimatesin a scale-free network.635.2. Factors Affecting Filter Performance5.2 Factors Affecting Filter PerformanceAs in Sec. 3.2, we consider ER and SF networks and analyze the effect of sampling andmodel misspecification on the filter performance. To isolate these simulation tests from theapproximation of Rn, as well as state noise, we utilize Scheme A of Sec. 3.2, which simulatesa deterministic trajectory with noisy observations.5.2.1 Effect of Sampling on FilteringThe observation noise variance R in (2.19) depends on the sampling method employed. Usingthe uniform sampling mechanism outlined in Sec.2.3, the effect of sampling on filtering isillustrated.Under uniform sampling, the noise variance of each observation depends upon the networkdegree distribution. In the uniform sampling of Sec. 2.3, for any given degree, an observationat time n will have a variance which corresponds to that of a scaled binomial distributionσ2(xˆn(d)) =xn(d)(1−xn(d))ν(d), where ν(d) are the number of samples of nodes of degree d andν =∑d ν(d). For a large number of independent samples, by the central limit theorem,ν(d) ≈ νρ(d), and the observation noise variance is inversely related to the probability thata node is of degree d. Consequently, the noise covariance depends upon degree distributionparameters γ and λ. In Fig. 5.2, the mean square error of the filter estimate is seen todepend on the network parameters λ and γ.In these simulations, the observation error covariance matrices were chosen as the errorcovariance matrices from network based simulations(Scheme B). We isolate the experimentfrom finite network state noise, which can dominate the MSE of the filter estimate. Bysimulating the system with Scheme A, with observation noise informed by the syntheticnetwork simulations of Scheme B, we can observe the effect of sampling more clearly.645.2. Factors Affecting Filter PerformanceTime step(k)0 50 100 150 200 250 300Log-MeanSquareError-12-10-8-6-4-20ER parameter = 10ER parameter = 5ER parameter = 3ER parameter = 2ER parameter = 1(a) Filtered estimate log mean square error of infection population states onErdo˝s Re´nyi degree distributions.Time step(k)0 50 100 150 200 250 300Log-MeanSquareError-10-9-8-7-6-5-4-3-2-10γ = 5γ = 4γ = 3γ = 2γ = 1(b) Filtered estimate log mean square error of infection population states onscale-free degree distributions.Figure 5.2: Filtered estimate log mean square error of infection population states for twonetwork types: Erdo˝s Re´nyi and scale-free.655.3. Extended and Alternative Diffusion Models5.2.2 Sensitivity of Filter Performance to Misspecified ModelThe SIS model in Sec. 2.1.2 is misspecified if the degree distribution ρ is not known/specifiedexactly. The Bayesian filter (Sec. 3.1) implemented using a misspecified model is referred toas a misspecified filter. This section compares the MSE of a Bayesian filter and a misspecifiedfilter, both formulated for the same underlying network. The misspecified filter was derivedwith the degree distribution ρ(d) = 1D∀d. For further comparison, we use a moving average(linear) estimator:xˆn = ϑ1yn−1 + ϑ2yn−2 + ϑ3yn−3 + · · ·+ ϑιyn−ι + ϑ0. (5.1)Here xˆn is the moving average estimate at time n, the matrices ϑi are computed using mul-tivariate least squares estimation, and time delay ι was chosen to be 10, for simplicity. InFig. 5.3, the infection transition probabilities P¯12 and P¯21 were chosen uniformly at randomover [0, 1]. The observation noise covariance matrix R is a random positive definite matrixwith entries in [0, 3 × 10−2]. The initial state was simulated as x0 = 12∀d. The misspecifiedfilter exhibits a plateau in performance. This plateau corresponds to the incorrect computa-tion of the priori state estimate, xˆ−n , due to the misspecified filter parameters. It is observedin Fig. 5.3 that, even when the degree distribution is misspecified, the Bayesian filters out-perform the moving average filter with an MSE of the order of 10−6, compared to 10−4 ofthe moving average filter.5.3 Extended and Alternative Diffusion Models5.3.1 Numerical Exploration of HomophilyIn this section, we illustrate the performance of the filter on a social network having ho-mophily as an additional attribute. The network was simulated according to Scheme B ofSec. 3.2 with two population types, b and c, as shown in Fig. 5.4. These populations interact665.3. Extended and Alternative Diffusion ModelsTime step(k)0 200 400 600 800 1000Log-MeanSquareError-11-10-9-8-7-6-5-4-3-2-1Mis-specified filterBayesian optimal filterMoving average estimatorFigure 5.3: Comparison of the estimate log mean square error between a Bayesian filter, amisspecified Bayesian filter and a moving average estimator.more with members of their own type than those of the other type, specifically ςbb = ςcc = 0.8.To differentiate the populations, we choose the infection probabilities for population b to befour times20 that of population c , that is P¯ b21 = 4P¯c21. The recovery probabilities and scale-free degree distributions are the same for both populations. In Fig. 5.4, we plot the infectedpopulation state, observed state and filtered state estimate for both population types, us-ing degree d = 2 as an example. The network was simulated according to ‘Scheme B’ ofSec. 3.2 and each node was labeled as type b or c with equal probability. The network wasthen sampled and filtered according to ‘Scheme B’ of Sec. 3.2. The filtered estimate of theinfected population state converges to the true infected population state. The two different20Population b is therefore more resistant to infection.675.3. Extended and Alternative Diffusion Modelspopulation types can be observed to behave differently as the low infection probabilities ofthe type c population and insular network structure, cause a low infected population statein nodes of c.Time step(k)0 2000 4000 6000 8000 10000Degreeinfectionprobability0.050.10.150.20.25Observed infection probability d = 2, type= bObserved infection probability d = 2, type= cFiltered infection probability d = 2, type= bFiltered infection probability d = 2, type= cTrue infection probability d = 2, type= bTrue infection probability d = 2, type= cFigure 5.4: Diffusion of infected population states and their corresponding filtered estimatesin a scale-free network.The SIS model with homophily also enables the use of additional degree distributioninformation to generate a more accurate mean field description of a system. In a network thedegree distribution of nodes attached to each degree d may not equal the degree distributionof the entire network. For example, solitary people may be more likely to also associatewith other solitary people, resulting in high level of internal connections between all nodesof low degree. Thus the degree distribution of low degree nodes may differ from the degreedistribution of high degree nodes. These degree-degree correlations are not accounted for in685.3. Extended and Alternative Diffusion Modelsthe SIS model presented in Sec. 2.2, however, we can use the SIS model with homophily toincorporate this information. To model additional degree information, we map each degreein the network to a type b, so that the number of types B equals the maximum degree in thenetwork plus one21 D+ 1, and allowing each type to have its own degree distribution ρ[b](d),which may be different from the network’s degree distribution ρ.In Fig. 5.5, we illustrate the effect additional degree distribution information can haveon the dynamics of a network by simulating and comparing the deterministic mean fieldtrajectories for two SIS models, one with homophily and one without. We consider a scale-free network segregated by degree, where individuals with d neighbours associate primarilywith others having d neighbours. The network has an overall degree distribution ρ(d) ∝ d−γ,where γ = 3.0, and maximum degree D = 20. The degree distribution of each type b ∈{1, . . . ,B} is given by:ρ[b](d) = (1− )I(d = b− 1) + ρ(d). (5.2)where I is an indicator function. In Fig. 5.5, the infection probabilities, maximum degree,and overall degree distributions are identical for both systems. The degree distributionsused in both simulations are scale-free degree distributions, ρ, with γ = 3.0, and the degreespecific degree distributions, ρ[b](d), in the simulation with homophily is given by (5.2) with = 0.02. The additional degree distribution information incorporated into the model withhomophily changes the system dynamics, and a difference between the models with andwithout homophily can be observed. The key observation here is that under a simple SISmodel as presented in Sec. 2.2, both these systems are identical, however, we can utilizeadditional degree distribution information and the SIS model with homophily to bettercapture the mean field dynamics of the population.21There are D + 1 types because degree d = 0 is also modeled.695.3. Extended and Alternative Diffusion ModelsTime step(k)0 2000 4000 6000 8000 10000DegreeInfectionProbability00.10.20.30.40.50.60.70.80.9Infection dynamics, no homophily, d=1Infection dynamics, no homophily, d=2Infection dynamics, no homophily, d=3Infection dynamics, with homophily, d=1Infection dynamics, with homophily, d=2Infection dynamics, with homophily, d=3Figure 5.5: Deterministic mean field trajectories of an SIS model with and without homophilyare compared.5.3.2 Numerical Exploration of Game Theoretic DynamicsIn this section, we investigate game theoretic network dynamics. The networks structuresinvestigated were generated with N = 10, 000 and were Erdo˝s Re´nyi networks with parameterλ = 2.0. We observed the dynamics of the prisoner’s dilemma with payoffs [a = 4, b = 0, c =5, d = 2] and death-Birth updating. In Fig. 5.6, we plot the cooperating population state,observed state and filtered state of degree d = 2 as an example. The filtered state estimateis generated by the particle filter described in Sec. 4.2.3 using 1000 particles and is seento track the true cooperation state. In Fig. 5.7 and Fig. 5.8, we plot the mean field stateand simulated state under strong (w = 0.9) and weak selection (w = 0.01), respectively.The mean field approximation is fairly accurate under strong selection, however, it fails to705.4. Analysis and Validation on Twitter Datasetaccurately describe the population state under weak selection. Under weak selection, thecooperating population state does not decay towards 0 as predicted by mean field dynamics.Instead, local structures can promote cooperation as predicted in regular networks [56].Time step(k)0 100 200 300 400 500 600 700 800 900 1000Degreecooperationprobability0.060.070.080.090.10.110.120.13Observed cooperation probability d = 1Observed cooperation probability d = 2Observed cooperation probability d = 3Filtered cooperation probability d = 1Filtered cooperation probability d = 2Filtered cooperation probability d = 3True cooperation probability d = 1True cooperation probability d = 2True cooperation probability d = 3Figure 5.6: Cooperating population state of an Erdo˝s Re´nyi network, as well as state obser-vations, and particle filter state estimate.5.4 Analysis and Validation on Twitter DatasetThis section illustrates the tracking of infection diffusion on social networks using real dif-fusion data from the microblog platform Twitter. Twitter is a social media platform overwhich users can communicate in short < 140 character messages, images and video files. Weanalyze the diffusion of information through the Twitter Social network to demonstrate theeffectiveness of the SIS model of Sec. 2.2.Twitter played a critical role in the Egyptian revolution of 2011 or January 25th (#Jan25)715.4. Analysis and Validation on Twitter DatasetTime step(k)×10 40 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5DegreeCooperationProbability00.020.040.060.080.10.12True cooperation state, d=1True cooperation state, d=2True cooperation state, d=3Mean field cooperation state, d=1Mean field cooperation state, d=2Mean field cooperation state, d=3True cooperation state (averaged), d=1True cooperation state (averaged), d=2True cooperation state (averaged), d=3Figure 5.7: Simulated and mean field approximation of cooperating population state understrong selection w = 0.9.uprising [70]. Twitter was used by protesters to organize the protest and recruit membersand as a medium to discuss and share information about the protest. Below, we refer tothe interest and engagement with the news of the uprising as infection and track the dis-tribution of infection over time. Modeling this online process as an epidemic is supportedby the virality of the #Jan25 hashtag. Viral online information is found to follow complexcontagion models in [84].5.4.1 DatasetThe dataset consists of tweets sampled between January 23rd and February 8th, 2011 andare available from Twitter (http://trec.nist.gov/data/tweets/). The tweet collection period725.4. Analysis and Validation on Twitter DatasetTime step(k)×10 40 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5DegreeCooperationProbability0.080.0850.090.0950.10.1050.110.1150.120.1250.13True cooperation state, d=1True cooperation state, d=2True cooperation state, d=3Mean field cooperation state, d=1Mean field cooperation state, d=2Mean field cooperation state, d=3True cooperation state (averaged), d=1True cooperation state (averaged), d=2True cooperation state (averaged), d=3Figure 5.8: Simulated and mean field approximation of cooperating population state underweak selection w = 0.01.encapsulates the time-frame of the first major developments relating to the January 25thuprising event. In Twitter, a “hashtag” follows the discussion topics, i.e., a word or a phraseprefixed with the number sign #. We make use of the hashtags to track the spreading of aspecific topic on Twitter. The most used hashtag related to this protest is “#Jan25”. Toobtain the information spreading among users participating in this protest, we filtered 26,313tweets containing “#Jan25” published by 13,341 different users, from around 10 milliontweets. These tweets contain the event of interest and the social network is (re-)constructedfrom them as follows: two users are connected if one user has mentioned another user(“@username”) in the tweet containing “#Jan25” at least once over the duration of interest.735.4. Analysis and Validation on Twitter DatasetOn this constructed social network, information diffusion is analysed.All users in the constructed social network are assumed to be susceptible initially. Userswho initiate tweets on the event of interest are assumed to be infected and act as seeds forthe spread of information. Once a user, say User#A becomes infected, it has some constantprobability, say ξ, of becoming susceptible in each time period. This modeling assumption ismotivated by the frequently observed Poisson-like decay of an individual’s interest in socialmedia topics [23]. The decay probability was chosen to be 0.001, motivated heuristically byan average interest duration of 2 hours. Our dataset is thus a hybrid dataset, wherein thenetwork and infection process is informed by true Tweets between users, and these infectedindividuals become susceptible according the synthetic Poisson-like process described above.The effect the choice of ξ has on the infection dynamics can be seen in Fig. 5.9. The decayparameter does affect the twitter infection trajectory, however, for decay parameters thatcorrespond to an average forgetting timescale on the order of hours, the trajectories appearsimilar.745.4. Analysis and Validation on Twitter DatasetTime step(k)×10 40 0.5 1 1.5 2Degreeinfectionprobability0.10.20.30.40.50.60.70.8x(d): d = 2, ξ = 0.01x(d): d = 1, ξ = 0.01x(d): d = 0, ξ = 0.01x(d): d = 2, ξ = 0.001x(d): d = 1, ξ = 0.001x(d): d = 0, ξ = 0.001x(d): d = 2, ξ = 0.0001x(d): d = 1, ξ = 0.0001x(d): d = 0, ξ = 0.0001Figure 5.9: Comparison of infected population state of d = 1, 2, 3 for varying decay parameterξ.5.4.2 SIS model for Twitter dataIn case of the spread of engagement in Twitter, individuals can either be inactive or activedepending on if they are willing and able to spread information and interest on a topic. Activeusers can be considered ‘infected’, and inactive users can become ‘infected’ by interactingwith other ‘infected’ individuals, in particular, any of its active neighbors in a social network.In this way, engagement and knowledge of a topic spreads throughout the network. Peoplecan also become disinterested in a subject they have already been exposed to, in this way theyare not currently engaged, but may become engaged if contacted by an infected neighbor;755.4. Analysis and Validation on Twitter Datasetthus inactive individuals are assumed to be susceptible. The SIS model has been applied toinformation diffusion in [51], [46].We adopt the SIS model for the Twitter data, mapping engagement in the January 25th, 2011uprising as an infection spreading over the Twitter network created from Twitter users.Below we analyze the goodness-of-fit of the SIS model for the Twitter data using theKolmogorov-Smirnov test [24], and measure the square and absolute difference between thedata and model predictions.Model Evaluation (Goodness of Fit for SIS model): We used the KS test on the empiricalinfected degree distribution at the final timepoint to evaluate the goodness of fit of the SISmodel. The KS test statistic was 0.2286 with p-value 0.2813. The null hypothesis for thisstatistical test is that both the observed Twitter and SIS model infected degree distributionsare samples of the same infected degree distribution. At a confidence level of 0.01, the nullhypothesis cannot be rejected. This test therefore provides little evidence for or againstthe quality of SIS model for Twitter. To further explore the agreement between the modeland the true data, we computed the average and maximum square difference between theTwitter data and predicted SIS degree infection probabilities. These values can be seen inTable 5.1 and the trajectories are shown in Fig. 5.10. The underlying degree distribution ofthe Twitter network is scale-free, and thus degree and number of nodes of that degree areinversely related. In Fig. 5.10, the simulated trajectories use ξ = 0.001 and the empiricallygenerated ρ and ˆ¯P12 (5.3). Recall that without added state noise, or finite networks, thesimulations for the mean field dynamics are deterministic. The deterministic mean fielddynamics effectively capture the trajectory of the Twitter infection. Empirically, we observethat the fewer nodes there are of a particular degree, the higher the probability that theempirical infection deviates from the asymptotic case.The low magnitude of the model deviations in Table 5.1 for the Twitter dataset and the fail-ure to reject the hypothesis that the Twitter data and model infected degree distributionscome from the same distribution, suggest that the SIS model is a satisfactory model with765.4. Analysis and Validation on Twitter DatasetTable 5.1: Goodness of fit for the SIS model: The average and maximum deviations betweenthe Twitter data and SIS model predictions are presented.Degree 1 2 3+Average SquareDifference0.0011 0.0014 0.0235AverageAbsoluteDifference0.0273 0.0294 0.1000MaximumAbsoluteDifference0.0644 0.0719 0.8403respect to the infection dynamics of interest in the January 25th uprising.Sampling for tracking the infected population stateThe mean field dynamics for the SIS model can be used to track and predict the evolutionof the infection on Twitter. We must generate estimates of P¯12, P¯21, and determine thedegree distribution from samples obtained from (2.19). P¯12 is given by ξ, since all infectednodes become susceptible with probability ξ at each time point. We compute the empiricaltransmission rates ˆ¯P21 directly by observing the frequency with which an infected individualwith d neighbors, a of which are infected, becomes infected.ˆ¯P12(d, a) =T∑n=0M∑m=1I(s(m)n+1 = 2|s(m)n = 1,Ξ(m) = d, F (m)n = a)T∑n=0M∑m=1I(s(m)n = 1,Ξ(m) = d, F(m)n = a) (5.3)The degree distribution used is the empirical degree distribution. The true degree infectionprobabilities are computed directly from the entire network for each 1 minute time interval.Next, we sample the data using the RDS sampling scheme described in Sec. 2.3, every1 minute and track the infected population states over time using the non-linear BayesianFiltering technique described in Sec. 3.1. The parameters used in the Bayesian filter areξ = 0.001, the empirically generated ρ and ˆ¯P12. The filter estimates are shown in Fig. 5.11. It775.4. Analysis and Validation on Twitter DatasetTime step(k)×10 40 0.5 1 1.5 2DegreeInfectionProbability00.050.10.150.20.250.30.350.40.45Twitter Infection, d=0Twitter Infection, d=1Twitter Infection, d=2Twitter Infection, d=3Simulated Infection, d=0Simulated Infection, d=1Simulated Infection, d=2Simulated Infection, d=3Figure 5.10: The true Twitter infection and Twitter infection as predicted by the determin-istic mean field dynamical equations are compared for nodes of degree d = 1, 2, 3.785.5. Summaryis seen that the filtered estimates track the true infected population states over time. Fig. 5.12shows the mean square errors of the filter estimate22 of Twitter infection and the mean squareerrors of the filter estimate of the numerical example shown in Fig. 5.1. This figure illustratesthe superiority of the filter on the numerical example and highlights the shortcomings of theTwitter dataset. The filter performs adequately for both the Twitter data and the simulatednetwork, however the estimate for the simulated network, the infection of which followsthe SIS model, performs dramatically better at a filtered estimate MSE of 10−11 versus thefiltered estimate of the Twitter data around 10−6.5.5 SummaryIn this Chapter, we numerically illustrated the performance of the filtering algorithms pre-sented in Chapters 3 and 4 on scale-free and Erdo˝s Re´nyi networks. We investigated thesensitivity of the non-linear filtering algorithm to errors in model specification. Further, themodel with homohphily was numerically investigated and estimates of simulated infectedpopulation states were generated using the non-linear filtering algorithm of Chapter 3, andthe game theoretic model based on the Moran process was investigated and filtered usingthe particle filter presented in 4. We then introduced, motivated and analyzed a real worlddataset from Twitter. This dataset consists of a corpus of tweets taken during the ArabSpring uprising of 2011. The non-linear filtering algorithm was applied to this dataset andshown to satisfactorily estimate the infected population state.22The MSE of the Twitter estimate starts extremely low. During early timesteps in the Twitter data,there are almost no infected nodes. The sampled observations similarly estimate nearly 0 infection, whichis why the filtered infection appears so accurate at these early timesteps. Thus, as the system dynamicsevolves from the trivial 0 infection, the state observations become less accurate.795.5. SummaryTime step(k)×10 40 0.5 1 1.5 2DegreeInfectionProbability00.050.10.150.20.250.30.350.40.45Observed Infection, d=1Observed Infection, d=2Observed Infection, d=3Filtered Estimate, d=1Filtered Estimate, d=2Filtered Estimate, d=3Twitter Infection, d=1Twitter Infection, d=2Twitter Infection, d=3Figure 5.11: The true Twitter infection and the filter estimates of the Twitter infectedpopulation state are compared.805.5. SummaryTime step(k)×10 40 0.5 1 1.5 2Log-MeanSquareError-14-12-10-8-6-4-20Bayesian filter - Simulated NetworkBayesian filter - Twitter DataMoving average filter - Twitter DataFigure 5.12: The MSE of the filtered estimate of the true twitter infection probability stateand the MSE of the estimate of the infected population state of a simulated network areshown.81Chapter 6Conclusions and Future WorkIn this thesis, we developed a method for tracking infections, information and opinionsas they diffuse through social networks. By utilizing statistical signal processing filteringalgorithms, we are able to combine domain knowledge of the underlying infection processand observations to generate accurate estimates of the infected population state. Trackingdiffusion in social networks is applicable to the commercialization, political analysis, andfundamental understanding of large online social networks as well as policy making in publichealth through deepening our epidemiological understanding of the evolution of a disease.The second chapter presented a formulation of infection diffusion dynamics over socialnetworks, a mean field approximation to this complex process, and finally sampling methodsfor generating observations of the underlying infected population state. Two types of socialnetwork structures were primarily considered: Erdo˝s-Re´nyi and scale-free networks. Aninfection diffusion model was developed based on the SIS model. We presented a generaldescription of mean field theory and its ability to approximate complex interacting systems.Mean field dynamics for the SIS model were then formally presented, including a theoremwhich states that the difference between the underlying population state and the mean fieldapproximation satisfies an exponential bound, and therefore the mean field dynamics can beutilized as a model which can generate the observable infection dynamics. Under discretetime mean field dynamics, the state at time n+ 1 depends polynomially on the state at timen. These polynomial dynamics are exploited by the optimal Bayesian filter of Chapter 3.The third chapter describes an optimal non-linear Bayesian filter for estimating the in-fected population state. First the polynomial mean field dynamics are presented so that82Chapter 6. Conclusions and Future Workthe polynomial dependence is explicit. A non-linear Bayesian filter is then described foroptimally estimating the infected population state of the social network. The fundamentallimits of filtering the mean field dynamics are then explored through the computation of thePCRLB. The PCRLB is computed for Erdo˝s-Re´nyi and scale-free networks.In the fourth chapter we incorporate homophily into to the SIS model of chapter 2 andpresent a game theoretic contagion model. The homophily extension improves the generalityand expressive power of the underlying SIS model. The model with homophily allows thepopulation to be stratified by type, permits these different types to interact with differentfrequencies, and enables the use of different infection transmission probabilities based ontype. The model with homophily does this without sacrificing any of the polynomial prop-erties of the state evolution that make the model amenable to optimal Bayesian filtering.We also present a game theoretic contagion model which uses the Moran process as theunderlying model for infection dynamics. This model, which is similar to the SIS model,allows for interactions to be game theoretic. Under the mean field and pair approximationswe develop in this chapter, this Moran model introduces a dependence of the infection trans-mission probabilities on the infected/cooperating population state of the network. Finally,in this chapter we present a particle filter for tracking the infected/cooperating populationstate under when the underlying infection follows this Moran model.The fifth chapter presented numerical results substantiating the infection diffusion modeland filtering algorithms presented in earlier chapters. In this section, first, the effect of net-work structure on the filtered infected population state estimate as investigated. Next, theeffect of model misspecification on the mean square error of the filter estimate was inves-tigated. Numerical examples of the influence of homophily as well as examples of filteringgame theoretic contagion using a particle filter are presented. The non-linear filtering algo-rithm is then applied to a dataset based on real world tweets from the social media websiteTwitter. The filtering algorithm is shown to perform satisfactorily. Finally, a two timescaleco-evolving network is simulated and the underlying degree distribution and infected popu-83Chapter 6. Conclusions and Future Worklation state are estimated.In future work, more work could be done to extend the analysis of scale-free and Erdo˝s-Re´nyi networks. Currently, we have begun to explore the effect that network structure has onour ability to estimate the infected population state on these networks, however, simulationsof larger networks may provide useful information to future researchers wishing to trackdiffusion on networks of known network type. Further, we considered two sampling typesin this thesis, RDS sampling and simple random sampling. These sampling methods canbe further assessed in the context of large scale-free and Erdo˝s-Re´nyi networks, with andwithout homophily, to better understand the effect of the sampling type has on the filtergenerated state estimates.In this thesis, the networks and infection parameters we analyzed and investigated werestatic. The infection transmission probabilities P¯12/P¯21 as well as diffusion parameter θ wereindependent of time n in all numerical investigations, however, the models presented wereamenable to dynamics parameters, so long as the dynamics of these parameters is knownapriori. The effect of time varying infection parameters can be explored numerically, todetermine the effect these parameters have on the mean square error of the filter estimate.A dynamic diffusion parameter is especially relevant to modeling real world interactions,because human contact processes rarely occur at a uniform rate. We may, for instance,tweet more in the morning and evenings, or engage with more people on weekends. Further,future work could be done to extend this work to filtering on evolving networks. In [48],two-timescale network dynamics are outlined, as well as a Hidden-Markov-Model (HMM)filter for tracking evolving network structure. This two-timescale model requires the strongassumption that network dynamics are much slower than infection dynamics, which does notalways hold. Analytic and numerical investigations of more complex co-evolving networkscould prove enlightening and extend the set of networks on which the infected populationstate can be adequately filtered.The game theoretic contagions provide a rich arena for exploration. In this thesis, the84Chapter 6. Conclusions and Future Workprisoner’s dilemma was presented as the persistent example of the game underlying theMoran process, but other two-player or even multi-player games can be considered. Gamessuch as the Hawk-Dove game and Stag-hunt game both provide dramatically different coop-eration dynamics. Further, the network-based mean field approximations can be analyzedto explore the effect network structure has on the equilibrium states of the various games.Finally, the filtering model can be applied to more real world data to entrench its effi-cacy. In this thesis, we analyzed a dataset from Twitter and showed that the polynomialfilter performed adequately, however, it was observed that the polynomial filter performedsignificantly better at filtering the infected population state of synthetic network contagions.This discrepancy may be investigated further both to establish that the real world ”infec-tions” are well modeled by the SIS model, as well as utilizing a larger share of the tools atthe modeler’s disposal, such as dynamic diffusion parameters, infection transmission proba-bilities, game theoretic contagion models and models with homophily to better model andultimately filter the infection process.85Bibliography[1] Facebook, Inc. (2016, Oct. 10). Company information: Stats [Online]. Available:https://newsroom.fb.com/company-info/[2] L. A. Adamic and B. A. Huberman, “Power-law distribution of the world wide web,”Science, vol. 287, no. 5461, pp. 2115–2115, 2000.[3] N. K. Ahmed,J. Neville, and R. Kompellaa, “Network sampling: From static to stream-ing graphs,” ACM Transactions on Knowledge Discovery from Data (TKDD), vol. 8,no. 2, pp. 7, 2014.[4] L. Allen, “Some discrete-time SI, SIR, and SIS epidemic models,” Mathematical Bio-sciences, vol. 124, no. 1, pp. 83 - 105, 1994.[5] J. Alstott, E. Bullmore, and D. Plenz, “powerlaw: a Python package for analysis ofheavy-tailed distributions,” PloS One, vol. 9, no. 1, pp. e85777, 2014.[6] T. W. Anderson, “Multivariate statistical analysis,” Wiley and Sons, 1984.[7] M. Arulampalam, S. Maskell, N. Gordon, and T. Clapp “A tutorial on particle filtersfor online nonlinear/non-Gaussian Bayesian tracking,” IEEE Transactions on signalprocessing, vol. 50, no. 2, pp. 174-188, 2002.[8] A. Baraba´si, “Scale-free networks: a decade and beyond,” Science, vol. 325, no. 5939,pp. 412–413, 2009.[9] A. Baraba´si and R. Albert, “Emergence of scaling in random networks,” Science,vol. 286, no. 5439, pp. 509–512, 1999.86Bibliography[10] A. Baraba´si, R. Albert, and H. Jeong, “Scale-free characteristics of random networks:The topology of the world-wide web,” Physica A: Statistical Mechanics and its Appli-cations, vol. 281, no. 1, pp. 69–77, 2000.[11] R. Albert and A. Baraba´si, “Statistical mechanics of complex networks,” Reviews ofModern Physics, vol. 74, pp. 47–97, 2002.[12] M. Benaim and J. Weibull, “Deterministic approximation of stochastic evolution ingames,” Econometrica, vol. 71, no. 3, pp. 873–903, 2003.[13] J. Benoit, A. Nunes, and M. Telo da Gama, “Pair approximation models for diseasespread,” The European Physical Journal B-Condensed Matter and Complex Systems,vol. 50, no. 1, pp. 177–181, 2006.[14] S. Boucheron, G. Lugosi, and P. Massart, “Concentration inequalities: A nonasymptotictheory of independence,” Oxford university press, 2013.[15] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins,and J. Wiener, “Graph structure in the web,” Computer Networks, vol. 33, no. 1, pp.309–320, 2000.[16] P. J. Carrington, J. Scott, and S. Wasserman, “Models and methods in social networkanalysis,” Cambridge University Press, vol. 28, 2005.[17] C. Castillo-Chavez and A. Yakubu, “Discrete-time SIS models with complex dynamics,”Nonlinear Analysis: Theory, Methods and Applications, vol. 47, no. 7, pp. 4753–4762,2001.[18] C. Chamley, “Rational Herds: Economic models of social learning,” Cambridge Uni-versity Press, 2004.[19] N. Chen, “On the approximability of influence in social networks,” SIAM Journal onDiscrete Mathematics, vol. 23, no. 3, pp. 1400–1415, 2009.87Bibliography[20] S. Chen, R. Varma, A. Sandryhaila, and J. Kovacˇevic´, “Discrete signal processing ongraphs: Sampling theory,” IEEE Transactions on Signal Processing, vol. 63, no. 24, pp.6510–6523, 2015.[21] F. Chung and L. Lu, “Complex Graphs and Networks,” American Mathematical Society,vol. 107, 2006.[22] A. Clauset, C. R. Shalizi, and M. E. Newman, “Power-law distributions in empiricaldata,” SIAM Review, vol. 51, no. 4, pp. 661–703, 2009.[23] R. Crane and D. Sornette, “Robust dynamic classes revealed by measuring the responsefunction of a social system,” roceedings of the National Academy of Sciences, vol. 105,no. 41, pp. 15649–15653, 2008.[24] D. Darling, “The Kolmogorov-Smirnov, Cramer-von Mises tests,” The Annals of Math-ematical Statistics, vol. 28, no. 4, pp. 823 –838, 1957.[25] G. Das, N. Koudas, M. Papagelis, and S. Puttaswamy, “Efficient sampling of informationin social networks,” Proceedings of the ACM workshop on search in Social Media, pp.67–74, 2008.[26] V. Dukic, H.F. Lopes, and N.G. Polson. ”Tracking epidemics with Google flu trendsdata and a state-space SEIR model.” Journal of the American Statistical Association,vol. 107, no. 500, pp. 1410-1426, 2012.[27] D. Easley and J. Kleinberg, “Networks, crowds, and markets: Reasoning about a highlyconnected world,” Cambridge University Press, 2010.[28] R. J. Elliott, L. Aggoun, and J. B. Moore, “Hidden Markov models: estimation andcontrol,” Springer International Publishing, vol. 29, 2008.88Bibliography[29] K. Fang and R. Li, “Some methods for generating both an NT-net and the uniformdistribution on a Stiefel manifold and their applications,” Computational Statistics &Data Analysis, vol. 24, no. 1, pp. 29–46 1997.[30] O. Frank, “Network sampling and model fitting,” Models and methods in social networkanalysis, pp. 31–56, 2005.[31] P. Geroski, “Models of technology diffusion,” Research policy, vol. 29, no. 4, pp. 603–625,2000.[32] G. Ghoshal, L. Chi, and A. L. Baraba´si, “Uncovering the role of elementary processesin network evolution,” Scientific Reports, vol. 3, pp. 2920, 2013.[33] S. Goel and J. Salganik, “Respondent-driven sampling as Markov chain Monte Carlo,”Statistics in Medicine, vol. 28, no. 17, pp. 2202–2229, 2009.[34] M. Granovetter, “Threshold models of collective behavior,” American Journal of Soci-ology, pp. 1420–1443, 1978.[35] M. Granovetter, “Network sampling: Some first steps,” American Journal of Sociology,pp. 1287–1303, 1976.[36] M. Hamdi, V. Krishnamurthy, and G. Yin, “Tracking a Markov-modulated stationarydegree distribution of a dynamic random graph,” IEEE Transactions on InformationTheory, vol. 60, no. 10, pp. 6609–6625, 2014.[37] J. D. Hamilton, “Time series analysis,” Princeton University Press, vol. 2, 1994.[38] D. D. Heckathorn, “Respondent-driven sampling: A new approach to the study ofhidden populations,” Social Problems, vol. 44, no. 2, pp. 174–199, 1997.[39] D. D. Heckathorn, “Respondent-driven sampling II: Deriving valid population estimatesfrom chain-referral samples of hidden populations,” Social Problems, vol. 49, no. 1, pp.11-34, 2002.89Bibliography[40] M. Herna´ndez-Gonza´lez and M. Basin, “Discrete-time filtering for nonlinear polynomialsystems over linear observations,” International Journal of Systems Science, vol. 45,no. 7, pp. 1461–1472, 2014.[41] H. Hethcote, “The mathematics of infectious diseases,” SIAM Review, vol. 42, no. 4,pp. 599–653, 2000.[42] M. O. Jackson, “Social and Economic networks,” Princeton University Press, vol. 3,2008.[43] M. O. Jackson and B. W. Rogers, “Relating network structure to diffusion propertiesthrough stochastic dominance,” The BE Journal of Theoretical Economics, vol. 7, no. 1,2007.[44] M. O. Jackson and L. Yariv, “Diffusion of behavior and equilibrium properties in net-work games,” The American Economic Review, vol. 97, no. 2, pp. 92–98, 2007.[45] A. Java, X. Song, T. Finin, and B. Tseng, “Why we Twitter: understanding microblog-ging usage and communities,” Proceedings of the 9th WebKDD and 1st SNA-KDD work-shop on Web Mining and Social Network Analysis, pp. 56–65, 2007.[46] F. Jin, E. Dougherty, P. Saraf, Y. Cao, and N. Ramakrishnan, “Epidemiological model-ing of news and rumors on Twitter,” Proceedings of the 7th Workshop on Social NetworkMining and Analysis, pp. 8, 2013.[47] V. Krishnamurthy, “Partially Observed Markov Decision Processes,” Cambridge Uni-versity Press, 2016.[48] V. Krishnamurthy, S. Bhatt, and T. Pedersen, ”Tracking Infection Diffusion in SocialNetworks: Filtering Algorithms and Threshold Bounds”, IEEE Transactionson Signaland Information Processing over Networks, vol. 3, no. 2, 2017.90Bibliography[49] V. Krishnamurthy, O. N. Gharehshiran, and M. Hamdi, “Interactive Sensing and Deci-sion Making in Social Networks,” Foundations and Trends in Signal Processing, vol. 7,no. 1, pp. 1–196, 2014.[50] H. Kushner, “Weak convergence methods and singularly perturbed stochastic controland filtering problems,” Springer Science & Business Media, 2012.[51] H. Kwak, C. Lee, H. Park, and S. Moon, “What is Twitter, a social network or anews media?,” Proceedings of the 19th international conference on World wide web, pp.591–600, 2010.[52] S. Lee, “Understanding respondent driven sampling from a total survey error perspec-tive,” Survey Practice, vol. 2, no. 6, 2013.[53] J. Leskovec, D. Chakrabarti, J. Kleinberg, and C. Faloutsos, “Realistic, mathematicallytractable graph generation and evolution, using Kronecker multiplication,” EuropeanConference on Principles of Data Mining and Knowledge Discovery, pp. 133–145, 2005.[54] J. Leskovec and C. Faloutsos, “Sampling from large graphs,” Proceedings of the 12thACM SIGKDD international conference on Knowledge Discovery and Data Mining, pp.631–636, 2006.[55] J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst, “Patterns of cas-cading behavior in large blog graphs,” In Proceedings of the 2007 SIAM internationalconference on data mining, pp. 551–556, 2007.[56] E. Lieberman, C. Hauert, and M. A. Nowak, “Evolutionary Dynamics on Graphs,”nature, vol. 433, no. 7023, pp. 312–316, 2005.[57] D. Lo´pez-Pintado, “An Overview of Diffusion in Complex Networks,” Springer Inter-national Publishing, 2016.91Bibliography[58] D. Lo´pez-Pintado, “Contagion and coordination in random networks,” InternationalJournal of Game Theory, vol. 34, no. 3, pp. 371–381, 2006.[59] D. Lo´pez-Pintado, “Diffusion in complex social networks,” Games and EconomicBehavior, vol. 62, no. 2, pp. 573–590, 2008.[60] G. Lotan,E. Graeff, M. Ananny, D. Gaffney, and I. Pearce, “The Arab Spring therevolutions were tweeted: Information flows during the 2011 Tunisian and Egyptianrevolutions,” International Journal of Communication, vol. 5, no. 31, 2011.[61] K. V. Mardia and C. G. Khatri, “Uniform distribution on a Stiefel manifold,” Journalof Multivariate Analysis, vol. 7, no. 3, pp. 468–473 1977.[62] P. L. Odell and A. H. Feiveson, “A numerical procedure to generate a samplecovariance matrix,” Pearson Education India, vol. 61, no. 313, pp. 199–203, 1966.[63] M. McPherson, L. Smith-Lovin, and J. M. Cook, “Birds of a feather: Homophily insocial networks,” Annual Review of Sociology, pp. 415–444, 2001.[64] A. Mata, R. Ferreira, and S. Ferreira, “Heterogeneous pair-approximation for thecontact process on complex networks,” New Journal of Physics, vol. 16, no. 5, 2014.[65] E. Mossel and S. Roch, “On the submodularity of influence in social networks,”Proceedings of the thirty-ninth annual ACM symposium on Theory of Computing, pp.128–134, 2007.[66] A. Mu¨ller and D. Stoyan, “Comparison Methods for Stochastic Models and Risks,”Wiley, vol. 389, 2002.[67] M. A. Nowak, C. Tarnita, and T. Antal “Evolutionary dynamics in structuredpopulations,” Philosophical Transactions of the Royal Society of London B: BiologicalSciences, vol. 365, no. 1537, pp. 19-30, 2010.92Bibliography[68] H. Ohtsuki, C. Hauert, E. Lieberman, and M. A. Nowak, “A simple rule for theevolution of cooperation on graphs and social networks,” Nature, vol. 441, no. 7092,pp. 502-505, 2006.[69] R. Olfati-Saber, “Distributed Kalman filtering for sensor networks,” 46th IEEEConference on Decision and Control, pp. 5492–5498, 2007.[70] Z. Papacharissi and M. de F. Oliveira, “Affective news and networked publics: Therhythms of news storytelling on #Egypt,” Journal of Communication, vol. 62, no. 2,pp. 266–282, 2012.[71] R. Pastor-Satorras and A. Vespignani, “Epidemic spreading in scale-free networks,”Physical Review Letters, vol. 86, no. 14, pp. 3200, 2001.[72] L. Perez and S. Dragicevic, “An agent-based approach for modeling dynamics ofcontagious disease spread,” International journal of health geographics, vol. 8, no. 1,pp.50, 2009.[73] J. Poncela, J. Go´mez-Gardenes, J. Moreno, and Y. M. Flor´ıa, “Consolidatingbirth-death and death-birth processes in structured populations,” InternationalJournal of Bifurcation and Chaos, vol. 20, no. 3, pp. 849-857, 2010.[74] M. A. Porter, J. P. Gleeson, “Dynamical systems on networks: A tutorial,” Springer,2016.[75] L. Shi, “Kalman filtering over graphs: Theory and applications,” IEEE Transactionson Automatic Control, vol. 54, no. 9, pp. 2230–2234, 2009.[76] H. A. Simon, “On a class of skew distribution functions,” Biometrika, vol. 42, no. 3,pp. 425–440, 1955.[77] G. Szabo´ and G. Fath, “Evolutionary games on graphs,” Physics reports, vol. 446,no. 4, pp. 97–216, 2007.93[78] M. Taylor, “Exact and approximate epidemic models on networks: theory andapplications,” University of Sussex, 2013.[79] P. Tichavsky, C. Muravchik, and A. Nehorai, “Posterior Crame´r-Rao bounds fordiscrete-time nonlinear filtering,” IEEE Transactions on Signal Processing, vol. 46,no. 5, pp. 1386–1396, 1998.[80] D. Topkis, “Supermodularity and Complementarity,” Princeton University Press, 1998.[81] H. L. Van Trees, “Detection, estimation, and modulation theory,” John Wiley & Sons,2004.[82] F. Vega-Redondo, “Complex social networks,” Cambridge University Press, no. 44,2007.[83] Z. Wang, W. Zhang, and C. W. Tan, “On inferring rumor source for SIS model undermultiple observations,” International Conference on Digital Signal Processing (DSP),pp. 755–759, 2015.[84] L. Weng, F. Menczer, and Y. Y. Ahn, “Virality prediction and community structure insocial networks,” Scientific Reports, vol. 3, 2013.[85] R. J. Wilson, “An introduction to graph theory,” Pearson Education India, 1970.[86] H. Yang, W. Zhi-Xi, and D. Wen-Bo, “Evolutionary games on scale-free networks withtunable degree distribution,” EPL (Europhysics Letters), vol. 99, no. 1, 2012.[87] D. H. Zanette and S. Risau-Gusma´n, “Infection spreading in a population withevolving contacts,” Journal of Biological Physics, vol. 34, no. 1, pp. 135–148, 2008.[88] J. Zhang and M. F. J. Moura, “Diffusion in social networks as SIS epidemics:beyond full mixing and complete graphs,” IEEE Journal of Selected Topics in SignalProcessing, vol. 8, no. 4, pp. 537–551, 2014.94Appendix ARelated ProofsIn this Appendix, we provide a proof of the Mean Field Approximation used in Chapter 2.A.1 Proof of Theorem 1(Mean Field Dynamics) [48]Part 1 of the theorem is a standard martingale representation of a Markov chain, see forexample [47, page 20].The proof of the mean field dynamics approximation in [12] is not readily accessible toan engineering reader. We show below that the proof is a simple consequence of Azuma-Hoeffding inequality and Gronwall’s inequality.Theorem 2 (Azuma-Hoeffding Inequality [14]). Suppose ST =∑Tτ=1 ζτ + S0 where {ζτ} isa martingale difference process with bounded differences satisfying |ζτ | ≤ ∆τ almost surelywhere ∆τ are finite constants. Then for any > 0,P(|ST − S0| ≥ ) ≤ 2 exp(− 22∑Tτ=1 ∆2τ)A bound on the deviation between the mean field dynamics xn in (2.15) and infectedpopulation state∼xn in (2.14) is evaluated in the form of two lemmas, namely, Lemma 1and Lemma 2 below. Recall that ζτ is a 2D-dimensional finite-state martingale incrementprocess defined in (2.14) with ‖ζτ‖2 ≤ ΓM for some positive constant Γ.95A.1. Proof of Theorem 1Lemma 1. Let ϕn = xn − ∼xn. Then ‖ϕn‖ satisfies‖ϕn+1‖∞ ≤ ‖ϕ0‖∞ + βMn∑τ=1‖ϕτ‖∞ + ST .Lemma 2. Let ST = max1≤n≤T ‖∑nτ=1 ζτ‖∞. ThenP(ST ≥ ) ≤ 2 exp(−2M22ΓT)Proof of Theorem 1: With Lemmas 1 and 2, the proof of Theorem 1 is as follows.Applying Gronwall’s inequality23 to Lemma 1 yields ‖ϕn‖∞ ≤ ST exp[βnM], which in turnimpliesmax1≤n≤T‖ϕn‖∞ ≤ ST exp[βTM].As a resultP( max1≤n≤T‖ϕn‖∞ > ) ≤ P(ST exp[βTM]> )= P(ST > exp[−βTM])Next applying Lemma 2 to the right hand side yieldsP( max1≤n≤T‖ϕn‖∞ > ) ≤ 2 exp(− exp(−2βTM)2M22ΓT)Finally choosing T = c1M , for some positive constant c1 yieldsP( max1≤n≤T‖ϕn‖∞ > ) ≤ 2 exp(−C22M)23Gronwall’s inequality: if {xn} and {bn} are non-negative sequences and a ≥ 0, thenx¯n ≤ a+n−1∑j=1xjbj =⇒ xn ≤ a exp(n−1∑j=1bj)96A.1. Proof of Theorem 1where C2 = exp(−2βc1) 12Γc1 .Proof of Lemma 1: Define the following 2D− dimensional vectors:H(xn) = P21(xn)− P12(xn),H(∼xn) = P21(∼xn)− P12(∼xn).Recall from (2.14) and (2.15),ϕn+1 = ϕn +1M[H(xn)−H(∼xn)] + ζn= ϕ0 +1Mn∑τ=1[H(xτ )−H(∼xτ )] +n∑τ=1ζτ‖ϕn+1‖∞ ≤ ‖ϕ0‖∞ + 1Mn∑τ=1‖H(xτ )−H(∼xτ )‖∞ + ‖n∑τ=1ζτ‖∞≤ ‖ϕ0‖∞ + βMn∑τ=1‖ϕτ‖∞ + STThe last inequality is justified as follows: From (2.11) and (2.12), P21(d, x¯n(d)) = ρ(d)(1 −x¯n(d))P¯21(d, a) and P12(d, x¯n(d)) = ρ(d)x¯n(d)P¯12(d, a) for some a, where a is the number ofinfected neighbors. HenceH(xn, i)−H(∼xn, i) ≤ β(x¯n(i)− xn(i)),where β = maxd[ρ(d)(P¯21(d, a) + P¯12(d, a))]is bounded.Proof of Lemma 2: ‖∑nτ=1 ζτ‖∞ = maxi |∑nτ=1 e′iζτ | = |∑nτ=1 e′i∗ζτ | for some i∗,where ei denotes the vector having 1 at the ith position and zero elsewhere. Since e′i∗ζτ is amartingale difference process with |e′i∗ζτ | ≤√Γ/M applying the Azuma-Hoeffding inequality(Theorem 2) yieldsP(‖n∑τ=1ζτ‖∞ ≥ ) = P(|n∑τ=1e′i∗ζτ | ≥ ) ≤ 2 exp[−2M22Γn]97A.1. Proof of Theorem 1The right hand side is increasing with n. So clearly,P( max1≤n≤T‖n∑τ=1ζτ‖∞ ≥ ) ≤ 2 exp[−2M22ΓT]98
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Tracking infection diffusion in social networks
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Tracking infection diffusion in social networks Pedersen, Tavis Joseph 2017
pdf
Page Metadata
Item Metadata
Title | Tracking infection diffusion in social networks |
Creator |
Pedersen, Tavis Joseph |
Publisher | University of British Columbia |
Date Issued | 2017 |
Description | This thesis explores the problem of tracking the diffusion of contagion processes on social networks. Infection (or Information) diffusion is modeled using the Susceptible-Infected-Susceptible (SIS) model. Mean field approximation is employed to approximate the discrete valued infection dynamics by a deterministic ordinary differential equation, thereby yielding a generative model for the infection diffusion. The infection is shown to follow polynomial dynamics and is estimated using an exact non-linear Bayesian filter. We compute posterior Cramer-Rao bounds to obtain the fundamental limits of the filter which depend on the structure of the network. The SIS model is extended to include homophily, and filtering on these networks using the exact non-linear Bayesian filter is illustrated. With consideration for the collaborative or antagonistic nature of some social processes on networks, we present an alternative, game theoretic, model for the spread of information based on the evolutionary Moran process. The diffusion of collaboration following this Moran model is estimated using a particle filter. Additionally, we validate the efficacy of our method with diffusion data from a real-world online social media platform, Twitter. We find that SIS model is a good fit for the information diffusion and the non-linear filter effectively tracks the information diffusion. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2017-08-10 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0353169 |
URI | http://hdl.handle.net/2429/62557 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2017-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2017_september_pedersen_tavis.pdf [ 1.79MB ]
- Metadata
- JSON: 24-1.0353169.json
- JSON-LD: 24-1.0353169-ld.json
- RDF/XML (Pretty): 24-1.0353169-rdf.xml
- RDF/JSON: 24-1.0353169-rdf.json
- Turtle: 24-1.0353169-turtle.txt
- N-Triples: 24-1.0353169-rdf-ntriples.txt
- Original Record: 24-1.0353169-source.json
- Full Text
- 24-1.0353169-fulltext.txt
- Citation
- 24-1.0353169.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0353169/manifest