Average-Consensus in aTwo-Time-Scale Markov SystembyKevin James TopleyB.Sc., Dalhousie University, 2003M.Math., Waterloo University, 2005A THESIS SUBMITTED IN PARTIAL FULFILMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)November, 2014c© Kevin James Topley 2014AbstractIn a spatially distributed network of sensors or mobile agents it is often required to compute theaverage of all local data collected by each member of the group. Obtaining the average of alllocal data is, for example, sufficient to conduct robust statistical inference; identify the groupcenter of mass and direction of motion; or evenly assign a set of divisible tasks among processors.Due to the spatial distribution of the network, energy limitations and geographic barriers mayrender a data fusion center to be infeasible or highly inefficient in regard to averaging the localdata. The problem of distributively computing the network average - also known as the average-consensus problem - has thus received significant attention in the signal processing and controlresearch communities. Efforts in this direction propose and study distributed algorithms thatallow every agent in the network to compute the global average via communication with onlya subset of fellow agents.The thesis will present a framework in which to analyze distributed algorithms for bothdynamic and static consensus formation. For dynamic consensus we consider a two-time-scaleMarkov system wherein each sensor node can observe the state of a local Markov chain. As-suming each Markov chain has a stationary distribution with slowly switching regime, we showthat a local stochastic approximation algorithm in conjunction with linear distributed averagingcan imply that each node estimate converges weakly to the current average of all stationarydistributions. Each node can thus track the average of all stationary distributions, providedthe regime switching is sufficiently slow. We then consider static consensus formation whenthe inter-node communication pattern is a priori unknown and signals possess arbitrarily longtime-delays. In this setting, four distributed algorithms are proposed and shown to obtainaverage-consensus under a variety of general communication conditions.iiPrefaceThe work in this thesis is based primarily on the results contained in the following three journalpapers:• V. Krishnamurthy, K. Topley, and G. Yin, Consensus Formation in a Two-Time-ScaleMarkovian System, SIAM MMS, Vol.7, No.4, 2009.• K. Topley and V. Krishnamurthy, Average-Consensus Algorithms in a DeterministicFramework: Part 1 Strong Connectivity, IEEE Trans. on Signal Processing, Vol.60, No.12,Dec. 2012.• K. Topley and V. Krishnamurthy, Average-Consensus Algorithms in a DeterministicFramework: Part 2 Central Connectivity, IEEE Trans. on Signal Processing, Vol.60,No.12, Dec. 2012. 1All three of the above papers analyze algorithms for consensus formation in a networkof sensor nodes. The first paper deals with a dynamic consensus based on local stochasticapproximation algorithms operating in parallel at each node. The second two papers considerstatic consensus formation when the inter-node communication pattern is a priori unknownand arbitrary (but finite) communication time-delays may occur.In the current presentation of these results we have combined the framework of all threepapers into a single comprehensive model. This approach allows the reader to see each algorithmas a special case of a larger algorithmic class and observation model. Furthermore, our approachsheds light on the variety of trade-offs that may exist between communication resource costs,signal pattern assumptions, and the convergence rate at which consensus formation occurs.1The author was on-leave during the periods of Jan.2010-Sept.2011 and Sept.2012-Jun.2014.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x1 Introduction and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 An Illustrative Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Thesis Statement and Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.1 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 Thesis Contributions and Organization . . . . . . . . . . . . . . . . . . . . . . . 71.3.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.2 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Average-Consensus Problem Descriptions . . . . . . . . . . . . . . . . . . . . . 92.1 Preliminary Notations, Assumptions, and Definitions . . . . . . . . . . . . . . . 92.1.1 Consensus Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.2 Communication Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 The Average-Consensus Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 12ivTable of Contents3 ACP1: Consensus in a Two-Time-Scale Markov System . . . . . . . . . . . . 143.1 ACP1: Local Consensus Input Dynamics . . . . . . . . . . . . . . . . . . . . . . 163.2 ACP1: Inter-node Communication Pattern . . . . . . . . . . . . . . . . . . . . . 173.3 ACP1: Algorithm Description and Relation to Past Literature . . . . . . . . . . 193.4 ACP1: Algorithmic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253.4.1 ACP1: Stochastic Approximation and Consensus Protocol . . . . . . . . 253.4.2 ACP1: Convergence Results . . . . . . . . . . . . . . . . . . . . . . . . . 273.4.3 Scaled Tracking Error and Cumulative Distribution Function (CDF) Es-timation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4.4 Average-Consensus Exchange Graph Conditions . . . . . . . . . . . . . . 333.5 ACP1: Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.6 ACP1: Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 394 ACP2: The Static Consensus Problem with a priori Unknown Communica-tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.1 ACP2: Local Consensus Input Dynamics . . . . . . . . . . . . . . . . . . . . . . 434.2 ACP2: Inter-node Communication Pattern and Problem Statement . . . . . . . 434.3 ACP2: Relation to Past Literature . . . . . . . . . . . . . . . . . . . . . . . . . . 454.4 ACP2: Algorithmic Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.4.1 ACP2: Consensus Protocols . . . . . . . . . . . . . . . . . . . . . . . . . 474.4.2 ACP2: UCP Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.4.3 ACP2: Convergence Results . . . . . . . . . . . . . . . . . . . . . . . . . 554.4.4 ACP2: Discussion of Results . . . . . . . . . . . . . . . . . . . . . . . . . 564.4.5 ACP2: Example SVC˜C Sequence . . . . . . . . . . . . . . . . . . . . . . 594.5 ACP2: Numerical Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.6 ACP2: Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 655 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.2 Directions of Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A Proofs of Convergence Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79A.1 Proof of Results Contained in Chapter 3. . . . . . . . . . . . . . . . . . . . . . . 79vTable of ContentsA.1.1 Proof of Theorem 3.4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79A.1.2 Proof of Theorem 3.4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82A.1.3 Proof of Theorem 3.4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84A.1.4 Proof of Theorem 3.4.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84A.1.5 Proof of Lemma 3.4.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85A.1.6 Proof of Lemma 3.4.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88A.1.7 Proof of Lemma 3.4.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88A.1.8 Proof of Corollary 3.4.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . 88A.2 Proof of Results Contained in Chapter 4. . . . . . . . . . . . . . . . . . . . . . . 88A.2.1 Proof of Theorem 4.4.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89A.2.2 Proof of Theorem 4.4.10 . . . . . . . . . . . . . . . . . . . . . . . . . . . 103A.3 Data Dissemination Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110A.4 Preliminary Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111A.5 Fixed Ring Graph Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115A.5.1 Statement of Theorem A.5.1. . . . . . . . . . . . . . . . . . . . . . . . . . 115A.5.2 Proof of Theorem A.5.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 115A.5.3 Supporting Lemmas for Theorem A.5.1. . . . . . . . . . . . . . . . . . . 117A.6 Corollary A.6.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118A.6.1 Statement of Corollary A.6.1. . . . . . . . . . . . . . . . . . . . . . . . . 118A.6.2 Proof of Corollary A.6.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 118A.6.3 Supporting Lemmas for Corollary A.6.1. . . . . . . . . . . . . . . . . . . 119A.7 Conjecture A.7.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119A.7.1 Statement of Conjecture A.7.1. . . . . . . . . . . . . . . . . . . . . . . . 120A.8 Extensions to Time-Delayed and Asynchronous Communication Sequences . . . 120A.8.1 Statement of Extensions to Cor.A.6.1 and Conj.A.7.1 . . . . . . . . . . . 120A.8.2 Proof of Theorem A.8.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 120A.8.3 Proof of Conjecture A.8.2. . . . . . . . . . . . . . . . . . . . . . . . . . . 121A.8.4 Supporting Lemmas for Thm.A.8.1 and Conj.A.8.2 . . . . . . . . . . . . 121viList of Tables3.1 Numerical results of the ratio comparisons between Models 1 and 2 . . . . . . . . 39viiList of Figures1.1 Underwater sensor nodes estimating the ambient concentration of biohazards. . . 31.2 A centralized underwater communication network. . . . . . . . . . . . . . . . . . 41.3 Time-varying Consensus with a priori Unknown Communication Links. . . . . . 53.1 Sensor estimation of the average-consensus under both Models 1 and 2 . . . . . . 373.2 The absolute scaled error, plotted consecutively over the simulation time for eachstate in S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.3 The analytic and sample variance of the scaled tracking error under both Models1 and 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.4 Unordered Sensor Diffusion Coefficients, Absolute Error, and Functions of Weights 404.1 Illustrations of the SVCC and SVC˜C communication conditions . . . . . . . . . . 594.2 Convergence Under Example 3 in [1] . . . . . . . . . . . . . . . . . . . . . . . . . 614.3 Convergence under Asynchronous Communication with Time-Varying Asymmet-rical Time-Delays; Gossip, Random, and OH Protocols . . . . . . . . . . . . . . . 624.4 Convergence under Asynchronous Communication with Time-Varying Asymmet-rical Time-Delays; IA, DA, and DDA Protocols . . . . . . . . . . . . . . . . . . . 634.5 Convergence under Asynchronous Communication with Time-Varying Asymmet-rical Time-Delays and Unknown Network Size . . . . . . . . . . . . . . . . . . . . 64A.1 A diagram of the Signal Notation in Def.A.4.10 . . . . . . . . . . . . . . . . . . . 113A.2 A diagram of Thm.A.8.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121A.3 A diagram of Conj.A.8.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121viiiList of Acronymstrcp time-repecting communication pathntrcp non-time-repecting communication pathtrcs strongly-connected time-repecting communication sequencentrcs strongly-connected non-time-repecting communication seqeuencermcs regime-modulated communication seqeuenceSVSC singly V strongly connectedIVSC infinitely V strongly connectedSVCC singly V centrally connectedIVCC infinitely V centrally connectedSVC˜C singly V dual-centrally connectedACP average-consensus problemDLA distributed linear averagingDA distributed-averaging algorithmDDA discretized distributed-averaging algorithmIA ideal distributed algorithmOH one-hop distributed algorithmRG ring-graphRGF fixed ring-graphRGSC strongly-connected ring-graphRGFSC strongly-connected fixed ring-graphixAcknowledgmentsFirstly, I would to thank my supervisor Dr. V. Krishnamurthy, my supervisory committee, aswell as the studies coordinator Dr. L. Lampe, for their helpful suggestions.The main results presented in this thesis are derived from three journal papers, co-written bythe author. The contribution to each journal paper by the respective authors is now describedhere.The work Consensus Formation in a Two-Time-Scale Markovian System is a research paperbased on collaboration with Dr. Yin from Wayne State University and Dr. Krishnamurthy. Theconception and presentation of the project is credited to Dr. Krishnamurthy, whereas Dr.Yindeveloped all of the weak-convergence proofs. The author’s contributions to the project were thedevelopment of the dynamic consensus algorithm analyzed in the work; the connectivity con-ditions on the network graph shown to be necessary and sufficient for average-consensus underthis algorithm; and finally all of the simulations, which illustrate a numerical implementationof the algorithm in a two-time-scale Markov system.The algorithms and convergence proofs presented in the two papers Average-Consensus Al-gorithms in a Deterministic Framework: Part 1 Strong Connectivity and Average-ConsensusAlgorithms in a Deterministic Framework: Part 2 Central Connectivity were collectively devel-oped by the author. Dr. Krishnamurthy aided substantially in the presentation of the works.xChapter 1Introduction and Overview1.1 IntroductionSensor networks are comprised of wireless networks of low-cost sensors that can be embeddedin an environment for purposes such as security surveillance, monitoring and data aggregation.Over the past few decades, both sensor networks and ad hoc networks of mobile agents haveattracted many researchers from diverse backgrounds in computer science, signal processing,robotics, and control theory [2–11]. Information processing in sensor networks is also closelyrelated to traditional areas of research in systems and control such as multi-target tracking andsensor fusion [12–15].The introduction of sensor, wireless ad hoc and peer-to-peer networks has motivated thedesign of distributed and fault-tolerant computation algorithms. This is largely due to thefact that distributed networks are characterized by numerous operational constraints. Suchconstraints include a lack of knowledge at each node in regard to the network topology [1,12,16];the lack of a central node to facilitate the required computations [3, 17–19]; or, in the case ofsensor networks, the computational power and energy resources of each node may be very limited[20–22]. These constraints necessitate the design of distributed algorithms for computationwherein each node can send information to only a sparse subset of fellow nodes within anygiven time period. The purpose of such algorithms in this setting is to quickly determine ateach node the value of a global computation using minimum exchange of local data. Examplesof such global computations include finding the minimum value, mode value, and average valueamong all nodes.A particularly useful instance of the above distributed problem is that of computing theaverage input at each node. A considerable amount of work in recent years has been devoted toascertaining the communication conditions under which the average of node input values canbe accurately estimated by all nodes in a sparsely connected network (a sample of the literatureincludes [16,17,19,23–26]). The distributed linear averaging protocol has been a primary focalpoint in this area of research due to its analytic tractability, decentralized nature, and minimummemory storage required at each node [8,27]. The basic paradigm of this protocol parameterizesthe network communication at any given time instant by a graph G(t) = {V,E(t),W (t)} with11.1. Introductiondirected (possibly time-delayed) edge set E(t) ⊆ V ×V , node set V = {1, . . . , n}, and weightedadjacency matrix W (t) with elements Wij(t) = 0 if (j, i) /∈ E(t). The weighted adjacencymatrix W (t) determines how each node consensus estimate is updated as a linear average of theestimates received from neighboring nodes. Under various conditions on the communicationgraph G(t), research in this area have proven convergence results in regard to the distributedlinear averaging of all local data. This computation can be used as a primary for more complexdistributed calculations such as hypothesis testing in Bayesian networks, maximum likelihoodestimation, solving general convex optimization problems, and load balancing within a networkof processors. In this thesis we will provide convergence results under a general collectionof “consensus protocols” that lead to an average-consensus estimate at all nodes. The nodesthereby obtain average-consensus by only a decoupled interaction among autonomous sets ofagents that are collectively connected within a multi-agent system.The convergence to a common value among a collection of autonomous agents was firsttechnically analyzed in the seminal works [28, 29]. Although these papers preempted an entireline of research (i.e. [23,25]) that expanded on these topics in various directions, a key differencebetween this line of work and the present thesis is that we focus on convergence to the averageof each node’s input values, rather than simply convergence to an arbitrary common value. Thelatter is of little use in many practical applications that require utilization of the average valuefor purposes such as maximum likelihood estimation, hypothesis testing, or load balancing.The average-consensus problem was largely revitalized by the work [18], which emphasizedthe reality that in many applications an average of node initial values is required. In thisthesis we focus on average-consensus, and thus our results have a direct applicability to thevarious practical situations wherein the average initial value, rather than merely an arbitraryagreement value, is obtained. In the proofs regarding the network weak-convergence to average-consensus in a two-time-scale Markov system we require a detailed martingale formulation ofthe problem in conjunction with a well established matrix eigen-decomposition. With regard toaverage-consensus in a priori unknown communication sequences we rely primarily on extensivealgebraic manipulation, the Cauchy-Schwarz inequality, as well as the triangle inequality.1.1.1 An Illustrative ExampleAs a simple example, consider a set of 4 sensor nodes dispersed under water with the intentof monitoring the amount of ambient biohazardous material in their vicinity. In this case it isreasonable to assume that at each time instant each sensor node should take a local measurementand average this to obtain the current ambient concentration of biohazards (see Fig.1.1).Suppose at each time instant t ∈ {0, 1, 2, . . .} each node i ∈ {1, 2, 3, 4} transmits a localconsensus estimate sˆi(t) to their neighboring node as shown in Fig.1.1. It is easy to show that if21.1. IntroductionNode 1Node 2Node 3Node 4s (0)s (0)s (0)s (0)1234Water Surfacepppp( Biohazardous Material )Figure 1.1: Underwater sensor nodes estimating the ambient concentration of biohazardousmaterial by means of a distributed linear averaging protocol.each node initializes their consensus estimate by the measurement si(0), and each receiving nodej updates their local consensus estimate as sˆj(t+1) = (1−p)sˆj(t)+psˆi(t), then for any p ∈ (0, 1)each node consensus estimate will asymptotically converge to the average s¯(0) = 14∑4i=1 si(0),thereby obtaining at each node the ambient concentration of hazardous material in the water.Unfortunately however, the communication sequence and weighting scheme depicted inFig.1.1 requires a number of operational constraints, such as synchronization of the sensornodes clocks, an a priori agreed upon update weight p, as well as an initial node positioningsuch that (i) each node sends only one other node its consensus estimate and (ii) the unionof communicated signals constitute a strongly-connected network. These constraints could beby-passed if a centralized communication network was assumed, as shown in Fig.1.2. However,if the number of sensing nodes was large (say, 100 or 1000), then a centralized node in suchcircumstances would be clearly infeasible due to physical limitations of the network, such asnode transmission power and geological barriers. Furthermore, any fixed scheme to route eachnode’s data to a fusion center could be compromised if the network communication graph wasto change with time. With these problems in mind, we nonetheless note that a centralized nodeas shown in Fig.1.2 could solve the average-consensus problem trivially by simply aggregatingits received data and dividing by the total number of nodes.In this thesis we will generalize the basic average-consensus problem in two diametrically31.1. IntroductionNode 1Node 2Node 3Node 4Water SurfaceCentral NodeFigure 1.2: A centralized underwater communication network.opposite directions. First, we assume the initial input parameters si(0) are observed in a time-varying noise process, and second we allow the inter-node communication pattern to possessarbitrary time-delays and not adhere to any fixed graph structure (see Fig.1.3). In the un-derwater example described above, it is easy to imagine that swells in the ocean water couldcause certain nodes to move in and out of each other’s communication radius, thus changingthe network communication graph.In these terms, we thus refer to the average-consensus problem depicted in Fig.1.1 as a“static” consensus problem with a priori known communication links. The motivation foranalyzing an average-consensus protocol with time-varying (dynamic) inputs is natural in anyreal-time setting, such as security surveillance and environmental monitoring. The motivationfor average-consensus using a priori unknown communication links is again natural, since mostwireless networks are deployed in dangerous and/or unknown environmental conditions (i.e.the swells in the ocean may disable certain communication links and enable other links atunpredictable time intervals).Both the static and dynamic average-consensus problem has been analyzed in the past andapplied in a variety of practical settings such as formation control (i.e. UAV co-ordination[22,30]); distributed message routing [31,32]; the analysis of swarming behaviors [23]; oscillatorsynchronization [33]; and network load balancing [16].Obtaining the average of all local data is useful for theoretical network operations such as41.1. IntroductionNode 1Node 2Node 3Node 4Water Surfaces (t)s (t)s (t)124s (t)3Figure 1.3: Time-varying Consensus with a priori Unknown Communication Links.solving a set of local convex optimization problems [34, 35]; distributed Kalman filtering [8];conducting maximum likelihood estimation [16,36]; and machine learning [37]. The coordinationof an average-consensus within a multi-agent system has thus been considered under a wide-range of assumptions regarding the inter-agent communication capabilities [4, 38–48]. Thespatial distribution of the network, energy limitations and geographic barriers may all rendera data fusion center to be infeasible with respect to the averaging of local data. The problemof distributively computing the network average has received particularly significant attention.The majority of research in this area propose and study distributed algorithms that allow everyagent in the network to compute the global average via communication with only a subset offellow agents. In this thesis we will focus on a specific class of consensus algorithms underwhich a network consensus is obtained by the linear averaging of data held at neighboringnodes. Our contribution to this body of literature includes an extension of the input consensusdynamics to a two-time-scale Markov system, as well as an extension to arbitrarily time-delayed,asynchronous, and time-varying network graph topologies.This thesis will employ a general paradigm within which the same average-consensus prob-lems as studied in e.g. [17, 26] have been considered. Specifically, we will investigate how eachnode can obtain the average of all initial node values by means only of averaging communicatedvalues under a stochastically switching network communication graph G(t). In addition, weconsider a general communication model that eliminates the possibility of average-consensusby any linear averaging algorithm with fixed averaging weights. In the former case, to obtain51.2. Thesis Statement and Objectivesthe desired average-consensus value we propose a stochastic approximation (SA) implementedindependently at each node. The SA acts as an on-line “learning” algorithm and can be usedto make local weight adjustments that will asymptotically eliminate the distance between eachnode state-value and the collective average when employing the distributed linear averaging pro-tocol. In the latter case, we are required to employ “normal” consensus estimates that maintaina known linear dependence of each consensus estimate on the initial set of node values. Thedrawback to using the normal consensus estimates is that they require an O(n) memory storageat each node, regardless of the initial consensus input dimensions. We motivate our contribu-tions by providing numerous practical examples from the past literature that necessitate O(n)consensus estimates, thus nullifying the cost of using the normal consensus estimates whennetwork communication is asynchronous, time-delayed, and a priori unknown at every node.1.2 Thesis Statement and Objectives1.2.1 Thesis StatementThis thesis addresses two diametrically opposed aspects of the average-consensus problem. Thefirst aspect concerns average-consensus in a two-time-scale Markov system when inter-nodecommunication is synchronous, instantaneous, and adheres to a fixed network graph. Thesecond aspect concerns average-consensus with node input variables that do not vary overtime, yet the inter-node communication may be asynchronous, time-delayed, and adhere to nofixed network graph. Although in the Appendix we attempt to combine the difficulties of bothaspects by proposing a collection of data dissemination results that permit average-consensus ina two-time-scale Markov system under general time-delayed inter-node communication patterns,the main efforts in this thesis tackle the two aforementioned contrasting sides of the average-consensus problem.1.2.2 Thesis ObjectivesBased on the description in Sec.1.1, it is evident that average-consensus can in general be amulti-faceted and intricate network optimization problem. Its applications involve significantresearch abridging stochastic optimization, communication theory, and network science. More-over, the involved difficulties vary with the underlying network infrastructure that support thedata transmitted by any given consensus protocol. Therefore, the goals of this research includethe following:1. efficient data aggregation given a priori knowledge of the underlying communication pat-terns.61.3. Thesis Contributions and Organization2. stochastic and non-stochastic solutions to average-consensus problems in noisy and noise-less settings.3. distributed algorithms that solve combinations of the above problem features.To address these objectives we employ techniques in graph theory, stochastic approximation,and convex optimization.1.3 Thesis Contributions and Organization1.3.1 Summary of ContributionsThe Chapters 3,4, and 5 detail our contributions relative to the current literature. Here wesummarize the main contributions of this thesis at a more abstract level:• distributed protocols that weakly-converge to average-consensus in a two-time-scale Markovsystem.• average-consensus protocols that converge under a priori unknown communication pat-terns.• data dissemination protocols that can operate under various communication sequenceassumptions.By utilizing the appropriate data dissemination results, we attempt to combine all 3 of theabove contributions by suggesting efficient consensus protocols that will weakly-converge toaverage-consensus under a priori unknown communication patterns.1.3.2 Thesis OrganizationIn addition to this introductory chapter, the thesis includes two chapters which were originallyprepared for journal publication and have been slightly modified in order to offer a logicalprogression in the thesis. The final chapter discusses the conclusions and directions for futurework.The remaining chapters in this thesis are organized as follows:Chapter 2 introduces the notation and general distributed algorithm assumptions that willbe made through-out the thesis. We also define in general terms the 2 average-consensusproblems that will be addressed in the subsequent 2 chapters.In Chapter 3 we define a two-time-scale Markov system and describe a distributed consensusprotocol that weakly-converges to the current average stationary distribution. Subsequently,71.3. Thesis Contributions and Organizationthe network graph weights in this setting are analyzed with respect to the conditions necessaryand sufficient for asymptotic convergence to average-consensus.Chapter 4 considers a static average-consensus with a priori unknown communication pat-terns. Four consensus protocols are defined and the various communication conditions necessaryand/or sufficient for each protocols’ convergence are defined. In the Appendix we prove variousdata dissemination results (yet unpublished) that are required for the latter set of protocols toprovide weak-convergence results when the consensus input variables adhere to the two-time-scale Markov system defined in Chapter 3.Finally, we draw conclusions in Chapter 5 thereby wrapping up the contributions of thisthesis and proposing several suggestions for future research on the discussed topics.8Chapter 2Average-Consensus ProblemDescriptionsIn this chapter we define two diametrically opposed average-consensus problems that will beaddressed in this thesis. The first problem (ACP1) assumes each node is associated with aninput parameter observed in a slowly changing Markov noise process; the average-consensusproblem then consists of having all nodes weakly-converge to the average of all current inputnoise distributions. The second problem (ACP2) assumes each node is associated with a fixedinitial consensus input; the average-consensus problem then consists of having all nodes convergeto the average of all the initial node inputs. Although the second problem is a special case ofthe first in regard to the node input parameters, we will consider the second consensus problemunder far more general inter-node communication assumptions than the first.To formalize these problems we require some preliminary notation and definitions. Theseare presented in Sec.2.1. Subsequently in Sec.2.2 we define the two aforementioned consensusproblems.2.1 Preliminary Notations, Assumptions, and DefinitionsLet N denote the non-negative integers and t ∈ N denote discrete time. At initial time t = 0,consider a finite set of arbitrarily numbered nodes V = {1, . . . , n} and a set of d-dimensionalvectors {si(0) ∈ Rd×1 : i ∈ V}. The set {si(0) ∈ Rd×1 : i ∈ V} is referred to as the setof “initial input consensus vectors”. We generally will consider the input consensus vectors{si(t) ∈ Rd×1 : i ∈ V , t ∈ N} to have the ability to vary with time. Through-out the thesis,we define s¯(t) as the “current average-consensus value”,s¯(t) =1nn∑i=1si(t) . (2.1)Assume each node i ∈ V can locally store a “knowledge set” Ki(t) that consists of an orderedset of elements each with a distinct meaning.92.1. Preliminary Notations, Assumptions, and Definitions• (A1): (knowledge set assumption) At any time t ∈ N, each node i ∈ V is equipped with adevice that can update and store a “knowledge set” Ki(t). For each i ∈ V, the knowledgeset Ki(t) may have a time-varying cardinality.The algorithms proposed in this thesis will assume that every knowledge set Ki(t) contains alocal “consensus estimate” sˆi(t) ∈ Rd×1 that represents the “belief” of node i in regard to thecurrent average-consensus vector s¯(t) defined in (2.1).• (A2): Ki(t) ⊇ {sˆi(t)} , ∀ i ∈ V , ∀ t ∈ N, where sˆi(t) ∈ Rd×1 represents the local estimateof s¯(t) at node i and time t.Next we assume that an ordered set Sij(tij0 , tij1 ) can be transmitted from node j at a timedenoted tij0 , and received at node i at a time denoted as tij1 , where due to causality tij1 ≥ tij0 .Through-out this work we will for notational convenience denote t0 = tij0 and t1 = tij1 , since thesuperscripts are necessary only when multiple signals are considered in a single equation. Werefer to Sij(t0, t1) as a “signal”, or “signal set”.• (A3): (signal set assumption) At any time t0 ∈ N, each node j ∈ V has the ability totransmit a “signal set” Sij(t0, t1) ⊆ Kj(t0) that will be received at some node i ∈ V attime t1 ∈ N, where t1 ≥ t0.The signal Sij(t0, t1) has a time-delay (t1 − t0) ≥ 0. As our final condition, we assume that atany time t ∈ N, each node i ∈ V will “know” its unique node identifier value i, the network sizen, the current input consensus vector si(t), and the local consensus estimate vector sˆi(t). Thisis formalized as,• (A4): At any time t ∈ N, the knowledge set Ki(t) of each node i ∈ V satisfies Ki(t) ⊇{i, n, sˆi(t), si(t)}.In [27] we explain how the proposed distributed algorithms can be modified if the network sizen is not initially known at any node. We assume Ki(t) ⊃ {n} for all i ∈ V and t ∈ N only forsimplicity of presentation.2.1.1 Consensus ProtocolsGiven (A1)-(A4), we define a “consensus protocol” (P) in terms of its “knowledge set updatingrule” fPK {·} together with a “signal specification rule” fPS {·}. The knowledge set updating rulefPK {·} defines the effect that a set of signals has on the knowledge set Ki(t1) at the receiving nodei. The signal specification rule fPS {·} specifies the elements contained in the signal Sij(t0, t1)given as a function of the knowledge set Kj(t0) at the transmitting node j.102.1. Preliminary Notations, Assumptions, and DefinitionsThe Consensus Protocol (P) under (A1)-(A4):Knowledge Set Updating Rule: fPK : Ki(t1)⋃Sij(t0, t1) 7−→ Ki(t1+1) (2.2)Signal Specification Rule: fPS : Kj(t0) 7−→ Sij(t0, t1) (2.3)Any given consensus protocol (P) may or may not solve a particular average-consensusproblem, as we shall see in each subsequent chapters.2.1.2 Communication PatternsIn addition to (A1)-(A4) and the consensus protocol (2.2) − (2.3), to state the two consensusproblems mentioned above we still require four definitions pertaining to the inter-node com-munication. Based on the definition of a signal as stated in (A3), we define a “communicationsequence” as follows.Definition 2.1.1 (Communication Sequence) For any 0 ≤ t0 ≤ t1, a “communication se-quence” Ct0,t1 is the set of all signals transmitted no earlier than time t0 and received no laterthan time t1, that is,Ct0,t1 = {Si1j1 , Si2j2 , Si3j3 , . . .}where we have omitted the time indices but it is understood that the transmission time t0 andreception time t1 of each signal Si`j` belong to the interval [t0, t1]. Next we define the notion of a “non-time-respecting communication path”, which will be ab-breviated as “ntrcp”.Definition 2.1.2 A communication sequence Ct0,t1 contains a “non-time-respecting communi-cation path” (ntrcp) from node j to node i if Ct0,t1 contains a sub-sequence Cijt0(ij),t1(ij) withthe following connectivity property,Cijt0(ij),t1(ij) = {S`1j , S`2`1 , S`3`2 , . . . , S`k(ij)`k(ij)−1 , Si`k(ij)}where we have omitted the time indices of each signal but it is understood that the transmissiontime t0 and reception time t1 of each signal Si`+1j` belong to the interval [t0(ij), t1(ij)]. Note that the sub-sequence Cijt0(ij),t1(ij) in Def.2.1.2 has a finite cardinality |Cijt0(ij),t1(ij)| =k(ij) + 1 ≥ 1.For the next definition we let V−i = V \ i for each node i ∈ V. The notion of a “non-time-respecting” strongly-connected communication sequence (ntrcs) is as follows.112.2. The Average-Consensus ProblemsDefinition 2.1.3 A communication sequence Ct0,t1 is “non-time-respecting strongly-connected”(ntrcs) if Ct0,t1 contains a ntrcp from each node i ∈ V to every node j ∈ V−i. We let Ct0,t1 ∈ ntrcs denote that Ct0,t1 is a ntrcs.Lastly, we define a “regime-modulated” communication sequence (rmcs).Definition 2.1.4 Let {rk(t0(k), t1(k)): k ∈ N} denote a sequence of regimes over the integerst ∈ N, with t0(k) ≤ t1(k) < t0(k + 1) for all k ∈ N. A communication sequence Ct0,t1 is“regime-modulated” (rmcs) if Ct0,t1 =⋃k∈NCt0(k),t1(k), Sijt,t ∈ Ct0(k),t0(k) if and only if (iff)Sijt,t ∈ Ct(k),t(k) for all t(k) ∈ {t0(k), t0(k) + 1, . . . , t1(k)}, and Ct0(k),t1(k) ∈ ntrcs for eachk ∈ N. We let Ct0,t1 ∈ rmcs denote that Ct0,t1 is a rmcs. Note that every signal in a regime-modulatedcommunication sequence has a zero time-delay, and that for each particular regime a node jsends a signal to node i at each time instant iff node j sends a signal to node i at any timeinstant in that regime. In this sense, each communication sequence Ct0(k),t1(k) is “fixed” overthe time span of each regime.2.2 The Average-Consensus ProblemsGiven the above definitions, we may now state the two consensus problems that will be consid-ered in this thesis.Definition 2.2.1 (ACP1) Under (A1)-(A4), assume the communication sequence C0,∞is regime-modulated and that within each regime r(k) the sequence of each input consensusparameter {si(t) : t ∈ [t0(k), t1(k)]} is an i.i.d. discrete random variable with finitestate-space S = {s1, s2, . . . , sd} and probability distribution pii(r(k))∈ Rd×1. A consensusprotocol (P) (cf. (2.2) − (2.3)) then solves the average-consensus problem (ACP1) if theconsensus estimate sˆi(t) of each node weakly-converges to p¯i(r(k))= 1n∑i∈V pii(r(k))forall t ∈ [t0(k), t1(k)] and each k ∈ N. The above consensus problem requires the consensus estimate sˆi(t) of each node to weakly-converge to the average of all input variables’ regime-modulated probability distribution. Wethus have a consensus in distribution rather than a consensus in regard to a fixed value. Fur-thermore, the probability distribution of each input variable is regime-modulated and thus can122.2. The Average-Consensus Problemsvary with time. The latter differences are two of the primary distinctions between the consensusproblems (ACP1) and (ACP2) defined next.Definition 2.2.2 (ACP2) Under (A1)-(A4), assume the communication sequence C0,∞is a priori unknown and that the input consensus parameters {si(t) : i ∈ V} remain fixedat their initial values {si(0) : i ∈ V} for all t ∈ N. A consensus protocol (P) (cf. (2.2)−(2.3)) then solves the average-consensus problem (ACP2) if the consensus estimate sˆi(t) ofeach node converges (in the Euclidean norm) to the average initial input parameter s¯(0) =1n∑i∈V si(0). Note that in both (ACP1) and (ACP2) the average-consensus problem is solved when eachnode consensus estimate converges to the “current average-consensus value” s¯(t) defined in(2.1). The differences between the two problems is that (i) in (ACP1) s¯(t) is the average of aset of time-varying probability distributions whereas in (ACP2) it remains fixed at the initialinput value, and (ii) in (ACP1) the communication sequence is regime-modulated (synchronous,without time-delay, and fixed) whereas in (ACP2) it is completely a priori unknown.In the subsequent chapters we shall present consensus protocols that solve each of theseACPs. In each chapter we first define the local consensus input dynamics as well as the as-sumed inter-node communication patterns. We then review the literature with respect to thealgorithms that researchers in this area have presented. Our own solutions are then describedin detail (the proofs of each result can be found in the Appendix). Numerical examples of ouralgorithms are presented and their performance compared where possible with those of previ-ously proposed algorithms in the literature. Lastly, some concluding comments and potentialdirections for future work are discussed.13Chapter 3ACP1: Consensus in aTwo-Time-Scale Markov SystemWireless sensor networks are often a preferential candidate to monitor environmental parameterssuch as moisture degree, presence of contaminants, pressure, and temperature. In many of suchapplications, the sensors will first sense the environment, and then average their measurementsto compute the final global estimator in a completely distributed fashion. However, there isusually no meaningful way to decide when the sensing stage should be terminated and theaveraging procedure initiated. In this chapter we present a consensus protocol that consistsof both a local stochastic approximation (SA) algorithm that operates in parallel with thedistributed linear averaging (DLA) protocol. By utilizing both local SA and strongly-connectedDLA we show that an average-consensus can be maintained while simultaneously procuring newobservations via the local sensing procedure. We note that if the initial consensus inputs remainfixed (or are drawn from a deterministic source) than there is no need to obtain new observationswhile simultaneously computing the current average (this particular issue is explored in moredetail in Chapter 4).Specifically, this chapter will propose a consensus protocol that solves a generalized versionof ACP1 (cf. Def.2.2.1). We will assume the input consensus variable Xi at each node i ∈V obeys a two-time-scale Markov system wherein the stationary distribution pii(t) switchesaccording to a slow Markov chain denoted by the hyper-parameter θ(t). We propose a schemewhere the sensing and the averaging stages are simultaneous: the network continues collectingdata while computing on-line the distributed estimator of the current average input value.The asymptotic behavior of the consensus protocol is investigated and compared with that ofclassical consensus protocols; the impact of the network topology is discussed; and numericalexamples are presented.Motivation:In many applications of sensor networks, the system is deployed in dangerous areas, subjectto some kind of risk such as geomorphological, chemical, radioactive, and so forth. In thesecases the wireless architecture is usually preferred to the wired one for the absence of hardware14Chapter 3. ACP1: Consensus in a Two-Time-Scale Markov Systeminfrastructures connecting the sensors. The different nodes of the network independently mea-sure various input parameters and the final estimate of the desired quantity is usually obtainedin a fully decentralized fashion. Instead of delivering the locally measured data to a commonfusion center, the nodes exchange their data by performing decoupled averaging and therebyconverging, after a sufficient amount of time, to a common value for all the nodes. The finalglobal estimate is recorded at each node of the network and thus can be recovered from any ofthe existing nodes.However, in the above analysis an important scheduling aspect is ignored. Usually theinstant of time when a sufficient amount of sensing data has been locally procured is unavailableto any given node, implying that the system designer is not able to decide when the sensing stageshould be terminated and the averaging protocol that leads to the final consensus should begin.If the sensing stage is prematurely terminated, many potentially useful observations are lost. Onthe other hand, prolonging too much the sensing stage in the attempt of collecting more samplesentails the risk of duplicate data processing and the collection of uninformative measurements.In this chapter we propose a consensus protocol where the stages of sensing and of averagingare not separated but they evolve simultaneously. We make explicit reference to the gossipalgorithms [17], where the final aim is that of making available, to any node, the arithmeticmean of all the observations globally collected by the entire network. In addition to convergenceresults, our contribution is investigate how the new measurements incorporated in the consensusprocedure trade off with the convergence conditions: averaging the observations cuts down thedifferences between the sensors (consensus), but the occurrences of new observations representa form of diversity among the nodes.We generalize the observation input model in the existing literature by considering a two-time-scale Markov system. In such a system it is shown in [49] that the continuous-timeinterpolation of the iterationsˆi(k + 1) = (1− µ)sˆi(k) + µXi(θ(k)) , k ∈ N (3.1)will, under appropriate scaling conditions, weakly-converge to the slowly-switching ODEdsi(t)dt= −si(t) + pii(θ(t)). (3.2)In this chapter we generalize (3.1) from a single node to a multi-agent system of nodes thatcommunicate via an instantaneous, fixed, and synchronous network communication graph. Inaddition to this generalization of (3.2), we also consider the estimation of the average cumulativedistribution function Π¯(θ(t)), the scaled tracking error vi(t) at each node i ∈ V, as well as theset-valued generalization of (3.2).153.1. ACP1: Local Consensus Input DynamicsThe outline of this chapter is as follows. In Sec.3.1 we define the local consensus inputdynamics, and in Sec.3.2 we define the inter-node communication patterns that are assumedunder ACP1. Also in Sec.3.2 the definition of a consensus protocol as well as the generalizedversion of the consensus problem ACP1 are re-stated for the readers’ convenience. A literaturereview with respect to consensus protocols that past research in this area have analyzed ispresented in Sec.3.3. The implementation of our proposed protocol is then described in Sec.3.4.Numerical examples of the protocol are subsequently presented (Sec.3.5), and lastly, we providesome concluding comments and potential directions for future work (Sec.3.6).3.1 ACP1: Local Consensus Input DynamicsIn this section we define a two-time-scale Markov system that independent sensors positionedat each node will observe. This system entails a set of randomly switching input consensusvariables observed in Markov noise. By reaching average-consensus each node essentially tracksthe average value of the current set of slowly-switching input consensus variables.At each discrete-time index k ∈ N suppose each node i ∈ V observes the state of an S-stateMarkov chain Xi(k) with state-space {e1, . . . , eS}, where each e` is the `th S × 1 standardunit vector. Next consider a continuous time-scale t ≥ 0 such that tk = kµ for a small scaleparameter µ 1. Assume there exists a hyper-parameter θ for which each Xi(k) is θ-dependentfor θ ∈M = {θ1, . . . , θm} such that the transition matrix of Xi(k) conditioned on θ is given byAi(θ) = [ailj(θ)], whereailj(θ) = P (Xi(k + 1) = ej |Xi(k) = el, θ(k) = θ).To proceed, we pose the following three conditions.(B) 1. For each θ ∈ M , the transition matrix Ai(θ) is irreducible and aperiodic for eachi ∈ V.2. Parameterize the transition probability matrix of θ asP ε = I + εQ,where ε is a small parameter satisfying 0 < ε 1 and Q is the generator of acontinuous-time finite-state Markov chain.3. The process θ(k) is slow in the sense ε = O(µ). For simplicity, we take ε = µhenceforth.163.2. ACP1: Inter-node Communication PatternCondition (B) specifies our observation model as a two-time-scale Markov system. As statedin (B), each transition matrix Ai(θ) is irreducible and aperiodic. Accordingly we will let pii(θ(k))denote the stationary distribution of Xi(k). Note that since ε = O(µ), the hyper-parameter θwill switch states on a slow time-scale as compared to the state of the input parameter Xi(k),which will switch states on a relatively fast time-scale.3.2 ACP1: Inter-node Communication PatternIn this section we define the inter-node communication pattern that will be assumed through-out this chapter. To do so we require the definition of a “non-time-respecting communicationpath”, which will be abbreviated as “ntrcp”.Definition 3.2.1 A communication sequence Ct0,t1 contains a “non-time-respecting communi-cation path” (ntrcp) from node j to node i if Ct0,t1 contains a sub-sequence Cijt0(ij),t1(ij) withthe following connectivity property,Cijt0(ij),t1(ij) = {S`1j , S`2`1 , S`3`2 , . . . , S`k(ij)`k(ij)−1 , Si`k(ij)}where we have omitted the time indices of each signal. Note that the sub-sequence Cijt0(ij),t1(ij) in Def.3.2.1 has a finite cardinality |Cijt0(ij),t1(ij)| = k(ij)+1 ≥ 1. We next define the notion of a “non-time-respecting” strongly-connected communicationsequence (ntrcs).Definition 3.2.2 A communication sequence Ct0,t1 is “non-time-respecting strongly-connected”(ntrcs) if Ct0,t1 contains a ntrcp from each node i ∈ V to every node j ∈ V−i. We let Ct0,t1 ∈ ntrcs denote that Ct0,t1 is a ntrcs.The network communication in this chapter will be assumed to be synchronous, directed,instantaneous, and regime-modulated. Specifically we have the following three conditions.• (A5) At each time instant {tk = kµ : k ∈ N} every member i in a set of nodesVout(tk) ⊆ V sends an instantaneous signal to at least one member j in a set of nodesV in(tk) ⊆ V. We define the n × n “communication adjacency matrix” C(tk) to have its(ij)th element equal to 1 if node i sends a signal to node j at time tk and zero otherwise.• (A6) At each time instant {tk = kµ : k ∈ N} the matrix C(tk) determined by the stateof the hyper-parameter θ(tk).• (A7) the communication sequence Ctk,tk is ntrcs for each tk ≥ 0.173.2. ACP1: Inter-node Communication PatternSince the inter-node communication is fully defined at time tk by the communication adjacencymatrix C(tk), conditions (A5) and (A6) together specify a “regime-modulated communicationsequence”, with regime dictated by the state of the hyper-parameter θ(tk). Under (A5)-(A6),the condition (A7) is necessary for consensus under any consensus protocol, as is shown inChapter 4 (see Thm.4.4.7).We now re-state assumptions (A1)-(A4) as well as the definition of a consensus protocol forthe reader’s convenience. We will assume,• (A1): (knowledge set assumption) At any time t ∈ N, each node i ∈ V is equipped with adevice that can update and store a “knowledge set” Ki(t). For each i ∈ V, the knowledgeset Ki(t) may have a time-varying cardinality.• (A2): Ki(t) ⊇ {sˆi(t)} , ∀ i ∈ V , ∀ t ∈ N, where sˆi(t) ∈ Rd×1 represents the local estimateof s¯(t) at node i and time t.• (A3): (signal set assumption) At any time tk ∈ N, each node j ∈ V has the ability totransmit a “signal set” Sij(tk, tk) ⊆ Kj(tk) that will be received at some node i ∈ V attime tk ∈ N.• (A4): At any time tk > 0, the knowledge set Ki(tk) of each node i ∈ V satisfies Ki(tk) ⊇{i, n, sˆi(tk), Xi(tk)}.Given (A1)-(A4), we define a “consensus protocol” (P) in terms of its “knowledge set updatingrule” fPK {·} together with a “signal specification rule” fPS {·}. The knowledge set updating rulefPK {·} defines the effect that a set of signals has on the knowledge set Ki(tk) at the receiving nodei. The signal specification rule fPS {·} specifies the elements contained in the signal Sij(tk, tk)given as a function of the knowledge set Kj(tk) at the transmitting node j.The Consensus Protocol (P) under (A1)-(A4):Knowledge Set Updating Rule: fPK : Ki(tk)⋃Sij(tk, tk) 7−→ Ki(tk+1) (3.3)Signal Specification Rule: fPS : Kj(tk) 7−→ Sij(tk, tk)(3.4)The consensus problem ACP1 is now re-stated in terms (B), (A1)-(A7) and (3.3)− (3.4).183.3. ACP1: Algorithm Description and Relation to Past LiteratureDefinition 3.2.3 (ACP1) Under (B) and (A1)-(A7), a consensus protocol (P) (cf. (3.3)−(3.4)) solves the average-consensus problem (ACP1) if the consensus estimate sˆi(tk) of eachnode weakly-converges to p¯i(θ(tk))=∑i∈V1npii(θ(tk))for all tk > 0 as µ→ 0.To solve ACP1 (cf. Def.3.2.3) we will propose a consensus protocol that satisfies (3.3)−(3.4)under the communication assumptions (A1)-(A7). We define this protocol in Sec.3.4.1, andpresent the formal convergence results in Sec.3.4.2. Before doing so, we review the past literaturerelated to the results presented in this chapter.3.3 ACP1: Algorithm Description and Relation to PastLiteratureThe coordination of a consensus within a multi-agent system is a problem that has been con-sidered in a variety of general settings and solved under a wide-range of assumptions regardingthe inter-agent communication capabilities. Algorithms that result in “consensus” have thusin general taken on a variety of forms, for some examples see [4, 22, 38–48, 50]. Here we willfocus on a specific consensus algorithm under which a network consensus is obtained by linearaveraging the local data held at neighboring sensor nodes.Linear averaging has been a particular research focal point regarding consensus forma-tion [19, 24, 51, 52]. In these and many other works on consensus, the multi-agent system isdefined as a collection of sensors with no extensive memory base, hence the linear averagingscheme is analyzed with respect to a group of distributed agents. This particular networkparadigm and consensus mechanism will be together referred to here as distributed linear aver-aging (DLA). Under a DLA algorithm, every sensor has a direct linear effect on the estimates ofeach of its neighbors. Thus the DLA algorithm is scalable and can be robust under a variety ofnetwork communication conditions [16,18,25,53]. This is in contrast to consensus protocols forspecialized data fusion problems that have complex algorithmic forms that cannot be simplifiedto mere averaging [19,24,39,40,48].In precise terms, the DLA consensus protocol is an iterative procedure wherein each sensornode linearly averages the information it holds with the information it receives from othersensors whom with it directly communicates. This computation is performed unanimously byall sensors, thus DLA requires minimum data storage, labeling of data, or specialization ofindividual sensors, i.e. use of base nodes. For a network of n sensors, the DLA protocol canbe parameterized by a weighted digraph G = {V, E ,W} where V = {1, . . . , n} denotes the set193.3. ACP1: Algorithm Description and Relation to Past Literatureof sensor nodes, E ⊆ V × V is a set of directed edges specifying which sensors are connected(i.e. can transmit or receive information), and the weight matrix W ∈ Rn×n that denotes theweights with which neighboring sensors average their held information. We will by definitionlet the (i, j)th element of W equal zero if (i, j) /∈ E .Distributed Consensus-Tracking Algorithm. In this chapter, a discrete-time DLAalgorithm is assumed to operate together with a linear stochastic approximation (SA) trackingalgorithm based locally at each sensor i ∈ V. Each SA tracking algorithm aims to estimate thestationary distribution pii ∈ RS of a fast S-state ergodic Xi ∈ RS that is observed uniquely bysensor node i. Motivated by applications in adaptive tracking in sensor networks, we assumethe transition matrix of each Markov chain {Xi : i ∈ V} jump changes slowly with time.More specifically, we assume that the transition matrix of each Markov chain {Xi : i ∈ V} isconditioned on a hyper-parameter θ taking values in a finite set M . The dynamics of θ aredetermined by a slowly changing m-state Markov chain with state-space M = {θ1, . . . , θm} andtransition matrix P ε = I + εQ, where ε > 0 is a small parameter and Q = (qij) ∈ Rm×m is agenerator of a continuous-time Markov chain. Since ε is small, θ is a slowly varying process andwill jump infrequently among different states relative to the state of each Markov chain Xi(k),i ∈ V.For each θ ∈ M and i ∈ V, Xik is an S-state Markov chain with a transition matrixmodulated by the slow Markov chain θk, thus we describe the sequence {θk, Xik : i ∈ V, k ∈ N}as a switched Markov system. As a result, each stationary distribution pii(θ) is piece-wise fixedbut with a jump change at every time switch in θk. When θk switches its value from one stateto another within M, the stationary distribution pii(θk) switches accordingly, thus it followsthat the sensor network must track these time-varying distributions.We next give an intuitive explanation of consensus formation. Given the averaging weightsW and graph edge set E of the sensor network described above, each node i ∈ V computes viathe distributed consensus tracking algorithm (the iteration (3.16) below) a state-value, denotedsˆi(tk) ∈ RS×1.We will prove that for a suitably scaled continuous-time, the state-value sˆi(t) convergesweakly to a linear combination p˜ii(θ(t)) =∑nl=1 ψilpil(θ(t)) ∈ RS of the stationary distributions{pii(θ(t)) : i ∈ V}. Here the coefficients ψil are determined by the choice ofW and E . Consensusformation deals with the assumptions on the information exchange between sensors requiredto obtain a specific structure of p˜ii(θ(t)) in the above expression. The sensor network achievesa consensus if for any given l ∈ V, ψil = ψjl for all i, j ∈ V. The average-consensus occurs ifψil =1n for all i, l. Due to the dependence of pii on θ, the values of p˜ii(θ) stochastically switchamong a finite set of values, each corresponding to a particular θ ∈M. The average-consensusin this setting is thus a random process. To supplement the weak convergence results, we203.3. ACP1: Algorithm Description and Relation to Past Literaturederive conditions on the averaging weights required for each sensor to obtain the stochasticaverage-consensus p¯i(θ(t)).Assuming both the DLA and local tracking algorithms use a constant step-size µ at eachdiscrete-time iteration k ∈ N, it will be seen that in order for all sensors to track the averagedistribution p¯i(θ) each sensor should distributively average both its observed signal, denotedXik, and its state-value sˆik. Define the observation exchange graph Go = {V, Eo,Wo} and state-value exchange graph Gv = {V, Ev,Wv} that respectively determine how each type of data,Xi (observation) and sˆi (state), is distributively averaged. The distributed consensus-trackingalgorithm we then consider is as follows:sˆk+1 = (I − µDv + µWv − µDo)sˆk + µWoXk = (I − µH)sˆk + µWoXksˆ(0) = X(0) , Dp = diag(Wp1) p ∈ {o, v} , H = Do +Dv −Wv,(3.5)where sˆk = [sˆ1k, . . . , sˆnk ]′ ∈ RSn×1, Xk = [X1k , . . . , Xnk ]′ ∈ RSn×1, the term 1 denotes a columnvector of ones with appropriate dimension, and we let diag(r) be the d×d diagonal matrix withelements corresponding to those of the vector r ∈ Rd. We will henceforth let each element of theweight matrices {Wv,Wo} represent an S×S identity matrix scaled by the respective element,that is we let W = W⊗IS×S , where⊗denotes the Kronecker product [54]. As a final note,throughout this chapter we let subscript k indicate dependence on a discrete-time in N (e.g.,θk), whereas dependence on continuous-time t ≥ 0 will be denoted (t) (e.g. θ(t)). The singleexception to this rule is the initial data, which is always denoted (0) for both discrete-time andcontinuous-time systems.The above algorithm can be viewed as comprising two DLA protocols together with a familyof local stochastic approximations; each DLA protocol averages with the local state-value eitherwith the state value at each neighboring node (as determined by Gv), or the observations ateach neighboring node (as determined by Go). For a sufficiently small step-size µ, the iteration(3.16) implies that at each discrete time k ∈ N, the state-value sˆik computed at any sensor i ∈ Vis updated linearly in three ways as follows (the superscripts * and ** below are intermediatevariables for illustration only and will not be used subsequently):1. by its own local observation Xik,(sˆik)∗ = sˆik + µWoii(Xik − sˆik), (3.6)2. by the local observation Xjk of each neighboring sensor j ∈ {j : (i, j) ∈ Eo} ,(sˆik)∗∗ = (sˆik)∗ + µ∑j 6=iWoij(Xjk − (sˆik)∗),213.3. ACP1: Algorithm Description and Relation to Past Literature3. by the state-value sjk of each neighboring sensor j ∈ {j : (i, j) ∈ Ev} ,sˆik+1 = (sˆik)∗∗ + µ∑j 6=iWvij(sˆjk − (sˆik)∗∗).As µ vanishes these three nested steps may be written as the single iteration (3.16) because inthis limit all second and third order terms in µ become negligible compared to the first orderterms that are retained in (3.16). We also note that due to the similarity between steps 1 and2, each row i ∈ V of the exchange graph Go may either specify which sensors communicatetheir observed information Xj to sensor i, or equivalently which Markov chains Xj sensor ican actually observe, assuming that the same SA algorithm (3.6) with update weights Woij areapplied to each Markov chain Xj .Context. The tracking algorithm (3.16) was first introduced in [8] as the distributedconsensus filter. However, in [8] it is assumed (i) Wo = Wv + I, (ii) the diagonal of Wv iszero, and (iii) each non-zero averaging weight is unity. Furthermore, the observation variables{X1k , . . . , Xnk } are reduced to a single time-varying scalar observed in i.i.d. Gaussian noise byall sensors. In contrast to [8], recall that we assume a Markov-modulated dynamic model whereeach sensor i ∈ V observes the Markov chain Xik. Our observation model involves a two-time-scale formulation with parameters µ and ε. The step-size µ is used in the recursive trackingand averaging algorithms, whereas the step-size ε represents the transition rate of the Markovchain θ.To analyze the continuous-time limit of (3.16), it is assumed in [8] that the observed scalarhas a uniformly bounded derivative as the step-size µ approaches zero. Here, we assumeε = O(µ), indicating that (3.16) has a tracking rate on the same time-scale as the Markovmodulating process. Our main result shows that under suitable conditions on the networkexchange graphs {Gv,Go}, the assumption ε = O(µ) implies the limit of a piece-wise constantcontinuous-time interpolation of the sequence of each sensors estimates sˆi(tk) converges weaklyto the solution of a Markov modulated ODE as µ→ 0. Considering also the sequence of sensortracking errors, we demonstrate under similar conditions that an interpolation of the normal-ized errors converges weakly to a switching diffusion. Under additional weight constraints itthen follows as a special case of the above results, that the estimates sˆi(tk) converge weakly tothe average-consensus p¯i(θ).Within this general framework we also address two modifications of the consensus-trackingalgorithm (3.16). One modification assumes that rather than a single chain Xi each sensor iobserves a family of Markov chains Xik ∈ Xik, thus implying the sensor state-values convergeweakly to solutions of a switched differential inclusion. The second modification considerssensor estimation of the cumulative distribution function (CDF) Πi(θ(t)), i ∈ V, θ ∈ M =223.3. ACP1: Algorithm Description and Relation to Past Literature{θ1, . . . , θm} by using empirical measures. In this case the sensor scaled tracking error convergesweakly to a switched Brownian bridge. We deal with each of these modifications separately inSec.3.4.2 and Sec.3.4.3.Related Work. We briefly review here some of the related literature on consensus forma-tion. The distributed consensus-tracking algorithm (3.16) can be viewed as the combinationand extension of two sub-algorithms: a local sensor adaptive SA tracking algorithm,sik+1 = sik + µ(Xik − sik), si(0) = Xi(0), 0 < µ 1 1operating in parallel with the (DLA) or so-called “Laplacian” consensus algorithm,sk+1 = (I − µL)sk, L = Dv −Wv. (3.7)The consensus algorithm (3.7) has been explored in works such as [16, 18] regarding its abilityto achieve an average-consensus on node initial state-values s(0) = [s1(0), . . . , sn(0)], that iss¯(0) = 1n∑ni=1 si(0). It is clear the average-consensus s¯(0) is constant at all times, thus thealgorithm (3.7) by itself can result in a static consensus that is not driven by some externalparameters, such as the observed Markov chains Xik considered here (see also [51] and referencestherein).As proposed in [8], we extend the Laplacian consensus algorithm to include not only aver-aging of the sensor state-values si(0), but also of the sensor observations Xik. This extensioncan be very useful for DLA to ensure average-consensus when considered within the generalityof the current setting, specifically when each sensor observes and estimates a unique parameter.Unlike certain works on distributed averaging such as [8, 24], we allow {Gv,Go} to be bothdirected and weighted, such as those considered for instance in [18, 55, 56]. By directed it ismeant (i, j) ∈ E ; (j, i) ∈ E , and by weighted it is meantW need not have each element belongto the set {0, w} for some fixed w ∈ (0, 1). We also note that Wij 6= 0 implies (j, i) ∈ E , andthus knowledge of {Wv,Wo} is sufficient to completely define {Gv,Go}.The Laplacian consensus algorithm (3.7) derives its name due to its asymptotic equivalencewith the gradient system s˙(t) = −∇ΦG(s(0)) associated with the Laplacian potentialΦG(s(0)) =12n∑i,j=1Wvij(sj(0)− si(0))2 =12s(0)′Ls(0),where L = diag(Wv1) − Wv is defined as the Laplacian matrix of an undirected graph G ={V, Ev,Wv} and each node i ∈ V has an initial state-value si(0) ∈ R [18]. Many works [19, 24,25,57,58] consider the Laplacian consensus dynamics (3.7) in isolation from any tracking model,233.3. ACP1: Algorithm Description and Relation to Past Literaturesuch works are thus concerned with static, rather than time-varying, consensus formation. Thegeneral theme addressed in these works regard the network graph conditions under which (3.7)solves the static average-consensus problem, that is, having all sensors reach s¯(0) given thatEv has time-varying or stochastic properties. These concerns are closely related to the issuesof consensus under time-delayed or asynchronous communication within sensor networks andother multi-agent systems, we refer the reader to works such as [18, 26, 43, 56, 57, 59–62] fordetailed consideration.In contrast, [17] assumes the communication edge set Ev is fixed, undirected, and strongly-connected (that is, each node can be reached from all other nodes by traversing the edges inEv). Under these conditions it is shown in [17] that the matrix W = I − µL satisfying,minimize ‖W − 1n11′ ‖2subject to W ∈ Ev, W1 = 1, 1′W = 1′,(3.8)will imply the maximum asymptotic per-step convergence factor rstep(W ) of the sensor state-value estimates sk to the average-consensus value s¯(0), whererstep(W ) = supsk 6=s¯(0)‖ sk+1 − s¯(0) ‖2‖ sk − s¯(0) ‖2.In (3.8), the operation ||·||2 indicates spectral norm, thus (3.8) is a convex optimization problemand in [17] cast as a semi-definite program. The relation of [17] to the present work is emphasizedsince the same constraints of (3.8) are sufficient to ensure sensor estimation of the average-consensus p¯i(θ), and thus the weights W that solve (3.8) may also be argued as the optimalweights for fast convergence to p¯i(θ).The consensus-tracking framework of our paper is also similar to recent works of [7,8,61,63].These papers deploy (3.7) as part of a distributed Kalman filter or in conjunction with localsensor estimation of a time-varying parameter, thus these works also deal with a time-varyinginput consensus value. For example, assuming that each sensor node i observes in continuous-time the m-dimensional signal data pair {zi(t), z˙i(t)}, the following result is proven in [7]: forany sequence of weighted matrices Wi(t) the dynamical system,x˙i = W−1i (xj − xi) + z˙i +W−1i W˙i(zi − xi) , xi(0) = zi(0), (3.9)ensures that every sensor estimate xi tracks the weighted-average consensuslimt→∞xi(t) = limt→∞(n∑i=1Wi(t))−1(Wi(t) zi(t)), i = 1, . . . , n,243.4. ACP1: Algorithmic Solutionswith zero steady-state error, provided the Laplace transforms of both zi(t) and Wi(t) eachhave all poles in the left-hand plane with at most one pole at zero. In the switched Markovobservation model considered here, we do not assume knowledge of z˙i(t) since this would implydirect observation of the parameter {θ(tk)}. In our framework each sensor observes θ onlyindirectly through changes in the approximated stationary distribution of the observed Markovchains Xik, thus an algorithm such as (3.9) cannot be applied to the observation model presumedin [7].In light of extensive research conducted regarding multi-agent coordination, there are severalpractical applications of the DLA algorithm in sensor networks, for instance the synchronizationof node clocks [64] or distributed load balancing [65]. The problem of synchronizing coupledoscillators by means of DLA was discussed in [33], whereas in [66] the authors consider DLA asa mechanism to ensure a team of unmanned air vehicles (UAV’s) will approach a steady-statewherein each UAV surveys an equal portion of a one-dimensional perimeter. Conversely [58]models a set of mobile agents in the plane and develops a distributed control law for stableflocking behavior. Similarly [23] applies the decentralized network paradigm of coupled agentsto explain the consensus behavior of a group of self-driven particles during phase transition,as was previously considered in [67]. In this chapter we will generalize these average-consensusobservation models to incorporate (i) input parameters observed in noise that, (ii) jump-changeaccording to a slowly-switching Markov process.3.4 ACP1: Algorithmic SolutionsThis section will describe a consensus protocol that solves ACP1. The protocol involves a localstochastic approximation algorithm that runs in parallel with a distributed linear averagingprotocol. In Sec.3.4.1 we define the specific consensus protocol that will be analyzed, and inSec.3.4.2 we formally state the convergence results for this protocol. In Sec.3.4.4 we discussthe communication graph weight conditions regarding the average-consensus formation, andnumerical examples are presented in Sec.3.5. Lastly, we provide conclusions and directions offuture work in Sec.3.6.3.4.1 ACP1: Stochastic Approximation and Consensus ProtocolWe begin by considering the assumption that each node i ∈ V maintains a local stochasticapproximation algorithm in relation to the input Xik ∈ RS×1, that is,s˜i(k + 1) = (1− µ)s˜i(k) + µXik , s˜i0 = Xi0 . (3.10)253.4. ACP1: Algorithmic SolutionsThe knowledge set Ki(t) of each node is then assumed to satisfy the following modification to(A4),• (A4’): At any time tk > 0, the knowledge set Ki(tk) of each node i ∈ V satisfies Ki(tk) ⊇{i, n, sˆi(tk), s˜i(tk), Xi(tk)}.Next we define the consensus protocol that will be analyzed in Sec.3.4.2. The protocol mustconform to the regime-modulated communication sequence defined by (A5)-(A7) and thus it ischaracterized by a set of synchronous, directed, instantaneous signals. We propose a “dis-tributed linear averaging” (DLA) scheme in which each node’s consensus estimate sˆi(t) isupdated to a weighted average of its current value and the current values of the consensusestimates held at each node from which it receives a signal. In terms of the consensus protocol(3.3)− (3.4) we then have the specification:Consensus Protocol: Distributed Linear Averaging (DLA)Signal Specification: Sij(tk, tk) = Kj(tk)\{j, n, s˜j(tk+1)} , (i, j) ∈ V2(3.11)Knowledge Set Update: Ki(tk+1) = {i, n, sˆi(tk+1), s˜i(tk+1), Xi(tk+1)}(3.12)Stochastic Approximation Update: s˜i(tk+1) = (1− µ)s˜i(tk) + µXi(tk)(3.13)Consensus Estimate Update: sˆi(tk+1) =(1 − µ(wsij(tk) + woij(tk)))sˆi(tk)+µwsij(tk)sˆj(tk) + µwoij(tk)Xj(tk)(3.14)Estimate Initialization: sˆi(0) = Xi(0) , s˜i(0) = Xi(0) . (3.15)The coefficients wsij(tk) and woij(tk) are the averaging weights with which the DLA protocolrespectively assigns to the consensus estimate sˆj(tk) and observation Xj(tk) contained in thereceived signal Sij(tk, tk).Due to the instantaneous and synchronous nature of the communication sequence, we canwrite the consensus estimate (3.14) update more compactly via the linear equation,sˆ(tk+1) =(ISn − µ(Ds(tk) + Do(tk)−Ws(tk)))sˆ(tk) + µWo(tk)X(tk) (3.16)where sˆ(tk+1) = [sˆ1(tk), . . . , sˆn(tk)] ∈ R(Sn)×1, X(tk) = [X1(tk), . . . , Xn(tk)]⊗IS , W o(tk) =W o(tk)⊗IS , W s(tk) = W s(tk)⊗IS , W o(tk) = [woij(tk)], Ws(tk) = [wsij(tk)],⊗denotes the263.4. ACP1: Algorithmic SolutionsKronecker product, Do(tk) = W o(tk)1Sn, Ds(tk) = W s(tk)1Sn, 1` denotes an `× 1 vector ofunit valued elements, and I` is the ` × ` identity matrix. Letting H(tk) =(ISn − µ(Ds(tk) +Do(tk)−W s(tk)))we can then re-write (3.16) as,sˆ(tk+1) = H(tk)sˆ(tk) + µWo(tk)X(tk) . (3.17)In Sec.3.4.2 we discuss the coefficient weight conditions necessary and sufficient for average-consensus to be obtained under the DLA protocol (3.11)− (3.15) or equivalently (3.17).3.4.2 ACP1: Convergence ResultsThis section states the weak-convergence results obtained for the DLA protocol. In additionto the convergence of the iteration (3.17), we also characterize the convergence of an estimatefor the cumulative distribution of p¯i(tk), as well as the scaled tracking error and set-valuedswitching ODE.To proceed, we pose the following conditions (Condition (B) is re-stated here for the reader’sconvenience).(B) 1. For each θ ∈ M , the transition matrix Ai(θ) is irreducible and aperiodic for eachi ∈ V.2. Parameterize the transition probability matrix of θ asP ε = I + εQ,where ε is a small parameter satisfying 0 < ε 1 and Q is the generator of acontinuous-time finite-state Markov chain.3. The process θk is slow in the sense ε = O(µ). For simplicity, we take ε = µ henceforth.(C) All eigenvalues of the matrix H = (L+Do) have positive real parts, or non-negative realparts. We denote these exchange graph conditions as 0 ≺ H and 0 4 H respectively.Condition (B) specifies our observation model as a two-time-scale Markov system. Condition(C) is a constraint on the network exchange graphs {Go,Gv} and ensures bounded stability of(3.16) in the limit as µ→ 0. We note that 0 ≺ H implies −H is a Hurwitz matrix.Theorem 3.4.1 Assume conditions (B)-(C) and suppose s(0) is independent of µ. Define thecontinuous-time interpolated sequences of iteratessµ(t) = sk, θµ(t) = θk for t ∈ [kµ, (k + 1)µ).273.4. ACP1: Algorithmic SolutionsThen as µ→ 0, (sµ(·), θµ(·)) converges weakly to (s(·), θ(·)) such that θ(·) is a continuous-timeMarkov chain with generator Q and s(·) satisfiesds(t)dt= −Hs(t) +Wopi(θ(t)), t ≥ 0, s(0) = X(0). (3.18)Note that if s(0) = sµ(0), we require sµ(0) converges to s(0) weakly. However, for simplicity,we choose s(0) to be independent of µ. The above theorem implies the sensor iterates resultingfrom (3.16) converge weakly to a Markov switched ODE. This is in contrast to the standardanalysis of stochastic approximation algorithms [68] where the limiting process is a determin-istic ODE. The proof of the theorem uses the martingale problem formulation of Stroock andVaradhan, see also [49]. If a consensus is obtained by all sensors, that is sˆi(t) = sˆj(t) ∀ i, j ∈ V,then this consensus is a stochastic process dictated by θ. Again this is in contrast to worksconcerned with a static consensus formation (i.e. [16, 18, 19, 24, 25]), as well as others whereinthe average-consensus estimate is a linear combination of time-varying signals unanimouslyobserved by all sensors [8] or with observed rates of change [7]. In comparison, the consen-sus estimate we obtain is an average of piece-wise fixed finite-state Markov chain stationarydistributions, where it may be assumed each Markov chain is observed by only one sensor.Long-Time Horizon. Theorem 3.4.1 states a convergence result for small µ and largek such that µk remains bounded. However, if the consensus-tracking algorithm (3.16) is inoperation for a long time, we would like to establish its behavior for a large-time horizon. Wethus next consider the case when µ is small and k is large such that µk →∞. This is essentiallya stability result corresponding to the limit of the switching ODE (3.18) as t → ∞. For thisresult we assume that Q is irreducible or equivalently, the associated continuous-time Markovchain θ(t) is irreducible.Recall that the irreducibility means that the system of equations,ν ′Q = 01′mν = 1,(3.19)has a unique solution satisfying νj > 0 for all j = 1, . . . ,m. The vector ν = (ν1, . . . , νm)′ ∈ Rmis then the stationary distribution of θ . Since θ is a finite-state Markov chain, we note that if0 ≺ H thenm∑i=1∫ T0exp(−H(T − u))Wopi(θi)I{θ(u)=θi}duconverges w.p.1 as T → ∞ due to the exponential dominance of the term exp(−H(t − u)). A283.4. ACP1: Algorithmic Solutionscloser scrutiny permits an expression of this limit in closed form. Denotes∗ = H−1m∑i=1Wopi(θi)νi.Theorem 3.4.2 Assume conditions of Thm.3.4.1 with the modifications that Q is irreducibleand −H is Hurwitz. Then for any sequence {tµ} satisfying tµ → ∞ as µ → 0, sµ(· + tµ)converges weakly to s∗. Moreover, for any 0 < T < ∞, sup|t|≤T |sµ(t + tµ) − s∗| → 0 inprobability.In Theorem 3.4.2, the generator Q is assumed irreducible. We may also consider the casewhen Q has an absorbing state and thus the remaining m− 1 states are transient. This impliesthere is a zero row in Q. In this case, although the Markov chain is not irreducible, the limitprobability distribution still exists. This distribution is a vector with one component equal tounity (corresponding to the absorbing state), and the remaining components equal to zero. Asa result, the techniques used in Thm.3.4.2 can still be applied. We state the following result,but omit its proof for brevity.Theorem 3.4.3 Assume the conditions of Thm.3.4.2 with the modification that the Markovchain θ has an absorbing state. Without loss of generality, denote this state by θ1. Then forany sequence {tµ} satisfying tµ →∞ as µ→ 0, sµ(·+ tµ) converges weakly to sa, wheresa = H−1Wopi(θ1).Moreover, for any 0 < T <∞, sup|t|≤T |sµ(t+ tµ)− sa| → 0 in probability.Set-Valued Consensus Formation. In Theorem 3.4.1, it is assumed that for each θ ,the observed Markov chains {Xik : i ∈ V} will each have a respective transition matrix Ai(θ)that is irreducible and aperiodic. This is an ergodicity condition that may be generalized tothe case when the observed stationary distributions are not unique, but instead each belongto a set Ai(θ) of transition matrices with corresponding set of stationary distributions Γipi(θ).By the same DLA algorithm (3.16) then each sensor can reach a set-valued average-consensusregarding the collection of sets Γ(θ) = [Γ1pi(θ), . . . ,Γnpi(θ)]. As each observed Markov chain Xikwill have a stationary distribution belonging to exactly one element in the set Γipi(θ), then under(3.16) the collection of values that the estimate si will converge weakly to is given by,P i(θ) =⋃pij∈Γjpi(θ){n∑j=1ψijpij(θ)} (3.20)293.4. ACP1: Algorithmic Solutionswhere Λ = (ψij) is a constant matrix. The set-valued average-consensus P¯(θ) may likewise beexpressed,P¯(θ) =⋃pii∈Γipi(θ){1nn∑i=1pii(θ)}. (3.21)Due to the linearity of (3.16), the assumption of set-valued transition matrices Ai(θ) isin fact equivalent to considering set-valued exchange graphs {Gv,Go}, since different exchangegraphs result in each sensors estimation of a different linear combination of the time-varyingsingletons, {pi1(θk), . . . , pin(θk)}. Although some of these combinations may imply the formationof a consensus (i.e. limt→∞ si(t) = sj(t) for all i, j ∈ V and each θ ∈M) this is certainly not ingeneral true, and thus again leads to consideration of consensus formation not to a fixed point,but rather to a set such as (3.20) or (3.21). We next consider the case of set-valued transitionmatrices Ai(θ) and demonstrate that, in the limit as µ vanishes, the switching ODE (3.18) isreplaced by a switching differential inclusion.Theorem 3.4.4 Assume the conditions of Thm.3.4.1 with the following modification. In lieu ofthe irreducibility and aperiodicity assumption of A(θ) in condition (B), assume for each θ ∈Mthere is an invariant measure pi ∈ Γpi(θ) such thatdist( 1nn+`−1∑k=`E`Xk(θ),Γpi(θ))→ 0 in probability as n→∞,where dist(x,B) = infy∈B |x− y| is the usual distance function. The conclusion of Thm.3.4.1 isthen modified as (sµ(·), θµ(·)) converges weakly to (s(·), θ(·)) such thatds(t)dt∈ −Hs(t) +WoΓpi(θ(t)). (3.22)We note the conditional set of estimates reached by each sensor may be viewed as the resultof uncertainty regarding the observed Markov transition matrices, or in other words uncertaintyin the signal dynamics. Equivalently, the conditional set of estimates can be seen as the resultof uncertainty in the network exchange graphs, and hence the use of set-valued parametersrather than singletons.3.4.3 Scaled Tracking Error and Cumulative Distribution Function (CDF)EstimationSince the network consensus converges to a stochastic process, it is important to examine theasymptotic convergence rate of the algorithm (3.16). We proceed here with the study of the303.4. ACP1: Algorithmic Solutionsscaled tracking errors of the sensor state-values sk. Then we consider consensus formation whensensors estimate the empirical CDF instead of the probability mass function.We have shown that (3.18) implies the sensor estimates s(t) will converge weakly to astochastically switching steady-state Λpi(θ(t)) for some fixed equilibrium matrix Λ ∈ RSn×Sn.We now define the scaled tracking errorvk =sk − ΛE(pi(θk))√µ(3.23)where sk is the vector of state-value estimates obtained from (3.16).Next, assume that there exists a constant kµ > 0 sufficiently large such that {vk : k ≥ kµ}is tight, which implies that the normalized error process scaled by 1/√µ does not blow upas k → ∞ and µ → 0. Let us explain this assumption. The requirement k ≥ kµ is owingto the effect of initial conditions; the sequence of scaled errors needs to have some time tosettle down. Recall that {vk : k ≥ kµ} is tight, if for any δ > 0, there is a Kδ such that forall n ≥ max(kµ,Kδ), P (|vk| ≥ Kδ) ≤ δ. To verify this condition, by virtue of Tchebeshev’sinequality, it suffices to have a suitable Liapunov function U˜(·) such that EU˜(vk) is bounded. Ifthe random noise is uncorrelated, this boundedness can be obtained easily. If correlated noise isinvolved, tightness is established using the perturbed Liapunov function approach [49] (see also[68, Chapter 10] for detailed discussion and references). In this approach, small perturbationsare added to the Liapunov function to deal with the correlation, and the perturbations areconstructed so as to result in the needed cancellations. We omit the verbatim proof and referthe reader to the aforementioned references for further details.Theorem 3.4.5 Assume the conditions of Theorem 3.4.1. Then the interpolated sequence ofiterates vµ(·) defined by vµ(t) = vk, for t ∈ [(k − kµ)µ, ((k + 1) − kµ)µ) converges weakly to asolution v(·) of the switching diffusiondv(t) = −Hv(t)dt+ (WoΣ(θ(t))Wo ′)1/2dw, (3.24)where w(·) is a standard Brownian motion and for a fixed θ , Σ(θ) is the covariance given byΣ(θ) = limn→∞1nEn+`−1∑k=`n+`−1∑j=`(Xk(θ)− EXk(θ))(Xj(θ)− EXj(θ))′. (3.25)From (3.24) it is clear the sensor averaging weights Wo have a direct effect on the sensorstracking error, in particular we see that any scaling of the averaging weights Wo implies thesame scaling of the sensor diffusion process.313.4. ACP1: Algorithmic SolutionsAverage-Consensus on the Cumulative Distribution Function (CDF). So far wehave assumed each sensor j ∈ V observes the state of a fast Markov chain Xj and tracks by(3.6) the associated stationary probability mass function pij(θ(t)). We now consider, for a givenj ∈ V, the approximation of the CDF of Xjk by means of empirical measures. For each θ ∈M ,the CDF associated with Xj is denoted by Πj(θ, x) for any x ∈ RS . For any j ∈ V, 0 < T <∞,and any x ∈ RS , define the empirical measureηk =1kk−1∑k1=0I{Xjk1≤x}, 0 ≤ k ≤ T/ε.Note that y ≤ x with x = (xι) ∈ RS and y = (yι) ∈ RS is understood to hold component-wise(i.e., yι ≤ xι for ι = 1, . . . , S). The sequence ηk may be written recursively asηk+1 = ηk −1k + 1ηk +1k + 1I{Xjk≤x}.So for sufficiently large k, the empirical CDF can be estimated asηk+1 = ηk + µI{Xjk≤x}, (3.26)for arbitrarily small µ > 0. Define the continuous-time interpolated process ηµ(t) = ηk fort ∈ [µk, µk + µ). In the above, for simplicity, we have suppressed the j-dependence in both ηkand ηµ·.The following theorem is analogous to Theorem 3.4.1 but deals with average-consensus ofthe empirical CDF.Theorem 3.4.6 Under condition (B), (ηµ(·), θµ(·)) converges weakly to (η(·), θ(·)) such thatη(·) satisfies the switching ordinary differential equationη˙ = pij(θ(t))Πj(θ(t), x). (3.27)Remark 3.4.7 In analogy to Theorem 3.4.5 (which dealt with the probability mass function),we now comment on the scaled tracking error for average-consensus on the CDF when using(3.26). To proceed, defineξµ,j(t, x) =√µbt/µc−1∑k=0[I{Xjk≤x}−Πj(θk, x)].Using a combination of the techniques in the proof, Markov averaging and standard results incentered and scaled errors for empirical measures, we can show that ξµ,j(t, x) converges weakly323.4. ACP1: Algorithmic Solutionsto ξ(t, x), a switching Brownian bridge process. However, unlike the proof of Theorem 3.4.1,we can no longer use the martingale problem formulation. Nevertheless, we can use the basictechniques of weak convergence; see [69] for the related study on Brownian bridge processes.We illustrate in Sec.3.5 a sample path of the Brownian bridge resulting from the sensor scaledtracking error when a consensus regarding the average Π¯(θ(t)) = 1n∑ni=1 Πi(θ(t)) is obtainedvia (3.16).Compared to pi(θ(t)), estimation of Π(θ(t)) implies one less state-value of Xi for each i ∈ Vneed be transmitted between sensors, thus reducing by a factor of (1 − 1/S) any number ofphysical constraints on the sensor operations per communication link. If the number of statesS is near unity, then estimation of Π(θ(t)) may result in a significant reduction of the resourcesrequired for sensor communication and thus consensus. The results of estimating Π(θ(t)) ascompared to pi(θ(t)) are illustrated in Sec.3.5, where it is clear estimation of the distributionfunction also results in greater scaled tracking error among the mid-states of S than the trackingerror among extremum states.3.4.4 Average-Consensus Exchange Graph ConditionsThe continuous-time sensor estimates s(t) resulting from (3.16) was stated in Sec.3.4.2 to con-verge weakly to the solutions of (3.18) under the conditions (B) − (C). In particular, it wasassumed that {Gv,Go} are such that all eigenvalues of H = (L + Do) have either positive realparts or non-negative real parts. These basic constraints on the network exchange graphs aresufficient for the asymptotic convergence of (3.16), but are not sufficient to ensure each sensori ∈ V asymptotically obtains the average-consensus estimate p¯i(θ(t)). To proceed in this direc-tion we will assume the following property on Gv holds in addition to the conditions (B)− (C)stated in Sec.3.4.2.(D) The graph Gv is non-time-respecting strongly connected (ntrcs), thus for any two nodesi, j ∈ V, there exists a path (a sequence of directed edges in Ev) that starts at node i andends at j.The reason for condition (D) is to simplify the general weight conditions required for average-consensus. In fact, with regard to consensus formation, only when assuming this connectivitycondition on Gv may we assume a decentralized structure in Go. To see this, and as well motivatecondition (D), we note that if Ev were not ntrcs then there would exist a non-empty set ofordered pairs V\SC = {(i1, j1) . . . , (ih, jh)} for each element of which Ev contains no path. Tothen reach consensus by DLA would require (il, jl) ∈ Eo for each (il, jl) ∈ V\SC , thus implyinga non-scalable and “centralized” structure in Go. Similarly, to ensure a network agreement each333.4. ACP1: Algorithmic Solutionsnode contained in an element of V\SC would require an individual treatment. Although thisis possible, we assume Gv is ntrcs to avoid any centralization assumptions on Go and as wellavoid a more detailed scrutiny that if necessary may be left as future research.It is emphasized that for generality we seek to reach the average-consensus p¯i(θ(t)) underthe condition that in each regime the observed stationary distributions {pi1(θ(t)), . . . , pin(θ(t))}are distinct, that ispii(θ(t)) 6= pij(θ(t)) for all i, j ∈ V and any θ(t) ∈M. (3.28)If it were specifically known that two sensors may observe the same stationary distribution, i.e.pii(θ∗) = pij(θ∗) for some i, j ∈ V and θ∗ ∈M , then the average-consensus requirements statedin the lemma below could be relaxed conditional on the fact θ(t) = θ∗. We consider the mostgeneral case and do not make any assumptions regarding equality among the pii(θ), in otherwords we assume (3.28) holds.Lemma 3.4.8 Assume conditions (A)− (D), then each sensor estimate based on the discrete-time algorithm (3.16) will in the limit as µ→ 0 possess the steady-state p¯i(θ(t)) conditional onθ(t) if and only if one of the following two conditions hold,• Condition (1) : 0 4 (L+Do), L is balanced, and Wo = α(L+Do), where α ∈ R .• Condition (2) : 0 ≺ (L+Do), and the following equation holds,(L+Do)−1Wo = Λ (3.29)for any matrix Λ with the form,Λ = I + β(11′ − nI), β = diag(β1IS×S , . . . , βnIS×S) ∈ RSn×Sn (3.30)where β is non-singular.We here clarify that under condition (C), if 0 4 (L + Do) then we have Do = 0 and thematrix H = L + Do will have only one eigenvalue with a real part equal to zero, with thecorresponding eigenvector equal to 1. To see this requires only noting that Do is diagonal andLv = 0 if and only if v = 1 when Gv is ntrcs [18], the result that Hv = 0 if and only if v = 1follows immediately, as does the requirement Do = 0. Thus under conditions (A)− (C) we haveboth Wo = αL and Eo = Ev. Despite this simplification, we may still write Wo = α(L + Do)since it technically holds and also maintains a clear notational consistency between conditions(1) and (2).343.4. ACP1: Algorithmic SolutionsThe consensus-tracking algorithm (3.16) requires either 0 4 (L + Do) or 0 ≺ (L + Do) inorder for the sensor estimates sk to remain bounded in the limit as µ vanishes and t increases.This is in parallel with the constraints 0 4 L and 0 ≺ L that are required for the static Laplacianconsensus algorithm (3.7) to remain bounded under the same asymptotic limits. Thus, unlikethe works of [17, 18, 24], which consider the static algorithm (3.7), we may permit L havenegative eigenvalues. Denoting the minimum of these λ, by then taking Do = |λ|I + V for anydiagonal matrix V ≥ 0, we have (L + Do) = U(J + |λ|I)U−1 + V = UJ∗U−1 + V = L˜ + D˜o,where now 0 4 L˜ = UJ∗U−1 and D˜o = V .Necessary Exchange Graph Edge SetsThe necessary and sufficient condition for average-consensus to be obtained by sensors under(3.16) and its sub-algorithms is that either one of the conditions stated in Lemma (3.4.8) holds.Each of these conditions specifically assume either 0 ≺ H or 0 4 H (recall H = L + Do). Inboth cases, the constraints posed on the exchange graphs {Go,Gv} are based on the requirementthat for average-consensus Λ must have, in each row, identical non-diagonal terms that do notequal zero. By then supplementing the estimates sˆ(t) with the local SA tracking estimatess˜(t) and normalizing by (nβ)−1, the average p¯i(θ(t)) is obtained under Condition (2). UnderCondition (1) a similar linear procedure results in the estimate p¯i(θ(t)) at all nodes.We now characterize the exchange graph edge sets {Eo, Ev} for which either of the conditionsstated in Lemma 3.4.8 are feasible. Without loss of generality we take S = 1, we also definethe neighborhood of sensor i as Evi = {j : (i, j) ∈ Ev} and the complementary edge set of Eas E¯ , that is (i, j) ∈ E¯ if and only if (i, j) /∈ E . We assume that no sensor receives informationdirectly from all other sensors, since otherwise we would have a centralized data fusion problemthat would render a distributed algorithm such as (3.16) unnecessary. As a consequence of thisassumption each row of either matrix {Wo,Wv} has at least one non-diagonal element equalto zero, thus there exists for each row i an element joi such that (i, joi ) /∈ Eo and an element jvisuch that (i, jvi ) /∈ Ev.Below we state 3 results related to the necessary and sufficient communication graph edgesets for an average-consensus formation.Lemma 3.4.9 If Eo = Ev then average-consensus is possible only if Condition (1) of Lemma(3.4.8) holds.Lemma 3.4.10 If Eo 6= Ev then average-consensus is possible only if Condition (2) of Lemma(3.4.8) holds and Eoi ⊇ E¯vi for at least one sensor i ∈ V.Corollary 3.4.11 If Eo ⊂ Ev then average-consensus is not possible.353.5. ACP1: Numerical ExamplesThe statement Eoi ⊇ E¯vi implies that every sensor that does not send sensor i state-value datamust send sensor i observation data. For sparse networks we might presume the cardinalityof Ev is |Ev| n, and thus when Eo 6= Ev by Lemma (3.4.10) an average-consensus requires|Eoi∗ | ≈ n for some i∗ ∈ V. We note this implies that either a large group of sensors h ⊂ Vsend observation data to sensor i∗, or equivalently that sensor i∗ can itself observe the Markovchains that are observed by a large group of other sensors in V.3.5 ACP1: Numerical ExamplesIn this section we will provide numerical examples to illustrate how the theorems in Sec.3.4.2- 3.4.3 result in the weak-convergence of each sensors estimate to the average-consensus distri-bution p¯i(θ(t)). We consider two communication networks that each satisfy one of the followingconditions,• (Model 1) Wo = −L, Wv is balanced, and {Gv,Go} are such that Ev = Eo ,• (Model 2) {Wo,Wv} satisfy {Gv,Go} and are such that Ev 6= Eo .In the first Model H 4 0, whereas in Model 2 H ≺ 0. In both cases we set the network sizeat n = 24 sensors, and consider S = 40 observed states. Thus each sensor i ∈ V will observeXi in one of 40 states upon each iteration. In our simulation θ takes on 6 different states. Foreach of these regimes, the stationary distribution pii(θ) of each Markov chain Xi was randomlygenerated from the uniform distribution, thus validating an a priori knowledge set assumption.The total number of communication links, that is the total number of elements in Eo and Ev,is fixed at 2n(n−1) = 1104. We also fix the sum of absolute averaging weights at approximatelythe constant 2755,|Ev|+ |Eo| = 2n(n− 1), 1′(|L|+ |Wo|)1 ≈ 2755.Neither of these constraints are necessary, they are set identical for either model to bettercompare the two different average-consensus graph conditions. We note that under condition(1) the minimum number of edges required for an average-consensus can be proven 2n, whereasunder condition (2) this number is always strictly greater than 2n.Setting µ = 10−9 for both models, the sample path of the sensor iterates s(t) under (3.16)is plotted as the unadjusted sensor estimates; see Fig.3.1 (each sensor estimate is plotted in adifferent color).The estimates s(t)(1) and s(t)(2) are plotted as the adjusted sensor estimates under Model 1and 2 respectively. We find that, in accordance with the adaptation rate λ2, Model 1 converges363.5. ACP1: Numerical Examples0 50 100 150 200 250 300 350 400−2−1.5−1−0.500.511.522.5 Model 1 ( H ≥ 0 )Time IterationsEstimation Values Sensor EstimatesUnadjusted Consensus Values0 50 100 150 200 250 300 350 400−0.500.511.52Time IterationsEstimation Values Sensor EstimatesAverage−Consensus Values0 50 100 150 200 250 300 350 400−30−20−10010203040 Model 2 ( H > 0 )Time IterationsEstimation Values Sensor EstimatesUnadjusted Consensus Values0 50 100 150 200 250 300 350 400−0.500.511.52Time IterationsEstimation Values Sensor EstimatesAverage−Consensus ValuesFigure 3.1: Sensor estimation of the average-consensus p¯i(θ(t)) under both Models 1 and 2. Forillustrative purposes we have, without loss of generality and for this figure only, set X(0) = 0and magnified the sensor observations of Xi by a factor proportional to the number of previousstates θ has occupied in the simulation.quicker to p¯i(θ) as compared to Model 2, see also Table 3.1. Also indicated by Table 3.1 isthat our simulations imply the ratio holds in approximation (we note that the two values underUsamp refer to sensor estimation of pi(θ) and Π(θ) respectively).As a trade-off to the improved adaptation rate, there is an increase in the scaled error underModel 1 when averaged across sensors. This is shown in Fig.3.2, where we have plotted for bothmodels the sample scaled tracking error when estimating pi(θ), as well as estimation of Π(θ) bythe empirical measure described in Sec.3.4.3. In Fig.3.2 the scaled error associated with eachobserved state (S=40) is plotted in a different color.We see that Model 1 results in an increased average scaled error as compared to Model2. The exact relation between the scaled error and adaptation rate r is not explored here indetail. The analytic and sample variance of the scaled tracking error, as measured relative tothe sensor trajectories, are plotted in Fig.3.3.Our purpose in Fig.3.3 is to illustrate how the coefficients of the Brownian motion dw varybetween models (each sensor and observed state (Sn = 24 · 40 = 960) are plotted in a differentcolor). For state ` in Xi and sensor i ∈ V under (3.16), the variance of the Brownian motion373.5. ACP1: Numerical Examples0 500 1000 1500 2000 2500 3000 3500 400000.511.522.533.54 Average Scaled Error, Model 1 (H ≥ 0)Sample Path per Observed StateCDF Estimates Scaled Error0 500 1000 1500 2000 2500 3000 3500 400000.511.52Sample Path per Observed StatePMF Estimates Scaled Error0 500 1000 1500 2000 2500 3000 3500 400000.20.40.60.811.21.4 Average Scaled Error, Model 2 (H > 0)Sample Path per Observed StateCDF Estimates Scaled Error0 500 1000 1500 2000 2500 3000 3500 400000.20.40.60.81Sample Path per Observed StatePMF Estimates Scaled ErrorFigure 3.2: The absolute scaled error Usamp, plotted consecutively over the entire simulationtime [T0, T1] for each state in S. The results indicate a Brownian bridge across the stateestimates (S = 40) of the average cumulative distribution Π¯(θ(t)).in (3.24) was computed analytically as a function of the averaging weights,σ2i` =Sn∑j=1(Wo · Σ)2((i−1)S+1+`)j , (Wo · Σ).= (WoΣWo ′)1/2. (3.31)We have plotted in Fig.3.3 the sample values of σi` in ascending order, the correspondinganalytic values have been plotted in the same order to illustrate the similarity between theanalytic and simulated results, thus verifying (3.24) as the correct expression for the scaledsensor tracking error. We note that σ¯2 =∑S`=1∑ni=1 σ2i`.Finally, in Fig.3.4, we have plotted various features of our simulations per sensor (each sensorand observed state (Sn = 24 · 40 = 960) are plotted in a different color). It is apparent that theratios between these models regarding analytic variance σ2sensor i =∑S`=1 σ2i`, Frobenius norm||Wo||2, either summed squared weight values 1′|Wo|21 or 1′|H|21, and the absolute samplederror Usamp, hold in approximation.In these numerical examples we employed the adaptation rate r and σ¯2 to explain ourempirical results. We leave open the question of which model is superior in terms of maximizingr while minimizing σ¯2 and as well the weights and edges, or complexity thereof, required383.6. ACP1: Conclusions and Future Work0 100 200 300 400 500 600 700 800 900 100000.511.522.533.5 x 10−7Ordered Sensors, Model 1 (H ≥ 0)Diffusion Coefficients 0 100 200 300 400 500 600 700 800 900 100001234567 x 10−6Sample Diffusion CoefficientsOrdered Sensors, Model 2 (H > 0)Diffusion Coefficients0 100 200 300 400 500 600 700 800 900 100000.511.522.533.5 x 10−7Ordered Sensors, Model 1 (H ≥ 0)Diffusion Coefficients0 100 200 300 400 500 600 700 800 900 100001234567 x 10−6Analytic Diffusion CoefficientsOrdered Sensors, Model 2 (H > 0)Diffusion CoefficientsFigure 3.3: The analytic and sample variance σi` of the scaled tracking error under bothModels 1 and 2, as given by (3.31). A point exists for each sensor i ∈ V and observed state` ∈ {1, . . . , S}. The results have been ordered according to the sample variances to better showthe similarity between the analytic and simulated variances.for {Go,Gv} to ensure average-consensus. This question is of importance since the featuresof {Go,Gv} can be naturally associated with the communication costs assumed of the sensornetwork [18].3.6 ACP1: Conclusions and Future WorkWe have proven that under the linear distributed averaging protocol and a local stochasticapproximation algorithm, the sensor state-values within a network communication graph mayModel λ2 1′Usamp σ¯2model 1′|Wo|21 ||Wo||2 1′|H|211 7.12 3.563, 0.944 6.875 10−7 442 106 6343 2.999 1052 1.44 0.805, 0.207 1.556 10−7 841.5 105 1463 1.3775 105Ratio model1model2 0.1994 0.2259, 0.2193 0.2278 0.1843 0.2049 0.2177Table 3.1: Numerical results of the ratio comparisons. The average ratio between Models 1 and2 is 0.2094.393.6. ACP1: Conclusions and Future Work0 500 100000.511.522.533.54 x 10−7Sample VarianceUn−ordered SensorsSample Summed Absolute Error200 400 600 80000.050.10.150.20.250.30.350.4Sample Absolute ErrorUn−ordered SensorsSample Summed Absolute Error200 400 600 800050100150200σsensorσ sensorUn−ordered Sensors 200 400 600 800050100150200Summed Weights |W|oSummed Rows of |W|oUn−ordered Sensors 200 400 600 800010002000300040005000600070008000Frobenius Norm, Model 1Frobenius NormUn−ordered Sensors0 500 100001234567 x 10−6Sample VarianceSample Summed Absolute ErrorUn−ordered Sensors 200 400 600 80000.050.10.150.20.250.30.350.4Sample Absolute ErrorSample Summed Absolute ErrorUn−ordered Sensors 0 500 10000500100015002000250030003500σsensorσ sensorUn−ordered Sensors 200 400 600 800050100150200Summed Weights |W|oSummed Rows of |W|oUn−ordered Sensors 0 500 100002468101214 x 104Frobenius Norm, Model 2Frobenius NormUn−ordered SensorsFigure 3.4: Unordered Sensor Diffusion Coefficients, Absolute Error, and Functions of Weights.The similarities between each quantity is apparent by the localization of larger values underModel 2, as compared to the uniform values obtained from Model 1. The approximate ratio isalso seen here to generally hold per sensor, although only with Wo replaced by H as seen inTable 3.1.under suitable averaging weights converge weakly to an average-consensus regarding the state ofa two-time-scale Markov system. In our framework this consensus is based on the slowly time-varying stationary distribution of a Markov chain observed by each sensor. The result has beenextended to the case of multiple ergodic classes in the observed Markov chains, or equivalentlyto network exchange graphs {Gv,Go} that belong to a set rather than a fixed parametrizationG = {V, E ,W}. The limiting switched ODE is in this case replaced by a switched differentialinclusion, thus motivating the idea of set-valued consensus. In addition to this, the sensorscaled tracking error was proven to satisfy a switching diffusion equation and, in the case ofCDF estimation by an empirical measure, a switching Brownian bridge.Necessary and sufficient conditions were derived in regard to the network communicationgraph edge set and weights required to ensure average-consensus formation. As a result, compu-tation of the correct consensus estimate requires (3.16) in conjunction with its two componentsub-algorithms. Thus, obtaining the average-consensus does not require any greater complex-ity in sensor computing ability or network communication than (3.16) itself assumes. Lastly,we considered the adaptation rate and magnitude of diffusion present in the sampled sensor403.6. ACP1: Conclusions and Future Worktrajectories, and proposed an optimization problem as well as an approximate ratio that relateboth to various features of the sensor averaging weights. Future work may consider the edge setand averaging weights required for fast convergence to the average-consensus estimate withoutrequiring strong connectivity assumptions or increased sum of the absolute averaging weights.41Chapter 4ACP2: The Static ConsensusProblem with a priori UnknownCommunicationIn this chapter we address the consensus problem ACP2 (cf. Def.2.2.2), that is obtaining at eachnode the average initial input when inter-node communication is completely a priori unknown.In particular we assume the communication sequence is asynchronous and possesses arbitrarytime-delays. Our main contribution is Thm.4.4.9, which shows that average-consensus can beobtained for general communication patterns if a vector of O(n) is transmitted along with theconsensus estimates, where n is the network size. For consensus estimates of O(n) this resultimplies an order of magnitude less data to be communicated than an “ideal” protocol thatfloods the initial data through-out the network. In Sec.4.4.4 we provide 3 practical examplesfrom the past literature that assume consensus estimates of O(n).There exist several results in previous literature that have proven the convergence of dis-tributed algorithms to an asymptotic average-consensus under asynchronous communicationand/or time-delayed signals. We note that asynchronous communication without time-delaysconstitute a special case of the framework considered in this thesis. In particular [44, 70–73]emphasize the asynchronous nature of their results, but (i) do not allow time-delayed signals,and (ii) assume a balanced communication graph at each iteration. We require neither of theseassumptions, and thus our consensus protocols achieve average-consensus under more generalcommunication conditions than the cited works, albeit with the cost of an extra O(n) memoryand signal storage (recall however that this extra cost may be negligible in various practicalapplications of the static average-consensus problem ACP2).Assuming the initial node states remain fixed, a distributed linear averaging protocol willimply that the local consensus estimate at each node can be expressed as a linear combination ofthe set of initial node states. A novel feature of the protocols we propose in this chapter is thateach node transmits not only their local consensus estimate, but also a local “normal” vectorrepresenting the linear dependence their local estimate has on the set of all initial consensusvectors. Under an appropriate least-squares update rule it can be shown that when all elements424.1. ACP2: Local Consensus Input Dynamicsof the local normal vector are equal, the node will know that its local consensus estimate equalsthe average-consensus value. It is by maintaining the local normal vector that each node canobtain average-consensus in the presence of time-varying and asymmetrical time-delays.To proceed we first define the local consensus input dynamics (Sec.4.1) as well as the inter-node communication assumptions under ACP2 (Sec.4.2). We then re-state ACP2 in thesespecific terms. A review of the literature with respect to the past algorithms frtom research inthis area is provided in Sec.4.3. Subsequently we present the details of our proposed protocols(Sec.4.4). Numerical examples of the protocols are presented (Sec.4.5) and their performanceis compared with those of previously proposed algorithms in the literature. Lastly, in Sec.4.6we provide some concluding comments and potential directions for future work.4.1 ACP2: Local Consensus Input DynamicsBoth ACPs described in Chapter 2 assume each node i ∈ V is initially associated with a localconsensus vector si(0) ∈ Rd×1. Unlike ACP1, the consensus problem ACP2 assumes that theinitial local consensus vector is static and does not vary over time. Thus for ACP2, the currentaverage-consensus value s¯(t) defined in (2.1) remains equal to s¯(0) = 1n∑i∈V si(0) for all t ∈ N.This is why we refer to ACP2 as a “static” ACP.4.2 ACP2: Inter-node Communication Pattern and ProblemStatementThe ACP2 requires a consensus protocol (cf. (4.1) − (4.2)) that operates under an “a prioriunknown communication pattern” (UCP). This is the most general form of inter-node commu-nication that can be assumed, and we must further define this type of communication beforeproceeding with any of the consensus protocols’ convergence results. To do so, we now state 3assumptions that all consensus protocols must abide by under a UCP.Definition 4.2.1 (UCP) A consensus protocol operating under an “ a priori unknown com-munication pattern” (UCP) satisfies the following conditions,• (A5): for each signal Sij(t0, t1), node j does not have the ability to know the value of ior t1.• (A6): for each signal Sij(t0, t1), node i does not have the ability to control the value of j,t0, or t1.434.2. ACP2: Inter-node Communication Pattern and Problem Statement• (A7): when a signal Sij(t0, t1) is received, the knowledge set Ki(t1) of the receiving nodeis updated by the time instant t1 + 1. (e.g. the “processing time” of each signal is oneunit of time).Assumptions (A5)-(A6) imply the communication process is a priori unknown at everynode, furthermore there are no constraints on the time-delay between the transmission andreception time of any signal. We emphasize that together (A5)-(A6) allow for more generalcommunication patterns than those in [18, 62, 72, 74–76]. The assumption (A7), together with(4.2) below, imply that a signal Sij(t0, t1) may contain information that has been received atnode j preceding time t0. This fact will be utilized later in Def. A.4.3, formalizing our notionof a “time-respecting communication path” between two nodes. Assumption (A7) is realistic(as well as natural in our discrete-time framework) since the protocols we propose will requireonly a few arithmetic operations in the update process.We now re-state assumptions (A1)-(A4) as well as the definition of a consensus protocol forthe readers convenience. We will assume,• (A1): (knowledge set assumption) At any time t ∈ N, each node i ∈ V is equipped with adevice that can update and store a “knowledge set” Ki(t). For each i ∈ V, the knowledgeset Ki(t) may have a time-varying cardinality.• (A2): Ki(t) ⊇ {sˆi(t)} , ∀ i ∈ V , ∀ t ∈ N, where sˆi(t) ∈ Rd×1 represents the local estimateof s¯(t) at node i and time t.• (A3): (signal set assumption) At any time t0 ∈ N, each node j ∈ V has the ability totransmit a “signal set” Sij(t0, t1) ⊆ Kj(t0) that will be received at some node i ∈ V attime t1 ∈ N, where t1 ≥ t0.• (A4): At any time t ∈ N, the knowledge set Ki(t) of each node i ∈ V satisfies Ki(t) ⊇{i, n, sˆi(t), si(t)}.Given (A1)-(A4), we define a “consensus protocol” (P) in terms of its “knowledge set updatingrule” fPK {·} together with a “signal specification rule” fPS {·}. The knowledge set updating rulefPK {·} defines the effect that a set of signals has on the knowledge set Ki(t1) at the receiving nodei. The signal specification rule fPS {·} specifies the elements contained in the signal Sij(t0, t1)given as a function of the knowledge set Kj(t0) at the transmitting node j.The Consensus Protocol (P) under (A1)-(A4):Knowledge Set Updating Rule: fPK : Ki(t1)⋃Sij(t0, t1) 7−→ Ki(t1+1) (4.1)444.3. ACP2: Relation to Past LiteratureSignal Specification Rule: fPS : Kj(t0) 7−→ Sij(t0, t1) (4.2)The consensus problem ACP2 is now re-stated in terms (A1)-(A7) and (4.1)− (4.2).Definition 4.2.2 (ACP2) Under (A1)-(A7), assume that the input consensus parameters{si(t) : i ∈ V} remain fixed at their initial values {si(0) : i ∈ V} for all t ∈ N. A consensusprotocol (P) (cf. (4.1) − (4.2)) then solves the average-consensus problem (ACP2) at timet∗ if as t approaches t∗, the consensus estimate sˆi(t) of each node i ∈ V converges (in theEuclidean norm || · ||) to the average initial input parameter,limt→t∗||sˆi(t)− s¯(0)|| = 0 . (4.3)To solve ACP2 (cf. Def.4.2.2) we will propose 4 consensus protocols that each requirea specific condition on the UCP C0,∞ sufficient for their convergence. We devote Sec.4.4.1to defining the 4 consensus protocols, Sec.4.4.2 to describing the respective UCP conditionssufficient for their convergence, and Sec.4.4.3 to formally state these results. Before proceedinghowever, we briefly provide an overview of past literature related to the results presented inthis chapter.4.3 ACP2: Relation to Past LiteratureThis chapter considers average-consensus in the presence of asynchronous and time-delayed com-munication. Several past papers have proposed distributed algorithms for asymptotic average-consensus under asynchronous communication and/or time-delayed signals, but they all alsorequire communication constraints extraneous to ours. For instance, the works [44, 70–73]emphasize the asynchronous nature of their results, but they do not allow time-delayed sig-nals and furthermore assume a balanced communication graph at each iteration. Conversely,[18, 62, 74, 76, 77] show that in the presence of time-delays each local consensus estimate willconverge (in some norm) to a common limit, yet they require extra communication constraintsthat afford a global conservation property. To obtain the conservation property the cited worksrequire some form of symmetry in the time-delays,• (R1): undirected communication links, with a common time-delay in both directions ofeach link, e.g. [18, 74,76,77]454.3. ACP2: Relation to Past Literature• (R2): a “response” signal for any initiating signal [62].Both [74, 77] claim that it is often impossible to obtain average-consensus in the presence ofasymmetrical time-delays, despite the previous results of [62]. Furthermore, the algorithmproposed in [36] obtains average-consensus in the presence of asymmetrical and time-invarianttime-delays. However, in [36] each node must have a priori knowledge (obtained in a dis-tributive fashion during a “start up phase”) of the underlying (time-invariant) communicationdigraph. Due to the “normal” vectors, the algorithms we present achieve average-consensusin the presence of (i) time-varying and asymmetrical time-delays, and (ii) time-varying com-munication digraphs. We do not require (R1) or (R2), nor do we require that the sequence oftime-delays or communication digraphs adhere to any fixed pattern.Similar to most consensus protocols (e.g. [16, 18, 43, 62]), the 4 protocols proposed hererequire each node to update a local estimate of the true average-consensus value via a weightedaverage of their current local estimate and the local estimate received from a neighboring node.Linear consensus protocols of this type imply that each local consensus estimate can alwaysbe expressed as a linear combination of the set of initial node states. A novel feature of theprotocols we propose is that each node transmits not only their local consensus estimate, butalso a local “normal” vector representing the linear dependence their local estimate has on theset of all initial consensus variables. When all elements of the local normal vector are equal,the node knows their local consensus estimate equals the average-consensus value. It is bymaintaining a local normal vector that each node can obtain average-consensus in the presenceof time-varying and asymmetrical time-delays.Similar to our proofs, the works [43, 78, 79] do not require strict conservation propertiesor a priori knowledge of the communication pattern. However, these works can only proveasymptotic consensus on some unpredictable value, which is not useful in many practical appli-cations [16, 36, 62, 80]. In contrast, the weighted gossip algorithm proposed in [75] guaranteesasymptotic average-consensus without a conservation property of the local consensus estimates.However, [75] employs additional local “sum” and “weight” scalar variables that each obey aglobal conservation property. The algorithm in [75] obtains average-consensus under asyn-chronous communication digraphs, yet they do not allow time-delays, and furthermore theyrequire the transmitting node knows how many nodes will receive their signal. Similarly, thealgorithms proposed in [21,81–84] converge relatively fast to average-consensus, yet they requirewell-behaved communication patterns and/or detailed knowledge at each node of the underly-ing network topology. In summary, the algorithms proposed in this chapter use local “normal”vectors to obtain average-consensus in the presence of asynchronous directed communicationwith asymmetrical time-varying time-delays. The drawback is that the local normal vectorsrequire node communication and storage costs to scale linearly with network size. The draw-464.4. ACP2: Algorithmic Solutionsback is nullified when an element-wise average-consensus is required on a set of vectors whosedimension is on the same order of magnitude as the network size. Three practical examples ofthis are discussed in Sec.4.4.4.4.4 ACP2: Algorithmic SolutionsIn this section we describe 4 consensus protocols that solve ACP2 under specific UCP conditions.The first section Sec.4.4.1 defines the 4 consensus protocols, Sec.4.4.2 describes the respectiveUCP conditions sufficient for their convergence, and Sec.4.4.3 formally states these results. Wethen go on to discuss the implications of these results in relation to the literature on consensusformation (Sec.4.4.4), present numerical examples (Sec.4.5), and provide directions of futurework (Sec.4.6).4.4.1 ACP2: Consensus ProtocolsIn this section we define the 4 consensus protocols that will be analyzed in this chapter. Wefirst will discuss the common features among all 4 protocols, we then move on to specificallydefine each respective protocol in terms of the general consensus protocol (4.1)− (4.2).All 4 protocols have the common feature of assuming that the knowledge set of each nodecontains a “normal” consensus estimate vˆi(t) ∈ Rn×1 representing the linear dependence theconsensus estimate sˆi(t) has on the matrix S = [s1(0); s2(0); · · · ; sn(0)] ∈ Rd×n.• (A8): Ki(t) ⊇ {vˆi(t)} , ∀ i ∈ V , ∀ t ∈ N, where vˆi(t) ∈ Rn×1 satisfies sˆi(t) = Svˆi(t), andwe denote S = [s1(0); s2(0); · · · ; sn(0)] ∈ Rd×n.Combining (A4) and (A8) yields the minimum knowledge set for each of the 4 protocols,• (A4′): Ki(t) ⊇ {i, n, sˆi(t), vˆi(t), si(0)} , ∀ i ∈ V .Assumption (A8) implies that sˆi(t) = s¯(0) if vˆi(t) = 1n1n, where 1n ∈ Rn×1 denotes a vectorthat consists only of unit-valued elements. Thus, in terms of the consensus problem ACP2,assumption (A4′) imply the average-consensus problem ACP2 is solved at time t if vˆi(t) = 1n1nfor all nodes i ∈ V. Motivated by this fact, consider the following update scheme for eachnormal consensus estimate vˆi(t). At any time t, let Vi denote a matrix with each columnequal a unique member of the set Hi = {v ∈ Ki(t(+)) : s = Sv , s ∈ Ki(t(+))}, whereKi(t(+)) = Ki(t)⋃Sij(t0, t). Note that (A4′) implies Hi has cardinality |Hi| ≥ 1. At anytime instant t ∈ N, the proposed protocols will update each normal consensus estimate vˆi(t)474.4. ACP2: Algorithmic Solutionsaccording to the following optimization problem,vˆi(t+ 1) = Viaˆ ,aˆ = arg mina∈R|Hi| ||Via−1n1n||2 ,(4.4)where || · || denotes the Euclidean norm (the exception to this is the DDA protocol, whichfurther requires each element of vˆi(t) to be either 0 or (1/n)). In order to maintain (A8), theupdate (4.4) implies that the consensus estimate sˆi(t) must be updated as follows,sˆi(t+ 1) = Siaˆ , (4.5)where Si is a matrix with each column equal to the vector s ∈ Ki(t(+)) that satisfies s = Svwith v equal to the corresponding column in Vi. In other words, Si = SVi. Note that ifnode i does not receive a signal at time t, then Ki(t(+)) = Ki(t) and hence (4.4) − (4.5)imply (A8) holds. In the remaining sections we show that by utilizing (A4′), (4.4) and (4.5),a suitable consensus protocol (4.1)− (4.2) can achieve average-consensus under relatively weakcommunication conditions when compared to past algorithms in the literature.Protocol 1: Idealized Algorithm (IA)The first protocol we define is the Idealized Algorithm (IA). The IA protocol presented inthis section obtains average-consensus trivially. We propose it formally because the 3 otherprotocols are specific reductions of it in the sense that any data transmitted by the latter willalso be transmitted by the IA algorithm, but not vice-versa. Furthermore, the IA protocolrequires communication conditions that will be used in the non-trivial result Thm.4.4.9. TheIA protocol implies that all of the initial consensus vectors {si(0) : i ∈ V} are floodedthrough-out the network and stored at each node. Thus IA can be seen as a trivial solutionto the “multi-piece dissemination” problem [73]. The general methodology of the IA protocolhas been discussed previously in the literature (e.g. [16, 62]) and it is often referred to as“flooding”. For completeness of our results, we show in Theorems 4.4.7− 4.4.8 that regardlessof the communication pattern between nodes, there exists no consensus protocol (4.1) − (4.2)that can obtain average-consensus before the IA algorithm. This is why we have named it the“Idealized” algorithm.Let δ[·] denote the Kronecker delta function applied element-wise. The IA signal specifi-cation (4.2) and knowledge set update (4.1) are respectively defined as (4.6) and (4.7), underwhich the updates (4.4)− (4.5) imply (4.8)− (4.10). Notice that in (4.7)− (4.9) below we willfor notational convenience utilize a vector w ∈ Rn×1 with `th element denoted w`.484.4. ACP2: Algorithmic SolutionsProtocol 1: Idealized Algorithm (IA)Signal Specification: Sij(t0, t1) ={Kj(t0) \ {j, n, sˆj(t0)} if vˆj(t0) 6= 1n1nKj(t0) if vˆj(t0) = 1n1n, (i, j) ∈ V2(4.6)Knowledge Set Update:w =1n(1n − δ[vˆi(t1) + vˆj(t0)])Ki(t1 + 1) ={{i, n, vˆi(t1 + 1), sˆi(t1 + 1), s`(0) ∀ ` s.t. w` = 1} if w 6= 1n{vˆi(t1 + 1), sˆi(t1 + 1)} if w = 1n(4.7)Normal Consensus Estimate Update: vˆi(t1+1) = w (4.8)Consensus Estimate Update: sˆi(t1+1) ={ ∑n`=1 w`s`(0) if vˆj(t0) 6=1n1ns¯(0) if vˆj(t0) = 1n1n(4.9)Estimate Initialization: vˆi(0) =1nei , sˆi(0) =1nsi(0) . (4.10)Besides flooding the initial consensus vectors, the IA protocol (4.6)−(4.10) has an additionalfeature that is not necessary but is practical: once a node j obtains all initial vectors {si(0) :i ∈ V}, any signal thereafter transmitted from node j will contain only the consensus estimatesˆj(t) = s¯(0) and normal consensus estimate vˆj(t) = 1n1n. In this way the average-consensusvalue s¯(0) can be propagated through-out the network without requiring all n initial consensusvectors to be contained in every signal (this feature, however, requires that the network size nis known a priori). With or without this feature, IA implies communication and storage of avector with a potentially very large dimension in the range O(nd) and Ω(d).Protocol 2: Distributed-Averaging (DA)We now define the non-trivial Distributed-Averaging (DA) consensus protocol. To define theDA update procedures, let V + denote the pseudo-inverse of an arbitrary matrix V, and eidenote the ith standard unit vector in Rn×1. The DA signal specification (4.2) and knowledgeset update (4.1) are defined as (4.11) and (4.12), under which the updates (4.4) − (4.5) imply(4.13)− (4.15).494.4. ACP2: Algorithmic SolutionsProtocol 2: Distributed Averaging (DA)Signal Specification: Sij(t0, t1) = Kj(t0)\{j, n, sj(0)} , (i, j) ∈ V2(4.11)Knowledge Set Update: Ki(t1 + 1) = {i, n, vˆi(t1 + 1), sˆi(t1 + 1), si(0)}(4.12)Normal Consensus Estimate Update: vˆi(t1+1) = V(DA)V+(DA)1n1n , V(DA) = [vˆi(t1), vˆj(t0),ein](4.13)Consensus Estimate Update: sˆi(t1 +1) = VsV+(DA)1n1n , Vs = [sˆi(t1), sˆj(t0),1nsi(0)] (4.14)Estimate Initialization: vˆi(0) =1nei , sˆi(0) =1nsi(0) . (4.15)Notice that the DA knowledge set update (4.12) implies the initial consensus vector si(0)remains stored in the respective knowledge set of each node i ∈ V. Without this propertythe DA protocol does not achieve average-consensus for the general class of communicationpatterns assumed in Thm. 4.4.9. Next observe that V(DA) is a n× 3 matrix. It is shown in [27]that the pseudo-inverse V +(DA) in (4.13) − (4.14) can be computed using only a few arithmeticoperations. Lastly, note that the DA signal specification (4.11) implies every signal must containonly the consensus estimate sˆj(t) ∈ Rd×1 and normal consensus estimate vˆj(t) ∈ Rn×1, thusthe DA protocol requires communication and storage of a vector with dimensions in the rangeO(n + d) and Ω(d) 2. Most consensus protocols based only on distributed linear averaging(e.g. [43, 72, 75, 76]) have Θ(d) communication and storage costs in our setting. Similarly,the class of “gossip” algorithms considered in [73] are required to have O(poly(log(n)) + d)storage costs, however [73] only considers bi-directional instantaneous communication patterns(such patterns comprise a very small subset of the communication sequences considered in thischapter).Protocol 3: Discretized Distributed-Averaging (DDA)Next we define the non-trivial Discretized Distributed-Averaging (DDA) consensus protocol.Let ei denote the ith standard unit vector in Rn×1, vˆi`(t) denote the `th element of vˆi(t),and vˆ−`i (t) ∈ R(n−1)×1 denote the vector vˆi(t) with element vi`(t) deleted. The DDA signal2The lower bound can be achieved by a mapping to reduce the signal dimensionality. For example, theinitialization (4.15) implies the ith element of vˆi(0) equals (1/n) and all other elements equal zero. The n-dimensional vector vˆi(0) can thus be mapped to the 2-dimensional vector [i, (1/n)].504.4. ACP2: Algorithmic Solutionsspecification (4.2) and knowledge set update (4.1) are defined as (4.16) and (4.17), under whichthe updates (4.4)− (4.5) imply (4.18)− (4.21).Protocol 3: Discretized Distributed-Averaging (DDA)Signal Specification: Sij(t0, t1) = Kj(t0)\{j, n, sj(0)} , (i, j) ∈ V2 (4.16)Knowledge Set Update: Ki(t1+1) = {i, n, vˆi(t1+1), sˆi(t1+1), si(0)} (4.17)Normal Consensus Estimate Update:vˆi(t1 + 1) = aˆvˆi(t1) + bˆvˆj(t0) + cˆei (4.18)(aˆ, bˆ, cˆ) =(1, 1, −vˆji(t0))if vˆ−ii (t1)′vˆ−ij (t0) = 0(aˆ, bˆ, cˆ) =(0, 1, 1n − vˆji(t0))if vˆ−ii (t1)′vˆ−ij (t0) > 0 , ||vˆ−ii (t1)||2 < ||vˆ−ij (t0)||2(aˆ, bˆ, cˆ) =(1, 0, 0)if vˆ−ii (t1)′vˆ−ij (t0) > 0 , ||vˆ−ii (t1)||2 ≥ ||vˆ−ij (t0)||2(4.19)Consensus Estimate Update: sˆi(t1+1) = aˆsˆi(t1)+ bˆsˆj(t0)+ cˆsi(0) (4.20)Estimate Initialization: vˆi(0) =1nei , sˆi(0) =1nsi(0) . (4.21)The only difference between DDA and the DA consensus protocols, is that DDA requiresvˆi(t) ∈ {0, 1n}n whereas DA requires vˆi(t) ∈ Rn×1, this is why we use the name “DiscretizedDistributed-Averaging”. It follows that, similar to the DA algorithm, the DDA requires commu-nication and storage of a vector with dimension size in the range O(n+ d) and Ω(d). However,in Sec.4.4.4 we discuss how quantization issues imply that in practice the DDA may havesignificantly lower resource costs than DA. Note that most consensus protocols based only ondistributed linear averaging (e.g. [43,62,72,75,76,85,86]) have Θ(d) communication and storagecosts in our setting.Protocol 4: One-Hop (OH)Under the OH consensus protocol each signal Sij(t0, t1) will either contain the local initialconsensus vector sj(0) and transmitting node identity j, or (if n is known a priori at eachnode) the average-consensus value s¯(0) and a scalar 0 to indicate that the transmitted vector isthe true average-consensus value. For this reason the OH protocol requires communication of avector with dimension size Θ(d), but similar to the DA and DDA protocols OH requires storageof a vector with dimension size in the range O(n + d) and Ω(d). Let Sij1 (t0, t1) denote thefirst element in the signal set Sij(t0, t1), and δ[·] denote the Kronecker delta function applied514.4. ACP2: Algorithmic Solutionselement-wise. The OH signal specification (4.1) and knowledge set update (4.2) are definedas (4.22) and (4.23), under which the updates (4.4) − (4.5) imply (4.24) − (4.26). Note thatin (4.23) − (4.24) below we will for notational convenience utilize a vector w ∈ Rn×1 with `thelement denoted w`.Protocol 4: One-Hop (OH)Signal Specification: Sij(t0, t1) ={{j, sj(0)} if vˆj(t0) 6= 1n1n{0, sˆj(t0)} if vˆj(t0) = 1n1n, (i, j) ∈ V2(4.22)Knowledge Set Update:w ≡ 1n − δ[vˆi(t1) + ej ]Ki(t1 + 1) ={{i, n, vˆi(t1 + 1), sˆi(t1 + 1), si(0)} if w 6= 1n1n{vˆi(t1 + 1), sˆi(t1 + 1)} if w = 1n1n(4.23)Normal Consensus Estimate Update: vˆi(t1 + 1) ={1nw if 0 6= Sij1 (t0, t1)1n1n if 0 = Sij1 (t0, t1)(4.24)Consensus Estimate Update: sˆi(t1 + 1) ={sˆi(t1) +(1n − vˆij(t1))sj(0) if 0 6= Sij1 (t0, t1)sˆj(t0) if 0 = Sij1 (t0, t1)(4.25)Estimate Initialization: vˆi(0) =1nei , sˆi(0) =1nsi(0) . (4.26)The condition 0 = Sij1 (t0, t1) requires n is a priori known at each node. If n is initiallyunknown at each node, then the condition 0 = Sij1 (t0, t1) must be removed from the above OHprotocol, but otherwise the protocol remains the same.Summary: In this section we have defined 4 consensus protocols that in Sec.4.4.3 will beshown to solve ACP2. The algorithms can be viewed as special cases of the consensus protocol(4.1) − (4.2). For initial consensus vectors with dimension d ≥ n, the communication costs ofthe DA and DDA protocols become O(d), thereby making these protocols comparable to manyof the other average-consensus algorithms in the literature that have Θ(d) costs. In Sec.4.4.4we provide three practical examples from the literature to justify the assumption d ≥ n.524.4. ACP2: Algorithmic Solutions4.4.2 ACP2: UCP ConditionsIn this section we define 5 conditions on the UCP (cf. (A5)-(A7)) that will be shown necessaryand/or sufficient for convergence of the 4 consensus protocols defined in Sec.4.4.1.Singly V Strongly Connected UCPA UCP communication sequence Ct0,t1 is “Singly V Strongly Connected” (SVSC) if each nodehas “time-respecting communication path” to every other node in the network. We define a“time-respecting communication path” as follows.Definition 4.4.1 A communication sequence Ct0,t1 contains a “time-respecting communicationpath” (trcp) from node j to node i if Ct0,t1 contains a sub-sequence Ct0(ij),t1(ij) with the followingconnectivity property,Ct0(ij),t1(ij) = {S`1j , S`2`1 , S`3`2 ,. . . , S`k(ij)`k(ij)−1 , Si`k(ij)}(4.27)where we have omitted the time indices but it is understood that the transmission time t0 of eachsignal S`q+1`q occurs after the reception time t1 of the preceding signal S`q`q−1. Note that the sub-sequence Ct0(ij),t1(ij) defined in (4.27) has a finite cardinality |Ct0(ij),t1(ij)| =k(ij) + 1 ≥ 1.With Def.4.4.1 we can now define a “Singly V Strongly Connected” (SVSC) communicationsequence.Definition 4.4.2 (SVSC) A communication sequence Ct0,t1 is “singly V strongly connected”(SVSC) if Ct0,t1 contains trcp from each node i ∈ V to every node j ∈ V−i. For any finite t0 ∈ N, we will let “Ct0,t1 ∈ SVSC” denote that the sequence Ct0,t1 is SVSC.Infinitely V Strongly Connected UCPA UCP communication sequence C0,∞ is “Infinitely V Strongly Connected” (IVSC) if it can bepartitioned into an infinite number of disjoint SVSC communication sequences.Definition 4.4.3 (IVSC) A communication sequence C0,∞ is “infinitely V strongly connected”(IVSC) if for each finite time instant t ∈ N there exists a finite time T (t) ∈ N such thatCt,T (t) ∈ SVSC. We will let “C0,∞ ∈ IVSC” denote that the sequence C0,∞ is IVSC.534.4. ACP2: Algorithmic SolutionsSingly V Centrally Connected UCPA UCP communication sequence Ct0,t1 is “Singly V Centrally Connected” (SVCC) if all nodeshave a communication path to some node iˆ, and then the node iˆ transmits a signal to all othernodes. In this sense, the network is “centered” around at least one node iˆ.Definition 4.4.4 (SVCC) A communication sequence Ct0,t1 is “singly V centrally connected”(SVCC) if there exists a time t1/2 and node iˆ ∈ V, such thatCt0 (ˆij),t1 (ˆij) ⊂ Ct0,t1/2−1 , ∀ j ∈ V−iˆSjiˆ(t0, t1) ∈ Ct1/2,t1 , ∀ j ∈ V−iˆ .We let “Ct0,t1 ∈ SVCC” denote that a sequence Ct0,t1 is SVCC.Infinitely V Centrally Connected UCPA UCP communication sequence C0,∞ is “Infinitely V Centrally Connected” (IVCC) if it canbe partitioned into an infinite number of disjoint SVCC communication sequences.Definition 4.4.5 (IVCC) A communication sequence C0,∞ is “infinitely V centrally con-nected” (IVCC) if for any finite time instant t ∈ N there exists a finite time T (t) ∈ N such thatCt,T (t) ∈ SVCC. For an IVCC sequence the “central” node iˆ defined in Def.4.4.4 can vary between each SVCCsub-sequence, and furthermore we do not assume that any node knows the identity of the centralnode iˆ. Let “Ct0,t1 ∈ IVCC” denote that a sequence Ct0,t1 is IVCC.Dual of a Singly V Centrally Connected UCPNext we define a UCP condition that will be shown sufficient for the OH protocol. To do so wecan define the “dual” C˜t0,t1 of a communication sequence Ct0,t1 as follows,Sij(t0, t1) ∈ C˜t0,t1 if Ct0(ij),t1(ij) ∈ Ct0,t1Ct0(ij),t1(ij) ∈ C˜t0,t1 if Sij(t0, t1) ∈ Ct0,t1 .(4.28)We use the term “dual” because (4.28) implies that if Ct0,t1 contains a signal (resp. communi-cation path) from node j to i, then so does ˜˜Ct0,t1 .A communication sequence Ct0,t1 is “Singly V dual-Centrally Connected” (SVC˜C) if allnodes transmit a signal to some node iˆ, and then the node iˆ has a communication path to544.4. ACP2: Algorithmic Solutionsall other nodes. In other words, Ct0,t1 is “Singly V dual-Centrally Connected” (SVC˜C) ifC˜t0,t1 ∈ SVCC.Definition 4.4.6 (SVC˜C) A communication sequence Ct0,t1 is “singly V dual-centrally con-nected” (SVC˜C) if C˜t0,t1 ∈ SVCC. Let “Ct0,t1 ∈ SVC˜C” denote that a sequence Ct0,t1 is SVC˜C.4.4.3 ACP2: Convergence ResultsGiven the 5 UCP conditions defined in Sec.4.4.2 we can now present the necessary and sufficientcommunication conditions for the convergence of the 4 consensus protocols defined in Sec.4.4.1.These are stated in the following 5 theorems.The theorem below states the necessary communication conditions for convergence of anyconsensus protocol (4.1) − (4.2). The theorem thus applies to each of the IA, DA, OH, andDDA protocols.Theorem 4.4.7 (Consensus Protocol (P) Necessary Communication Conditions)A consensus protocol (cf. (4.1) − (4.2)) cannot solve ACP2 for any communication sequenceC0,t /∈ SVSC (cf. Def.4.4.2). The next theorem states the sufficient UCP conditions for convergence of the IA protocol.Note that for the IA protocol the SVSC condition is both necessary and sufficient.Theorem 4.4.8 (IA Sufficient UCP Conditions) The IA protocol (4.6) − (4.10) solvesACP2 at time (t1 + 1) for any UCP C0,t1 ∈ SVSC (cf. Def.4.4.2). Although the above two theorems (Thm.4.4.7, 4.4.8) can be proven trivially (see [87]), theywill be referred to later and are presented for completeness of our results. The following theoremis non-trivial and states the sufficient UCP conditions for convergence of the DA protocol.Theorem 4.4.9 (DA Sufficient UCP Conditions) The DA protocol (4.11)− (4.15) solvesACP2 as t→∞ for any UCP C0,∞ ∈ IVSC (cf. Def.4.4.3). The next theorem is also non-trivial and states the sufficient UCP conditions for convergenceof the DDA protocol.Theorem 4.4.10 (DDA Sufficient UCP Conditions) The DDA protocol (4.16) − (4.21)solves ACP2 as t→∞ for any UCP C0,t ∈ IVCC (cf. Def.4.4.5). 554.4. ACP2: Algorithmic SolutionsThe final theorem can be proven trivially (see [87]) and states the necessary and sufficientUCP conditions for convergence of the OH protocol.Theorem 4.4.11 (OH Necessary and Sufficient UCP Conditions) The OH protocol(4.22)− (4.26) solves ACP2 at time (t1 + 1) for any UCP C0,t1 ∈ SVC˜C (cf. Def.4.4.6). Note that Thm.4.4.11 together with Thm.4.4.7 imply that any SVSC communication se-quence is also SVC˜C.Summary: In this section we have stated necessary and sufficient UCP conditions for theconvergence of the IA, DA, OH, and DDA protocols. The UCP conditions were defined inSec.4.4.2 and termed “Singly V Strongly Connected” (SVSC), “Infinitely V Strongly Connected”(IVSC), “Singly V Centrally Connected” (SVCC), “Infinitely V Centrally Connected” (IVCC),and “Singly V dual-Centrally Connected” (SVC˜C). In the subsequent section we relate ourresults to those in the literature, expand on the conclusions made in this section, and alsodescribe an analogy in terms of the sufficient UCP conditions for convergence of the IA andDA, and OH and DDA protocols.4.4.4 ACP2: Discussion of ResultsThe IVSC condition (4.4.3) assumes a recurring connectivity among nodes. It does not requireany node to have a priori knowledge of the communication sequence, nor does it require anyspecific structure in the (directed) signal pattern and time-delays. The DA protocol solvesACP2 for any UCP satisfying the IVSC condition. The drawback to the DA protocol is that itrequires O(n + d) communication and storage costs, whereas most past algorithms have onlyΘ(d) costs (see e.g. [1, 32] for exceptions to this).There has been much research considering algorithms (with only Θ(d) costs) that can achieveconsensus for arbitrary IVSC communication sequences (e.g. [43,78,79]). Typically, these “con-sensus algorithms” require only that there is a recurring “root” node (see [79] for definitions),whereas Theorem 4.4.9 requires that all nodes are a recurring “root” node. However, a majordrawback of the cited algorithms is that they only guarantee a final consensus on some unknownlinear combination of the initial consensus vectors, and such a consensus is not useful in manypractical applications [16,36,62,80].Theorem 4.4.9 implies the DA algorithm will obtain average-consensus asymptotically un-der more general communication conditions than previous works such as [16, 18, 62, 74, 76, 77].However, we clarify (i) there exist communication sequences for which the DA algorithm ob-tains average-consensus in finite time (see [27]), and (ii) Theorem 4.4.9 does not imply that theDA protocol always achieves average-consensus before all other known consensus protocols. InSec.4.5, we apply the DA protocol to an example considered in [1]. Although the consensus564.4. ACP2: Algorithmic Solutionsprotocol of [1] obtains an exact average-consensus after a finite number of received signals, itis shown that for this example the DA protocol obtains a satisfactory approximation to theaverage-consensus at even an earlier time.Analogy Between the IA and DA, and OH and DDA ProtocolsThe necessary and sufficient condition for average-consensus under the OH protocol requiresthat each node j ∈ V will either (a) receive a signal from all other nodes, or (b) have acommunication path from some node ij ∈ V−j that has already received a signal from all othernodes q ∈ V−ij . Consider the latter possibility where ij = iˆ for all j ∈ V−iˆ, that is the SVC˜CUCP condition holds. The SVC˜C UCP condition is the weakest condition for average-consensusunder the OH protocol in the sense that only one node must receive a signal from all othernodes (one is the minimum number of such nodes for convergence under the OH algorithm).Utilizing SVC˜C (cf. Def.4.4.6), we now describe how the (IA, DA) protocols are analogousto (OH, DDA) with respect to (i) the sufficient communication conditions, and (ii) the datarequired to be communicated.In terms of communicated data, note that (i) the IA protocol requires communication of allcurrently known initial consensus vectors, whereas the OH protocol requires communication ofonly the initially known consensus vector, and (ii) the DDA protocol is a discretized version ofthe DA protocol (as discussed after (4.21)). Next observe that Theorems 4.4.7 − 4.4.8 implythe weakest sufficient condition for average-consensus under the IA protocol is the SVSC UCPcondition (cf. Def.4.4.2). The dual of any SVSC sequence implies that every node j ∈ Vreceives a signal from every other node i ∈ V−j . It follows that if the dual of the weakestsufficient condition under the IA protocol occurs on an infinite number of contiguous time-intervals, then both the DA and DDA protocols obtain average-consensus. Likewise, if the dualof the SVC˜C UCP condition occurs on an infinite number of contiguous time-intervals, boththe DA and DDA protocols obtain average-consensus, since the dual of SVC˜C is precisely theSVCC condition.In summary, if the dual of the minimum sufficient condition for convergence of the IAprotocol occurs on an infinite number of disjoint time intervals then the OH, DA and DDAprotocols converge. Likewise if the dual of the minimum sufficient condition for convergence ofthe OH protocol occurs on an infinite number of disjoint time intervals then the IA, DA andDDA protocols converge. We also find that the OH is minimized version of IA in terms of datacommunicated, and likewise is the DDA protocol a minimized version of the DA protocol.574.4. ACP2: Algorithmic SolutionsPractical Examples for the DA and DDA AlgorithmsThe DA and DDA protocols require communication and storage of a vector with dimension oforder O(n + d), whereas the IA requires O(nd) costs. Most average-consensus algorithms inthe literature require only Θ(d) costs (e.g. [52, 71, 75, 79, 85, 86] and see [1, 32] for algorithmsthat potentially require larger costs). Our main results in this chapter imply the DA andDDA protocols can obtain average-consensus under general asynchronous and time-varyingasymmetric time-delayed communication, which is a unique (modulo “flooding”) and desirablefeature within the average-consensus literature. The issue of greater communication and storagecosts is effectively nullified if d ≥ n, in which case the resource costs of the DA and DDA belongto the same order of magnitude as that of most consensus algorithms and thereby motivate useof DA and DDA.There are several broad examples in the literature where an average-consensus on d ≥ nelements may be needed. As our first example, suppose each node observes a√n dimensionalprocess in noise, then a distributed Kalman filter requires an average-consensus on d = n+√ndistinct scalars [8]. Similarly, if each node observes a√n dimensional parameter in noise, theneither a distributed maximum-likelihood estimate or best linear unbiased estimate requiresaverage-consensus on d = n +√n distinct scalars [16, 36]. Lastly, consider a sensor networkwhere each node makes an observation on some q dimensional parameter z ∈ Rq×1, where foreach node i the observation z is an independent realization of a random variable that obeys oneof p different known probability distributions. A distributed hypothesis test (DHT) on the mostlikely hypothesis (the maximum a posterior estimate) then requires an average-consensus onpq scalar quantities [88]. It follows that whenever p ≥ n/q a DHT requires average-consensuson d ≥ n scalar values. For example, suppose n nodes are dispersed in a region of space andseek to locate a fixed energy source (q = 1). A DHT with p = n (each node’s coverage area)could then obtain the maximum likely source location by obtaining average-consensus on d = nscalars. The above examples do not benefit the IA algorithm, since O(nd) O(d) for large nand d ≥ n.Quantization of the Normal Consensus EstimateFor simplicity we have assumed that the communication and storage of a vector with dimensionq implies resource costs of order O(q). The rationale for this is due to signal quantization. Ifeach communicated scalar is quantized to the nearest p digits, then for any fixed p the numberof digits in a signal with dimension q is O(q).Signal quantization issues cannot be ignored in practice, and there has been growing interestin the effect of quantization has on the ability to obtain an accurate average-consensus [71,72].584.4. ACP2: Algorithmic SolutionsFor the IA, OH, and DDA protocols each normal consensus estimate vˆi(t) belongs to the discreteset {0, 1n}n. For this reason the latter three protocols allow each element in vˆi(t) to be expressedby the binary values {0, 1}, given that the integer value of n is known. The DA protocol doesnot possess this feature. The update coefficients for the consensus estimate sˆi(t) are identical tothose of the normal consensus vector vˆi(t), and thus an extensive quantization of the latter willimply a larger margin of error with respect to the optimal update coefficients for the former.It follows that in practice, signal quantization will in general affect the IA, OH, and DDAprotocols far less than the DA protocol.4.4.5 ACP2: Example SVC˜C SequenceIn this section we provide an example communication sequence under which the DDA protocoldoes not obtain average-consensus. The communication sequence that will be defined containsan infinite number of disjoint SVC˜C sub-sequences (cf. Def.4.4.6). The purpose of this exampleis to illustrate that the SVC˜C condition implies a fusion center (i.e. there exists a node towhich all nodes send a signal directly) whereas the sufficient condition for convergence of theDDA protocol requires there exist a node from which all nodes receive a signal. The latter canbe accomplished via a single broad-casting node, whereas the former will in general require allnodes (except one) to broad-cast their initial data (see Fig.4.1).SVCC SVCC~Figure 4.1: Illustrations of the SVCC and SVC˜C communication conditions for n = 5 nodes.Consider the communication sequence for n = 4 nodes (we choose n = 4 for simplicity; thegeneralization to an arbitrary number of nodes can be obtained in an analogous manner by thefollowing example).594.5. ACP2: Numerical ExamplesC0,1 = {S14(0, 0), S24(0, 0), S34(0, 0), S41(0, 0),S12(0, 1), S23(0, 1), S42(0, 1), S31(0, 1)}Ct,t = {S14(t, t), S21(t, t), S32(t, t), S41(t, t), S42(t, t), S43(t, t)} ∀ t ≥ 2(4.29)Let Vˆ (t) = [vˆ1(t), vˆ2(t), vˆ3(t), vˆ4(t)] ∈ R4×4. At time t = 2 the communication sequence(4.29) implies the following normal consensus node state values under both the OH and DDAprotocol.Vˆ (2) =1n 01n1n1n1n 01n0 1n1n 01n1n1n1n(4.30)The communication sequence (4.29) implies that under the DDA protocol the normal consensusstate values in (4.30) do not change for all t ≥ 2, and thus the DDA protocol does not converge.On the other hand, the OH protocol will obtain average-consensus at time t = 4 under (4.29).Together with the fact the SVC˜C condition is necessary for convergence under the OH protocol,the example (4.29) illustrates why we need to define both the SVC˜C and SVCC communicationsequences to characterize the convergence conditions of the OH and DDA protocols.4.5 ACP2: Numerical ExamplesThis section presents numerical examples of the 4 protocols defined in Sec.4.4.1 (IA, DA, DDA,and OH). The first example is taken from Example 3 in [1], where 6 nodes possess synchronousbi-directional communication with no time-delays. Within this communication model, theprotocol in [1] can obtain average-consensus in finite time under various assumptions: (i) givena priori knowledge of the network topology [1] obtains average-consensus after 3 iterations(which is the same time as IA, and thus from Theorems 4.4.7 and 4.4.8 this is the earliesttime any consensus protocol can obtain average-consensus), (ii) distributed computation of thenetwork topology using n-dimensional vectors, which implies [1] obtains average-consensus after8 iterations (5 to obtain the network topology at each node, and 3 to obtain the consensus) 3,and (iii) distributed computation of the network topology using 1-dimensional vectors, which3In [1] this is the worst case scenario . If additional storage costs are permitted, then the algorithm in [1] canachieve consensus in as few as 5 iterations. However, the DA algorithm will obtain consensus after 4 iterationsif multiple signals can be processed simultaneously, which we did not assume for simplicity of the convergenceproof.604.5. ACP2: Numerical Examplesimplies [1] obtains average-consensus after 33 iterations (30 to obtain the network topology ateach node, and 3 to obtain the consensus). The DA and DDA protocols are comparable to(ii) above, since DA and DDA also require communication of an n-dimensional vector. Fig.4.2illustrates the convergence of (IA, DA, DDA, OH) along with the “Max-Degree algorithm” [16],and an algorithm “Random” that assigns random weights (that sum to one) to each update.Vertical lines indicate the 3,8, and 33 time points corresponding to the assumptions (i)-(iii)mentioned above. Note that “Normal Consensus Error” at time t is defined aslimt→t1n∑i=1||vˆi(t)−1n1n||2 = 0 .0 5 10 15 20 25 30 3500.10.20.30.40.50.60.70.80.91Discrete−Time IterationsNormal Consensus ErrorAverage−Consensus with Bi−Direct. Comm. and No Delays IADADDAOHMax−DegreeRandomFigure 4.2: Convergence Under Example 3 in [1]. Random Weights and OH do not achieveaverage-consensus, but all other protocols do. The DDA protocol obtains consensus after only4 iterations, and at time t = 8 the DA normal consensus error has been reduced by 94%, thusimplying a close approximation to the true average-consensus.614.5. ACP2: Numerical ExamplesNext, we illustrate the convergence of the (IA, DA, DDA, OH) protocols as compared to(i) a “Gossip” algorithm that averages each received estimate with the current local estimate,and (ii) a “Random” algorithm wherein each node assigns a random weight w to each receivedestimate, and adds this to (1 − w) of its current local estimate. Consider n = 100 nodes thatcommunicate asynchronously with time-varying delays. Specifically, a digraph on 100 nodes wasconstructed by adding randomly chosen edges until the graph was strongly connected. At eachiteration a random (directed) link is selected and used to communicate a signal transmitted ata random time point in the past (by random we mean uniformly random).0 2000 4000 6000 8000 10000 12000 14000 1600000.10.20.30.40.50.60.70.80.91Discrete−Time IterationsNormal Consensus ErrorAverage−Consensus with Asynch. and Delayed Comm. OHGossipRandomFigure 4.3: Convergence under Asynchronous Communication with Time-Varying Asymmetri-cal Time-Delays. Gossip, Random, and OH do not achieve consensus. The latter implies thatno node has received a signal from all other nodes.In Fig.4.3, it is apparent that node estimates under the “Random” and “Gossip” protocolshover around the same error and hence do not obtain average-consensus. Also in Fig.4.3 itis clear that the OH protocol does not converge, which by Thm.4.4.11 implies that no node624.5. ACP2: Numerical Examples0 500 1000 1500 2000 2500 3000 3500 4000 4500 500000.10.20.30.40.50.60.70.80.91Discrete−Time IterationsNormal Consensus ErrorAverage−Consensus with Asynch. and Delayed Comm. IADADDAFigure 4.4: Convergence under Asynchronous Communication with Time-Varying Asymmetri-cal Time-Delays. The (IA,DA,DDA) protocols all obtain consensus.634.5. ACP2: Numerical Examples100 200 300 400 500 600 700 800 90000.511.522.5Discrete−Time IterationsNormal Consensus ErrorAverage−Consensus with Asynch. and Delayed Comm. (n unknown) IADADDAFigure 4.5: Convergence under Asynchronous Communication with Time-Varying Asymmet-rical Time-Delays and unknown network size. The initial (transient) normal consensus erroris larger when n is unknown, however the steady-state of the (IA,DA,DDA) protocols are notaffected (the vertical line indicates the IA time of consensus).644.6. ACP2: Conclusions and Future Workhas received a signal from all other nodes. In Fig.4.4 we plot the (IA,DA,DDA) protocolsunder the same communication sequence as that of Fig.4.3. In contrast to (Random, Gossip,OH), the (IA,DA,DDA) protocols all obtain average-consensus. Lastly, in Fig.4.5 we simulatethe same communication sequence for the (IA,DA,DDA) protocols but now assume that n isunknown initially at each node. By plotting the same normal consensus error as was shownin Fig.4.4, it is evident that if n is unknown initially at each node, the at the time that IA(with n known) obtains average-consensus, the normal consensus estimates of the (IA,DA,DDA)protocols converge exactly to the respective estimates of each protocol when n is known (see [27]for a proof of this result). From Thm.4.4.7 it then follows that the (IA,DA,DDA) protocolswill obtain average-consensus at the same time as they would when assuming n is known. TheOH protocol, on the other hand, does not transmit the normal consensus estimates, hence thenormal consensus error under the OH protocol is largely affected by a priori knowledge of n(the OH error is too large to plot in Fig.4.5, with a final normal consensus error reaching 850%of what it is when n is known). We note that if n is unknown, then to obtain average-consensusthe OH algorithm requires a communication sequence that conforms only to a fully connectedcommunication graph [86].4.6 ACP2: Conclusions and Future WorkThis chapter has described and analyzed 4 consensus protocols designed to solve the average-consensus problem under general unidirectional connectivity conditions with asymmetrical andtime-varying time-delays. We derived necessary and sufficient conditions for the convergence toaverage-consensus under each respective protocol. The conditions for convergence were basedon 5 types of UCP connectivity conditions, namely the (SVSC), (IVSC), (SVCC), (IVSC), and(SVC˜C) communication sequences. All 5 connectivity conditions allow arbitrary (finite) delaysin the transmission time of each directed signal; the communication digraphs need not repeatany fixed pattern; and furthermore we did not assume that the sending node knows the identities(or cardinality) of the receiving nodes. A drawback of the non-trivial protocols (DA) and (DDA)is that they require communication and storage of a vector with dimension of order O(n+ d),whereas most consensus protocols based only on weighted linear updates require Θ(d) costsin the current setting. Numerical examples of these protocols were presented and comparedwith previous algorithms in the consensus literature, and practical examples of the non-trivialprotocols were explained. A unique feature of each protocol is the requirement to store theinitial consensus vector si(0) in each node’s respective knowledge set Ki(t). An interestingavenue of future work includes obtaining UCP conditions for which the same protocols achieveaverage-consensus when the initial consensus vector is not stored in each node’s respective654.6. ACP2: Conclusions and Future Workknowledge set. Further work also still need be done regarding the protocols application todynamic consensus formation, when the initial consensus vectors change with time.As a final note, all 4 protocols (IA, DA, OH, DDA) can obtain a consensus on any linearcombination of the initial consensus vectors {si(0) ∈ Rd : i ∈ V} under the exact samecommunication conditions as stated in Thm.4.4.9−4.4.11. In other words, if we assume a vectorw ∈ Rn×1 is initially known at each node, then if the (IA,DA,DDA,OH) protocols update thenormal consensus estimate based on (4.4) with 1n1n replaced by the vector w, then the sameconditions stated in Sec.4.4.3 will imply limt→t1 ||sˆi(t) −∑i∈V si(0)wi|| = 0 for all i ∈ V. Theproof of these results follow by identical arguments to those presented in [27] and [89], simplyby replacing 1n1n by the vector w.66Chapter 5Conclusion5.1 Summary of ContributionsThis thesis has addressed two diametrically opposed aspects of the average-consensus problem.The first of these concerns average-consensus in a two-time-scale Markov system when inter-node communication is synchronous, instantaneous, and adheres to a fixed network graph.In this setting we showed that by using a local stochastic approximation algorithm togetherwith the appropriate update weights and network communication graph, each node consensusestimate will weakly-converge to the average stationary distribution of all observed Markovchains. We also characterized the convergence rate of this algorithm in terms of a suitablyscaled switched diffusion equation; showed how an empirical measure allows estimation of thecumulative distribution function of each stationary distribution; and also derived a set-valuedswitching ODE in the case when each node observes a set of Markov chains rather than asingle chain. In addition to all of this, if the regime switching is slow enough we have showneach node consensus estimate is capable of tracking the slowly-switching average of the inputdistributions. This two-time-scale model constitutes a set of slowly-switching input parametersobserved in Markovian noise.The second aspect concerning the average-consensus problem that was addressed in thisthesis was that of reaching an average-consensus on the initial local input variables whenthe inter-node communication is asynchronous, time-delayed, and adheres to no fixed net-work graph. In this setting we showed that by employing normal consensus estimates of orderO(n), average-consensus is still possible even when the inter-node communication is completelya priori unknown. For initial consensus inputs of O(n), these results imply an order of mag-nitude reduction in the required transmitted data as compared to an ideal consensus protocolthat simply floods the initial consensus inputs. Three practical examples of such instancesin the consensus literature were discussed, thus motivating the practical implementation ofour consensus protocols in this setting. These three examples in particular were hypothesistesting in Bayesian networks, Kalman filtering, and maximum likelihood estimation. Besidesflooding, the only other algorithm known to achieve average-consensus under such general com-munication conditions is a randomized algorithm that generates exponential random variables675.2. Directions of Future Workof order O(rd), where larger values of r imply a more accurate approximation of the O(d)average-consensus value. Further work is required to investigate how our O(n+ d) distributedaveraging algorithm compares in terms of communication complexity and accuracy with therandomized algorithm.In the appendix we proved that given an upper bound on the time until a communicationsequence is non-time-respecting strongly-connected, there exists an O(n!) upper bound on thecompletion time of the ideal protocol. We further conjectured in such a case that there isan O(n) upper bound. These results allow for the consensus protocols described in Chapter4 to weakly-converge to the average-consensus under the input dynamics of Chapter 3, thuscombining the most difficult features of both average-consensus problems. In particular, whenany node’s input changes in the two-time-scale Markov system, a general method to achieveaverage-consensus for a priori unknown communication sequences would then consist of thatnode sending out a signal with a time counter (initialized at (n − 1)) along with either (i)a message to reset each node’s consensus estimates or (ii) containing the new input valueat that node. The upper bound on the completion time of the ideal protocol also allows for acomplete stopping of any distributed averaging algorithm once a sufficiently close approximationto the true average-consensus value has been reached. We note, however, that the only knownconsensus protocols that allow for such an approximation to be computed are the IA,DA,OHand DDA protocols, and that is by their common use of the normal consensus estimates.5.2 Directions of Future WorkThere exists a variety of directions for future work based on the results discussed in this thesis.We list these now,1. worst case convergence rate of the DA (resp. DDA) consensus protocol for uniformlystrongly (resp. centrally) connected communication sequences.2. extension of the IA, DA, OH, and DDA consensus protocols to noisy and/or changinglocal input variables (i.e. the two-time-scale Markov system).3. application of the DA consensus protocol to hypothesis testing in Bayesian networks.4. extension of the DA and DDA consensus protocols to the distributed convex optimizationproblem.We next will discuss each of these directions individually.Note that in Chapter 4 we provided results on the convergence of the DA and DDA con-sensus protocols. Furthermore, in [27] we provide worst case error bounds on the consensus685.2. Directions of Future Workestimates sˆi(t) given a priori knowledge of upper and lower bounds on the element-wise entriesof the local consensus inputs si(0). However, we do not know of any worst case or asymptoticconvergence rate for either protocol under general strongly-connected or centrally-connectedcommunication sequences. Theoretically such bounds could be computed by assuming eitherthat (i) the communication sequence is uniformly strongly (resp. centrally) connected or (ii)deriving a bound based on counting the number of previous disjoint time intervals over whichthe communication sequence was singly-strongly (resp. centrally) connected.In Chapter 4 it was assumed that the network sought after an average-consensus on theinitial local input variables (i.e. a static average-consensus). In contrast, Chapter 3 assumedthe local input variables at each node were random variables (Markov chains) with stochasticallyswitching stationary distributions. The latter input dynamics constitute a set of changing inputs(because of stationary distributions are stochastically switching) that are also observed in noise(because the stationary distributions do not in general describe a deterministic value). It ispossible that by modifying the IA, DA, OH, and/or DDA protocols the static average-consensusproblem addressed in Chapter 4 can be extended to the dynamic average-consensus addressedin Chapter 3. This modification would then incorporate the most difficult features of bothaverage-consensus problems, namely a consensus on the current average stationary distributionusing an a priori unknown, asynchronous, and time-delayed communication sequence.It was shown in [88] how linear distributed averaging can be used to conduct hypothesistests in a Bayesian network when node communication is synchronous, fixed, and without time-delays. The DA consensus protocol generalizes these results to allow hypothesis testing for apriori unknown communication sequences (asynchronous, with time-delays, and no fixed net-work topology). Furthermore, unlike the work in [88], by using the normal consensus estimatesthe DA protocol permits an absolute computation of the most likely hypothesis, that is to sayeach node will know when it has computed the most likely hypothesis. Finally, as mentionedabove, the distributed averaging can also come to a complete stop once the most likely hypoth-esis has been computed by any given node, simply if that node sends out a flag signal indicatingthe most likely hypothesis and a time counter initialized at (n− 1).Lastly, a more complicated distributed computation that uses the general problem of average-consensus as a primary is that of distributive convex optimization [35]. Specifically, if each nodei ∈ V has an associated convex function fi(z1, . . . , zd) and the network desires to obtain at eachnode the vector z that minimize∑i∈V fi(z1, . . . , zd), then a sub-gradient push-sum protocolcan be shown to achieve this result for time-varying directed communication graphs if eachnode knows its out-degree [35]. The DA and DDA protocols provide a possible extension of thiswork insofar as that these consensus protocols might be generalized to solve the aforementioneddistributed convex optimization problem for a priori unknown, asynchronous, and time-delayed695.2. Directions of Future Workcommunication sequences.All of these avenues constitute possible directions of future work based on the results de-scribed in this thesis.70REFERENCES[1] S. Sundaram and C. Hadjicostis, “Distributed Consensus and Linear Functional Calcula-tion: An Observability Perspective,” in Proc. of IPSN 2007, Cambridge, MA, 2007.[2] Y. Bar-Shalom and T. E. Fortmann, Tracking and data association. Academic Press,Boston, MA, 1988.[3] T. K. J. Cortes, S. Martinez and F. Bullo, “Coverage control for mobile sensing networks,”IEEE Trans. on Robotics and Automation, vol. 20, no. 2, p. 243255, April 2004.[4] J. H. D. Estrin, R. Govindan and S. Kumar, “Next century challenges: scalable coordi-nation in sensor networks,” in Proc. ACM/IEEE Intern. Conf. on Mobile Computing andNetworking, 1999, pp. 263–270.[5] M. P. S. Meguerdichian, F. Koushanfar and M. B. Srivastava, “Coverage problems inwireless ad-hoc sensor networks,” in Proc. of IEEE Infocom, April 2001, p. 13801387.[6] J. K. W.R. Heinzelman and H. Balakrishnan, “Adaptive protocols for information dissem-ination in wireless sensor networks,” in Proc. of Mobile Computing and Networking, 1999,p. 174185.[7] D. S. R. Murray, R. Olfati-Saber, “Distributed Sensor Fusion using Dynamic Consensus,”in 16th IFAC World Congress, Prague, Czech., 2005.[8] R. Olfati-Saber, “Distributed Kalman Filter with Embedded Consensus Filters,” in Proc.Joint 44th IEEE Conf. Decision and Control, Spain, 2005.[9] H. D.-W. B.S.Y. Rao and J. Sheen, “A fully decentralized multi-sensor system for trackingand surveillance,” Int. Journal of Robotics Research, vol. 12, no. 1, p. 2044, 1993.[10] R. O.-S. D.P. Spanos and R. Murray, “Approximate distributed Kalman filtering in sensornetworks with quantifiable performance,” in Fourth International Symposium on Informa-tion Processing in Sensor Networks, April 2005, p. 133139.71REFERENCES[11] S. B. L. Xiao and S. Lall, “A scheme for asynchronous distributed sensor fusion basedon average consensus,” in Fourth International Symposium on Information Processing inSensor Networks, April 2005.[12] H. H. M. Chu and F. Zhao, “Scalable information-deriven sensor querying and routingfor ad hoc heterogeneous sensor networks,” International Journal of High-PerformanceComputing Applications, vol. 16, no. 3, p. 293313, 2002.[13] D. Reid, “An algorithm for tracking multiple targets,” IEEE Trans. on Automatic Control,vol. 24, no. 6, p. 843854, 1979.[14] J. L. Speyer, “Computation and transmission requirements for a decentralized linearquadratic-gaussian control problem,” IEEE Trans. on Automatic Control, vol. 24, no. 2,p. 266269, 1979.[15] B. Zhou and N. K. Bose, “Multi-target tracking in clutter: fast algorithms for data as-sociations,” IEEE Trans. on Aerospace and Electronic Systems, vol. 29, no. 2, p. 352363,1993.[16] L. X. S. Boyd and S. Lall, “A Scheme for Robust Distributed Sensor Fusion based on Av-erage Consensus,” in Proc. of the 4th IEEE Symposium on Info. Proc. in Sensor Networks,Los Angeles, CA, 2005.[17] S. Boyd and L. Xiao, “Fast Linear Iterations for Distributed Averaging,” Systems andControl Letters, vol. 53, pp. 65–78, 2004.[18] R. Murray and R. Olfati-Saber, “Consensus Problems in Networks of Agents with Switch-ing Topology and Time-Delays,” IEEE Trans. Automat. Control, vol. 49, no. 9, pp. 1520–1532, 2004.[19] M. Porfiri and D. Stilwell, “Consensus Seeking Over Weighted Directed Graphs,” IEEETrans. Automat. Control, vol. 52, no. 9, pp. 1767–1773, 2007.[20] K. T. V. Krishnamurthy and G. Yin, “Consensus Formation in a Two-Time-Scale Marko-vian System,” SIAM MMS, vol. 7, no. 4, 2009.[21] A. S. A. Dimakis and M. Wainwright, “Geographic Gossip: Efficient Aggregation for SensorNetworks,” in Proc. ACM/IEEE Symp. Info. Processing in Sensor Networks, 2006.[22] J. Cortes, “Distributed Algorithms for Reaching Consensus on General Functions,” Auto-matica (Journal of IFAC), vol. 44, no. 3, pp. 726–737, March 2008.72REFERENCES[23] J. L. A. Jadbabaie and A. Morse, “Coordination of Groups of Mobile Autonomous Agentsusing Nearest Neighbor Rules,” IEEE Trans. Automat. Control, vol. 48, pp. 988–1001,2003.[24] Y. Hatano and M. Mesbahi, “Agreement Over Random Networks,” IEEE Trans. Automat.Control, vol. 50, no. 11, pp. 1867–1872, 2005.[25] L. Moreau, “Stability of Multiagent Systems with Time-dependent Communication Links,”IEEE Trans. Automat. Control, vol. 50, no. 2, pp. 169–182, 2005.[26] B. H. V. Gupta and R. Murray, “Stability analysis of stochastically varying formations ofdynamic agents,” Proc. of 42nd IEEE Conference on Decision and Control, Maui, Hawaii,pp. 504–509, 2003.[27] K. Topley and V. Krishnamurthy, “Average-Consensus Algorithms in a DeterministicFramework: Part 1 Strong Connectivity,” IEEE Trans. on Signal Processing, vol. 12,no. 60, 2012.[28] D. P. B. J. N. Tsitsiklis and M. Athans, “Distributed Asynchronous Deterministic andStochastic Gradient Optimization Algorithms,” IEEE Transactions on Automatic Control,vol. 31, no. 9, pp. 803–812, 1986.[29] D. Bertsekas and J. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods.Prentice Hall, 1989.[30] e. a. D. Kingston, “Decentralized Perimeter Surveillance Using a Team of UAVs,” IEEETrans. on Robot., vol. 24, no. 6, pp. 1394–1404, 2008.[31] A. D. D. Kempe and J. Gehrki, “Gossip-Based Computation of Aggregate Information,” inProc. of the 44th Annual IEEE Symposium on Foundations of Computer Science, October2003, pp. 482–491.[32] D. Mosk-Aoyama and D. Shah, “Fast Gossip Algorithm for Computing Separable Func-tions,” IEEE Trans. on Info. Theory, vol. 54, no. 7, 2008.[33] G. Ermentrout, “Stable periodic solutions to discrete and continuum arrays of weaklycoupled oscillators,” SIAM J. Appl. Math., vol. 52, no. 6, pp. 1665–1687, December 1992.[34] A. Nedic´, “Asynchronous Broadcast-Based Convex Optimization over a Network,” IEEETransactions on Automatic Control, vol. 56, no. 6, pp. 1337–1351, 2011.73REFERENCES[35] A. Nedic´ and A. Olshevsky, “Distributed Optimization over Time-Varying DirectedGraphs,” IEEE Transactions on Automatic Control, accepted, 2014.[36] S. B. G. Scutari and L. Pescosolido, “Distributed Decision Through Self-SynchronizingSensor Networks in the Presence of Propagation Delays and Asymmetric Channels,” IEEETrans. on Signal Processing, vol. 56, no. 4, 2008.[37] S. L. K.I. Tsianos and M. Rabbat, “Consensus-Based Distributed Optimization: PracticalIssues and Applications in Large-Scale Machine Learning,” in Proc. of the 50th AllertonConference on Communication, Control, and Computing, 2012.[38] S. Grime and H. Durrant-Whyte, “Data Fusion in Decentralized Sensor Networks,” ControlEngineering Practice, vol. 2, no. 5, pp. 849–863, 1994.[39] A. R. I.D. Schizas and G. Giannakis, “Consensus in ad hoc WSNs with noisy links, Part I:Distributed Estimation of Deterministic Signals,” IEEE Trans. Signal Processing, vol. 56,no. 1, pp. 350–364, 2008.[40] M. Franceschelli and A. Gasparri, “Decentralized Centroid Estimation for Multi-AgentSystems in Absence of any Common Reference Frame,” in American Control Conference,St. Louis, Missouri, June 2009.[41] M. S. S. Stankovic and D. Stipanovic, “Decentralized Parameter Estimation by ConsensusBased Stochastic Approximation,” in Proc. of the 46th IEEE Conference on Decision andControl, New Orleans, Dec 2007.[42] S. Chatterjee and E. Seneta, “Towards consensus: Some convergence theorems on repeatedaveraging,” J. Appl. Probab., vol. 14, no. 1, pp. 89–97, March 1977.[43] A. O. V. Blondel, J. Hendrickx and J. Tsitsiklis, “Convergence in Multi-agent Coordina-tion, Consensus, and Flocking,” in Proc. of IEEE Conference on Decision and Control,Seville, Spain, 2005, pp. 2996–3000.[44] B. P. S. Boyd, A. Ghosh and D. Shah, “Randomized Gossip Algorithms,” IEEE Trans. onInfo. Theory, vol. 52, no. 6, pp. 2508–2530, 2006.[45] K. Plarre and P. Kumar, “Extended Message Passing Algorithm for Inference in LoopyGaussian Graphical Models,” Ad Hoc Networks, vol. 2, pp. 153–169, 2004.[46] A. L. E. Nakamura and A. Frery, “Information Fusion for Wireless Sensor Networks: Meth-ods, Models, and Classificaitons,” ACM Computing Surveys, vol. 39, no. 3, 2007.74REFERENCES[47] M. DeGroot, “Reaching a consensus,” J. Am. Statistical Assoc., vol. 69, no. 345, pp. 118–121, 1974.[48] D. Scherber and H. Papadopoulos, “Distributed Computation of Averages over ad hocNetworks,” IEEE J. Sel. Areas Commun., vol. 23, no. 4, pp. 776–787, April 2005.[49] V. K. G. Yin and C. Ion, “Regime Switching Stochastic Approximation Algorithms withApplication to Adaptive Discrete Stochastic Optimization,” SIAM J. Optim., vol. 14, no. 4,pp. 1187–1215, 2004.[50] D. Scherber and H. Papadoupoulos, “Locally Constructed Algorithms for Distributed Com-putations in Ad-hoc Networks,” in Proc. of the 3rd International Symposioum on Infor-maiton Processing in Sensor Neworks, Berkely CA, ACM Press, New York, April 2004,pp. 11–19.[51] S.-J. K. S. Boyd and L. Xiao, “Distributed average consensus with least mean-squaredeviation,” J. Parallel Distrib. Comput., vol. 67, pp. 33–46, January 2007.[52] W. Ren and R. Beard, “Consensus Seeking in Multi-Agent Systems under DynamicallyChanging Interaction Topologies,” IEEE Trans. on Automatic Control, vol. 50, no. 5, 2005.[53] M. Huang and J. Manton, “Stochastic consensus seeking with measurement noise: con-vergence and asymptotic normality,” in Proc. American Control Conf., Seattle, WA, June2008, pp. 1337–1342.[54] R. Horn and C. Johnson, Matrix Analysis. Cambridge University Press, 1990.[55] C. Wu, “Synchronization and convergence of linear dynamics in random directed net-works,” IEEE Trans. Automat. Control, vol. 51, no. 7, pp. 1207–1210, July 2006.[56] P. Antsaklis and L. Fang, “Information Consensus of Asynchronous Discrete-time Multi-agent Systems,” in Amer. Control Conf., 2005.[57] P. Lin and Y. Jia, “Average consensus in networks of multi-agents with both switchingtopology and coupling time-delay,” Physica A, vol. 387, pp. 303–313, 2008.[58] A. J. H. Tanner and G. Pappas, “Stable Flocking of Mobile Agents, Part I: Fixed Topol-ogy,” in Proc. of the 42nd IEEE Conf. on Decision and Control, 2003.[59] J. Slotine and W. Wang, “A Study of Synchronization and Group Cooperation using PartialContraction Theory,” in Block Island Workshop on Cooperative Control, Springer-Verlag,2003.75REFERENCES[60] B. F. M. Broucke and Z. Lin, “Local Control Strategies for Groups of Mobile AutonomousAgents,” IEEE Trans. Automat. Control, vol. 49, no. 4, pp. 622–629, 2004.[61] M. A. R. Rahman and V. Saligrama, “Distributed Tracking in Multi-hop Sensor NetworksWith Communication Delays,” IEEE Trans. on Signal Processing, vol. 55, no. 9, 2007.[62] J. P. S. L. M. Mehyar, D. Spanos and R. Murray, “Asynchronous Distributed Averagingon Communication Networks,” IEEE Trans. on Networking, vol. 15, no. 3, 2007.[63] P. Alricksson and A. Rantzer, “Distributed Kalman filtering using weighted averaging,” in17th Intl. Symp. Math. Theory of Networks and Systems, Kyoto, Japan, 2006.[64] R. Pagliari and A. Scaglione, Implementation of Average Consensus Protocols for Com-mercial Sensor Networks Platforms, ser. Grid Enabled Remote Instrumentation. SpringerUS, 2009.[65] G. Cybenko, “Load balancing for distributed memory multi-processors,” Journal of Par-allel and Distributed Computing, vol. 7, pp. 279–301, 1989.[66] R. Beard and D. Kingston, “Discrete-Time Average-Consensus under Switching NetworkTopologies,” in Proc. of the 2006 American Control Conference, Minneapolis, Minnesota,June 2006.[67] E. B.-J. I. C. T. Vicsek, A. Cziro´k and O. Shochet, “Novel type of phase transition in asystem of self-driven particles,” Phys. Rev. Lett., vol. 75, 1995.[68] H. Kushner and G. Yin, Stochastic Approximation and Recursive Algorithms and Applica-tions, 2nd Edition. Springer-Verlag, New York, NY, 2003.[69] P. Billingsley, Convergence of Probability Measures. J. Wiley, New York, NY, 1968.[70] S. Kar and J. Moura, “Distributed Consensus Algorithms in Sensor Networks With Imper-fect Communication: Link Failures and Channel Noise,” IEEE Trans. on Signal Processing,vol. 57, no. 1, 2009.[71] ——, “Distributed Consensus Algorithms in Sensor Networks: Quantized Data and Ran-dom Link Failures,” IEEE Trans. on Signal Processing, vol. 58, no. 3, 2010.[72] A. O. A. Nedic´, A. Olshevsky and J. Tsitsiklis, “On Distributed Averaging Algorithms andQuantization Effects,” IEEE Trans. on Automatic Control, vol. 54, no. 11, 2009.[73] D. Shah, “Gossip Algorithms,” Foundations and Trends in Networking, vol. 3, no. 1, 2008.76REFERENCES[74] P. Bliman and G. Ferrari-Trecate, “Average consensus problems in networks of agents withdelayed communications,” Automatica, vol. 44, 2008.[75] P. T.-J. T. F. Be´ne´zit, V. Blondel and M. Vetterli, “Weighted Gossip: Distributed Aver-aging Using Non-Doubly Stochastic Matrices,” in Proc. of ISIT, 2010.[76] T. Zhang and H. Yu, “Average Consensus in Networks of Multi-agent with Multiple Time-varying Delays,” International Journal of Communications and System Sciences, vol. 3,2010.[77] P. Lin and Y. Jia, “Multi-Agent Consensus with Diverse Time-Delays and Jointly-Connected Topologies,” Automatica, vol. 47, 2011.[78] D. Angeli and P. Bliman, “Extension of a result by Moreau on stability of leaderless multi-agent systems,” in Proc. Joint 44th IEEE Conf. Decision and Control, Spain, 2005.[79] L. Fang and P. Antsaklis, “Information Consensus of Asynchronous Discrete-time Multi-agent Systems,” in Proc. of the American Control Conference, Oregon, 2005.[80] J. M.-M. R. A. Dimakis, S. Kar and A. Scaglione, “Gossip Algorithms for DistributedSignal Processing,” Proc. of the IEEE, vol. 98, no. 11, 2010.[81] W. Li and H. Dai, “Location-Aided Fast Distributed Consensus,” IEEE Trans. on Info.Theory, 2008.[82] E. Kokiopoulou and y. v. n. P. Frossard, journal=IEEE Trans. on Signal Processing,“Polynomial Filtering for Fast Convergence in Distributed Consensus.”[83] M. B.Oreshkin and M.Rabbat, “Optimization and Analysis of Distributed Averaging WithShort Node Memory,” IEEE Trans. on Signal Processing, vol. 58, no. 5, 2010.[84] C. Moallemi and B. V. Roy, “Consensus Propagation,” IEEE Trans. Info. Theory, vol. 52,no. 11, 2006.[85] A. Tahbaz-Salehi and A. Jadbabaie, “A Necessary and Sufficient Condition for ConsensusOver Random Networks,” IEEE Trans. on Automatic Control, vol. 53, no. 3, 2008.[86] D. Kingston and R. Beard, “Discrete-Time Average-Consensus under Switching NetworkTopologies,” in Proc. of the 2006 American Control Conf., 2006.[87] K. Topley and V. Krishnamurthy, “Average-Consensus Algorithms in a DeterministicFramework,” arXiv:1106.4346v1 [cs.DC].77[88] E. F. R. Olfati-Saber, E. Franco and J. Shamma, “Belief Consensus and Distributed Hy-pothesis Testing in Sensor Networks,” in Workshop on Network Embedded Sensing andControl, Notre Dame University, IN, 2005.[89] K. Topley and V. Krishnamurthy, “Average-Consensus Algorithms in a DeterministicFramework: Part 2 Central Connectivity,” IEEE Trans. on Signal Processing, vol. 12,no. 60, 2012.[90] G. Yin and Q. Zhang, Discrete-Time Markov Chains:Two-Time-Scale Methods and Appli-cations. Springer-Verlag, New York, NY, 2005.[91] H. Kushner, Approximation and Weak Convergence Methods for Random Processes, withapplications to Stochastic Systems Theory. MIT Press, Cambridge, MA, 1984.[92] G. Yin and Q. Zhang, Continuous-time Markov Chains and Applications: A SingularPerturbations Approach. Springer-Verlag, New York, NY, 1998.[93] S. Karlin and H. Taylor, A Second Course in Stochastic Processes. Academic Press, NewYork, NY, 1981.Appendix AProofs of Convergence ResultsIn this appendix we provide proof of the convergence results stated in this thesis. In Sec.A.1 wedetail the proofs of Thm.3.4.1-Thm.3.4.6 as well as the Lem.3.4.8-Lem.3.4.10 that are containedin Chapter 3. In Sec.A.2 we prove Thm.4.4.9−4.4.10 that were stated in Chapter 4. The proofsof Thm.4.4.7-4.4.8 and Thm.4.4.11 are relatively trivial and can be found in [87]. Lastly, inSec.A.3 we discuss 3 data dissemination results that can be utilized to combine the protocolsin Chapter 4 with the input dynamics assumed in Chapter 3.A.1 Proof of Results Contained in Chapter 3.This section details the proofs of Thm.3.4.1-Thm.3.4.6 and Lem.3.4.8-Lem.3.4.10.A.1.1 Proof of Theorem 3.4.1We divide the proof into three distinct steps. Some of the steps are formulated as lemmas toimprove clarity of the presentation.Step 1: In this step, we show that a moment bound holds for the iterates under consideration.The assertion is stated as a lemma.Lemma A.1.1 For each 0 < T <∞, under the conditions of Thm.3.4.1,sup0≤k≤T/µE|sk|2 <∞. (A.1)Proof of Lem.A.1.1. Define V (s) = s′s/2. The gradient and Hessian of V (s) are givenby ∇V (s) = s and ∇2V (s) = I, respectively. Under the Condition (C) we assume H has noeigenvalues with negative real parts, hence there is a λ(0) > 0 such that s′Hs ≥ λ(0)V (s).In addition, since Xk is a conditional Markov chain with a finite state space, it is boundeduniformly w.p.1, which implies thats′kWoXk ≤12(|sk|2 + |Wo|2|Xk|2)≤ K(V (sk) + 1)79A.1. Proof of Results Contained in Chapter 3.for some K > 0. Here and hereafter, K is used to represent a generic constant, whose valuesmay change for different appearances. Using (3.16) and the above estimates, we obtainV (sk+1) = V (sk) + µs′k(−Hsk +WoXk) +O(µ2)(V (sk) + 1)≤ V (sk)− λ(0)µV (sk) + µs′kWoXk +O(µ2)(V (sk) + 1)≤ (1− λ(0)µ)V (sk) +O(µ)(V (sk) + 1).(A.2)Taking expectations and iterating on the resulting sequence, we obtainEV (sk+1) ≤ (1−λ(0)µ)k+1EV (s(0))+O(µ)k∑j=0(1−λ(0)µ)k−jEV (sj)+O(µ)k∑j=0(1−λ(0)µ)k−j .An application of the Gronwall’s inequality yields thatEV (sk+1) ≤ O(1) exp(µk) ≤ O(1) exp(µ(T/µ)) ≤ O(1).Taking sup0≤k≤T/µ on both sides above yields the desired result. Note that by means of Tchebyshev’s inequality and Lem.A.1.1, for any δ > 0 sufficientlysmall, there is a Kδ = (1/√δ) sufficiently large such thatP (|sk| ≥ Kδ) ≤ E|sk|2K2δ≤ sup0≤k≤T/µE|sk|2K2δ≤ KK2δ≤ Kδ.Thus {sk} is tight. Since θk takes values in a finite set, it is also tight. Thus, we obtain thefollowing corollary.Corollary A.1.2 The pair of sequences {sk, θk} is tight. Step 2: We now show that the process (sµ(·), θµ(·)) is tight in D([0, T ] : RS ×M). Sincethe tightness of θµ(·) has been demonstrated in [90], we focus on the tightness of sµ(·). For anyδ > 0, t > 0, and 0 < u ≤ δ, we havesµ(t+ u)− sµ(t) = µ(t+u)/µ∑k=t/µ(−Hsk +WoXk).Thus by repeating the proof of Lem.A.1.1, we can showEµt |sµ(t+ u)− sµ(t)|2 ≤ Kµ2∑(t+u)/µ−1k=t/µ (Eµt |sk|2 + 1)≤ Kµ2( t+uµ −tµ)2 ≤ Ku2 ≤ Kδ,for u sufficiently small, where Eµt denotes the conditional expectation with respect to the σ-80A.1. Proof of Results Contained in Chapter 3.algebra Ft = {sµ(u), θµ(u) : u ≤ t}. Thuslimδ→0lim supµ→0EEµt |sµ(t+ u)− sµ(t)|2 = 0.The claim then follows from the well-known tightness criteria [p. 47] [91] (see also [Section7.3] [68]).Step 3: Characterization of the limit. Owing to Step 2, (sµ(·), θµ(·)) is tight. By Prohorov’stheorem, we can extract a weakly convergent subsequence. Without loss of generality, still indexthe selected sequence by µ and denote the limit by s(·), θ(·). By the Skorohod representation,we may assume (with slight abuse of notation), (sµ(·), θµ(·)) converges to (s(·), θ(·)) w.p.1 andthe convergence is uniform in any bounded t interval. For each t, u > 0, partition [t, t+ u] intosmall segments with the use of kµ →∞ as µ→ 0 but δµ := µkµ → 0 as µ→ 0. Then we havesµ(t+ u)− sµ(t) =t+u∑l:lδµ=tδµ1kµlkµ+kµ−1∑k=lkµ(−Hsk +WoXk),where∑t+ul:lδµ=t denotes the sum over l in the range t ≤ lδµ < t+ u. For the following analysis,it is crucial to recognize the two-time-scale structure of the algorithm. In the segment lkµ ≤k ≤ lkµ + kµ − 1, compared with sk and θk, Xk varies much faster. Thus, in this segment, skand θk can be viewed as fixed. As a result, although Xk depends on θ, the slowly varying θkenables us to treat Xk as a “noise” with θk fixed at a specific value θ. In the end, Xk will beaveraged out and replaced by its stationary measure. More precisely, let lδµ → u˜ as µ → 0.Then for all lkµ ≤ k ≤ lkµ + kµ − 1, µk → u˜. To emphasize the θ dependence in Xk, we writeit as Xk(θk). It then follows that1kµ∑lkµ+kµ−1k=lkµWoXk = 1kµ∑lkµ+kµ−1k=lkµWoXk(θk)= 1kµ∑lkµ+kµ−1k=lkµWoXk(θlkµ) + o(1)→Wopi(θ(u˜)) in probability as µ→ 0,where o(1) → 0 in probability as µ → 0. The last line follows from the ergodicity of the θdependent Markov chain Xk(θ). As a result∑(t+u)l:lδµ=tδµ 1kµ∑lkµ+kµ−1k=lkµWoXk →∫ t+ut Wopi(θ(u˜))du˜. (A.3)Likewise, we can work with the term involving sk by using the continuity in s of the following81A.1. Proof of Results Contained in Chapter 3.expression, which leads to(t+u)∑l:lδµ=tδµ1kµlkµ+kµ−1∑k=lkµHsk →∫ t+utHs(u˜)du˜. (A.4)Now combining (A.3) and (A.4), following the argument as in the proof of Theorem 4.5 in [49],it can be shown that (s(·), θ(·)) is a solution of the martingale problem associated with theoperator given byLf(s, θi) = ∇f ′(s, θi)[−Hs+Wopi(θi)] +Qf(s, ·)(θi), (A.5)whereQf(s, ·)(θi) =m∑j=1qijf(s, θj).The desired results thus follow. A.1.2 Proof of Theorem 3.4.2The proof can be shown in three steps as follows.(i) Defines˜µ(·) = sµ(·+ tµ), θ˜µ(·) = θµ(·+ tµ),where tµ is given in the statement of the theorem. Then (s˜µ(·), θ˜µ(·)) is tight, which canbe proven similar to that in Thm.3.4.1. For 0 < T < ∞, extract a convergent subsequence{s˜µ(·), s˜µ(· − T )} with a limit denoted by (s(·), sT (·)). Note that s(0) = sT (T ). Also note that{sk} is tight, which can be proven as in Cor.A.1.2. The tightness of {sk} then implies that{sT (0)} is tight. By using the following representation of the solution of the switched ODE andnoting T is arbitrary, we havesT (T ) = exp(−HT )sT (0) +∑mi=1∫ T0 exp(−H(T − u))Wopi(θi)I{θ(u)=i}du= exp(−HT )sT (0) +∑mi=1∫ T0 exp(−H(T − u))duWopi(θi)νi+∑mi=1∫ T0 exp(−H(T − u))Wopi(θi)[I{θ(u)=i} − νi]du.(A.6)(ii) We claim that as T →∞, the last term above goes to 0 in probability. To show this, itsuffices to work with a fixed i. Defineξ(T ) = E∣∣∣∣∫ T0exp(−H(T − u))Wopi(θi)[I{θ(u)=i} − νi]du∣∣∣∣2.82A.1. Proof of Results Contained in Chapter 3.Then it is readily seen that|ξ(T )| ≤ K∣∣∣E∫ T0 (Wopi(θi))′ exp(−H ′(T − t))dt∣∣∣×∣∣∣∫ t0 exp(−H(T − t))Wopi(θi)[I{θ(u)=i)} − νi][I{θ(t)=i} − νi]du∣∣∣(A.7)By using the Markov property, it can be shown that∣∣E[I{θ(u)=i)} − νi][I{θ(t)=i} − νi]∣∣= |[P (θ(u) = i)− νi][P (θ(t) = i|θ(u) = i)− νi]|≤ K exp(−κ0u) exp(−κ0(t− u)) ≤ K exp(−κ0t),where κ0 > 0 is a constant representing the spectrum gap in the Markov chain owing to theirreducibility of Q (see [92, p. 300]). In the above and hereafter in the proof, we use K asa generic positive constant whose value may change in different appearances. Since −H isHurwitz,∫ t0| exp(−H(T − u))|du ≤∫ t0exp(−λH(T − u))du ≤ K exp(−λH(T − t)),where λH > 0. Then we have∣∣∣E∫ t0 exp(−H(T − u))Wopi(θi)[I{θ(u)=i)} − νi][I{θ(t)=i} − νi]du∣∣∣≤ K exp(−λH(T − t)) exp(−κ0t).Then|ξ(T )| ≤∣∣∣K∫ T0 |Wopi(θi)|| exp(−H ′(T − t))| exp(−λH(T − t)) exp(−κ0t)dt∣∣∣ .If λH ≤ κ0, exp((λH − κ0)t) ≤ 1, so|ξ(T )| ≤∣∣∣K∫ T0 exp(−λH(T − t)) exp(−λHT )dt∣∣∣ ≤ K exp(−λHT )→ 0 as T →∞.If λH > κ0,|ξ(T )| ≤ K∫ T0 exp(−λH(T − t)) exp(−(λH − κ0)T ) exp(−κ0T )dt≤ K exp(−κ0T )→ 0 as T →∞.Thus the claim in (ii) is verified.(iii) The tightness of {sT (0)} together with −H being Hurwitz implies the first term onthe right-hand side of (A.6) tends 0 as T → ∞. The second term in (A.6) converges to s∗ as83A.1. Proof of Results Contained in Chapter 3.T →∞ by the hypothesis of this theorem. The desired result thus follows the above and (ii).A.1.3 Proof of Theorem 3.4.4The proof here is similar to Thm.3.4.1 with the modification of the inclusion. We omit the proofand refer the reader to [68] for related results on averaging leading to differential inclusions.Note that the main difference of the current result compared with Thm.3.4.1 is: In Thm.3.4.1,assumption (B) implies the existence of the unique invariant measure for each θ. Here thiscondition is relaxed. For Markov chains with multiple ergodic classes, we refer the readerto [93, Chapter 1] for further details.To proceed, we show that the sequence (vµ(·), θµ(·)) converges weakly to (v(·), θ(·)) suchthat the limit is a solution of the martingale problem with the operator given byLvf(v, θi) = −∇f ′(v, θi)Hv +12tr[∇2f(v, i)WoΣ(θi)Wo ′] +Qf(v, ·)(θi), θi ∈M. (A.8)The rest of the proof is similar to that of [49]. The details are omitted. A.1.4 Proof of Theorem 3.4.6The well-known Glivenko-Cantelli theorem (see [69, p. 103]) for mixing processes implies that1kµlkµ+kµ−1∑k=lkµI{Xk(θj1 )≤x} → F (θj1 , x) w.p.1 as µ→ 0 and hence kµ →∞.The Markov structure implies that(A(θj1))k−lkµ → 1pi(θj1) as µ→ 0.The limit is a matrix with identical rows containing the stationary distribution pi(θj1). SinceI{θlkµ=θj1} can be written as I{θµ(lδµ)=θj1}. As µ→ 0 and kδµ → u˜, we can show1kµ∑lkµ+kµ−1k=lkµElkµI{Xjk(θ)≤x}→∑mj1=1 pij(θj1)F (θj1 , x)I{θ(u˜)=θj1}= pij(θ(u˜))F (θ(u˜), x).Thus we obtain that the limit η(·) satisfiesη(t)− η(u) =∫ tupij(θ(u˜))F (θ(u˜), x)du˜.The desired result then follows immediately. 84A.1. Proof of Results Contained in Chapter 3.A.1.5 Proof of Lemma 3.4.8Intuitively, we motivate Lemma 3.4.8 by the following rationale: assume all sensor computationspossess the linear form (3.16). Then the average-consensus p¯i(θ(t)) can be obtained underCondition (1) if each sensor maintains, in addition to (3.16), two distinct estimates based onthe following sub-algorithms of (3.16):1. The estimate sˆk based on only the local tracking sub-algorithm of (3.16),s˜ik+1 = s˜ik + µ(Xik − s˜ik), s˜i(0) = Xi(0), i ∈ V (A.9)2. The estimate sik based only on the (static) Laplacian consensus sub-algorithm of (3.16),sk+1 = (I − µL)sk, s(0) = X(0) .As µ → 0, the local tracking estimates sˆik converge weakly to pii(θ(t)) for each i ∈ V. Incomparison, it is well-known [18] that for any graph Laplacian 0 4 L with left-right eigenvectorsω`, ωr corresponding to the zero eigenvalue and satisfying ω′`ωr = 1, ω` > 0, the estimate sikwill then approach a convex-consensus s∞ = ωrω′`s(0) = ωrω′`X(0) for all i ∈ V. Regardless,the linear combination denotedsi(t)(1) =1α(si(t)− sˆi(t)) + s˜i(t) , (A.10)has a steady-state p¯i(θ(t)) for each i ∈ V. Under Condition (2) the linear combination,si(t)(2) =1nβi(sˆi(t) + (nβi − 1)s˜i(t)) , (A.11)has a steady-state p¯i(θ(t)) for each i ∈ V, as follows from (A.13).Exact Solutions to (3.18)). We first derives(t) = e−Ht(s(0)− Λpi(θ(t))) + Λpi(θ(t)), (A.12)as an exact solution to (3.18) conditional on the state θ(t) of θ ∈ M. The equilibrium matrixΛ is defined by the following function of {Gv,Go},Λ ={(L+Do)−1Wo if 0 ≺ (L+Do),(I − ωrω′`) if 0 4 (L+Do) , and Wo = L+Do.(A.13)85A.1. Proof of Results Contained in Chapter 3.If 0 ≺ H then the above results follow immediately by standard techniques of solving systemsof first order linear ODEs. On the other hand if 0 4 H, then (A.13) is derived based on theconservation property,ddtω′`s(t) = ω′`ds(t)dt= −ω′`Hs(t) + ω′`Wopi(θ(t)) = 0, (A.14)which follows under the assumption Wo = H, since ω′`H = 0 by definition. From (A.14) it isclear Λ must satisfy ω′`Λ = 0, whereas by solvingds(t)dt = 0 given (A.12) we have HΛ = Wo.These two constraints then uniquely determine the second steady-state in (A.13).For any ntrcs graph Gv, if and only if L is balanced will ωrω′` =1n11′ [18]. On the otherhand, if 0 4 H then ωr = c1 for any non-zero c ∈ R. Since ω′`ωr = 1 it is clear that ifωrω′` 6=1n11′ then the average-consensus cannot be obtained by any linear combination of s(t),sˆ(t) or s(t). Furthermore, when 0 4 H and Wo 6= H, then (3.16), although stable in the limitas µ vanishes, is asymptotically unbounded as the number of b1/µc iterations increase. To seethis assume t ≤ 0 and 0 < µ 1 are such that (t/µ) ∈ N. The (t/µ) iteration of (3.16) yieldsthe sensor estimates,st/µ = (I − µL − µDo)t/µs(0) +t/µ−1∑l=0(I − µL − µDo)lµWoXt/µ−1−l. (A.15)For arbitrary weight matrices {Wv,Wo} the limit of (A.15) when µ approaches zero yields thesame expression for the coefficient of s(0) as the exact solution (A.12),limµ→0(I − µL − µDo)t/µ = e−(L+Do)t.Similarly in the limit as µ vanishes, by (A.3) above we replace Xt/µ−1−l with the stationarymeasure pi(θ(t)), the summed coefficients of which yield the following limiting values,limµ→0t/µ−1∑l=0(I − µL − µDo)lµWo ={H−1(I − e−Ht)Wo , 0 ≺ Hα(I − e−Ht) , 0 4 H , Wo = αH, α ∈ R.(A.16)Without loss of generality set α = 1. As t increases the above limits become those expressedin (A.13). If, however, 0 4 H and Wo 6= H, the above iteration results in an unboundedsteady-state in the limit t increases, as can be seen by re-expressing (A.16) in terms of the86A.1. Proof of Results Contained in Chapter 3.eigen-decompositions H = UJU−1 and Wo = ARA−1,t/µ−1∑l=0(I − µL − µDo)lµWo = Ut/µ−1∑l=0(I − µJ)lµU−1ARA−1. (A.17)In the limit as µ vanishes this expression becomes UJ∗U−1ARA−1, where the zero eigenvalueof J is replaced by t in the matrix J∗ and all others replaced by 1−e−λitλi, for convenience welet the first element of J denote this zero eigenvalue, J11 = 0. As t increases it is clear J∗approaches J−1 except in its first element, which grows linearly with t. For (A.17) to remainbounded then as t increases, the multiplications UJ∗U−1ARA−1 must eliminate the presenceof t in J∗11. It is straight-forward to see this occurs if and only ifWo = H, in which case A = U ,R = J , and the zero eigenvalue in R eliminates the presence of t in J∗ regardless the value ofµ. In this case the denominator of J∗ is also eliminated by right multiplication of ARA−1 andwe have the second steady-state of (A.13).Since we have considered all possible solutions to (3.18) with bounded steady-states, thereexist no other methods by which to achieve average-consensus through (3.16) and its impliedsub-algorithms. Our rationale has been to place conditions on {Gv,Go} such that the sensorsteady-states under (3.16) can be adjusted by their local tracking estimates sˆ(t) to obtain theaverage-consensus estimate p¯i(θ(t)). This requires the equilibrium matrix Λ have each of itsrows i = 1, . . . , n comprised of a non-zero scalar βi, with diagonals defined (1 − (n − 1)βi), aswe have expressed by (3.30). The diagonals are defined as such due to the property Λ1 = 1which holds only when 0 ≺ H, as can be seen by (A.13) and the fact(L+Do)−1Wo1 = 1 ⇒ Do1 = (L+Do)1 (A.18)which holds since L1 = 0 by definition. Assuming 0 ≺ H, the necessity of Λ1 = 1 is evidentsimply from the contradiction entailed by (A.18) when Λ1 6= 1.The equality among non-diagonal elements within each row i implies that the sensor estimatesi(t) approaches a linear average with uniform weights (βi) assigned to the currently observedstationary distributions of all other sensors, and a disproportionate weight (1− (n−1)βi) givento its own locally observed stationary distribution. If the weights βi are known then by thecorrection with s˜ik and re-scaling, as stated in (A.11), we obtain the steady-state p¯i(θ(t)). We make two remarks here.1. The average-consensus estimate s(t)(1) in (A.10) assumes L is balanced and that eachsensor knows the scale factor α between (L + Do) and Wo. In Condition (2), the state-valueexchange graph Laplacian L need not be balanced, but each sensor i ∈ V must know the totalnetwork size n, as well as the ith diagonal element of the matrix β in (3.30). Computation87A.2. Proof of Results Contained in Chapter 4.of any diagonal element in β will by (3.29) require near complete knowledge of the exchangegraphs {Go,Gv}, and so under Condition (2) we must presume each sensor has this knowledge.2. Under condition (2), it follows by (A.11) that s(t)2 = sˆ(t) if and only if β = 1nI. Inthis case the extra local tracking estimate s˜(t) is not required. However, when β = 1nI then(3.29) cannot hold unless Go is complete, that is Eo contains every ordered pair of (i, j) ∈ V.Equivalently this implies every sensor observes all Markov chains Xi, i ∈ V, in which caseconsensus may be formed trivially.A.1.6 Proof of Lemma 3.4.9The proof follows by showing (3.29) is infeasible when 0 ≺ H and Eo = Ev. Since 0 ≺ H we canrearrange (3.29) such thatWo = HΛ. Denote the ith row of H by the row vector wi. Since eachrow of Wv has at least one zero non-diagonal element (since we are assuming a decentralizedcommunication graph), thus there exists an element ji such that wji = 0 for each row i ∈ V.Since Eo = Ev we then have by (3.29) and (3.30) that 1′βw′i = 0 for each row i. This impliesthe rows of H are linearly dependent and thus H has an eigenvalue at 0, which contradicts ourassumption 0 ≺ H. Thus (3.29) is infeasible when Eo = Ev and 0 ≺ H. On the other hand, ifEo = Ev then we may setWo = α(L+Do) and Condition (1) holds if L is balanced. A.1.7 Proof of Lemma 3.4.10By our assumption of a decentralized network, there exists for each row i an element joi suchthat (i, joi ) /∈ Eo and an element jvi such that (i, jvi ) /∈ Ev. For any row i, if joi = jvi then wehave 1′βw′i = 0. By Lemma (3.4.9) if 1′βw′i = 0 holds for all i ∈ V then average-consensus isinfeasible when Eo 6= Ev, thus there must be at least one row i∗ such that 1′βw′i∗ 6= 0. Thisimplies (i∗, jvi∗) ∈ Eoi∗ for at least one jvi∗ such that (i∗, jvi∗) /∈ Evi∗ , which implies (i∗, jvi∗) ∈ Eoi∗ forall jvi∗ such that (i∗, jvi∗) /∈ Evi∗ . Thus Eoi∗ ⊇ E¯vi∗ . On the other hand, Condition (1) of Lemma(3.4.8) cannot be satisfied if Eo 6= Ev, thus in this case we require Condition (2). A.1.8 Proof of Corollary 3.4.11The result follows immediately from Lemma 3.4.10. A.2 Proof of Results Contained in Chapter 4.This section details the proof of Thm.4.4.9 and Thm.4.4.10. Before proceeding, we clarifythe following notations. An arbitrary signal Sij(t0, t1) is now specified as Sij(tij0 , tij1 ), and an88A.2. Proof of Results Contained in Chapter 4.arbitrary communication sequence Ct0,t1 is now denoted C[t0,t1]. We define the squared “error”of the normal consensus estimate vˆi(t) as,E2(vˆi(t)) = ||vˆi(t)−1n1n||2 . (A.19)The total reduction in normal consensus squared error resulting from the sequence C[t0,t1] isdefined as,E˜2(C[t0,t1]) = limt→t1∑Sij(tij0 ,tij1 )∈C[t0,t]E˜2(Sij(tij0 , tij1 )), (A.20)where E˜2(Sij(tij0 , tij1 ))is defined using the normal consensus error (A.19) as follows,E˜2(Sij(tij0 , tij1 ))= E2(vˆi(tij1 ))− E2(vˆi(tij1 + 1)). (A.21)Let vˆi`(t) denote the `th element of the vector vˆi(t). Lastly, it will be convenient to use thefollowing definition.Definition A.2.1 Under (A4′), the average-consensus problem (4.2.2) is solved at time t1 ∈(0,∞] if the “network normal consensus error”∑ni=1E2(vˆi(t))approaches zero as t approachest1, that is,limt→t1n∑i=1E2(vˆi(t))= 0 . (A.22)A.2.1 Proof of Theorem 4.4.9Outline of Proof : The proof of Theorem 4.4.9 follows from four observations: (i) each nor-mal consensus estimate satisfies the normalization property ||vˆi(t)||2 = (1/n)vˆi(t)′1n (LemmaA.2.3), (ii) each normal consensus estimate satisfies a “zero local error” property vˆii(t) = 1/n(Lemma A.2.7), (iii) the normal consensus reduction in error resulting from any signal willeventually vanish (Lemma A.2.11), (iv) any IVSC sequence contains an infinite number of con-tiguous SVSC sub-sequences, cf. (4.4.2). By combining these four features we can show thateach vˆi(t) must asymptotically approach 1n1n in the L2-norm.To help clarify our proof, we note that Lemmas A.2.2-A.2.13 deal with the properties of theDA normal consensus estimates, whereas Lemmas A.2.14-A.2.17 deal with the convergence ofthe DA normal consensus estimates in the L2-norm. Lemma A.2.14 shows the receiving estimate(prior to update) converges to the transmitted estimate. Lemma A.2.15 shows the receivingestimate (prior to update) converges to its value after update. Lemma A.2.16 combines theprevious two lemmas to show the receiving estimate (after update) converges to the transmittedestimate. By then combining properties (i),(ii), and (iv) above, we obtain the desired result.89A.2. Proof of Results Contained in Chapter 4.Before each lemma we describe in words its intended purpose 4.Lemma A.2.2 (DA Consensus Estimate Updates) Applying (4.11)−(4.12) to (4.4)−(4.5)yields the DA consensus estimate updates (4.13)− (4.14). Proof. Under (4.11)− (4.12) we have for any knowledge set Ki(tij1 ) and signal Sij(tij0 , tij1 ),Ki(tij1 )⋃Sij(tij0 , tij1 ) = {i, n, si(0), sˆi(tij1 ), vˆi(tij1 ), sˆj(tij0 ), vˆj(tij0 )} ,where si(0) = Sei , sˆi(tij1 ) = Svˆi(tij1 ) , sˆj(tij0 ) = Svˆj(tij0 ).It follows that Vi = [ei, vˆi(tij1 ), vˆj(tij0 )] and Si = [si(0), sˆi(tij1 ), sˆj(tij0 )]. We can thus re-write(4.4) as,vˆi(tij1 + 1) = arg minv˜ ∈ span{vˆi(tij1 ), vˆj(tij0 ), ei}||v˜ −1n1n||2. (A.23)The update (4.13) follows from (A.87) by standard results in least-squares optimization, andthe update (4.14) then follows from (4.5). Lemma A.2.3 (DA Consensus Estimate Normalization) Every normal consensus esti-mate vˆi(t) satisfies||vˆi(t)||2 =1nvˆi(t)′1n , ∀ i ∈ V , ∀ t ≥ 0 . (A.24)Proof. From (4.15) we have the initial condition vˆi(0) = 1nei, which satisfies (A.80) foreach i ∈ V. Next observe that under (A7) the estimate vˆi(t) will not change unless a signalSij(tij0 , tij1 ) is received at node i. If a signal is received then by Lemma A.2.2 the estimate vˆi(t)is updated to the unique solution of (A.87). Thus to finish the proof it suffices to show that ifa vector v ∈ Rn×1 does not satisfy (A.80) then the vector v is not the solution to (A.87). Toprove this we show that if (A.80) does not hold then the vector w defined as,w = v(v′1n)/(n||v||2),will satisfy the inequality||w −1n1n||2 < ||v −1n1n||2 . (A.25)Since w is contained in span(v), the inequality (A.25) implies that v is not the solution to(A.87). Next observe that if a vector v does not satisfy (A.80) then,(||v||2 −1nv′1n)2> 0 . (A.26)4For brevity we do not show that (A4′), (4.4), and (4.5) together imply the DA initialization (4.15). In lieu ofa formal proof, note that under (A4′) the initialization (4.15) can simply be assumed, regardless of (4.4)− (4.5).90A.2. Proof of Results Contained in Chapter 4.Expanding (A.26) yields,(||v||2)2 − 21nv′1n||v||2 +( 1nv′1n)2> 0 . (A.27)Re-arranging (A.27) then implies (A.25),(||v||2)2 − 2 1nv′1n||v||2 > −(1nv′1n)2,||v||2 − 2 1nv′1n + 1n >(v′1nn||v||2)2||v||2 − 2 (v′1n)2n2||v||2 +1n ,||v − 1n1n||2 > ||v v′1nn||v||2 −1n1n||2 = ||w − 1n1n||2 .Lemma A.2.4 (DA Non-Decreasing Normal Consensus Estimate Magnitude) Eachmagnitude ||vˆi(t)||2 is a non-decreasing function of t ≥ 0 for all i ∈ V. Proof. Under (A7) the estimate vˆi(t) will not change unless a signal is received at nodei. If a signal Sij(tij0 , tij1 ) is received then the DA update problem (A.87) implies the updatevˆi(tij1 + 1) must satisfy,||vˆi(tij1 + 1)−1n1n||2 ≤ ||w − 1n1n||2 ,∀ w ∈ span {vˆi(tij1 ), vˆj(tij0 ), ei}.(A.28)Since {vˆi(tij1 ), vˆj(tij0 )} ∈ span {vˆi(tij1 ), vˆj(tij0 ), ei} the inequality (A.28) implies,||vˆi(tij1 + 1)−1n1n||2 ≤ min{||vˆi(tij1 )−1n1n||2 ,||vˆj(tij0 )−1n1n||2}.(A.29)Next observe that if a vector v ∈ Rn×1 satisfies (A.80) then,||v −1n1n||2 = ||v||2 − 21nv′1n +1n=1n− ||v||2 . (A.30)Due to Lemma A.2.3, all normal consensus estimates satisfy (A.80), thus we can apply (A.30)to (A.29) and obtain,1n− ||vˆi(tij1 + 1)||2 ≤ min{ 1n − ||vˆi(tij1 )||2 , 1n − ||vˆj(tij0 )||2}. (A.31)Subtracting both sides of (A.31) from 1n then yields,||vˆi(tij1 + 1)||2 ≥ max{||vˆi(tij1 )||2 , ||vˆj(tij0 )||2} ≥ ||vˆi(tij1 )||2 , (A.32)91A.2. Proof of Results Contained in Chapter 4.thus each magnitude ||vˆi(t)||2 is a non-decreasing function of t ≥ 0 for all i ∈ V. Lemma A.2.5 (Equality of Normalized Linearly Dependent Vectors) Any two non-zero vectors that satisfy (A.80) are linearly dependent iff they are identical. Proof. The proof is omitted for brevity. See [87] for details. Remark A.2.6 The initialization (4.15) combined with Lemma A.2.4 implies that every DAnormal consensus estimate will be non-zero for all t ≥ 0. Due then to Lemma A.2.3, the resultstated in Lemma A.2.5 is applicable to all DA normal consensus estimates.Lemma A.2.7 (DA Zero Local Error Property) Every normal consensus estimate vˆi(t)satisfiesvˆii(t) =1n, ∀ i ∈ V , ∀ t ≥ 0 . (A.33)Proof. The initial condition (4.15) implies vˆii(0) = 1n for each i ∈ V. Next observe thatunder (A7) the estimate vˆi(t) will not change unless a signal Sij(tij0 , tij1 ) is received at node i.If a signal is received then vˆi(t) is updated to the solution of (A.87). To finish the proof, wenow consider the three possibilities: (i) {vˆi(tij1 ), vˆj(tij0 ), ei} contain three linearly dependent vec-tors, (ii) {vˆi(tij1 ), vˆj(tij0 ), ei} contain two linearly dependent vectors, and (iii) {vˆi(tij1 ), vˆj(tij0 ), ei}contain no linearly dependent vectors.Part (i). If the set of vectors {vˆi(tij1 ), vˆj(tij0 ), ei} are a linearly dependent set, then from(A.87) we have vˆi(tij1 + 1) ∈ span{ei} which implies from Lemma A.2.3 that vˆi(tij1 + 1) =1nei,thus satisfying (A.83).Part (ii). Next assume that only two vectors in the set {vˆi(tij1 ), vˆj(tij0 ), ei} are linearlydependent. In this case, without loss of generality, (A.87) reduces tovˆi(tij1 + 1) = arg minv˜ ∈ span{ei, v} ||v˜ −1n1n||2,= arg minv˜ ∈ {a 1nei + bv : (a, b) ∈ R2} ||v˜ −1n1n||2 ,(A.34)where v ∈ {vˆi(tij1 ), vˆj(tij0 )} is linearly independent of ei. The objective function in (A.34) isf(a, b) = ||a 1nei + bv −1n1n||2 ,= a2|| 1nei||2 + b2||v||2 + 2ab 1ne′iv− 2an2 e′i1n − 2bv′ 1n1n +1n .(A.35)The Lemma A.2.3 implies that v satisfies (A.80). Note also that 1nei satisfies (A.80), thus theobjective function (A.35) can be simplified,f(a, b) = (a2 − 2a)||1nei||2 + (b2 − 2b)||v||2 + 2ab1ne′iv +1n.92A.2. Proof of Results Contained in Chapter 4.The first-order partial derivatives of f(a, b) are,∂f(a,b)∂a = 2(a− 1)||1nei||2 + 2b 1ne′iv ,∂f(a,b)∂b = 2(b− 1)||v||2 + 2a 1ne′iv .(A.36)The second-order partial derivatives of f(a, b) are,∂2f(a,b)∂a2 = 2||1nei||2 ,∂2f(a,b)∂b2 = 2||v||2 ,∂2f(a,b)∂a∂b =∂2f(a,b)∂b∂a = 21ne′iv .The determinant of the Hessian matrix of f(a, b) is thus,det[H(f(a, b))]= 2(||1nei||2||v||2 − (1ne′iv)2) . (A.37)Since 1nei and v are linearly independent, the determinant (A.37) is strictly positive by theCauchy-Schwartz inequality. This implies the Hessian matrix H(f(a, b))is positive-definite,thus setting the right-hand side (RHS) of (A.36) to zero and solving for (a, b) yields the uniqueoptimal values (aˆ, bˆ),aˆ =||v||2 1ne′i(1nei − v)|| 1nei||2||v||2 − ( 1ne′iv)2, bˆ =|| 1nei||2v′(v − 1nei)|| 1nei||2||v||2 − ( 1ne′iv)2. (A.38)From (A.38) the unique solution to (A.34) is thus obtained,vˆi(tij1 + 1) =||v||2 1n e′i(1n ei−v)|| 1n ei||2||v||2−( 1n e′iv)21nei+|| 1n ei||2v′(v− 1n ei)|| 1n ei||2||v||2−( 1n e′iv)2 v .(A.39)Denoting the ith element of v as vi, we can express the element vˆii(tij1 + 1) based on (A.39) asfollows,vˆii(tij1 + 1) =||v||2 1n e′i(1n ei−v)|| 1n ei||2||v||2−( 1n e′iv)21n+|| 1n ei||2v′(v− 1n ei)|| 1n ei||2||v||2−( 1n e′iv)2 vi ,=||v||2( 1n2− 1nvi)1n+1n2(||v||2− 1nvi)vi1n2||v||2− 1n2v2i,=1n ||v||2−vi||v||2+vi||v||2−1nv2i||v||2−v2i= 1n .(A.40)The last equality in (A.40) follows since Lemma A.2.5 implies ||v||2 6= v2i under the assumptionthat v satisfies (A.80) and is linearly independent from 1nei.93A.2. Proof of Results Contained in Chapter 4.Part (iii). Lastly, we assume that {vˆi(tij1 ), vˆj(tij0 ), ei} is a linearly independent set of vectors.In this case the update (4.13) can be expressed,vˆi(tij1 + 1) = V(DA)(V′(DA)V(DA))−1V ′(DA)1n1n,V(DA) = [vˆi(tij1 ), vˆj(tij0 ),1nei] .For notational convenience we denote vˆi = vˆi(tij1 ) and vˆj = vˆj(tij0 ). The product (V′(DA)V(DA))has the inverse (A.41) below,(V ′(DA)V(DA))−1 =||vˆi||2 vˆ′ivˆj1n vˆiivˆ′ivˆj ||vˆj ||2 1n vˆji1n vˆii1n vˆji1n2−1,= 1|V ′(DA)V(DA)|1n2(||vˆj ||2 − vˆ2ji)1n2(vˆiivˆji − vˆ′ivˆj)1n(vˆjivˆ′ivˆj − ||vˆj ||2vˆii)1n2(vˆiivˆji − vˆ′ivˆj)1n2(||vˆi||2 − vˆ2ii)1n(vˆiivˆ′ivˆj − ||vˆi||2vˆji)1n(vˆjivˆ′ivˆj − ||vˆj ||2vˆii)1n(vˆiivˆ′ivˆj − ||vˆi||2vˆji) (||vˆi||2||vˆj ||2 − (vˆ′ivˆj)2)(A.41)where the determinant det[V ′(DA)V(DA)]can be computed as,det[V ′(DA)V(DA)]= 1n2 (||vˆi||2||vˆj ||2 − (vˆ′ivˆj)2 + 2vˆiivˆjivˆ′ivˆj−vˆ2ii||vˆj ||2 − ||vˆi||2vˆ2ji) .Right-multiplying (A.41) by V ′(DA)1n1n and then left-multiplying by the `th row of V(DA) yieldsthe following closed-form expression for vˆi`(tij1 + 1),vˆi`(tij1 + 1) =1det[V ′(DA)V(DA)] 1n2(vˆi`(||vˆi||2(||vˆj ||2 − vˆ2ji)+||vˆj ||2(vˆiivˆji − vˆ′ivˆj) +1n(vˆjivˆ′ivˆj − vˆii||vˆj ||2))+vˆj`(||vˆi||2(vˆiivˆji − vˆ′ivˆj) + ||vˆj ||2(||vˆi||2 − vˆ2ii)+ 1n(vˆiivˆ′ivˆj − ||vˆi||2vˆji))).(A.42)Taking ` = i in (A.42) then yields (A.83), see [87] for details. Lemma A.2.8 (DA Lower Bound On Signal Reduction in Error) Upon reception ofany signal Sij(tij0 , tij1 ), the decrease in the updated normal consensus squared error has thefollowing lower bound:E2(vˆi(tij1 ))− E2(vˆi(tij1 + 1))≥max{||vˆj(tij0 )||2 − ||vˆi(tij1 )||2 ,n(vˆj(tij0 )′(vˆj(tij0 )− vˆi(tij1 )))2}.(A.43)94A.2. Proof of Results Contained in Chapter 4.Proof. By Lemma A.2.3 we can apply (A.30) to the left-hand side of (A.43),E2(vˆi(tij1 ))− E2(vˆi(tij1 + 1))= ||vˆi(tij1 + 1)||2 − ||vˆi(tij1 )||2. (A.44)Applying the first line of (A.32) to the RHS of (A.44) then yields,E2(vˆi(tij1 ))− E2(vˆi(tij1 + 1))≥ max{||vˆi(tij1 )||2 , ||vˆj(tij0 )||2} − ||vˆi(tij1 )||2 ,= max{0 , ||vˆj(tij0 )||2 − ||vˆi(tij1 )||2} ,≥ ||vˆj(tij0 )||2 − ||vˆi(tij1 )||2 .(A.45)Next observe that for any two vectors vˆi, vˆj in Rn×1,span {vˆi, vˆj} ⊆ span {vˆi, vˆj , ei} .It thus follows that,E2(arg minv˜ ∈ span{vˆi, vˆj , ei} ||v˜ −1n1n||2)≤ E2(arg minv˜ ∈ span {vˆi, vˆj} ||v˜ −1n1n||2).(A.46)Let vˆi = vˆi(tij1 ), vˆj = vˆj(tij0 ) and vˆi(tij1 + 1) = vˆi(+) for notational convenience. From (A.46)we have,E2(vˆi)− E2(vˆi(+))≥ || arg minv˜ ∈ span{vˆi, vˆj} ||v˜ −1n1n||2||2 − ||vˆi||2 ,= ||wˆ||2 − ||vˆi||2 = 1n(wˆ′1n − vˆ′i1n),(A.47)where the last equality holds due to Lemma A.2.3, and wˆ is defined,wˆ = arg minv˜ ∈ span{vˆi, vˆj} ||v˜ −1n1n||2 ,= arg minv˜ ∈ {avˆi + bvˆj : (a, b) ∈ R2 }||v˜ − 1n1n||2 .(A.48)Note that Lemma A.2.4 together with the initialization (4.15) implies vˆi and vˆj are non-zero.Next assume that vˆi and vˆj are linearly dependent. In this case (A.48) reduces to,wˆ = arg minv˜ ∈ span{vˆi} ||v˜ −1n1n||2 ,= V (V ′V )−1V ′ 1n1n , V = [vˆi] ,= vˆivˆ′i1nn||vˆi||2= vˆi ,(A.49)95A.2. Proof of Results Contained in Chapter 4.where the last equality follows due to Lemma A.2.3. Applying (A.49) to (A.47) implies,E2(vˆi)− E2(vˆi(+)) ≥ ||wˆ||2 − ||vˆi||2= ||vˆi||2 − ||vˆi||2 = 0 = n(||vˆj ||2 − vˆ′ivˆj)2 (A.50)where the last equality follows by Lemma A.2.5 since Lemma A.2.3 implies both vˆi and vˆjsatisfy (A.80), and we are assuming vˆi and vˆj are linearly dependent.Next assume vˆi and vˆj are linearly independent. In this case (A.48) can be solved analogousto (A.39) based on the optimization problem (A.34),wˆ = arg minv˜ ∈ {avˆi + bvˆj : (a, b) ∈ R2}||v˜ − 1n1n||2 ,= ||vˆj ||2vˆ′i(vˆi−vˆj)||vˆi||2||vˆj ||2−(vˆ′ivˆj)2 vˆi +||vˆi||2vˆ′j(vˆj−vˆi)||vˆi||2||vˆj ||2−(vˆ′ivˆj)2 vˆj .(A.51)Substituting the second line of (A.51) for wˆ in the third line of (A.47) then yields,1n(wˆ′1n − vˆ′i1n)= 1n( ||vˆj ||2vˆ′i(vˆi−vˆj)vˆ′i1n+||vˆi||2vˆ′j(vˆj−vˆi)vˆ′j1n||vˆi||2||vˆj ||2−(vˆ′ivˆj)2 − vˆ′i1n)=||vˆj ||2||vˆi||2vˆ′i(vˆi−vˆj)+||vˆi||2||vˆj ||2vˆ′j(vˆj−vˆi)−(||vˆi||2||vˆj ||2−(vˆ′ivˆj)2)||vˆi||2||vˆi||2||vˆj ||2−(vˆ′ivˆj)2= ||vˆi||2||vˆi||2||vˆj ||2−(vˆ′ivˆj)2(||vˆj ||2(||vˆi||2 − vˆ′ivˆj) + ||vˆj ||2(||vˆj ||2 − vˆ′ivˆj)−||vˆi||2||vˆj ||2 + (vˆ′ivˆj)2),≥((||vˆj ||2)2 − 2vˆ′ivˆj ||vˆj ||2 + (vˆ′ivˆj)2) ||vˆi||2||vˆi||2||vˆj ||2=(||vˆj ||2 − vˆ′ivˆj)2 1||vˆj ||2≥ n(||vˆj ||2 − vˆ′ivˆj)2,(A.52)where the last inequality follows since E2(vˆj) ≥ 0 implies ||vˆj ||2 ≤ 1n . Combining (A.47) and(A.52) implies,E2(vˆi)− E2(vˆi(+)) ≥ n(||vˆj ||2 − vˆ′ivˆj)2. (A.53)Together (A.45), (A.50) and (A.53) imply (A.43). Lemma A.2.9 (DA Non-Decreasing Magnitude for any Communication Time-RespectingPath) Any time-respecting communication path Cij[t0(ij),t1(ij)] implies,||vˆi(t1(ij) + 1)||2 ≥ ||vˆj(t0(ij))||2 .Proof. This result follows directly from (A.87), Lemma A.2.4, and the definition of a time-respecting communication path (A.4.3). See [87] for details. Lemma A.2.10 (Error Expression for C[0,∞] satisfying (4.4.3)) For any communica-tion sequence C[0,∞] satisfying (4.4.3) the total reduction in normal consensus squared error96A.2. Proof of Results Contained in Chapter 4.E˜2(C[0,∞]) defined in (A.20) is,E˜2(C[0,∞]) = limt→∞∑ni=1(E2(vˆi(0))− E2(vˆi(t)))= n−1n − limt→∞∑ni=1E2(vˆi(t))=∑`∈N E˜2(C[t`0,t`1]) , C[t`0,t`1] ∈ SVSC , t`1 < t`+10≤ n−1n .(A.54)Proof. The first line follows from (A.20)−(A.21). The second line in (A.54) is due to the ini-tialization (4.15) and (A.19). The third line in (A.54) follows since any communication sequenceC0,∞ satisfying (4.4.3) can be partitioned into an infinite number of disjoint sequences C[t`0,t`1] ∈SVSC. The last line in (A.54) follows since the minimum error of any normal consensus estimateis 0. Lemma A.2.11 (Vanishing Reduction in Error for C[0,∞] satisfying (4.4.3)) For anycommunication sequence C[0,∞] satisfying (4.4.3) there exists an integer `ε such that,E˜2(C[t`0,t`1]) ≤ ε , ∀ ` ≥ `ε , (A.55)for any ε > 0, where E˜2(C[t`0,t`1]) is defined by (A.20). Proof. The third line of (A.54) implies that for any C[0,∞] satisfying (4.4.3) the quantityE˜2(C[0,∞]) is the sum of an infinite number of non-negative terms E˜2(C[t`0,t`1]), the fourth line in(A.54) implies E˜2(C[0,∞]) is bounded above, thus (A.55) follows by the monotone convergencetheorem for sequences of real numbers. Lemma A.2.12 (DA Lower Bound on Reduction in Error for any C[0,t1] satisfying(4.4.2)) Any communication sequence C[t0,t1] satisfying (4.4.2) implies,E˜2(C[t0,t1]) ≥ n(mini∈V{||vˆi(t1 + 1)||2}−maxi∈V{||vˆi(t0)||2})≥ 0 .(A.56)Proof. The first inequality in (A.56) holds for any communication sequence C[t0,t1],E˜2(C[t0,t1]) =∑ni=1(E2(vˆi(t0))− E2(vˆi(t1 + 1)))=∑ni=1(||vˆi(t1 + 1)||2 − ||vˆi(t0)||2)≥ n(mini∈V{||vˆi(t1 + 1)||2} −maxi∈V{||vˆi(t0)||2}).97A.2. Proof of Results Contained in Chapter 4.To prove the second inequality in (A.56) it is required that C[t0,t1] satisfies (4.4.2). Let us define,` = arg mini∈V{||vˆi(t1 + 1)||2} ,` = arg maxi∈V{||vˆi(t0)||2} .Since C[t0,t1] satisfies (4.4.2) there exists a time-respecting communication path between anytwo nodes in V, in particular we have C``[t0(``),t1(``)]∈ C[t0,t1]. The Lemma A.2.9 then implies,||vˆ`(t1(``) + 1)||2 ≥ ||vˆ`(t0(``))||2 .The second inequality in (A.56) then follows because,mini∈V{||vˆi(t1 + 1)||2} = ||vˆ`(t1 + 1)||2≥ ||vˆ`(t1(``) + 1)||2 ≥ ||vˆ`(t0(``))||2≥ ||vˆ`(t0)||2 = maxi∈V{||vˆi(t0)||2} ,where the first and third inequality hold due to Lemma A.2.4 because t1 ≥ t1(``) and t0(``) ≥ t0respectively. Lemma A.2.13 (DA Properties of the Normal Consensus Update) Upon reception ofa signal Sij(tij0 , tij1 ), the normal consensus estimate vˆi(tij1 +1) that results from the update (A.87)will satisfy,vˆi(tij1 )′(vˆi(tij1 )− vˆi(tij1 + 1))≤ 0 . (A.57)Proof. Let us define w˜,w˜ = arg minv˜ ∈ span{vˆi(tij1 + 1), vˆi(tij1 ), ei}||v˜ −1n1n||2 , (A.58)where vˆi(tij1 + 1) is given by (A.87). Observe that vˆi(tij1 + 1) ∈ span{vˆi(tij1 ), vˆj(tij0 ), ei} implies,span{vˆi(tij1 + 1), vˆi(tij1 ), ei} ⊆ span{vˆi(tij1 ), vˆj(tij0 ), ei} . (A.59)From (A.59) we have,E2(w˜) ≥ E2(vˆi(tij1 + 1)), (A.60)and thus combining (A.60) and (A.30) implies,||w˜||2 ≤ ||vˆi(tij1 + 1)||2 . (A.61)98A.2. Proof of Results Contained in Chapter 4.Next observe that since w˜ is defined by (A.58), if (A.61) is applied to the hypothetical (and by(A.87) necessarily redundant) signal Sii(tij1 , tij1 + 1) then,E˜2(Sii(tij1 , tij1 + 1))≡ E2(vˆi(tij1 + 1))− E2(w˜) ,= ||w˜||2 − ||vˆi(tij1 + 1)||2 ≤ 0 .(A.62)Applying Lemma A.2.8 to the signal Sii(tij1 , tij1 + 1) then implies,0 ≥ E˜2(Sii(tij1 , tij1 + 1))≥ max{||vˆi(tij1 )||2 − ||vˆi(tij1 + 1)||2 ,n(vˆi(tij1 )′(vˆi(tij1 )− vˆi(tij1 + 1)))2}(A.63)where the first line follows from (A.62) and the last line implies (A.57). Lemma A.2.14 (DA Local Convergence of Normal Consensus Estimates) For anycommunication sequence C[0,∞] satisfying (4.4.3) there exists an integer `χ such that for all` ≥ `χ,||vˆi(tij1 )− vˆj(tij0 )||2 ≤ χ , ∀ Sij(tij0 , tij1 ) ∈ Ct`0,t`1 , (A.64)for any χ > 0. Proof. For any communication sequence C[0,∞] satisfying (4.4.3) the Lemma A.2.11 impliesthere exists an integer `ε such that (A.55) holds for any ε > 0. For all ` ≥ `ε we thus have forany signal Sij(tij0 , tij1 ) ∈ C[t`0,t`1],||vˆi(tij1 )||2 − ||vˆj(tij0 )||2 ≤ ||vˆi(t`1 + 1)||2 − ||vˆj(t`0)||2 ,≤∑nr=1(||vˆr(t`1 + 1)||2 − ||vˆr(t`0)||2),= E˜2(C[t`0,t`1]) ≤ ε .(A.65)The first inequality in (A.65) holds by Lemma A.2.4 since tij0 ≥ t`0 and tij1 ≤ t`1 for anySij(tij0 , tij1 ) ∈ C[t`0,t`1]. The second inequality in (A.65) holds since,∑nr=1(||vˆr(t`1 + 1)||2 − ||vˆr(t`0)||2)=(||vˆi(t`1 + 1)||2 − ||vˆj(t`0)||2)+(||vˆj(t`1 + 1)||2 − ||vˆi(t`0)||2)+∑r∈V\{i,j}(||vˆr(t`1 + 1)||2 − ||vˆr(t`0)||2),(A.66)where∑r∈V\{i,j}(||vˆr(t`1 + 1)||2 − ||vˆr(t`0)||2) ≥ 0 (A.67)99A.2. Proof of Results Contained in Chapter 4.holds due to Lemma A.2.4, and similarly ||vˆj(t`1 + 1)||2 − ||vˆi(t`0)||2 ≥ 0 because,||vˆj(t`1 + 1)||2 − ||vˆi(t`0)||2 ≥ minr∈V{||vˆr(t`1 + 1)||2}−maxr∈V{||vˆr(t`0)||2} ≥ 0 .(A.68)The second inequality in (A.68) holds due to Lemma A.2.12 since C[t`0,t`1] ∈ SVSC. Together(A.66), (A.67) and (A.68) imply the second inequality in (A.65). Applying Lemma A.2.8 toLemma A.2.11 implies that for all Sij(tij0 , tij1 ) ∈ C[t`0,t`1] and ` ≥ `ε,ε ≥ E˜2(C[t`0,t`1]) ≥ E˜2(Sij(tij0 , tij1 )),= E2(vˆi(tij1 ))− E2(vˆi(tij1 + 1)),≥ max{||vˆj(tij0 )||2 − ||vˆi(tij1 )||2 ,n(vˆj(tij0 )′(vˆj(tij0 )− vˆi(tij1 )))2} ,(A.69)for any ε > 0. For notational convenience denote vˆi = vˆi(tij1 ) and vˆj = vˆj(tij0 ). Combining(A.65) and (A.69) implies that for any ε > 0 there exist an integer `ε such that,||vˆi||2 − ||vˆj ||2 ≤ ε , vˆ′j(vˆj − vˆi) ≤√ε/n , (A.70)for any Sij(tij0 , tij1 ) ∈ C[t`0,t`1] and ` ≥ `ε. To obtain (A.64) we observe that (A.70) implies,||vˆi − vˆj ||2 = ||vˆi||2 − 2vˆ′ivˆj + ||vˆj ||2 = ||vˆi||2 − F ||vˆj ||2 + 2vˆ′j(vˆj − vˆi) ,≤ ε+ 2√ε/n ≤√ε(1 + 2/√n) , ∀ ∈ (0, 1].(A.71)We thus define ε(χ),ε(χ) =( χ1 + 2/√n)2. (A.72)For any χ ∈ (0, 1] and ε ∈ (0, ε(χ)] the result (A.64) then follows from (A.71). Lemma A.2.15 (DA Vanishing Change in Normal Consensus Update) For any com-munication sequence C[0,∞] satisfying (4.4.3) there exists an integer `ε such that for all ` ≥ `ε,||vˆi(tij1 + 1)− vˆi(tij1 )||2 ≤ ε , ∀ Sij(tij0 , tij1 ) ∈ C[t`0,t`1] , (A.73)for any ε > 0. Proof. Recall that Lemma A.2.11 implies that for any communication sequence C[0,∞]satisfying (4.4.3) there exists an integer `ε such that (A.55) holds for any ε > 0, we thus100A.2. Proof of Results Contained in Chapter 4.observe for ` ≥ `ε,ε ≥ E˜2(C[t`0,t`1])≥ E˜2(Sij(tij0 , tij1 )) , ∀ Sij(tij0 , tij1 ) ∈ C[t`0,t`1]= ||vˆi(tij1 + 1)||2 − ||vˆi(tij1 )||2 .For all ` ≥ `ε and signals Sij(tij0 , tij1 ) ∈ C[t`0,t`1] we then have,||vˆi(tij1 + 1)− vˆi(tij1 )||2 = ||vˆi(tij1 + 1)||2 − ||vˆi(tij1 )||2+2vˆi(tij1 )′(vˆi(tij1 )− vˆi(tij1 + 1))≤ E˜2(Sij(tij0 , tij1 ))≤ ε ,where the first inequality follows from Lemma A.2.13 and the second inequality from LemmaA.2.15. Lemma A.2.16 (DA Vanishing Change between Normal Consensus Update andSignal) For any communication sequence C[0,∞] satisfying (4.4.3), there exists an integer `γsuch that,||vˆi(tij1 + 1)− vˆj(tij0 )||2 ≤ γ , ∀ Sij(tij0 , tij1 ) ∈ C[t`0,t`1], (A.74)for all ` ≥ `γ and any γ > 0. Proof. For any communication sequence C[0,∞] satisfying (4.4.3), Lemma A.2.14 impliesthat (A.64) holds for any χ ∈ (0, 1) and ε ∈ (0, ε(χ)], where ε(χ) is given by (A.72). LemmaA.2.15 implies that (A.73) holds for any ε > 0, thus for any ` ≥ `ε(χ) and Sij(tij0 , tij1 ) ∈ C[t`0,t`1]the triangle inequality then implies,||vˆi(tij1 + 1)− vˆj(tij0 )|| ≤ ||vˆi(tij1 + 1)− vˆi(tij1 )||+||vˆi(tij1 )− vˆj(tij0 )||≤√ε(χ) +√χ ≤ 2√χ , ∀ χ ∈ (0, 1].Any χ ∈ (0, γ/4] and ε ∈ (0, ε(χ)] thus yields (A.74) for any γ ∈ (0, 4]. Lemma A.2.17 (DA Network Convergence to Average-Consensus) For any commu-nication sequence C[0,∞] satisfying (4.4.3) there exists an integer `ξ such that for all ` ≥ `ξ,n∑i=1||vˆi(t`1 + 1)−1n1n||2 ≤ ξ , ∀ C[t`0,t`1] ∈ C[0,∞], (A.75)for any ξ > 0. 101A.2. Proof of Results Contained in Chapter 4.Proof. Since C[0,∞] satisfies (4.4.3) we have C[t`0,t`1] ∈ SVSC for each ` ∈ N, thus there existsa time-respecting communication path Cij[t0(ij),t1(ij)] ∈ C[t`0,t`1] for any i ∈ V, j ∈ V−i, and ` ∈ N.For any i ∈ V, j ∈ V−i and ` ∈ N the triangle inequality then implies,||vˆij(t`1 + 1)− vˆjj(t`1j0 )||≤∑Srp(trp0 ,trp1 )∈Cij[t0(ij),t1(ij)]||vˆrj(trp1 + 1)− vˆpj(trp0 )||+∑k(ij)+1q=1∑Srp(trp0 ,trp1 )∈Q`q(ij)||vˆrj(trp1 + 1)− vˆrj(trp1 )||(A.76)where we define C¯[t`0,t`1] = C[t`0,t`1] \ Cij[t0(ij),t1(ij)]and,Q`q(ij) = {S`qm(t`qm0 , t`qm1 ) ∈ C¯[t`0,t`1] :t`qm1 ∈ (t`q`q−11 , t`q+1`q0 )},for q = 1, . . . , k(ij), where `0 = j, `k(ij)+1 = i,Q`k(ij)+1(ij) = {Sim(tim0 , tim1 ) ∈ C[t`0,t`1] : tim1 > t1(ij)} .(A.77)We clarify that the RHS of (A.76) includes the differences between the received normal consensusestimate vˆ`q−1(t`q`q−10 ) and the updated normal consensus estimate vˆ`q(t`q`q−11 + 1) that resultsfrom each signal contained in the time-respecting communication path (trcp) Cij[t0(ij),t1(ij)] ∈C[t`0,t`1]. Each set Q`q(ij) defined in (A.77) contains the signals received at each node after therespective signal in the trcp was received, but before the respective signal in the trcp was sent,as is required for an application of the triangle inequality. The set Q`k(ij)+1(ij) contains thesignals received at node i after the last signal in the trcp Cij[t0(ij),t1(ij)] was received, but nolater then the end of the communication sequence C[t`0,t`1], again this is required for applicationof the triangle inequality.For any communication sequence C[0,∞] satisfying (4.4.3) the Lemma A.2.15 implies thereexists an integer `ε such that (A.73) holds for any ε > 0. Likewise, for any communicationsequence C[0,∞] satisfying (4.4.3) the Lemma A.2.16 implies there exists an integer `γ such that(A.74) holds for any γ > 0. Thus for any γ ∈ (0, 4] if we let χ ∈ (0, γ/4] and ε ∈ (0, ε(χ)] thenfor any ` ≥ `ε(χ),||vˆij(t`1 + 1)− vˆjj(t`1j0 )||≤ |Cij[t0(ij),t1(ij)]|√γ+∑k(ij)+1q=1 |Q`q(ij)|√ε(χ) ,≤(|Cij[t0(ij),t1(ij)]|+∑k(ij)+1q=1 |Q`q(ij)|)√γ ,≤ |C[t`0,t`1]|√γ ,(A.78)where the second inequality holds since 0 < ε(χ) < γ, and the last inequality holds since every102A.2. Proof of Results Contained in Chapter 4.signal contained in C[t`0,t`1] is represented by at most one term on the RHS of (A.76), that is,|C[t`0,t`1]| ≥ |Cij[t0(ij),t1(ij)]|+k(ij)+1∑q=1|Q`q(ij)| .Due to (A.78), any ξ > 0 and γ ∈ (0, ξ/(n|C[t`0,t`1]|)2] then implies,(vˆij(t`1 + 1)− vˆjj(t`1j0 ))2=(vˆij(t`1 + 1)−1n)2≤ ξ/n2 , (A.79)for any (i, j) ∈ V2, where the first equality holds due to Lemma A.2.7. The result (A.75) thenfollows from (A.79) since,∑ni=1 ||vˆi(t`1 + 1)−1n ,1n||2=∑ni=1∑nj=1(vˆij(t`1 + 1)− vˆjj(t`1j0 ))2≤ n2(ξ/n2),where the first equality holds again due to Lemma A.2.7. A.2.2 Proof of Theorem 4.4.10Outline of Proof : The proof of Theorem 4.4.10 can be divided into the following three mainareas: (A) properties of the DDA normal consensus estimates, (B) finite time until the reductionin DDA normal consensus error vanishes, and (C) DDA normal consensus estimate propertiesgiven no reduction in error.The Lemmas A.2.18-A.2.24 cover area (A). The key results are:• Lemma A.2.19 proves that each DDA normal consensus estimate vˆi(t) satisfies the nor-malization property ||vˆi(t)||2 = (1/n)vˆi(t)′1n.• Lemma A.2.20 proves that each DDA normal consensus estimate vˆi(t) must satisfy the“zero local error” property vˆii(t) = 1/n.• Lemma A.2.22 shows that the error of each normal consensus estimate is non-decreasingwith time.• Lemma A.2.23 proves that the normal consensus estimate will not change unless thechange reduces its error.The Lemmas A.2.25-A.2.28 cover area (B). The key result is Lemma A.2.28, which shows thatany IVCC communication sequence implies there will exist a finite time after which there willbe no further reduction in the total normal consensus error. The Lemmas A.2.29−A.2.30 cover103A.2. Proof of Results Contained in Chapter 4.area (C), after which we present the proof of Thm. 4.4.10. Preceding the statement of eachlemma we describe in words its intended purpose 5.Lemma A.2.18 (DDA Normal Consensus Estimate Discretization) Every normal con-sensus estimate vˆi(t) satisfies vˆi(t) ∈ {0, 1n}n for all i ∈ V and t ≥ 0. Proof. The initialization (4.21) implies vˆi(0) ∈ {0, 1n}n. The optimization problem (4.4)requires that any solution vˆi(t+ 1) satisfies vˆi(t+ 1) ∈ {0, 1n}n. Under the DDA algorithm, theassumption (A7) implies every normal consensus estimate remains fixed unless it is updatedvia (4.4), thus it follows that vˆi(t) ∈ {0, 1n}n for all i ∈ V and t ≥ 0. Lemma A.2.19 (DDA Consensus Estimate Normalization) Every normal consensusestimate vˆi(t) satisfies,||vˆi(t)||2 =1nvˆi(t)′1n , ∀ i ∈ V , ∀ t ≥ 0 , (A.80)||vˆ−ii (t)||2 =1nvˆ−ii (t)′1n−1 , ∀ i ∈ V , ∀ t ≥ 0 . (A.81)Proof. The Lemma A.2.18 implies (A.80) as follows,1n vˆi(t)′1n = 1n∑n`=1 vˆi`(t) ,= 1n∑`∈V : vˆi`(t)=1n(1n),=∑`∈V : vˆi`(t)=1n(1n2),=∑n`=1 vˆi`(t)2 = ||vˆi(t)||2 .(A.82)If vˆi(t) ∈ {0, 1n}n then vˆ−ii (t) ∈ {0,1n}n−1, thus a similar argument to (A.82) implies (A.81). Lemma A.2.20 (DDA Local Zero Error Property) Every normal consensus estimate vˆi(t)satisfiesvˆii(t) =1n, ∀ i ∈ V , ∀ t ≥ 0 . (A.83)Proof. The initialization (4.21) implies vˆii(0) = 1n for each i ∈ V. Next observe that under(A7) the estimate vˆi(t) will not change unless a signal Sij(tij0 , tij1 ) is received at node i. If a5For brevity we do not show that (A4′), (4.4), and (4.5) together imply the DDA initialization (4.21). In lieuof a formal proof, note that under (A4′) the initialization (4.21) can simply be assumed, regardless of (4.4)−(4.5).Similarly, it is shown in [87] that (4.4)−(4.5) imply (4.18) under the DDA knowledge set update (4.17) and signalspecification (4.16). Under (4.16) − (4.17) one can simply assume the update (4.18) regardless of (4.4) − (4.5),and in fact (4.18) is a non-unique solution to (4.4)− (4.5) given (4.16)− (4.17).104A.2. Proof of Results Contained in Chapter 4.signal is received then vˆi(t) is updated by (4.18)− (4.19), this update can be re-written as,vˆi(tij1 +1) =vˆi(tij1 ) + vˆj(tij0 )− vˆji(tij0 )ei , if vˆ−ii (tij1 )′vˆ−ij (tij0 ) = 0vˆj(tij0 ) + ei(1n − vˆji(tij0 )), if vˆ−ii (tij1 )′vˆ−ij (tij0 ) > 0 , ||vˆ−ii (tij1 )||2 < ||vˆ−ij (tij0 )||2vˆi(tij1 ) , if vˆ−ii (tij1 )′vˆ−ij (tij0 ) > 0 , ||vˆ−ii (tij1 )||2 ≥ ||vˆ−ij (tij0 )||2.(A.84)To finish the proof, it suffices to show that the solution vˆi(tij1 + 1) specified by (A.84) implies(A.83) under the assumption that vˆi(tij1 ) satisfies (A.83). If vˆ−ii (tij1 )′vˆ−ij (tij0 ) = 0 then vˆi(tij1 +1) = vˆi(tij1 ) + vˆj(tij0 )− vˆji(tij0 )ei and thus,vˆii(tij1 + 1) = vˆii(tij1 ) + vˆji(tij0 )− vˆji(tij0 ) =1n.If vˆ−ii (tij1 )′vˆ−ij (tij0 ) > 0 and ||vˆ−ii (tij1 )||2 < ||vˆ−ij (tij0 )||2 then vˆi(tij1 + 1) = vˆj(tij0 ) + ei(1n − vˆji(tij0 ))and thus,vˆii(tij1 + 1) = vˆji(tij0 ) +1n− vˆji(tij0 ) =1n.Finally, if vˆ−ii (tij1 )′vˆ−ij (tij0 ) > 0 and ||vˆ−ii (tij1 )||2 ≥ ||vˆ−ij (tij0 )||2 then vˆi(tij1 + 1) = vˆi(tij1 ) and thus(A.83) follows by the hypothesis. Lemma A.2.21 (DDA Normal Consensus Estimate Magnitude Equivalence to Er-ror) For any two normal consensus estimates vˆi(t) and vˆj(t),||vˆi(t)||2 ≥ ||vˆj(t)||2 ⇔ E2(vˆi(t))≤ E2(vˆj(t)). (A.85)Proof. For any normal consensus estimate vˆi(t),E2(vˆi(t))= ||vˆi(t)− 1n1n||2 ,= ||vˆi(t)||2 − 2 1n vˆi(t)′1n + 1n =1n − ||vˆi(t)||2(A.86)where the last equality holds due to Lemma A.2.19. The equivalence (A.85) then follows directlyfrom (A.86). Lemma A.2.22 (DDA Normal Consensus Estimate Non-Decreasing Error) The er-ror of every normal consensus estimate vˆi(t) is a non-decreasing function of t ≥ 0 for all i ∈V. Proof. Under (A7) the normal consensus estimate vˆi(tij1 ) will only change if a signal isreceived at node i at time tij1 . If a signal is received at node i, then the DDA protocol updatesthe normal consensus estimate vˆi(tij1 ) by the optimization problem (4.4). Under (4.16)− (4.17)105A.2. Proof of Results Contained in Chapter 4.we have for any knowledge set Ki(tij1 ) and signal Sij(tij0 , tij1 ),Ki(tij1 )⋃Sij(tij0 , tij1 ) = {i, n, si(0), sˆi(tij1 ), vˆi(tij1 ), sˆj(tij0 ), vˆj(tij0 )} ,where si(0) = Sei , sˆi(tij1 ) = Svˆi(tij1 ) , sˆj(tij0 ) = Svˆj(tij0 ).It follows that Vi = [ei, vˆi(tij1 ), vˆj(tij0 )] and Si = [si(0), sˆi(tij1 ), sˆj(tij0 )]. We can thus re-write(4.4) as,vˆi(tij1 + 1) = arg minv˜ ∈ span{vˆi(tij1 ), vˆj(tij0 ), ei} ∩ {0,1n}n||v˜ −1n1n||2. (A.87)The Lemma A.2.18 implies vˆi(tij1 ) ∈ {0,1n}n, thus (A.87) implies that any candidate solutionvˆ(1)(tij1 +1) that does not satisfy E2(vˆ(1)(tij1 +1))≤ E2(vˆ(tij1 ))cannot be a solution to (A.87). Lemma A.2.23 (DDA Fixed Normal Consensus Estimate Given No Reduction inError) Under the DDA algorithm, for any signal Sij(tij0 , tij1 ) we have,E2(vˆi(tij1 + 1))= E2(vˆi(tij1 ))⇔ vˆi(tij1 + 1) = vˆi(tij1 ). (A.88)Proof. It is clear from (A.19) that vˆi(tij1 + 1) = vˆi(tij1 ) implies E2(vˆi(tij1 + 1))= E2(vˆi(tij1 )).Next observe that (A.86) implies,E2(vˆi(tij1 + 1))= E2(vˆi(tij1 ))⇔ ||vi(tij1 + 1)||2 = ||vˆi(tij1 )||2 .To finish the proof we thus have only to show,vˆi(tij1 + 1) 6= vˆi(tij1 )⇒ ||vˆi(tij1 + 1)||2 6= ||vˆi(tij1 )||2 . (A.89)Under the update (A.84), to prove (A.89) it suffices to show that either of the two cases,vˆi(tij1 + 1) = vˆi(tij1 ) + vˆj(tij0 )− vˆji(tij0 )ei ,if vˆ−ii (tij1 )′vˆ−ij (tij0 ) = 0,vˆi(tij1 + 1) = vˆj(tij0 ) + ei(1n − vˆji(tij0 )),if vˆ−ii (tij1 )′vˆ−ij (tij0 ) > 0 , ||vˆ−ii (tij1 )||2 < ||vˆ−ij (tij0 )||2(A.90)will imply ||vˆi(tij1 + 1)||2 > ||vˆi(tij1 )||2. If vˆ−ii (tij1 )′vˆ−ij (tij0 ) = 0, then (A.90) together with (A.80)106A.2. Proof of Results Contained in Chapter 4.imply,||vˆi(tij1 + 1)||2 = 1n vˆi(tij1 + 1)′1n ,= 1n(vˆi(tij1 ) + vˆj(tij0 )− vˆji(tij0 )ei)′1n ,= 1n(vˆi(tij1 )′1n + vˆj(tij0 )′1n − vˆji(tij0 )),= ||vˆi(tij1 )||2 + 1n vˆ−ij (tij0 )′1n−1 > ||vˆi(tij1 )||2 ,where the last inequality holds because Lemma A.2.20 implies vˆ−ij (t)′1n−1 ≥ 1n for all j ∈ V−iand t ≥ 0. If vˆ−ii (tij1 )′vˆ−ij (tij0 ) > 0 and ||vˆ−ii (tij1 )||2 < ||vˆ−ij (tij0 )||2, then from (A.90) and (A.80)it follows that,||vˆi(tij1 + 1)||2 = 1n vˆi(tij1 + 1)′1n ,= 1n(vˆj(tij0 ) + ei(1n − vˆji(tij0 )))′1n ,= 1n(vˆj(tij0 )′1n + 1n − vˆji(tij0 )),= 1n(vˆ−ij (tij0 )′1n−1 + 1n)> 1n(vˆ−ii (tij1 )′1n−1 + 1n),= 1n vˆi(tij1 )′1n = ||vˆi(tij1 )||2 ,where the second last equality holds due to Lemma A.2.20. Lemma A.2.24 (DDA Lower Bound on Increase in Normal Consensus Magnitude)Under the DDA algorithm, for any signal Sij(tij0 , tij1 ) we have,||vˆ−ii (tij1 + 1)||2 ≥ max{||vˆ−ii (tij1 )||2 , ||vˆ−ij (tij0 )||2} .Proof. If vˆ−ii (tij1 )′vˆ−ij (tij0 ) = 0 then vˆi(tij1 + 1) = vˆi(tij1 ) + vˆj(tij0 ) − vˆji(tij0 )ei. In this case weapply (A.81) to obtain,||vˆ−ii (tij1 + 1)||2 = 1n vˆ−ii (tij1 + 1)′1n−1 ,= 1n(vˆ−ii (tij1 )′1n−1 + vˆ−ij (tij0 )′1n−1),≥ max{||vˆ−ii (tij1 )||2 , ||vˆ−ij (tij0 )||2} .If vˆ−ii (tij1 )′vˆ−ij (tij0 ) > 0 and ||vˆ−ii (tij1 )||2 < ||vˆ−ij (tij0 )||2, then vˆi(tij1 +1) = vˆj(tij0 )+ei(1n − vˆji(tij0 )).In this case we have,||vˆ−ii (tij1 + 1)||2 = 1n(vˆ−ij (tij0 ) + e−ii(1n − vˆji(tij0 )))′1n−1= 1n vˆ−ij (tij0 )′1n−1 ≥ max{||vˆ−ii (tij1 )||2 , ||vˆ−ij (tij0 )||2}where the last inequality holds by the assumption ||vˆ−ii (tij1 )||2 < ||vˆ−ij (tij0 )||2. Finally, if vˆ−ii (tij1 )′vˆ−ij (tij0 ) >107A.2. Proof of Results Contained in Chapter 4.0 and ||vˆ−ii (tij1 )||2 ≥ ||vˆ−ij (tij0 )||2 then vˆi(tij1 + 1) = vˆi(tij1 ), thus||vˆ−ii (tij1 + 1)||2 = 1n vˆ−ii (tij1 + 1)′1n−1 ,= 1n vˆ−ii (tij1 )′1n−1 ≥ max{||vˆ−ii (tij1 )||2 , ||vˆ−ij (tij0 )||2} ,where the last inequality holds by the assumption ||vˆ−ii (tij1 )||2 ≥ ||vˆ−ij (tij0 )||2. Lemma A.2.25 (Error Expression for C[0,∞] ∈ IVCC) For any communication sequenceC[0,∞] ∈ IVCC the total reduction in normal consensus squared error E˜2(C[0,∞]) defined in(A.20) is,E˜2(C[0,∞]) = limt→∞∑ni=1(E2(vˆi(0))− E2(vˆi(t)))= n−1n − limt→∞∑ni=1E2(vˆi(t))=∑`∈N E˜2(C[t`0,t`1]) , C[t`0,t`1] ∈ SVCC≤ n−1n .(A.91)Proof. The first line follows from the definitions (A.20)− (A.21). The second line in (A.91)is due to the initialization (4.21) and (A.19). The third line in (A.91) holds since any commu-nication sequence C[0,∞] ∈ IVCC can be partitioned into an infinite number of contiguous sub-sequences C[t`0,t`1] ∈ SVCC. The last line in (A.91) follows since the minimum error of any normalconsensus estimate is 0. Lemma A.2.26 (DDA Vanishing Reduction in Error for C[0,∞] ∈ IVCC) For any com-munication sequence C[0,∞] ∈ IVCC there exists an integer `ε such that,E˜2(C[t`0,t`1]) ≤ ε , ∀ ` ≥ `ε , ∀ ε > 0 , C[t`0,t`1] ∈ SVCC.Proof. The third line of (A.91) implies that for any C[0,∞] ∈ IVCC the quantity E˜2(C[0,∞]) isthe sum of an infinite number of non-negative terms E˜2(C[t`0,t`1]), where C[t`0,t`1] ∈ SVCC for each` ∈ N. The fourth line in (A.91) implies E˜2(C[0,∞]) is bounded above, thus the result follows bythe monotone convergence theorem for sequences of real numbers. Lemma A.2.27 (DDA Lower Bound on Non-Zero Reduction in Error) For any com-munication sequence C[t0,t1] we have,E˜2(C[t0,t1]) > 0 ⇒ E˜2(C[t0,t1]) ≥1n2. (A.92)Proof. Applying Lemma A.2.21 to the left-hand side of (A.92)implies there exists a signalSij(tij0 , tij1 ) ∈ C[t0,t1] such that,||vˆi(tij1 + 1)||2 > ||vˆi(tij1 )||2 . (A.93)108A.2. Proof of Results Contained in Chapter 4.We now show that (A.93) implies ||vˆi(tij1 +1)||2 ≥ ||vˆi(tij1 )||2 + 1n2 , the result (A.92) then followsdirectly from Lemma A.2.21 and Lemma A.2.22. The Lemma A.2.18 implies vˆi(t) ∈ {0, 1n}nfor all i ∈ V and t ≥ 0, thus under (A.93) it follows that,||vˆi(tij1 + 1)||2 = 1n vˆi(tij1 + 1)′1n ,= 1n2 |{` : vˆi`(tij1 + 1) =1n}| ,> ||vˆi(tij1 )||2 = 1n vˆi(tij1 )′1n ,= 1n2 |{` : vˆi`(tij1 ) =1n}| .(A.94)From the second and fourth lines in (A.94) we have,|{` : vˆi`(tij1 + 1) =1n}| > |{` : vˆi`(tij1 ) =1n}| ⇒|{` : vˆi`(tij1 + 1) =1n}| − 1 ≥ |{` : vˆi`(tij1 ) =1n}| .(A.95)Under the constraint vˆi(t) ∈ {0, 1n}n, the last inequality in (A.95) implies ||vˆi(tij1 + 1)||2 ≥||vˆi(tij1 )||2+ 1n2 . Lemma A.2.28 (DDA Existence of a Time for Zero Reduction in Error) For anycommunication sequence C[0,∞] ∈ IVCC there exists an integer `n−2 such that,E˜2(C[t`0,t`1]) = 0 , ∀ ` ≥ `n−2 , C[t`0,t`1] ∈ SVCC. (A.96)Proof. The result follows immediately by applying Lemmas A.2.22 and A.2.27 to LemmaA.2.26. Lemma A.2.29 (DDA Signal Update Property Given Zero Reduction in Error) As-sume ||vˆi(tij1 )||2 = ||vˆj(tij0 )||2, then vˆji(tij0 ) =1n if the signal Sij(tij0 , tij1 ) results in no reduction inerror. Proof. LemmaA.2.20 together with ||vˆi(tij1 )||2 = ||vˆj(tij0 )||2 implies ||vˆ−ii (tij1 )||2 = ||vˆ−jj (tij0 )||2.From Lemma A.2.24 we thus have,||vˆ−ii (tij1 )||2 ≥ ||vˆ−ij (tij0 )||2 ≥ ||vˆ−jj (tij0 )||2 = ||vˆ−ii (tij1 )||2 . (A.97)From (A.97) we have ||vˆ−ii (tij1 )||2 = ||vˆ−ij (tij0 )||2. Note that Lemma A.2.20 together with||vˆi(tij1 )||2 = ||vˆj(tij0 )||2 implies ||vˆ−ii (tij1 )||2 + 1n2 = ||vˆ−ij (tij0 )||2 + vˆji(tij0 )2, and thus the result fol-lows since ||vˆ−ii (tij1 )||2 = ||vˆ−ij (tij0 )||2. 109A.3. Data Dissemination ResultsLemma A.2.30 (DDA Local Average-Consensus Property) If the magnitude of any nor-mal consensus vector vˆi(t) is 1n , then vˆi(t) =1n1n. Proof. If ||vˆi(t)||2 = 1n , then from Lemma A.2.19 we have1n vˆi(t)′1n = 1n . From Lemma A.2.18each element in vˆi(t) belongs to {0, 1n}, thus if any element in vˆi(t) did not equal1n thenvˆi(t)′1n < 1, so the result follows by contradiction. Theorem 4.4.10: (DDA Network Convergence to Average-Consensus) The DDAprotocol (4.16)− (4.21) solves ACP2 as t→∞ for any UCP C0,t ∈ IVCC (cf. Def.4.4.5). Proof: For any communication sequence C[0,t1] ∈ IVCC the Lemma A.2.28 implies thereexists an integer `n−2 such that (A.96) holds. To finish the proof it suffices to show that thecondition,E2(C[t`0,t`1]) = 0 , C[t`0,t`1] ∈ SVCC (A.98)implies (A.22) at time t = t`0, and hence by Def. A.2.1 an average-consensus is obtained at t`0.Applying Lemmas A.2.21 and A.2.22 to (A.98) implies that for all t ∈ [t`0, t`1],||vˆi(t)||2 ≥ ||vˆj(t)||2||vˆj(t)||2 ≥ ||vˆi(t)||2, ∀ (i, j) ∈ V2 (A.99)thus ||vˆi(t)||2 = ||vˆj(t)||2 for all t ∈ [t`0, t`1].Since C[t`0,t`1] ∈ SVCC, it follows from Lemmas A.2.20, A.2.23 and A.2.29 that there existsa node i satisfying vˆij(t) = 1n for all j ∈ V, hence ||vˆi(t)||2 = 1n . From Lemma A.2.30 and(A.99) we then have vˆj(t) = 1n1n for all j ∈ V. Thus by Lemma A.2.23 and Def. A.2.1 anaverage-consensus is obtained at the finite time t`0. A.3 Data Dissemination ResultsThis section provides a proof of an upper bound on the time until a communication sequenceC0,t is “time-respecting strongly connected” (trcs), given an upper bound on the time untilC0,t is “non-time-respecting strongly connected” (ntrcs).In Sec.A.4 we state the definitions regarding the communication sequence that will be re-quired in the results subsequently proven in Sec.A.5−A.8. In Sec.A.5 we prove an (n−1) upperbound on the completion time of the flooding protocol (cf. Def.A.4.12) given the relatively strictcommunication condition of C0,n−2 ∈ RGFSC (cf. Def.A.4.6 and Def.A.4.8). The subsequentcorollary, Cor.A.6.1, proves a super-exponentially greater upper bound((n − 2)(n!) + 1), butallows for the weaker (albeit much lengthier) communication condition of C0,(n−2)(n!) ∈ RGSC(cf. Def.A.4.7 and Def.A.4.6). We then combine the best features of the preceding theoremand corollary in our main result (Conj.A.7.1), which states an (n− 1) upper bound given only110A.4. Preliminary Definitionsthat C0,n−2 ∈ RGSC . Finally, we generalize these results for a time-delayed and asynchronouscommunication sequence in Sec.A.8.A.4 Preliminary DefinitionsThis section restates some of the communication assumptions made through-out this thesis. Wethen also define specific communication sequences that will be required for the results provenin Sec.A.5−A.8.Assume there exists a fixed set of n ∈ N nodes denoted V = {1, 2, . . . , n}. We let t ∈ Ndenote discrete time and V−i = V \ {i} for any node i ∈ V.Definition A.4.1 (Signal) For any ordered pair (t0, t1) ∈ N2, a “signal” transmitted from nodej ∈ V at time t0 and received at node i ∈ V at time t1 will be denoted Sij(t0, t1). Due to causality, we assume that any signal Sij(t0, t1) cannot be received prior to its trans-mission time, that is t0 ≤ t1.Definition A.4.2 (Communication Sequence) For any 0 ≤ t0 ≤ t1, a “communication se-quence” Ct0,t1 is the set of all signals transmitted no earlier than time t0 and received no laterthan time t1, that is,Ct0,t1 = {Si1j1 , Si2j2 , Si3j3 , . . .}where we have omitted the time indices but it is understood that the transmission time t0 andreception time t1 of each signal Si`j` belong to the interval [t0, t1]. We next define the notion of a “time-respecting communication path”, which will be abbre-viated as “trcp”.Definition A.4.3 A communication sequence Ct0,t1 contains a “time-respecting communica-tion path” (trcp) from node j to node i if Ct0,t1 contains a sub-sequence Cijt0(ij),t1(ij) with thefollowing connectivity property,Cijt0(ij),t1(ij) = {S`1j , S`2`1 , S`3`2 , . . . , S`k(ij)`k(ij)−1 , Si`k(ij)}where we have omitted the time indices but it is understood that the transmission time t0 of eachsignal S`q+1`q occurs after the reception time t1 of the preceding signal S`q`q−1. Note that the sub-sequence Cijt0(ij),t1(ij) in Def.A.4.3 has a finite cardinality |Cijt0(ij),t1(ij)| =k(ij) ≥ 1.Similar to the above definition, we now define the notion of a “non-time-respecting commu-nication path”, which will be abbreviated as “ntrcp”.111A.4. Preliminary DefinitionsDefinition A.4.4 A communication sequence Ct0,t1 contains a “non-time-respecting commu-nication path” (ntrcp) from node j to node i if Ct0,t1 contains a sub-sequence Cijt0(ij),t1(ij) withthe following connectivity property,Cijt0(ij),t1(ij) = {S`1j , S`2`1 , S`3`2 , . . . , S`k(ij)`k(ij)−1 , Si`k(ij)}where we have omitted the time indices but (in contrast to a trcp) it is not required that thetransmission time t0 of each signal S`q+1`q occurs after the reception time t1 of the precedingsignal S`q`q−1. In full analogy to the above two definitions, we can define the notions of a “time-respecting”and “non-time-respecting” strongly-connected communication sequence.Definition A.4.5 A communication sequence Ct0,t1 is “time-respecting strongly-connected”(trcs) if Ct0,t1 contains a trcp from each node i ∈ V to every node j ∈ V−i. The following definition is identical to the preceding definition, with the exception that anascending time-order in signals is not required.Definition A.4.6 A communication sequence Ct0,t1 is “non-time-respecting strongly-connected”(ntrcs) if Ct0,t1 contains a ntrcp from each node i ∈ V to every node j ∈ V−i. The following definition restricts the class of the communication sequences that will beassumed in our main conjecture (Conj.A.7.1).Definition A.4.7 (Ring Graph) At any time instant t ∈ N, a communication sequence Ct,t is a“ring graph” (RG) if every node in V sends and receives exactly one signal. Let Ct,t ∈ RG denote that a communication sequence Ct,t satisfies Def.A.4.7. Further letCt0,t1 ∈ RG denote a sequence of ring graphs over the interval [t0, t1], that is Ct0,t1 ∈ RGimplies Ct0,t1 = {C`,` ∈ RG : ∀ ` ∈ {t0, t0 + 1, . . . , t1}}.The first theorem in this chapter will require the following class of communication sequences.Definition A.4.8 A communication sequence Ct0,t1 is a “fixed ring graph” (RGF) over thetime interval [t0, t1] if Ct0,t1 = {Ct0,t0 ∈ RG , Ct0,t0 = C`,` , ∀ ` ∈ {t0, t0 + 1, . . . , t1}}. Let Ct0,t1 ∈ RGF denote that the communication sequence Ct0,t1 is RGF (cf. Def.A.4.8).Also let Ct,t ∈ RGSC imply that the communication sequence Ct,t is both RG (cf. Def.A.4.7)and ntrcs (cf. Def.A.4.6). Similarly, let Ct0,t1 ∈ RGSC denote that Ct0,t1 = {C`,` ∈ RGSC :∀ ` ∈ {t0, t0 + 1, . . . , t1}}, and Ct0,t1 ∈ RGFSC denote that Ct0,t1 = {Ct0,t0 ∈ RGSC , Ct0,t0 =C`,` , ∀ ` ∈ {t0, t0 + 1, . . . , t1}}.112A.4. Preliminary DefinitionsDefinition A.4.9 A communication sequence Ct0,t1 contains a “k-cycle” if there exists a subsetV(k) ⊆ V such that Ct0,t1 contains a ntrcp from each node i ∈ V(k) to every node j ∈ V(k)−i,and furthermore |V(k)| = k. We let (k)ntrcs ⊆ Ct0,t1 denote that the communication sequence Ct0,t1 contains a “k-cycle”.Definition A.4.10 (Signal Notation) Assume C0,t ∈ RG and let a0,t = {t1, t2, . . . , tk} ∈P(0, t), where P(0, t) denotes the power set of {0, 1, . . . , t}. Next define i(∅) = i, and let i(tk)denote the node which at time tk sends a signal to node i. Similarly, let i(tk−1, tk) denote thenode which at time tk−1 sends a signal to node i(tk), that is i(tk)(tk−1) = i(tk−1, tk). Proceedingin this way, let i(a0,t) denote the node which at time t1 sends a signal to the node which at timet2 sends a signal to the node which at time t3 (and so on) until node i receives at time tk a signalfrom node i(tk). The Lemma A.6.2 in Sec.A.6 proves that for any C0,t ∈ RG there exists a unique node i(a0,t)for any a0,t ∈ P(0, t), thus the above definition is not ambiguous. The diagram Fig.A.1 illus-trates the notation presented in Def.A.4.10. Note that by repeating the iteration i(tk)(tk−1) =i(tk−1, tk) as stated in the above definition, we obtain the formula i({t`, t`+1, . . . , tk})(t`−1)= i({t`−1, t`, t`+1, . . . , tk}) for any t`−1 that renders {t`−1, t`, . . . , tk} ∈ P(0, t).( t = 2 ) i(2)i(1,2)i( O )i(0,1,2)( t = 1 )( t = 0 )i(0,1)i(0)i(1)( t = 0 ) i(0,2)( t = 0 )( t = 0 )( t = 1 )Figure A.1: A diagram of the Signal Notation in Def.A.4.10.113A.4. Preliminary DefinitionsDefinition A.4.11 (Input Set) For each t ∈ N and node i ∈ V the “input set” V˜i(t) is a setof 2t possibly repeated elements. Assume the initialization V˜i(0) = {i(∅)} for all i ∈ V, andassume that when a signal from node j = i(t) is received at time t at node i, the input set V˜i(t)is updated as follows,V˜i(t+ 1) = V˜i(t) ∪ V˜i(t)(t) . (A.100)If we let V˜ `i (t) denote the `th element in V˜i(t), the update (A.100) then allows us to definethe `th element in V˜i(t + 1) as i(P`(0, t)), where P`(0, t) denotes the `th set in P(0, t). Tosummarize, the update (A.100) implies V˜ `i (t) = i(P`(0, t− 1)) ∀ ` ∈ {1, . . . , 2t}.We next define the time at which the update (A.100) is “complete”. Recall that V˜ `i (t+1) =i(P`(0, t)) and P`(0, t) denotes the `th set in P(0, t).Definition A.4.12 The update (A.100) is “complete” at time t ∈ N if,⋃`∈{1,...,2t}V˜ `i (t) ⊇ V , ∀ i ∈ V . (A.101)We note that for any communication sequence C0,t, the time of completion of (A.100) isidentical to the time at which any node i ∈ V can flood the network V with the initial consensusvalue si(0). In other words, if i ∈ V˜j(t) for all j ∈ V−i, then the data initially held at node i willbe known at every node j ∈ V−i and thus the “flooding algorithm” (as described in e.g. [16])will no longer increase the knowledge set of any node after (A.100) is complete.Lastly, we define the “reduced input set” V¯i(t) as follows.Definition A.4.13 (Reduced Input Set) For each t ∈ N and each node i ∈ V, the “reducedinput set” V¯i(t) is an unordered set that is obtained from the input set V˜i(t) (cf. Def.A.4.11) asfollows,i(P`(0, t− 1)) ∈ V¯i(t) ⇔ ∃ ` s.t. i(P`(0, t− 1)) ∈ V˜i(t) ,|V¯i(t)| = |⋃a0,t−1∈P(0,t−1) i(a0,t−1)| .(A.102)Unlike the input set V˜i(t), the reduced input set V¯i(t) contains no repeated node values (inparticular, V¯i(t) contains no repeated node values within the set {i(a0,t−1) ∈ V : a0,t−1 ∈P(0, t − 1)}). Furthermore, by the signal notation (cf. Def.A.4.10) it follows that (A.102)implies j ∈ V¯i(t) if and only if (iff) node j has a trcp (cf. Def.A.4.3) to node i over the timeinterval [0, t − 1]. Also, for any communication sequence it follows from (A.100) and (A.102)that the reduced input set V¯i(t) is updated as,V¯i(t+ 1) = V¯i(t) ∪ V¯i(t)(t) . (A.103)114A.5. Fixed Ring Graph TheoremLastly, the Def.A.4.12 states that (A.101) is satisfied and thus from (A.102) the update (A.100)is completed at time t iff V¯i(t) = V for each node i ∈ V. That is, the update (A.103) will be“completed” at the same time as the update (A.100).With the above definitions, the first theorem in this chapter (Thm.A.5.1) will prove an(n− 1) upper bound on the time it takes the update (A.100) to be completed (cf. Def.A.4.12)given the relatively strict communication condition of C0,n−2 ∈ RGFSC (cf. Def.A.4.6 andDef.A.4.8). The subsequent corollary, Cor.A.6.1, proves a super-exponentially greater upperbound((n − 2)(n!) + 1), but allows for the weaker (albeit much lengthier) communicationcondition of C0,(n−2)(n!) ∈ RGSC (cf. Def.A.4.7 and Def.A.4.6). We then combine the bestfeatures of both results in Conj.A.7.1, which proves an (n − 1) upper bound given only thatC0,n−2 ∈ RGSC .A.5 Fixed Ring Graph TheoremIn this section we prove that for any fixed ring graph C0,n−2 ∈ RGFSC the update (A.100) willbe completed at time (n− 1).A.5.1 Statement of Theorem A.5.1.Theorem A.5.1 (RGF-Thm.) Assume that C0,n−2 ∈ RGF (cf. Def.A.4.8). Given thisparameterization of C0,n−2, it is both necessary and sufficient that C0,n−2 ∈ RGFSC for theupdate (A.100) to be completed at time (n−1). A.5.2 Proof of Theorem A.5.1.Proof of Thm.A.5.1. (Sufficiency)We can show sufficiency follows by induction. Note that in the statement of Thm.A.5.1 itis assumed that C0,n−2 ∈ RGFSC , thus by lemma A.5.3 we can WLG consider C`,` to satisfy(A.108) for each ` ∈ {0, n − 2}. This standardized form will be denoted as C0,n−2 ∈ RGFSCfor convenience.Base Case: At the initial time t = 0 the reduced input set V¯i(t) (cf. Def.A.4.13) of eachnode i ∈ V satisfies,V¯i(t) = {i, i+ 1, . . . ,min{n, t+ i}}⋃{1, 2, . . . , t+ i−min{n, t+ i}} . (A.104)The base case holds because Def.(A.4.11) states the initial condition V˜i(0) = {i(∅)} for each115A.5. Fixed Ring Graph Theoremi ∈ V, where we recall the convention i(∅) = i. From (A.102) it follows that V¯i(0) = {i} for alli ∈ V. The latter implies (A.104) when t = 0 since min{n, t+i} = i and t+i−min{n, t+i} = 0.Inductive Assumption: At any time t ∈ {0, 1, . . . , n − 2} the reduced input set V¯i(t) ofeach node i ∈ V satisfies (A.104).Inductive Step: From Lemma A.5.3 we can WLG assume C0,n−2 ∈ RGFSC , thus from(A.108) and the update (A.103) it follows that,V¯i(t+ 1) ={V¯i(t)⋃V¯i+1(t) if i ∈ V−nV¯i(t)⋃V¯1(t) otherwise.(A.105)We consider the set of i ∈ V−n first. Note that for all i ∈ V−n we have min{n, t + i} ≥min{n, t+ i+ 1}− 1, and thus t+ i−min{n, t+ i} ≤ t+ i+ 1−min{n, t+ i+ 1}. Next observethat min{n, t+ i} ≤ min{n, t+ i+ 1}. Combining the latter two inequalities we then have by(A.105),V¯i(t+ 1) = V¯i(t)⋃V¯i+1(t)= {{i, i+ 1, . . . ,min{n, t+ i}} ∪ {1, 2, . . . , t+ i−min{n, t+ i}}}⋃{{i+ 1, . . . ,min{n, t+ i+ 1}} ∪ {1, 2, . . . , t+ i+ 1−min{n, t+ i+ 1}}}= {i, i+ 1, . . . ,min{n, t+ 1 + i}} ∪ {1, 2, . . . , t+ 1 + i−min{n, t+ 1 + i}} .(A.106)The last line of (A.106) implies the inductive step for all i ∈ V−n.Next we consider the i = n case. If i = n then min{n, t + i} = n for all t ≥ 0. Also notethat when i = 1 we have min{n, t + i} = t + 1 for all t ≤ n − 2. Utilizing (A.105), the abovetwo observations lead to the following expression for V¯n(t+ 1) for any t ≤ n− 2,V¯n(t+ 1) = V¯n(t)⋃V¯1(t)= {{n} ∪ {1, . . . , t}}⋃{{1, . . . , t+ 1} ∪ {∅}}= {n} ∪ {1, . . . , t+ 1} .(A.107)The last line of (A.107) implies the inductive step for i = n. The condition (A.104) thus holds forall i ∈ V and t ∈ {0, 1, . . . , n−1}. At t = n−1 we have min{n, t+ i} = n for all i ∈ V, it followsfrom (A.104) that |V¯i(n− 1)| = n and thus the update (A.101) is complete (cf. Def.A.4.12) att = (n− 1). Proof of Thm.A.5.1. (Necessity)The proof of necessity follows by contradiction. Assume that C0,n−2 ∈ RGF and C0,n−2 /∈RGFSC . If C0,n−2 /∈ RGFSC then by Def.A.4.6 and Def.A.4.9 we have (k)ntrcs ⊆ C`,` for116A.5. Fixed Ring Graph Theoremeach ` ∈ {0, 1, . . . , n − 2} and some k < n. From the proof of sufficiency as given above itfollows that |V¯i(n − 1)| ≤ k < n and thus the update (A.101) is not complete (cf. Def.A.4.12)at t = (n− 1) .A.5.3 Supporting Lemmas for Theorem A.5.1.Lemma A.5.2 For any communication sequence Ct,t ∈ RGSC (cf. Def.A.4.6 − A.4.7), itfollows that Ct,t does not contain any k-cycles (cf. Def. A.4.9) for all k ∈ {1, . . . , n− 1}. Proof of Lemma A.5.2.The result follows from a proof by contradiction. Assume Ct,t ∈ RGSC and that Ct,t containsa k-cycle (cf. Def. A.4.9) for some k ∈ {1, . . . , n − 1}. Next observe that by Def.A.4.6 thatCt,t ∈ RGSC implies that Ct,t contains a ntrcp from each node i ∈ V to every node j ∈ V−i.If there existed a subset V(k) ⊂ V such that Ct,t contains a ntrcp from each node i ∈ V(k) toevery node j ∈ V(k)−i, then there exists a subset of nodes V \ V(k) to which there could exista ntrcp from some node i ∈ V(k) only if node i transmitted two (or more) signals (one signalbeing transmitted to the receiving node contained in V(k), and one signal being transmitted tothe node contained in V \ V(k)). The Ct,t ∈ RGSC condition (cf. Def.A.4.7) requires that eachnode transmits one and only one signal, thus we find a contradiction for any Ct,t ∈ RGSC thatcontains a k-cycle for any k ∈ {1, . . . , n−1}. Lemma A.5.3 For any communication sequence C0,t ∈ RGFSC we can without loss of gener-ality (WLG) assume that C`,` is expressed as follows for each ` ∈ {0, 1, . . . , t},C`,` = {S1 2, S2 3, S3 4, . . . , S(n−1) n, Sn 1} (A.108)where we have omitted the time indices but it is understood each signal is sent and received attime `. Proof of Lemma A.5.3.For any C0,t ∈ RGFSC each C`,` ∈ RGSC does not vary over the time interval ` ∈{0, 1, . . . , t}. By Def.A.4.6−A.4.7, the following condition is sufficient for C`,` ∈ RGFSC ,C`,` = {Si1 i2 , Si2 i3 , Si3 i4 , . . . , Sin−1 in , Sin i1} , ik ∈ V \ {∪k−1r=1ir} , ∀ k ∈ V (A.109)117A.6. Corollary A.6.1.where we have omitted the time indices but it is understood each signal is sent and receivedat time `. The condition (A.109) can also be proven necessary for any C`,` ∈ RGFSC byobserving that if any signal is added or deleted then C`,` /∈ RG and thus Def.A.4.7 is notsatisfied. Similarly, if the receiving node j(i) of any node i and transmitting node r(`) of anynode ` ∈ V−j(i) are interchanged, then there exists a k-cycle (cf. Def.A.4.9) for some k < n, andthus by lemma A.5.2 the communication sequence C`,` is not ntrcs (cf. Def. A.4.6). Conversely,if the receiving node j(i) of any node i and transmitting node r(`) of the node ` = j(i) areinterchanged, then there is no change in the condition (A.109) because the actual values of eachik ∈ V are arbitrary under the constraint ik ∈ V \ {∪k−1r=1ir} , ∀ k ∈ V, as is stated in (A.109).We have shown that (A.109) is both necessary and sufficient for C`,` ∈ RGFSC . Besides theconstraint ik ∈ V\{∪k−1r=1ir} , ∀ k ∈ V as stated in (A.109), the values of each ik ∈ V are arbitraryand thus we can assume WLG that (A.108) holds for all C`,` ∈ RGFSC . A.6 Corollary A.6.1.In this section we prove a corollary to Thm.A.5.1. That is for any sequence of ring graphsC0,(n−2)(n!) ∈ RGSC the update (A.100) will be completed at time((n− 2)(n!) + 1).A.6.1 Statement of Corollary A.6.1.Corollary A.6.1 (Extended RGF-Thm.) For any C0,(n−2)(n!) ∈ RGSC the update (A.100)will be completed at time((n−2)(n!)+1). A.6.2 Proof of Corollary A.6.1.Proof of Cor.A.6.1.The Thm.A.5.1 implies that for any C0,0 ∈ RGSC and k ∈ N we have |V¯i(1)| ≥ 2 for alli ∈ V. Next observe that lemma A.6.3 implies any communication sequence C1,k(n!) ∈ RGSC willnecessarily contain at least k identical ring graphs C`,` ∈ RGSC among the set {C`,` ∈ C1,k(n!) :` ∈ {1, . . . , k(n!)}}, and thus from Thm.A.5.1 for any C0,k(n!) ∈ RGSC we have |V¯i(k(n!)+1)| ≥k + 2 for all i ∈ V and any k ∈ {1, . . . , n− 2}. It then follows that for any C0,(n−2)(n!) ∈ RGSCthe reduced input set of each node i ∈ V will satisfy |V¯i((n − 2)(n!) + 1)| = n and thus theupdate (A.100) is completed at time((n−2)(n!)+1). 118A.7. Conjecture A.7.1.A.6.3 Supporting Lemmas for Corollary A.6.1.Lemma A.6.2 For any communication sequence C0,t ∈ RG (cf. Def.A.4.7) there exists oneand only one node i(a0,t) for any i ∈ V and sequence a0,t ∈ P(0, t). Proof of Lemma A.6.2.By Def.A.4.7 each node receives exactly one signal at each time instant, thus if i(xk) existsthen it must be unique. Likewise if i(xk−1, xk) exists then it must be unique. Iterating thisargument we find that if i(a0,t) exists then it must be unique. Next observe that by Def.A.4.7each node in V sends exactly one signal at each time instant, thus i(a0,t) is contained in V andhence exists. Lemma A.6.3 For any communication sequence Ct,t ∈ RG (cf. Def.A.4.7) there exists (n!)possible communication sequences. Proof of Lemma A.6.3.For any Ct,t ∈ RG, the Lemma A.6.2 implies that at the time instant t each node i ∈ Vwill send exactly one signal, and this signal will be received by a unique node. For each nodei ∈ V let this unique node be denoted j(i). Let us first consider node 1, for which there exists npossible nodes j(1) ∈ V that may receive the signal from node 1. Next consider node 2, for which(since j(i) is unique) there exists n− 1 possible nodes j(2) ∈ V−j(1) that may receive the signalfrom node 2. Proceeding in this way, we find that for node k there exists n−k+1 possible nodesj(k) ∈ V \ {∪k−1r=1j(r)} that may receive the signal from node k. Besides the constraint j(k) ∈V\{∪k−1r=1j(r)}, each choice of the receiving node j(k) is independent from the previous choices ofthe receiving nodes, we can thus multiply the number of possible nodes that each node can senda signal to and thereby obtain (n!) as the total number of possible communication sequences forany Ct,t ∈ RG. A.7 Conjecture A.7.1.In this section we state a conjecture that combines the best features of Thm.A.5.1 and Cor.A.6.1.That is for any sequence of ring graphs C0,(n−2) ∈ RGSC the update (A.100) will be completedat time (n− 1).119A.8. Extensions to Time-Delayed and Asynchronous Communication SequencesA.7.1 Statement of Conjecture A.7.1.Conjecture A.7.1 (n-Thm.) For any communication sequence C0,n−2 ∈ RGSC , the update(A.100) will be completed at time (n−1). We note the above conjecture implies a super-exponential reduction in the completion timeof (A.100) as compared to Cor.A.6.1.A.8 Extensions to Time-Delayed and AsynchronousCommunication SequencesThis section generalizes Cor.A.6.1 and Conj.A.7.1 to time-delayed and asynchronous communi-cation sequences. The proof of both of these generalizations relies only on lemma A.8.3, whichis proven in Sec.A.8.4, subsequent to the statement of the generalizations in Sec.A.8.1.A.8.1 Statement of Extensions to Cor.A.6.1 and Conj.A.7.1For the following two results we assume there exists an upper bound denoted SCt on the timethat any sub-communication sequence Ct,t+SCt ⊆ C0,t is ntrcs (thus implying a “uniformly”ntrcs communication sequence). Given such an upper bound, the Cor.A.6.1 and Conj.A.7.1each imply an upper bound ¯SCt on time until any sub-communication sequence Ct,t+ ¯SCt ⊆ C0,tis trcs (thus implying a “uniformly” trcs communication sequence), as we now will state andthen prove in Sec.A.8.4.Theorem A.8.1 Assume Ct,t+SCt ∈ ntrcs for all t ∈ N and that C0,(n−2)n!(SCt+1)+SCt ∈RGSC the update (A.100) will be completed at time(((n−2)n!+1)(SCt+1)). The following conjecture implies a super-exponential reduction in the completion time of(A.100) as compared to Thm.A.8.1.Conjecture A.8.2 Assume Ct,t+SCt ∈ ntrcs for all t ∈ N, then the update (A.100) will be com-pleted at time((n−2)(SCt+1)+1). In Fig.A.2 below we illustrate the rationale behind Thm.A.8.1. The Fig.A.3 provides anillustration of the result stated in Conj.A.8.2.A.8.2 Proof of Theorem A.8.1.Proof of Thm.A.8.1.120A.8. Extensions to Time-Delayed and Asynchronous Communication Sequences{ { { { { {0 SCtSCt+1 2SCt+1... ... ((n-2)n!+1)(SCt+1)...|V |=1 |V |=2 |V |=2 |V |=n-1 |V |=n-1 |V |=ni i i i i in!(SCt+1)+SCtn!(SCt+1) ((n-3)n!+1)(SCt+1)(n-2)n!(SCt+1)...Figure A.2: A diagram of Thm.A.8.1.The updates (A.100) and (A.102) implies the reduced input set V¯ (t) is a non-decreasing func-tion of t. The lemma A.8.3 in Sec.A.8.4 proves that any trcp implies a ntrcp and thus assumingCor.A.6.1, the Thm.A.8.1 follows immediately using lemmaA.8.3. A.8.3 Proof of Conjecture A.8.2.Proof of Conj.A.8.2.The updates (A.100) and (A.102) implies the reduced input set V¯ (t) is a non-decreasing func-tion of t. The lemma A.8.3 in Sec.A.8.4 proves that any trcp implies a ntrcp and thus assumingConj.A.7.1, the Conj.A.8.2 follows immediately using lemmaA.8.3. . . . { |V |=3|V |=1 |V |=2 |V |=n-2 |V |=n. . . |V |=n-10 SCt+12SCt+12(SCt+1) (n-2)(SCt+1) (n-1)(SCt+1)(n-1)SCt+n-2(n-2)SCt+n-3. . . { { { {i i i i iSCtiFigure A.3: A diagram of Conj.A.8.2.A.8.4 Supporting Lemmas for Thm.A.8.1 and Conj.A.8.2Lemma A.8.3 Consider any communication sequence C0,t within which each signal is receivedat the same time as its transmission, that is,Sij(t0, t1) ∈ C0,t ⇔ t0 = t1 = ` ∈ [0, t] . (A.110)121A.8. Extensions to Time-Delayed and Asynchronous Communication SequencesNext consider the asynchronous and time-delayed communication sequence Cˆ(C0,t) which con-tains every signal in C0,t yet with possibly time-delayed and asynchronous signals, that is,Sij(t0, t1) ∈ C0,t ⇔(Sij(t0, t1) ∈ Cˆ(C0,t) , 0 ≤ t0 ≤ t1 ≤ t). (A.111)If C0,t contains a ntrcp from node j to node i then so does Cˆ(C0,t). Proof of Lemma A.8.3.We observe that any pair of asynchronous time-delayed signals Sji(t1, t2) and Sqj(t3, t4) willimply a trcs from node j to node q iff t2 ≤ t3. Since any trcp from node j to node i impliesa ntrcp from node j to node i, it follows that any communication sequence Cˆ(C0,t) as defined in(A.111) will imply a ntrcp from node j to node i if C0,t does. 122
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Average-consensus in a two-time-scale Markov system
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Average-consensus in a two-time-scale Markov system Topley, Kevin James 2014
pdf
Page Metadata
Item Metadata
Title | Average-consensus in a two-time-scale Markov system |
Creator |
Topley, Kevin James |
Publisher | University of British Columbia |
Date Issued | 2014 |
Description | In a spatially distributed network of sensors or mobile agents it is often required to compute the average of all local data collected by each member of the group. Obtaining the average of all local data is, for example, sufficient to conduct robust statistical inference; identify the group center of mass and direction of motion; or evenly assign a set of divisible tasks among processors. Due to the spatial distribution of the network, energy limitations and geographic barriers may render a data fusion center to be infeasible or highly inefficient in regard to averaging the local data. The problem of distributively computing the network average - also known as the average-consensus problem - has thus received significant attention in the signal processing and control research communities. Efforts in this direction propose and study distributed algorithms that allow every agent in the network to compute the global average via communication with only a subset of fellow agents. The thesis will present a framework in which to analyze distributed algorithms for both dynamic and static consensus formation. For dynamic consensus we consider a two-time-scale Markov system wherein each sensor node can observe the state of a local Markov chain. Assuming each Markov chain has a stationary distribution with slowly switching regime, we show that a local stochastic approximation algorithm in conjunction with linear distributed averaging can imply that each node estimate converges weakly to the current average of all stationary distributions. Each node can thus track the average of all stationary distributions, provided the regime switching is sufficiently slow. We then consider static consensus formation when the inter-node communication pattern is a priori unknown and signals possess arbitrarily long time-delays. In this setting, four distributed algorithms are proposed and shown to obtain average-consensus under a variety of general communication conditions. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2014-12-02 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0165567 |
URI | http://hdl.handle.net/2429/51262 |
Degree |
Doctor of Philosophy - PhD |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2015-02 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2015_february_topley_kevin.pdf [ 1.55MB ]
- Metadata
- JSON: 24-1.0165567.json
- JSON-LD: 24-1.0165567-ld.json
- RDF/XML (Pretty): 24-1.0165567-rdf.xml
- RDF/JSON: 24-1.0165567-rdf.json
- Turtle: 24-1.0165567-turtle.txt
- N-Triples: 24-1.0165567-rdf-ntriples.txt
- Original Record: 24-1.0165567-source.json
- Full Text
- 24-1.0165567-fulltext.txt
- Citation
- 24-1.0165567.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0165567/manifest