ACCURATE AND EFFICIENT NETWORK MONITORING ON MESH TOPOLOGIES VIA NETWORK CODING by JIAQI GUI B.E., Information Engineering, Shanghai Jiao Tong University, China, 2008 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) June 2010 © Jiaqi Gui, 2010 iiAbstractAccurate and eﬃcient measurement of network-internal characteristics is critical for man-agement and maintenance of large-scale networks. In this thesis, we propose a linearalgebraic network tomography (LANT) framework for active inference of link loss rateson mesh topologies via network coding. Probe packets are transmitted from the sourcesto the destinations along a set of paths. Intermediate nodes linearly combine the re-ceived probes and transmit the coded probes using pre-determined coding coeﬃcients.Although a smaller probe size can reduce the bandwidth usage of the network, the infer-ence framework is not valid if the probe size falls below a certain threshold. To this end,we establish a tight lower bound on probe size which is necessary for establishing themappings between the contents of the received probes and the losses on the diﬀerent setsof paths. Then, we develop algorithms to ﬁnd the coding coeﬃcients such that the lowerbound on probe size is achieved. Furthermore, we propose a linear algebraic approachto developing consistent estimators of link loss rates, which converge to the actual lossrates as the number of probes increases. We show that using the LANT framework, theidentiﬁability of a link, which only depends on the network topology, is a necessary andsuﬃcient condition for the consistent estimation of its loss rate. Simulation results showAbstract iiithat the LANT framework achieves better estimation accuracy than the belief propaga-tion (BP) algorithm for large number of probe packets.ivContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiContents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Network Coding Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3 Motivations and Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 131.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.5 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 The LANT Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Contents v2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2 Probe Coding Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2.1 Lower Bound on Probe Size . . . . . . . . . . . . . . . . . . . . . 222.2.2 Algorithms for Finding a Valid Probe Coding Scheme . . . . . . . 25Constructing Auxiliary Trees . . . . . . . . . . . . . . . . . . . . 26Selecting Coding Coeﬃcients . . . . . . . . . . . . . . . . . . . . . 26Designing Probe Packets . . . . . . . . . . . . . . . . . . . . . . . 282.3 Linear Algebraic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 322.3.1 Least-squares Solutions . . . . . . . . . . . . . . . . . . . . . . . . 322.3.2 Method of Normal Equations . . . . . . . . . . . . . . . . . . . . 392.3.3 Method of Row Selection . . . . . . . . . . . . . . . . . . . . . . . 403 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 514.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54viList of Figures1.1 The Butterﬂy network. Sources s1 and s2 multicast information x1 and x2to receivers r1 and r2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1 A directed acyclic graph withV={s,r,1,2,3,4}andE ={e1,e2,...,e7}.The set of monitored end-to-end paths P = {P1,P2,P3}, where P1 ={e1,e2,e5,e7}, P2 = {e1,e2,e4,e6,e7}, and P3 = {e1,e3,e6,e7}. For linke2 = (1,2), we haveP(e2) ={P1,P2}. . . . . . . . . . . . . . . . . . . . . 182.2 A directed acyclic graph with two end links e6 and e7. . . . . . . . . . . . 252.3 Two auxiliary trees Te6 and Te7, corresponding to end links e6 and e7,respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.1 Directed acyclic graphs with diﬀerent number of sources. (a) One source(node 1) and one receiver (node 9); (b) two sources (nodes 1 and 9) andone receiver (node 7); (c) three sources (nodes 1, 4 and 10) and one receiver(node 9). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.2 The RMSE of LA using the methods of normal equations (LA-NE) androw selection (LA-RS), versus the number of probe batches n. . . . . . . 46List of Figures vii3.3 The RMSE of the BP algorithm and the LA-RS algorithm, versus thenumber of probe batches n, for diﬀerent average success rate αave. . . . . 473.4 The RMSE of the BP algorithm and the LA-RS algorithm, versus theaverage success rate αave (n = 500). . . . . . . . . . . . . . . . . . . . . . 473.5 The RMSE of the LA-RS algorithm, versus the number of probe batchesn, for diﬀerent number of sources (αave = 0.8). . . . . . . . . . . . . . . . 483.6 The RMSE of the LA-RS algorithm, versus the average success rate αave,for diﬀerent number of sources (n = 500). . . . . . . . . . . . . . . . . . . 493.7 The RMSE of the LA-RS algorithm, versus the number of probe batchesn, for networks of diﬀerent sizes (αave = 0.9). . . . . . . . . . . . . . . . . 50viiiList of AcronymsQoS Quality of ServiceMVWA Minimum Variance Weighted AverageEM Expectation-MaximizationLEND Least-Biased End-to-end Network DiagnosisMILS Minimal Identiﬁable Link SequenceLPRs Link Pass RatiosPPRs Path Pass RatiosLA Linear AlgebraLANT Linear Algebraic Network TomographyBP Belief PropagationML Maximum LikelihoodRMSE Root Mean Square ErrorList of Acronyms ixNE Normal EquationsRS Row SelectionxList of Symbols|·| Cardinality of a setn Number of probe batchesV Set of nodesS Set of source nodesR Set of receiver nodesE Set of linksEI Set of identiﬁable linksEV Set of virtual linksER Set of end linksP Set of monitored end-to-end pathsP(e) Set of end-to-end paths that include link eP Power set ofP,|P|= 2|P|List of Symbols xiG= (V,E) Directed acyclic graphGe Subgraph consisting of the links and nodes in setP(e)G Set of subgraphs with overlapping linksTe Auxiliary tree corresponding to end link e and subgraphGeLe Set of leaf nodes inTeui(v) The ith tree node that corresponds to the non-receiver nodev in the directed graphεi(e) The ith tree link that corresponds to link e in the directedgraphℓ Probe size, number of bits in each probe packetq = 2ℓ Alphabet size of a probe packetM = (mi,j)|P|×|E| Path-link matrixM = (mi,j)|P|×|EI∪EV | Type 1 modiﬁed path-link matrixfM = (emi,j)(|P|−1)×|EI∪EV | Type 2 modiﬁed path-link matrixM(|P|) Auxiliary matrix of type 2 modiﬁed path-link matrix withpath setPαj Actual link success rate of jth link inEI∪EVList of Symbols xiiβi Actual path success rate of ith path inPθi Actual path set success rate of ith path set in P\{∅}a = (aj)|EI∪EV |×1 Column vector, where aj = logαjb = (bi)|P|×1 Column vector, where bi = logβic = (ci)(|P|−1)×1 Column vector, where ci = logθiµ =|P|−1 Number of rows in fMν =|EI∪EV| Number of columns in fMfM1 Reduced ν×ν path-link matrix from fM after row selectionc1 Reduced ν×1 column vector from c after row selectionxiiiAcknowledgementsFirst, I would like to express my deepest gratitude to my supervisor, Dr. Vincent Wong,for his patient guidance, constant encouragement, and excellent advice throughout mygraduate study. Through our weekly meeting, we generate new ideas, stimulate creativityand dig into technology details. Without his invaluable help, this work would not bepossible.A special thanks goes to my colleague, Vahid Shah-Mansouri, who provided me withprecious assistance during the completion of this thesis. I am also thankful to all mycolleagues in Dr. Wong’s group: Amir Hamed Mohsenian-Rad, Vahid Shah-Mansouri,Man Hon Cheung, Keivan Ronasi, Ehsan Vahedi, Wei Bao, Xiaolei Hao, Wenbo Shi, andJinbiao Xu as well as other friends in the data communications group, for sharing theirexperiences and knowledge during the time of my study.Finally, I take this opportunity to express my profound gratitude to my belovedparents for their understanding, support, and endless love during my study in Canada.To them I dedicate this thesis.This research is supported by the Natural Sciences and Engineering Research Council(NSERC) of Canada under grant number STPSC 356767.1Chapter 1IntroductionAccurate and eﬃcient measurement of network-internal characteristics is critical for man-agement and maintenance of large-scale networks. With accurate and timely performanceestimates, more eﬃcient traﬃc control protocols and dynamic routing algorithms can bedesigned. Quality-of-service (QoS) guarantees can be achieved if the available band-width can be gauged; the resulting service level agreements can be veriﬁed. Detectinganomalous or malicious behavior becomes a more achievable task [1].Although network administrators can monitor local traﬃc conditions and detect con-gestion points in small-scale networks, most networks are not completely isolated. Theuser-perceived performance of a network thus depends heavily on the performance of theinternetwork. The traditional approach for characterizing network performance is basedon detailed queueing models at the individual router level, which requires access to a widerange of routers to obtain link-level statistics. However, the routers are operated by diﬀer-ent companies or service providers, which makes it diﬃcult to collect detailed informationat individual devices. Alternatively, we can make useful end-to-end measurements thatdo not require widely cooperation from internal network devices. Subsequently, based onthe end-to-end measurements, we can apply inference techniques to extract the hiddenChapter 1. Introduction 2information of interest.Broadly speaking, large-scale network inference involves estimating network perfor-mance parameters based on traﬃc measurements at a limited subset of the nodes. Vardiwas one of the ﬁrst researchers to rigorously study this type of problem and he coinedthe term network tomography [2] due to the similarity between the inference of networkcharacteristics and medical tomography. Three forms of network tomography have beenaddressed in the recent literature: (1) link-level parameter estimation based on end-to-end, path-level traﬃc measurements [3, 4], (2) sender-receiver path-level traﬃc intensityestimation based on link-level traﬃc measurements, and (3) the inference of networktopology [5, 6]. Characterizing these parameters is critical for detecting congestion,faults and other anomalous behavior, ensuring compliance of service-level agreementswith users, and management of overlay networks.In link-level parameter estimation, the end-to-end measurements usually consist of thenumber of probe or data packets transmitted and received between the source and thereceiver nodes or the delay between packet transmissions and receptions. The objective isto estimate the loss rate or the queuing delay of each link. Dropped packets on a link areusually due to overload of the ﬁnite output buﬀer of one of the routers encountered whentraversing the link, but may also be caused by equipment downtime due to maintenanceor power failures. The end-to-end delay is due to both propagation delay processing delay,queuing delay, and transmission delay. As assumed by most literature, occurrences ofdropped packets and queuing delay is inherently random.Chapter 1. Introduction 3In path-level traﬃc intensity estimation, the measurements consist of the number ofprobe or data packets that transmitted through nodes in the network. In privately ownednetworks, the collection of such measurements is relatively straightforward. Based onthese measurements, the goal is to estimate how much traﬃc originated from a speciﬁednode and was transmitted to a speciﬁed destination. The combination of the traﬃcintensities of all the origin-destination pairs forms the origin-destination traﬃc matrix.Both the node-level measurements and the parameter to be estimated are inherentlyrandom.In the inference of network topology, the measurements usually consist of the numberof probe or data packets transmitted and received between the source and the receivernodes or the delay between packet transmissions and receptions. Some proposals requireclock synchronization while other more practical ones do not. The physical networktopology can be represented as a directed graph, where each vertex represents a physicaldevice such as a router or a switch and the edges correspond to the communication linksbetween those devices. Based on the end-to-end measurements, the logical topology canbe determined.Network tomography can be performed either in an active or passive manner. Activenetwork tomography refers to the case where probe packets are sent from the sources tothe receivers located on the periphery of the network [7–9]. Using the end-to-end mea-surements generated by probe packets transmitted and received between the source andthe receiver nodes, more informative and reliable path-level measurements are providedChapter 1. Introduction 4at the cost of utilizing additional network resources such as bandwidth and energy.On the contrary, passive network tomography reveals information from the existingdata traﬃc, so that it is more attractive for networks (e.g., wireless sensor networks) withlimited power supply and bandwidth constraints [10–13]. There is also an acceleratingtrend toward network security that will create a highly uncooperative environment foractive tomography. For example, ﬁrewalls designed to protect information may not allowrequests for routing information, special packet handling and other network transportprotocols required by many active tomography techniques. This has prompted investi-gations into passive-based traﬃc monitoring techniques.1.1 Network Coding BasicsNetworked systems arise in various communication contexts such as telephone networks,the public Internet, peer-to-peer networks, ad-hoc wireless networks, and wireless sensornetworks. An inherent premise behind the operation of all communication networks liesin the way information is treated. Recently, with the advent of network coding, the sim-ple but important observation was made that in communication networks, intermediatenodes are allowed not only forward but also process the incoming independent informa-tion ﬂows [14–16]. At the network layer, for example, intermediate nodes can performbinary addition of independent bitstreams, whereas, at the physical layer of optical net-works, intermediate nodes can superimpose incoming optical signals. In other words,data streams that are independently produced and consumed do not necessarily need toChapter 1. Introduction 5(a) Routing to r1 (b) Routing to r2 (c) Network CodingFigure 1.1: The Butterﬂy network. Sources s1 and s2 multicast information x1 and x2to receivers r1 and r2.be kept separate when they are transported throughout the network. Combining inde-pendent data streams can better tailor the information ﬂow to the network environmentand accommodate the demands of speciﬁc traﬃc patterns. This shift in paradigm isexpected to revolutionize the way we manage, operate, and understand the organizationin networks, as well as to have a deep impact on a wide range of areas such as reliabledelivery, resource sharing, eﬃcient ﬂow control, network monitoring, and security [17].One essential beneﬁt of network coding is in terms of throughput when multicasting.The following simple example from [14] illustrates the basic concepts in network codingand gives a preliminary idea of the expected beneﬁts and challenges.Example 1: Fig. 1.1 depicts a communication network represented as a directedgraph where vertices correspond to terminals and edges correspond to channels. ThisChapter 1. Introduction 6example is commonly known in the network coding literature as the butterﬂy network.Assume we can send one bit per time slot through each channel. We have two sourcess1 and s2, and two receivers r1 and r2. Each source produces one bit per time slot whichwe denote by x1 and x2, respectively. If receiver r1 uses all the network resources byitself, it could receive information from both sources. Indeed, we could route the bitx1 from source s1 along the path {AD} and the bit x2 from source s2 along the path{BC,CE,ED}, as depicted in Fig. 1.1(a). Similarly, if the second receiver r2 uses all thenetwork resources by itself, it could also receive information from both sources. Indeed,we could route the bit x1 from source s1 along the path{AC,CE,EF}, and the bit x2from source s2 along the path{BF}as depicted in Fig. 1.1(b).Now assume that both receivers want to simultaneously receive the information fromboth sources. We then have a contention for the use of edge CE, since we assume thateach channel can only transmit one bit per time slot. Traditionally, information ﬂow wastreated like ﬂuid through pipes, and independent information ﬂows were kept separate.The simple but important observation made in the work by Ahlswede et al. is that wecan allow intermediate nodes in the network to process their incoming information ﬂows,and not just forward them. In Fig. 1.1(c), node C can take bits x1 and x2 and XOR themto create a third bit x3 = x1 + x2 which it can then send through edge CE (the XORoperation corresponds to addition over the binary ﬁeld). r1 receives {x1,x1 + x2}, andcan solve this system of equations to retrieve x2 by XORing x1 and x1 +x2. Similarly, r2receives{x2,x1 + x2}, and can solve this system of equations to retrieve x1 by XORingChapter 1. Introduction 7x2 and x1 + x2. The previous example shows that if we allow intermediate node in the network tocombine information streams and extract the information at the receivers, we can increasethe throughput when multicasting.Network coding can also be used to infer the loss rates of links in an overlay network.For conventional active tomography, packets are usually multicast to several receivers.After a suﬃciently large number of probe packets, shared links and their loss rates canbe inferred with reasonable accuracy. In such a setting, network coding can provideadditional ﬂexibility since probe packets are not only duplicated at branching points ofthe multicast tree, but may also be merged. If multiple senders transmit packets to asingle receiver, and these packets are combined within the network, it allows to infernetwork parameters in much the same way as multicasting from one sender to multiplereceivers [18]. Furthermore, if the network coding coeﬃcients (i.e., the speciﬁc way inwhich packets are combined at the nodes) are known in advance, the received codedprobe packets can provide additional information about which packets were lost in whichpart of the tree. By exploiting these features, it is possible to signiﬁcantly reduce thenumber of active probes and the bandwidth usage generated by these probes, and thusincrease bandwidth eﬃciency.Chapter 1. Introduction 81.2 Related WorkIn the area of loss tomography, extensive studies have been given to multicast tree topolo-gies. In [7], Caceres et al. developed a maximum-likelihood estimator for loss rates oninternal links based on losses observed by multicast receivers. It exploits the inherentcorrelation between such observations to infer the performance of paths between branchpoints in the tree spanning a multicast source and its receivers. The proposed methodrelies on the iterative approximation to identify the parameters that requires a long exe-cution time. In addition, the parameters identiﬁed by this method may not be the truevalues of those parameters since the iterative procedure may trap into a local maximum.In [19], Zhu et al. proposed an estimate that is based on the correlation between alink and its sibling links to identify the loss rate of the link. The proposed method, in-stead of using an iterative approach to approximate the maximum, employs a bottom-upapproach to identify the loss rates of the links of a network.In contrast to multicast techniques, unicast inference based on multicast tree topolo-gies can easily be performed on most networks. In [20], Duﬃeld et al. designed exper-iments based on the notion of transmitting stripes of packets (with no delay betweentransmission of successive packets within a stripe) to two or more receivers. The pur-pose of these stripes is to ensure that the correlation in receiver observations matchesas closely as possible what would have been observed if the stripe had been replaced bya multicast probe that followed the same paths to the receivers. In [10], Tsang et al.designed a measurement procedure for network loss inference based on end-to-end packetChapter 1. Introduction 9pair measurements. Back-to-back packet pairs are the two packets that are sent one afterthe other by the source, possible destined for diﬀerent receivers, but sharing a commonset of links in their paths.In [21], Padmanabhan et al. investigated the problem of identifying lossy links inthe interior of the Internet by passively observing the end-to-end performance of existingtraﬃc between a server and its clients. The key advantage of a passive approach is that itdoes not introduce additional traﬃc which might perturb the object of inference, i.e., thelink loss rates. The techniques depend only on knowing the number of lost and successfulpackets sent to each client. While the accuracy of link loss rate inference may conse-quently suﬀer, the techniques can still pinpoint the trouble spots in the network (e.g.,highly lossy links). They developed and evaluated three techniques for passive networktomography: random sampling, linear optimization, and Bayesian inference using Gibbssampling.To extend the existing multicast and unicast tomography approaches to generaltopologies, in [22], Bu et al. proposed an approach using multiple trees to cover a meshtopology and combine the inferred loss rates. They further proposed two algorithms toperform the link-level inferences. One, the minimum variance weighted average (MVWA)algorithm treats the trees separately and then average the results. The second, based onexpectation-maximization (EM) merges all of the measurements into one computation.However, this approach may have low bandwidth eﬃciency, since those links that are partof multiple trees would be traversed by multiple probe packets in each time slot, and thusChapter 1. Introduction 10create additional traﬃc. In addition, it may incur high monitoring cost, since it requiresa large number of receivers to be deployed in each multicast tree.The pioneering work in [14] showed that for multicast networks, if intermediate nodescan perform network coding, that is to perform simple local XOR-operations on incomingpackets, one can achieve the min-cut throughput of the network to each receiver. Recentstudies show that applying network coding in loss tomography can increase bandwidtheﬃciency [17, 23]. In a network which is capable of performing network coding in addi-tion to multicast, the intermediate nodes linearly combine incoming probe packets andforward the coded probe packets to the outgoing links according to pre-determined cod-ing coeﬃcients. Results in [18] show that for active monitoring using network coding,appropriate selection of the number and location of sources and receivers can aﬀect theaccuracy of estimation in general tree topologies. The work in [24] established a frame-work for loss tomography on mesh topologies. An orientation algorithm is proposed toﬁnd a directed acyclic graph from an undirected graph with selected sources. An exampleis illustrated in [18] such that each link in a mesh topology can be traversed in each timeslot by exactly one probe.In contrast to network coding with pre-determined coding coeﬃcients, randomizednetwork coding changes the fundamental connection between path and link loss prob-abilities such that new inference algorithms need to be developed. In [13], Lin et al.studied the loss inference problem in sensor networks with randomized network coding.As end-to-end data are not suﬃcient to compute link loss rates precisely, they proposedChapter 1. Introduction 11inference algorithms based on Bayesian principles to discover the set of highly lossy linksin sensor networks. The algorithms achieve high detection and low false-positive ratesin extensive simulations. In [25], Yao et al. studied passive network tomography in thepresence of network failures, under the setting of random linear network coding. Sev-eral sets of algorithms for topology estimation and failure detection are proposed undervarious setting of adversarial random failures.Toreduce monitoring cost, a set of end-to-end paths on mesh topologies only requires alimited number of sources and receivers. In [26], Mao et al. proposed a brief propagation(BP) algorithm, which is combined with the use of network coding in [24]. The BPalgorithm is a low complexity algorithmic framework for link loss monitoring based onthe recent modeling and computational methodology of factor graphs [27]. It iterativelyupdates the estimates of link losses upon receiving (or detecting the loss of) recentlysent packets. The algorithm exhibits good performance and scalability, and can be easilyadapted to diﬀerent statistical models of networking scenarios. In particular, due to itslow complexity, the algorithm is particularly suitable as a long-term monitoring facility.In [28], Zhao et al. proposed a least-biased end-to-end network diagnosis (LEND)system for inferring link-level properties like loss rate. They deﬁned a minimal identiﬁablelink sequence (MILS) as a link sequence of minimal length whose properties can beuniquely identiﬁed from end-to-end measurements. They designed eﬃcient algorithms toﬁnd all the MILSs and infer their loss rates for diagnosis. The LEND system works forany network topology and for both directed and undirected topologies. It gives highlyChapter 1. Introduction 12accurate estimates of the loss rates of MILSs and such diagnosis can be achieved withﬁne granularity and in near real-time even for reasonably large overlay networks. TheLEND system can also supplement existing statistical inference approaches and providesmooth tradeoﬀ between diagnosis accuracy and granularity.In [29], Sun et al. focused on the problem of ﬁnding the link pass ratios (LPRs) whenthe path pass ratios (PPRs) of a set of paths are given. They proved the problem ofﬁnding the maximum-likelihood estimation of LPRs given PPRs is NP-hard, and thenproposed a simple algorithm based on divide-and-conquer. It ﬁrst estimates the numberof faulty links on a path, then uses the global information to estimate assign LPRs tothe links. It requires a priori probability distribution function on link loss rates and anassumption that the majority of links being lossless.In [30], Chen et al. focused on overlay network monitoring, which enables distributedInternet applications to detect and recover from path outages and periods of degradedperformance within seconds. For an overlay network with n hosts, existing systems eitherrequireO(n2) measurements, and thus lack scalability, or can only estimate the latencybut not congestion or failures. They proposed an algebraic approach that selectivelymonitors k linearly independent paths that can fully describe all theO(n2) paths. Theloss rates and latency of these k paths can be used to estimate the loss rates and latencyof all other paths.Chapter 1. Introduction 131.3 Motivations and ObjectiveIn this thesis, we consider the problem of link loss tomography on mesh topologies.Although there are extensive studies of link loss inference on multicast tree topologies[7, 19, 31, 32], loss tomography on mesh topologies is still a challenging problem. Existingapproaches have not exploited the inherent information in the end-to-end observations.As a result, the linear system of link loss rates and path loss rates usually has a coeﬃcientmatrix with deﬁcient column rank1, which makes it diﬃcult to accurately infer the linkloss rates [30].In general, most of the previously proposed loss tomography approaches on meshtopologies in the literature have one or more of the following performance bottlenecks:(1) low bandwidth eﬃciency, (2) high monitoring cost, (3) estimation not being alwaysaccurate, and (4) requiring additional assumptions. In this thesis, we propose a linearalgebraic network tomography (LANT) framework for active inference of link loss rateson mesh topologies. To increase bandwidth eﬃciency and reduce monitoring cost, wesend probe packets along a set of end-to-end paths rather than multicast trees and applynetwork coding. To increase the estimation accuracy, we exploit the inherent correlationbetween the losses on the links and those on the diﬀerent sets of paths, which is capturedthrough network coding. We refer to probe packets and network coding schemes jointly1The column rank of an m×n matrix is the maximum number of linearly independent columns ofthe matrix. If the matrix has rank n, then it has full column rank; otherwise, the matrix has de cientcolumn rank.Chapter 1. Introduction 14as probe coding schemes. In our LANT framework, a valid probe coding scheme enablesus to establish the mappings between the contents of received probe packets and thelosses on the diﬀerent sets of paths. Using valid probe coding schemes, we obtain validend-to-end observations, based on which we can distinguish which paths have successfullytransmitted a probe and which paths have not. We also deﬁne link identiﬁability, a linkproperty that only depends on the network topology. For identiﬁable links, we developconsistent estimators that converge to the actual loss rates as the number of probesincreases. Since the number of all path sets can grow exponentially as the number oftotal paths increases, we selectively monitor a subset of all path sets (the method of rowselection), which are suﬃcient to infer the loss rates of all identiﬁable links.1.4 ContributionsThe main contributions of this thesis are as follows [33, 34]:• We establish a tight lower bound on probe size, which is necessary for valid probecoding schemes when network coding is applied. Then, we develop algorithmsto ﬁnd a valid probe coding scheme such that the lower bound on probe size isachieved.• We propose a linear algebraic (LA) approach to developing consistent estimatorsof link loss rates, which converge to the actual loss rates as the number of probesincreases. We combine the methods of normal equations and row selection with theChapter 1. Introduction 15LA approach, and analyze the computational complexity.• We prove that the identiﬁability of a link, which only depends on the networktopology, is a necessary and suﬃcient condition for the consistent estimation of itsloss rate, using the LANT framework.• Simulation results show that the LA approach using the method of row selectioncan eﬀectively decrease computational complexity without reducing estimation ac-curacy. Besides, the LA approach achieves better estimation accuracy than theBP algorithm, when the estimators converge. Although the eﬀect of the numberand location of sources on the accuracy can be negligible with relatively large suc-cess rates or suﬃcient probe batches, diﬀerent number and location of sources mayresult in diﬀerent number of identiﬁable links.The framework we present in this thesis is unique when compared to the prior workdone in the area of loss tomography using network coding. In terms of bandwidth eﬃ-ciency, the work in [24] establishes a loose lower bound on probe size for valid probe codingschemes, while the problem of ﬁnding coding coeﬃcients remains unexplored. Withouteﬃcient algorithms to ﬁnd coding coeﬃcients, the inference framework is incomplete andcannot be applied. We establish a tight lower bound and also develop algorithms to ﬁnda valid probe coding scheme such that the lower bound on probe size is achieved. Interms of inference approaches, the BP algorithm [26] only uses the information of thelosses on diﬀerent paths such that the estimation may not be accurate for networks withChapter 1. Introduction 16relatively high link loss rates. The work in [29] requires additional assumptions such asa priori probability distribution function and the majority of links being lossless. Thework in [28] only ﬁnd the loss rate of a minimal identiﬁable link sequence. In contrast,our LA approach does not need extra assumptions while it can still obtain additionaluseful information, the losses on the diﬀerent sets of paths. This information can only beobtained via probe coding schemes and cannot be achieved by routing probes in general.As a result, we obtain better estimation accuracy and obtain more identiﬁable links.1.5 Structure of the ThesisThe rest of this thesis is organized as follows. In Chapter 2, we present the LANTframework, including system model, probe coding schemes, and an LA approach forthe consistent estimation of link loss rates. Chapter 3 presents performance evaluation.Conclusions are given in Chapter 4.17Chapter 2The LANT FrameworkIn this chapter, we present we present the LANT framework, including system model,probe coding schemes, and an LA approach for the consistent estimation of link lossrates.2.1 System ModelWe model the network as a directed acyclic graphG= (V,E), consisting of a set of nodesVand a set of linksE. The node setVincludes routers and periphery devices where probepackets are sent and received. A link e = (v,v′)∈E denotes a directed communicationlink from node v to node v′. LetS andRdenote the set of source nodes and the set ofreceiver nodes, respectively. The set of monitored end-to-end paths is denoted byP. Apath P ∈P is a set of directed links from a source to a receiver. Let P(e) denote theset of paths that include link e. We deﬁne a path-link matrix M = (mi,j)|P|×|E|, whose|P|rows correspond to the|P|paths and the|E|columns correspond to the|E|links, asfollows: The element mi,j is equal to 1 if the ith path in setP includes the jth link in setE, and is equal to 0 otherwise. As an example, the directed acyclic graph in Fig. 2.1 hasChapter 2. The LANT Framework 18Figure 2.1: A directed acyclic graph withV ={s,r,1,2,3,4}andE ={e1,e2,...,e7}.The set of monitored end-to-end paths P = {P1,P2,P3}, where P1 ={e1,e2,e5,e7}, P2 = {e1,e2,e4,e6,e7}, and P3 = {e1,e3,e6,e7}. For linke2 = (1,2), we haveP(e2) ={P1,P2}.three paths from source s to receiver r. Its path-link matrix is a 3 by 7 binary matrix asshown below:M =e1 e2 e3 e4 e5 e6 e7P1 1 1 0 0 1 0 1P2 1 1 0 1 0 1 1P3 1 0 1 0 0 1 1. (2.1)Given a directed acyclic graphG= (V,E) and a set of monitored end-to-end pathsP,a link e∈Eis called identiﬁable, if for each link pair (e,e′) where e′∈E\{e}, there existsat least one path inP that includes only one of the two links, i.e.,P(e)̸=P(e′). As inFig. 2.1, links e2,e3,...,e6 are identiﬁable links, while links e1 and e7 are non-identiﬁablelinks since P(e1) = P(e7). We notice that the identiﬁability of a link depends only onthe network topology.The following proposition shows that the identiﬁability of a link is a necessary condi-Chapter 2. The LANT Framework 19tion for the estimation of its loss rate.Proposition 1 The loss rate of a link can be estimated only if the link is an identiﬁablelink.Proof: We prove it by contradiction. Suppose there exists a link pair (e,e′), wheree,e′∈E, such that all paths inP include either both or none of them, i.e.,P(e) =P(e′).If a probe packet is being dropped on either link e or e′, the same end-to-end observation(probe packets with the same contents) will be obtained in either case. Therefore, wecannot diagnose on which link the loss of probe packet occurs, and it is not possible toestimate the loss rate of these links. We divide the set of non-identiﬁable links into several groups, where each groupcontains a set of links that are included in the same set of paths. We refer to each groupas a virtual link. As in Fig. 2.1, sinceP(e1) =P(e7), we refer to e1 and e7 as one virtuallink ev1. Let EI and EV denote the set of identiﬁable links and the set of virtual links,respectively. We haveEI ={ev1}andEV ={e2,e3,...,e6}. Note thatEI∩EV = ∅.Thus, for each link e∈EI ∪EV , we haveP(e)̸=P(e′) for all e′ ∈EI ∪EV \{e}. Weﬁx the order of elements inEI∪EV . Accordingly, we deﬁne a modiﬁed path-link matrixM = (mi,j)|P|×|EI∪EV | as follows: The element mi,j is equal to 1 if the ith path in setPincludes the jth link in setEI ∪EV , and is equal to 0 otherwise. We refer to M as type1 modiﬁed path-link matrix. The type 1 modiﬁed path-link matrix for the graph in Fig.Chapter 2. The LANT Framework 202.1 is shown below:M =ev1 e2 e3 e4 e5 e6P1 1 1 0 0 1 0P2 1 1 0 1 0 1P3 1 0 1 0 0 1. (2.2)We model the loss of packets on diﬀerent links by a set of mutually independentBernoulli processes. Losses are therefore spatial and temporal independent. This modelis commonly used in the literature [7–9, 19, 31, 32] for network tomography. We deﬁneαj ∈(0,1] as the link success rate of the jth link in setEI∪EV , which is the probabilitythat a packet can be successfully transmitted on the jth link. Thus, 1−αj denotesthe loss rate of the jth link in set EI ∪EV . Moreover, we deﬁne βi ∈ (0,1] as the pathsuccess rate of the ith path in setP, which is the probability that a probe packet can besuccessfully transmitted on the ith path in setP.Unlike data packets, probe packets would not be retransmitted if being dropped.Thus, we have|EI∪EV |∏j=1(αj)mi;j = βi, i = 1,...,|P|. (2.3)Taking logarithm on both sides of (2.3), we can reformulate it as linear equations:|EI∪EV |∑j=1mi,j logαj = logβi, i = 1,...,|P|, (2.4)where logαj and logβi are the variables of linear equations. Setting aj = logαj andbi = logβi, we have|EI∪EV |∑j=1mi,jaj = bi, i = 1,...,|P|. (2.5)Chapter 2. The LANT Framework 21We deﬁne two column vectors a = (aj)|EI∪EV |×1, and b = (bi)|P|×1. The system canbe represented in the matrix form asMa = b. (2.6)Equation (2.6) shows the relation between the path and link success rates. The objectiveof loss tomography is to infer the link loss rates using end-to-end observations (i.e., thenumber and the contents of the received probe packets). Let ˆa and ˆb denote the estimatorof a and b, respectively. By measuring the path success rates, we can estimate ˆb while ˆaremains unknown. Thus, equation (2.6) becomes a system of|P|equations with|EI∪EV|unknowns as:Mˆa = ˆb. (2.7)In most cases, the number of identiﬁable and virtual links is greater than the number ofpaths. That is,|EI∪EV|>|P|. Thus, (2.7) is under-determined. We propose the LANTframework to obtain additional useful information and determine ˆa.The LANT framework is composed of two phases. In the ﬁrst phase, we apply networkcoding and perform end-to-end measurements on the set of pathsP. n batches of probepackets are sent from the sources in a synchronized manner. In each time slot, theintermediate nodes linearly combine the incoming probes according to speciﬁc codingcoeﬃcients. The key objective in this phase is to ﬁnd the minimum probe packet sizethat can establish the mappings between the contents of the received probe packets andthe losses on the diﬀerent sets of paths. In the second phase, we inspect the contents ofthe received probe packets. We show that it can provide us with more information thanChapter 2. The LANT Framework 22path success rates. We establish a linear system whose coeﬃcient matrix has full columnrank, and use computational eﬃcient algorithms to develop consistent estimators of linkloss rates. In the next two sections, we describe these two phases in details.2.2 Probe Coding SchemesIn Subsection 2.2.1, we establish a tight lower bound on probe size (i.e., number of bitsin each probe packet), which is necessary for valid probe coding schemes. Then, wepropose algorithms to ﬁnd a valid probe coding scheme with the minimum probe size inSubsection 2.2.2.2.2.1 Lower Bound on Probe SizeWe refer to probe packets and network coding schemes jointly as probe coding schemes.A probe coding scheme is valid if we can determine which paths have successfully trans-mitted a probe and which paths have not from the end-to-end observations. We adoptlinear network coding schemes [15] that are suﬃcient for our task.A probe packet is a binary vector (·)2 of length ℓ, which can be interpreted as anelement in a ﬁnite ﬁeld Fq with an alphabet of size q (q = 2ℓ). A coding coeﬃcientcan also be interpreted as an element in the ﬁnite ﬁeld Fq. Within valid probe codingschemes, the probe size ℓ is desired to be as small as possible, since it is directly relatedto bandwidth eﬃciency. Although a smaller probe size can reduce the bandwidth usageof the network, the inference framework is not valid if the probe size falls below a certainChapter 2. The LANT Framework 23threshold. For example, in Fig. 2.1, receiver r receives coded packets that are combinedfrom packets on three diﬀerent paths. Using one-bit probe packets, we are not able todistinguish which of these three paths have successfully transmitted a probe packet. Inthis case, we need probe packets with at least three bits for valid probe coding schemeswhile smaller probe sizes cannot constitute valid probe coding schemes.Before we ﬁnd a lower bound on probe size, which is necessary for valid probe codingschemes, we present the notations used in our approach. In a directed acyclic graphG = (V,E), a link is an end link if it is adjacent to a receiver r ∈R. The set of allend links is denoted by ER. For an end link e ∈ER, let Ge denote a subgraph of Gconsisting of the links and nodes involved in set P(e). We notice that if receiver r hasmultiple end links, it would know from which link a packet is received. Let qe and ℓedenote the alphabet size and the length of the probe packets transmitted on subgraphGe, respectively. The following theorem presents a loose lower bound on probe packetsize for valid probe coding schemes.Theorem 1 For the probes transmitted on subgraph Ge, where e ∈ER, the probe sizeshould satisfy ℓe≥|P(e)|(i.e., qe≥2|P(e)|), in order to obtain valid end-to-end observa-tions.Proof: For each end link e∈ER, letP(e) ={P1,P2,...,P|P(e)|}. As for valid probecoding schemes, based on the content of the received probe, a receiver should distinguishwhich paths have successfully transmitted a probe and which paths have not. Withoutloss of generality, we start from path P1. Since a zero binary vector will introduceChapter 2. The LANT Framework 24ambiguity, (1)2 is the smallest binary vector we can use to denote the case where onlypath P1 has successfully transmitted a probe. (10)2 is the smallest binary vector we canuse to denote the case where only path P2 has successfully transmitted a probe. Since(11)2 denotes the case where both paths P1 and P2 have successfully transmitted a probe,(100)2 is the smallest binary vector we can use to denote the case where only path P3 hassuccessfully transmitted a probe. By induction, we can show that (10···0)2 of length|P(e)|is the smallest binary vector we can use to denote the case where only path P|P(e)|has successfully transmitted a probe. We modify the above binary vectors to vectors oflength|P(e)|with zeros added to the left-hand side. Thus, for the probes transmitted insubgraphGe, we have ℓe≥|P(e)|, and qe≥2|P(e)|. Although Theorem 1 provides lower bounds on probe size for the probes transmittedon diﬀerent subgraphs separately, some lower bounds may not be achieved. For example,in Fig. 2.2, we have ℓe6 ≥2 and ℓe7 ≥4 for subgraphsGe6 andGe7, respectively. However,there are overlapping links inGe6 andGe7 such as links e1, e2 and e3. In this case, a validprobe size should be 4. Let G denote a set of subgraphs with overlapping links. Theprobes transmitted on these subgraphs should have the same size. Let ℓG denote the sizeof such probes. Correspondingly, the set of end links in the subgraph set G is denotedbyER(G). The following proposition presents an improved lower bound on probe size forvalid probe coding schemes.Proposition 2 For the probes transmitted on subgraph setG, the probe size should satisfyℓG ≥maxe∈ER(G)|P(e)|, in order to obtain valid end-to-end observations.Chapter 2. The LANT Framework 25Figure 2.2: A directed acyclic graph with two end links e6 and e7.Proof: According to Theorem 1, for the probes transmitted in each subgraph Gewhere e ∈ ER(G), there exists a lower bound ℓe ≥ |P(e)|. Since network coding isapplied, the probes transmitted on one link should have the same size. Similarly, theprobes transmitted on one subgraph should also have the same size. Thus, for theprobes transmitted in the subgraphs with overlapping links, the lower bound on probesize is the maximum value of the lower bounds obtained from Theorem 1. That is,ℓG ≥maxe∈ER(G)|P(e)|. 2.2.2 Algorithms for Finding a Valid Probe Coding SchemeWe propose an approach to ﬁnd a valid probe coding scheme, such that the improvedlower bound obtained from Proposition 2 is achieved. The approach is divided into threeprocesses, described in this section.Chapter 2. The LANT Framework 26Constructing Auxiliary TreesFor each end link e = (h,r) ∈ER, we introduce an auxiliary tree topology Te. Thereis a one-to-many mapping from the nodes and the links in the original graph G to thenodes and the links in the auxiliary treeTe. We use u0(r) to denote the tree node thatcorresponds the root node r in graphG. We use ui(v) to denote the ith tree node thatcorresponds to the non-receiver node v in graphG. Similarly, we use εi(e) to denote theith tree link that corresponds to link e in graphG.Algorithm 1 shows how to construct the auxiliary trees corresponding to each endlink in set ER. For each end link e = (h,r)∈ER, nodes u0(r), u1(h) and link ε1(e) areﬁrst added intoTe (Step 2). We deﬁne a leaf node as a node only with outgoing links inTe. Node u0(r) is the destination node. The set of leaf nodesLe initially includes nodeu1(h) (Step 3). If there exists tree node uk(v)∈Le, where k∈{1,2,...,i}and v is nota source node inG, then along the incoming links of node v while ignoring the outgoinglinks, we ﬁnd a set of nodes. Corresponding to these nodes and the incoming links, newtree nodes and tree links are deﬁned and added intoTe (Step 7). The set of leaf nodesLeis updated in Steps 8 and 11. The counter i for the number of tree links inTe is updatedin Step 9.Selecting Coding CoeﬃcientsThe coding coeﬃcients are readily obtained based on the auxiliary trees. The non-receivernodes with multiple incoming links inGare the nodes that perform network coding. TheChapter 2. The LANT Framework 27Algorithm 1 Algorithm for constructing auxiliary trees. Assume graph G = (V,E) isgiven.1: for each end link e = (h,r)∈ER do2: Add nodes u0(r), u1(h) and link ε1(e) into auxiliary treeTe3: Initialize the set of leaf nodesLe←{u1(h)}4: i←25: while∃node uk(v)∈Le,k∈{1,2,...,i}and v /∈S do6: for each v′ :∃(v′,v)∈E do7: Add node ui(v′) and link εi(v′,v) intoTe8: UpdateLe←Le∪{ui(v′)}9: i←i +110: end for11: UpdateLe←Le\{uk(v)}12: end while13: end forcorresponding tree nodes would also have multiple incoming tree links. The remainingnodes inG perform either forwarding or multicasting.Algorithm2showshowtoselectcodingcoeﬃcients. Foreachnodeuk(v)inTe, supposeit has a set of t(uk(v)) incoming links {ε1in(e1in), ε2in(e2in), ..., εt(uk(v))in (et(uk(v))in )} and oneoutgoing link εout(eout). Then, node v has coding coeﬃcients [δ(e1in,eout), δ(e2in,eout), ...,δ(et(uk(v))in ,eout)]. Suppose tree links ε1in(e1in), ε2in(e2in), ..., εt(uk(v))in (et(uk(v))in ) have n1, n2,Chapter 2. The LANT Framework 28Algorithm 2 Algorithm for selecting coding coeﬃcients. Assume the auxiliary trees aregiven.1: for each auxiliary treeTe do2: for each node uk(v) inTe with outgoing link εout(eout) do3: n0←04: for each incoming link εiin(eiin) of node uk(v), i = 1,2,...,t(uk(v)) do5: Find its corresponding leaf-node setL(εiin(eiin))⊆Le6: ni←|L(εiin(eiin))|7: δ(eiin,eiout)←2n0+n1+···+ni 18: end for9: end for10: end for..., nt(uk(v)) corresponding leaf nodes, respectively. Then, we choose the values of codingcoeﬃcients as [20, 2n1, 2n1+n2, ..., 2n1+n2+···+nt(uk(v)) 1].Designing Probe PacketsThe path-link matrix M can be easily obtained based on the paths from the leaf nodesto the destination node in each auxiliary tree according to Algorithm 3. Now, we showhow to ﬁnd the sets of subgraphs with overlapping links. Each subgraph Ge originallyconstitutes a subgraph set {Ge}. We check each column of the path-link matrix M. Ifa column has multiple 1s and it also represents that diﬀerent subgraph sets include thesame link, we combine these subgraph sets as one subgraph set. Then, for each subgraphChapter 2. The LANT Framework 29setG with its end-link setER(G), according to Proposition 2, we calculate the tight lowerbound on probe size ℓG = maxe∈ER(G)|Le|. Thus, probes as (0···01)2 of length ℓG aresent from the sources to the outgoing links in G.Finally, for each path Pi ∈P, multiplying (0···01)2 of its corresponding length bythe coding coeﬃcients along path Pi, we can obtain the content of a received probe thatdenotes the case where only path Pi has successfully transmitted a probe. In this way,we can establish the mappings between the contents of received probes and the losseson the diﬀerent combinations of paths for each subgraph, and thus obtain a valid probecoding scheme.Example 2: We consider a directed acyclic graph G = (V,E) depicted in Fig. 2.2.We use Algorithms 1-2 to obtain a valid probe coding scheme with the minimum probesize. First, we construct two auxiliary treesTe6 andTe7, as depicted in Fig. 2.3. Second,we choose nodes 1 and 2 as the nodes that perform network coding, since they are thenon-receiver nodes with multiple incoming links. Based on the auxiliary treeTe6, we have|L(ε3(e1))|= 1 (Algorithm 2, Steps 5-6). Thus, some of the coding coeﬃcients of node 1are obtained as [δ(e1,e3),δ(e2,e3)] = [1,2] (Algorithm 2, Step 7). Similarly, based onTe7,we obtain the coding coeﬃcients of node 1 as [δ(e1,e4),δ(e2,e4)] = [1,2]. We also obtainthe coding coeﬃcients of node 2 as [δ(e4,e7),δ(e5,e7)] = [1,4], since|L(ε2(e4))|= 2.Chapter 2. The LANT Framework 30Algorithm 3 Algorithm for constructing path-link matrix. Assume the auxiliary treesare given.1: Initialize an empty path-link matrix M2: for each auxiliary treeTe do3: i←14: for each tree path Pi inTe do5: for each mi,j, j = 1,2,...,|E|do6: If∃εk(ej)∈Pi, mi,j = 1; otherwise, mi,j = 0.7: end for8: Row vector ω←(mi,j)1×|E|9: Update M←Mω10: i←i + 111: end for12: end forThird, the path-link matrix M can be obtained as follows:M =e1 e2 e3 e4 e5 e6 e71 0 1 0 0 1 00 1 1 0 0 1 01 0 0 1 0 0 10 1 0 1 0 0 11 0 1 0 1 0 10 1 1 0 1 0 1. (2.8)Chapter 2. The LANT Framework 31(a) Auxiliary treeTe6 (b) Auxiliary treeTe7Figure 2.3: Two auxiliary trees Te6 and Te7, corresponding to end links e6 and e7, re-spectively.The top two rows represents the two paths in subgraph Ge6, and the bottom four rowsrepresents the four paths in subgraph Ge7. Since link e1 is involved in both {Ge6} and{Ge7} (checking the ﬁrst column of M), we combine the two subgraph sets and obtainone subgraph set G = {Ge6,Ge7}. Counting the number of leaf nodes in each auxiliarytree, we obtain |P(e6)|= 2 and |P(e7)|= 4. Thus, ℓG = max{2,4}= 4 and probes as(0001)2 are sent from sources s1 and s2 to outgoing links e1 and e2, respectively. Chapter 2. The LANT Framework 322.3 Linear Algebraic ApproachAs described previously, the system in (2.7) has|P|equations with|EI∪EV|unknowns.However, |P| may be less than |EI ∪EV|, such as the topologies in Figs. 2.1 and 2.2,and thus ˆa in (2.7) cannot be uniquely determined. Even when|P|≥|EI ∪EV|, it doesnot ensure that ˆa can be determined. In this section, we propose a linear algebraic(LA) approach using the observations from coding operations. We show that ˆa can bedetermined using the method of least-squares [35]. Then, we combine the methods ofnormal equations and row selection with the LA approach and analyze the computationalcomplexity.2.3.1 Least-squares SolutionsBy inspecting the contents of the received coded probe packets at the destinations, wecan estimate not only the success rate of a single path, but also the success rate of aset of paths. As the main consequence of valid probe coding schemes, it enables us todistinguish between the paths that have contributed to a coded probe packet and thepaths that have not. This is unique to the networks which use probe coding schemes andcannot be achieved by routing probes in general.We denote the power set1 of P by P. Thus, |P| = 2|P|. Each element of P is asubset ofP, which can be used to represent a unique combination of paths. Let θi denote1The power set of a set is the set of all subsets of that set. For example, the power set of {a;b} is{∅;{a};{b};{a;b}}.Chapter 2. The LANT Framework 33the path set success rate of the ith path set in P\{∅}, which is the probability that abatch of probes can be successfully transmitted on all the paths in the ith path set. Wedeﬁne a path set success rate except for ∅∈P, because we require the probability thatat least one path can successfully transmit a probe to obtain an equation of link successrates and a path set success rate.Accordingly, we deﬁne a modiﬁed path-link matrix fM = (emi,j)(|P|−1)×|EI∪EV | as fol-lows: The element emi,j is equal to 1 if there exists a path in the ith path set in setP\{∅}which includes the jth link in setEI∪EV , and is equal to 0 otherwise. We referto fM as type 2 modiﬁed path-link matrix.The type 2 modiﬁed path-link matrix for the graph in Fig. 2.1 is shown below:fM =1 1 0 0 1 01 1 0 1 0 11 0 1 0 0 11 1 0 1 1 11 1 1 0 1 11 1 1 1 0 11 1 1 1 1 1. (2.9)Each of the top three rows represents the path set that includes only one path.We deﬁne a column vector, c = (ci)(|P|−1)×1, where ci = logθi. The column vector ais deﬁned as in Section 2.1. Thus, we have a linear system|EI∪EV |∑j=1emi,jaj = ci, i = 1,...,|P|−1 (2.10)Chapter 2. The LANT Framework 34or in the matrix form asfMa = c. (2.11)For each path set in P\{∅}, the n probe batches sent from the sources can beconsidered as a binomial experiment consisting of n trials. The associated binomialrandom variable Xi is deﬁned as the number of received coded probes (or probe batchesfor more than one incoming links) whose contents represent that all the paths in the ithpath set have successfully transmitted a probe packet.The sample proportion ˆθi = Xi/n is a maximum likelihood (ML) estimator of θi[36] (or an ML estimate resulting from end-to-end measurement xi substituted in theplace of Xi). Accordingly, we can obtain ˆc, the estimator of c. The column vector ˆaremains unknown. Thus, we extend (2.7) to a system of|P|−1 equations with|EI∪EV|unknowns, as follows:fMˆa = ˆc. (2.12)The linear system in (2.12) has more equations than unknowns, i.e.,|P|−1≥|EI∪EV|. (2.13)This is because for every pair of links e,e′ ∈EI ∪EV ,P(e) is diﬀerent fromP(e′), while|P|paths can have at most 2|P|−1 diﬀerent combinations of paths. The inequality (2.13)is a necessary condition for ˆa to be determined.To show that ˆa in (2.12) can be determined by the least-squares, we introduce a(|P|−1)×(|P|−1) auxiliary matrix M(|P|) of a type 2 modiﬁed path-link matrixChapter 2. The LANT Framework 35fM with an end-to-end path set P. Compared to fM, M(|P|) has additional columnvectors and has|P|−1 column vectors in total. The|P|−1 column vectors in the top|P|×(|P|−1) submatrix can represent all non-zero vectors in the vector space F|P|2 . Thebottom part of the additional columns are obtained according to the relation betweenthe paths and path sets. An example ofM(3) for fM in (2.9) is as follows:M(3) =1 1 0 0 1 0 11 1 0 1 0 1 01 0 1 0 0 1 11 1 0 1 1 1 11 1 1 0 1 1 11 1 1 1 0 1 11 1 1 1 1 1 1. (2.14)Lemma 1 Let M(|P|) be an auxiliary matrix of a type 2 modiﬁed path-link matrix fMwith an end-to-end path set P. Then, rank(M(|P|)) = 2|P|−1, i.e., all 2|P|−1 columnvectors inM(|P|) are linearly independent.Proof: We prove it by induction. We mention that the matrix M(|P|) has binaryentries and the column vectors are deﬁned in a vector space over a ﬁnite ﬁeld F2. It canbe veriﬁed thatM(2) has full rank. Assume that matrixM(k) also has full rank. Thatis, all 2k−1 columns inM(k) are linearly independent. Thus, the modulo 2 summationof any m columns of this matrix, for m = 2,...,2k−1, has at least one non-zero entry.Now, consider matrixM(k+1). This matrix can be represented as follows after row andChapter 2. The LANT Framework 36column permutations:M(k + 1) =0 ··· 0 1 1 ··· 10M(k) ... M(k)01 1 ··· 1M(k) ... ... ... ...1 1 ··· 1Permutations would not change its rank. The top row represents the newly added path,followed by two submatricesM1 = [M(k) 0M(k)] andM2 = [M(k) 1], where 0 and 1are columns of 0 and 1, respectively. We note that the path sets of M1 (rows in M1)do not include the new added path, while those ofM2 all include it. Now, we show thatthe matrixM(k+1) has full rank. To do so, we show that the summation of all possiblecombinations of these 2k+1−1 columns inM(k+1) is a non-zero vector (i.e., there existsat least one non-zero entry in the summation vector).First, the middle column [1 01]T is included in the combination of the columns thatwe choose. Since the entries of the last row in M(k) are all ones, in the summation ofthe chosen vectors, at least one entry would be non-zero. This entry corresponds to thelast row inM1 or inM2. From now on, we exclude the middle column from our choices.Second, we choose the columns from either the 2k−1 columns on the left-hand sideor the 2k−1 columns on the right-hand side (but not both at the same time). In thisChapter 2. The LANT Framework 37case, at least one entry in the summation vector would be non-zero corresponding to therows inM1. It is because of the linear independency of the columns inM(k).Third, we choose the columns from both the 2k−1 columns on the left-hand side andon the right-hand side. In this case, if an odd number of columns is chosen from the theright-hand side, the entry of the summation vector corresponding to the top row wouldbe non-zero. However, if an even number of columns is chosen from the right-hand side,at least one entry of the summation vector corresponding to the rows in M2 would benon-zero, because of the linear independency of the columns in M(k). To this end, wehave considered the modulo 2 summation for all possible combinations of the columnsin matrix M(k + 1), and there is always at least one non-zero entry in the summationvector. Therefore, all these 2k+1−1 column vectors inM(k+1) are linearly independent. With Lemma 1, the following theorem gives the rank of a type 2 modiﬁed path-linkmatrix.Theorem 2 Let a directed acyclic graph G = (V,E) be given with a system of linearequations in matrix form fMˆa = ˆc. Then, rank(fM) =|EI∪EV|.Proof: LetM(|P|) be an auxiliary matrix of fM. The|EI∪EV|column vectors in fMare among the 2|P|−1 column vectors in M(|P|). From Lemma 1, all 2|P|−1 columnvectors inM(|P|) are linearly independent. Thus, these|EI ∪EV|column vectors in fMare also linearly independent. As a result, rank(fM) =|EI∪EV|. Corollary 1 Let fMˆa = ˆc be given. Then, ˆa can be determined by least-squares.Chapter 2. The LANT Framework 38Proof: When the number of equations is equal to the number of unknowns, i.e.,|P|−1 = |EI ∪EV|, fM is a square matrix. Theorem 2 ensures that fM is invertible.Thus, ˆa can be determined asˆa = fM−1ˆc. (2.15)When the number of equations is greater than the number of unknowns, i.e.,|P|−1 >|EI ∪EV|, the system is over-determined. We can apply least-squares [35] to obtain anapproximate solution which minimizes the residual error∥ˆc−fMˆa∥. Theorem 2 ensuresthat fMTfM is invertible. Thus, ˆa can be determined asˆa = (fMTfM)−1fMTˆc. (2.16)We note that (2.15) is a special case of (2.16). Corollary1givesananalyticalsolutionof ˆausingleast-squares. Thefollowingtheoremdemonstrates the consistency of the corresponding estimators.Theorem 3 1−ˆαj is a consistent estimator of 1−αj.Proof: For each ej ∈EI ∪EV , 1− ˆαj is a function of ˆαj, while ˆαj is a function ofˆθ1, ˆθ2,..., ˆθ|P|−1. Since ˆθi p−→θi, the continuous mapping theorem and Slutsky’s theorem[36] yield that 1−ˆαj p−→1−αj, where p−→denotes convergence in probability. Proposition 3 The loss rate of a link can be consistently estimated if and only if thelink is an identiﬁable link.Proof: Theorem 3 shows that the loss rates of all identiﬁable links can be consistentlyestimated by the estimators. In addition, Proposition 1 shows that if the loss rate of aChapter 2. The LANT Framework 39link can be estimated, then the link is identiﬁable. These together prove this proposition. Although the loss rate of the links which are not identiﬁable cannot be consistentlyestimated, we at least can obtain an upper bound on the loss rate of them, which is theloss rate of the corresponding virtual link.2.3.2 Method of Normal EquationsAlgorithm 4 Method of Normal Equations [37]1: Calculate the symmetric matrix fMTfM2: Calculate Cholesky decomposition fMTfM = LLT3: Calculate d←fMTˆc4: Use forward substitution to solve Ly = d for y5: Use back substitution to solve LTˆa = y for ˆaThe most common technique to solve a full rank least-squares problem is the methodof normal equations [37]. We deﬁne µ = |P|−1 and ν = |EI ∪EV|, so that fM isa µ×ν matrix. The ﬁrst step in the method of normal equations is to calculate thesymmetric matrix (i.e., fMTfM). This requires µν2 ﬂops2. The second step is to calculatethe Cholesky decomposition fMTfM = LLT requiring ν3/3 ﬂops. The third step is tocalculate fMTˆc requiring 2µν ﬂops. The fourth and ﬁfth steps are to solve Ly = fMTˆc for2A op is a oating point operation. Flop count is useful as a rough estimate of complexity andpredictor of computational time on modern computers.Chapter 2. The LANT Framework 40y using forward substitution and to solve LTˆa = fMTˆc for ˆa using back substitution, eachof which requires ν2 ﬂops. Considering µ≥ν by (2.13), the complexity of this methodisO(µν2).Although the ﬁrst step in the method of normal equations includes the dominantterm of the complexity, it only needs to be executed once for initial setup as long as thenetwork topology remains unchanged. We need to obtain ˆc before we can calculate fMTˆcand perform forward/back substitutions (Steps 3-5). The complexity in calculating ˆc isO(µn), where n is the number of probe batches. This step and Steps 3-5 can be repeatedκ times in a monitoring period. Thus, the LA approach using the method of normalequations has a complexity ofO(µν2 + µnκ + µνκ) in practice.2.3.3 Method of Row SelectionSince µ = 2|P|−1, the number of path sets µ grows exponentially as|P|increases. As aresult, the method of least-squares would lack scalability and thus have high complexity.Nonetheless, according to Theorem 2, rank(fM) = ν. This means there exist ν linearlyindependent path sets out of µ path sets which are suﬃcient to determine ˆa.To select ν linearly independent path sets, we modify the row selection algorithmproposed in [30], and obtain a reduced linear system as below:fM1ˆa = ˆc1, (2.17)where fM1∈{0,1}ν×ν and ˆc1∈Rν consists of ν rows of fM and ˆc, respectively. Algorithm5 shows the modiﬁed row (path set) selection algorithm. This algorithm incrementallyChapter 2. The LANT Framework 41Algorithm 5 Modiﬁed Row (Path Set) Selection Algorithm1: Initialize fM1←the ﬁrst row in fM2: Initialize R by calculating the thin QR factorization of fMT13: while fM1 is not a square matrix do4: ω←next row in fM5: ˆR12←R−TfM1ωT6: ˆR22←∥ω∥2−∥ˆR12∥27: if ˆR22̸= 0 then8: Update R←R ˆR120 ˆR229: Update fM1←fM1ω10: end if11: end whilebuilds a QR factorization fMT1 = QR, where Q ∈ Rν×ν is an orthogonal matrix andR ∈ Rν×ν is an upper triangular matrix. It only needs to be executed once for initialsetup with a complexity ofO(µν2).The complexity of calculating ˆc1 is reduced toO(νn). Then, we calculate ˆa = fMT1 z,where z = R−1(R−1)Tˆc1, whose complexity is O(ν2). We repeat the above steps for κtimes in a monitoring period. Thus, the LA approach using the method of row selectionhas a lower complexity ofO(µν2 + νnκ + ν2κ) in practice.42Chapter 3Performance EvaluationIn this chapter, we access the performance of the LANT framework by simulations.3.1 Simulation SetupFor the network topology, we ﬁrst consider the Internet2 network map [38], which is ahigh-performance backbone network created by the Internet2 community. The topologyis modiﬁed as the one used in [24], consisting of 10 nodes and 15 links. We apply theorientation algorithm [24] that converts the modiﬁed topology with selected sources tothree directed acyclic graphs with diﬀerent number of sources in Fig. 3.1, where all linksare identiﬁable. Destinations are determined by the orientation algorithm.To consider larger networks, we use BRITE [39] to generate three router-level undi-rected network topologies with Waxman model. BRITE is a universal topology generaterwhich improves the state of the art and is based on design principles which include repre-sentativeness, inclusiveness, and inter-operability. Representativeness leads to synthetictopologies that accurately reﬂect many aspects of the actual Internet topology (e.g. hier-archical structure, degree distribution, etc.). Inclusiveness combines the strengths of asChapter 3. Performance Evaluation 43(a) One source.(b) Two sources.(c) Three sources.Figure 3.1: Directed acyclic graphs with diﬀerent number of sources. (a) One source(node 1) and one receiver (node 9); (b) two sources (nodes 1 and 9) andone receiver (node 7); (c) three sources (nodes 1, 4 and 10) and one receiver(node 9).Chapter 3. Performance Evaluation 44many generation models as possible in a single generation tool. Inter-operability providesinterfaces to widely-used simulation applications such as ns, SSF and OmNet++ as wellas visualization applications. In the three large network topologies generated by BRITE,the number of nodes are chosen as 20, 100, and 500. The number of links is twice thenumber of nodes in each topology.A random link loss rate 1−αj is assigned to the jth link ej ∈EI ∪EV , where thelink success rate αj is uniformly distributed within [αave−0.05,αave + 0.05]. The valueof αave is chosen as 0.7, 0.75, 0.8, 0.85, 0.9, and 0.95, to adjust the average success rateacross all links. After assigning each link a loss rate, we send n batches of probe packets.Each probe traversing a link is dropped at a ﬁxed probability as the link loss rate.In each simulation, we obtain an estimate 1−ˆαj of the actual link loss rate 1−αj forthe jth link in setEI∪EV . The root mean square error (RMSE) is used to determine theestimation accuracy across all identiﬁable links and virtual links. The RMSE is computedasRMSE =|EI∪EV |∑j=1|αj−ˆαj|2|EI∪EV|1/2. (3.1)We brieﬂy summarize the belief propagation (BP) algorithm [26] and compare ourLANT framework with the BP algorithm for loss rate inference through simulations inSection 3.2. Following the approach in [26], the ﬁrst step is to create the factor graphfrom the original graph. The factor graph is a bipartite graph: on one side there are thelinks (variable nodes), whose loss rates we aim to estimate; on the other side there arethe paths (function nodes) that are observed by each received probe. An edge exists inChapter 3. Performance Evaluation 45the factor graph between a link and a path if the link belongs to this path in the originalgraph. We note that, unlike tree topologies considered in [26], in general topologies theremight exist multiple paths for every source-receiver pair. The second step is to performbelief propagation. Each received probe triggers message passing in the factor graph andresults in an estimate of link loss probabilities. Then, the estimates from diﬀerent probesare combined, using standard methods [26], to obtain an estimate 1−ˆαj of the actuallink loss rate 1−αj for the jth link ej ∈EI∪EV .The results are averaged over 100 simulations to eliminate possible random eﬀects,where each simulation has new loss rate assignments and new loss processes.3.2 Simulation ResultsFirst, we investigate the inﬂuence of diﬀerent methods adopted by LA approach onthe estimation accuracy, based on the graph with one source in Fig. 3.1(a). The type 2modiﬁed path-link matrixfM has 127 rows, and we apply the method of normal equations.This case is denoted by LA-NE. Alternatively, we use Algorithm 5 to build a squarematrix fM1, where each row (path set) includes 1 or 2 paths. This case is denoted byLA-RS. Fig. 3.2 shows the RMSE of LA approach using the two methods, as a functionof the number of probe batches. We observe that the RMSE of the LA-NE algorithmis lower than that of the LA-RS algorithm when n = 50. Such behavior is reasonablesince the LA-NE algorithm uses more equations to obtain link loss rates than the LA-RSalgorithm. However, the diﬀerence of the RMSE is less than 2% and it vanishes as theChapter 3. Performance Evaluation 46102 103 10400.020.040.060.080.10.120.140.16nRMSELA−NE, αave = 0.8LA−RS, αave = 0.8LA−NE, αave = 0.85LA−RS, αave = 0.85LA−NE, αave = 0.9LA−RS, αave = 0.9Figure 3.2: The RMSE of LA using the methods of normal equations (LA-NE) and rowselection (LA-RS), versus the number of probe batches n.number of probes increases. Therefore, for large number of probes, the performance ofthe LA-RS algorithm and the LA-NE algorithm is similar while the LA-RS algorithmoutperforms the LA-NE algorithm in terms of the computational complexity.Second, we compare the estimation accuracy of the BP algorithm and the LA-RSalgorithm, based on the graph with one source in Fig. 3.1(a). Fig. 3.3 shows the RMSEas a function of the number of probe batches, for diﬀerent average link success rates.We observe that the BP algorithm has better accuracy when n < 400, and the LA-RS algorithm achieves better accuracy, after sending reasonably suﬃcient probe batches(n > 400). This is because the LA-RS algorithm exploits the losses on the diﬀerentcombinations of paths, while the BP algorithm only utilizes the losses on paths. Fig. 3.4Chapter 3. Performance Evaluation 47102 103 10400.020.040.060.080.10.120.140.16nRMSE BP, αave = 0.8 BP, αave = 0.85 BP, αave = 0.9LA−RS, αave = 0.8LA−RS, αave = 0.85LA−RS, αave = 0.9Figure 3.3: The RMSE of the BP algorithm and the LA-RS algorithm, versus the num-ber of probe batches n, for diﬀerent average success rate αave.0.7 0.75 0.8 0.85 0.9 0.9500.020.040.060.080.10.12αaveRMSE BPLA−RSFigure 3.4: The RMSE of the BP algorithm and the LA-RS algorithm, versus the av-erage success rate αave (n = 500).Chapter 3. Performance Evaluation 48102 103 10400.020.040.060.080.10.120.140.16nRMSEOne sourceTwo sourcesThree sourcesFigure 3.5: The RMSE of the LA-RS algorithm, versus the number of probe batches n,for diﬀerent number of sources (αave = 0.8).shows the RMSE as a function of the average link success rate with 500 probe batches.The RMSE decreases as the average link success rate increases, which is consistent withthe relative position of the curves in Fig. 3.3. Based on these two graphs, we can predictthat when average loss rates are lower than 0.8, the BP algorithm would perform worse(RMSE > 6%, n = 20,000), while the LA-RS algorithm would still achieve satisfactoryaccuracy (RMSE < 1%, n = 20,000).Third, for the networks with diﬀerent number of sources in Fig. 3.1, Fig. 3.5 showsthe RMSE as a function of the number of probe batches, and Fig. 3.6 shows the RMSEas a function of the average success rate. We compare the relative position of the threecurves in Figs. 3.5 and 3.6, and obtain the following observation: The graph with moreChapter 3. Performance Evaluation 490.7 0.75 0.8 0.85 0.9 0.950.010.020.030.040.050.060.070.08αaveRMSEOne sourceTwo sourcesThree sourcesFigure 3.6: The RMSE of the LA-RS algorithm, versus the average success rate αave,for diﬀerent number of sources (n = 500).sources achieves better estimation accuracy. However, the improvement of estimationaccuracy is negligible with relatively large success rates or suﬃcient probe batches. Inthis case, we can use a small number of sources and ﬂexibly choose their locations.Finally, we investigate the performance of the LA-RS algorithm in three larger net-works generated by BRITE. We randomly pick a part of nodes as source nodes in eachnetwork for 10 times. We pick 4 source nodes for the 20-node network, and 20 sourcenodes for the 100-node and the 500-node network. The orientation algorithm is appliedto obtain directed acyclic graphs. There are 68.25% identiﬁable links on average in thedirected graph for the 20-node network (40 links), 63.7% identiﬁable links on average inthe the directed graph for 100-node network (200 links), and 26.17% identiﬁable linksChapter 3. Performance Evaluation 50102 103 10400.020.040.060.080.10.120.14nRMSE500 nodes100 nodes 20 nodesFigure 3.7: The RMSE of the LA-RS algorithm, versus the number of probe batches n,for networks of diﬀerent sizes (αave = 0.9).on average in the directed graph for the 500-node network (1,000 links). Although theeﬀect of the number and location of sources on the accuracy can be negligible with rel-atively large success rates or suﬃcient probe batches, diﬀerent number and location ofsources may result in diﬀerent number of identiﬁable and virtual links. Fig. 3.7 shows theRMSE as a function of the number of probe batches for the networks of diﬀerent sizes.As expected, the LA-RS algorithm still achieves satisfactory accuracy (RMSE < 1%,n = 20,000), while more probe batches are needed to achieve the same accuracy in largernetworks.51Chapter 4Conclusions and Future Work4.1 ConclusionsIn this thesis, we studied the problem of link loss tomography on mesh topologies usingnetwork coding. We ﬁrst provided an overview of the area of network tomography incommunication networks. Accurate and eﬃcient measurement of network-internal char-acteristics is critical for management and maintenance of large-scale networks. Recentlythere have been attempts to apply network coding in loss tomography in order to increasebandwidth eﬃciency. We introduced diﬀerent inference techniques in the related worksand explained the various performance bottlenecks such as (1) low bandwidth eﬃciency,(2) high monitoring cost, (3) estimation not being always accurate, and (4) requiringadditional assumptions.Then, we proposed a linear algebraic network tomography (LANT) framework for ac-tive inference of link loss rates. We ﬁrst established a tight lower bound on the probe sizefor valid end-to-end observations when network coding is applied. Then, we developedalgorithms to ﬁnd a valid probe coding scheme, such that the lower bound on probe size isalways achieved. Furthermore, we proposed a linear algebraic (LA) approach and devel-Chapter 4. Conclusions and Future Work 52oped consistent estimators of link loss rates. We also demonstrated that the complexityof LA using the method of row selection is lower than that using the method of normalequations. Using our LANT framework, the identiﬁability of a link is the necessary andsuﬃcient condition for its consistent loss estimation.We investigated the performance of the LANT framework under diﬀerent simulationscenarios. Results showed that, for large number of probes, the performance of the LA-RSalgorithm and the LA-NE algorithm is similar while the LA-RS algorithm outperformsthe LA-NE algorithm in terms of the computational complexity. Moreover, the LA-RSalgorithm achieves better estimation accuracy than the belief propagation (BP) algo-rithm when the estimators converge. Although the eﬀect of the number and location ofsources on the accuracy can be negligible with relatively large success rates or suﬃcientprobe batches, diﬀerent number and location of sources may result in diﬀerent numberof identiﬁable and virtual links.4.2 Future WorkOne possible extension of this work is to minimize the number of nodes performingnetwork coding. In the current work, we choose all non-receiver nodes with multipleincoming links as the nodes that perform network coding. In this way, we can easilyobtain a valid coding scheme. One possible solution is to choose the nodes next to thesource nodes, since these nodes performing network coding can suﬃciently distinguishdiﬀerent information ﬂows, such that the intermediate nodes do not need to performChapter 4. Conclusions and Future Work 53network coding.Another extension is design an algorithm to choose the number and location of thesource and receiver nodes, such that all the links in the given network are identiﬁable.In the current work, we use the orientation algorithm [24] with selected sources to ﬁnd adirected acyclic graph. Using this algorithm, we cannot control the number and locationof the receiver nodes. To solve this problem, a new algorithm to ﬁnd a directed acyclicgraph with selected sources and receivers is necessary.54Bibliography[1] R. Castro, M. Coates, G. Liang, R. Nowak, and B. Yu, “Network tomography:Recent developments,” Statistical Science, vol. 19, no. 3, pp. 499–517, Aug. 2004.[2] Y. Yardi, “Network tomography: Estimating source-destination traﬃc intensitiesfrom link data,” J. Amer. Statist. Assoc., vol. 91, no. 433, pp. 365–377, March 1996.[3] Y. Tsang, M. Coates, and R. Nowak, “Network delay tomography,” IEEE Trans.Acoust., Speech, Signal Processing, vol. 51, no. 8, pp. 2125–2136, Aug. 2003.[4] A. Chen, J. Cao, and T. Bu, “Network tomography: Identiﬁability and Fourierdomain estimation,” in Proc. of IEEE INFOCOM, Anchorage, AK, May 2007.[5] G. Sharma, S. Jaggi, and B. Dey, “Network tomography via network coding,” inProc. of Information Theory and Applications Workshop, San Diego, CA, Jan. 2008.[6] P. Sattari, A. Markopoulou, and C. Fragouli, “Multiple source multiple destinationtopology inference using network coding,” in Proc. of NetCod Workshop, Lausanne,Switzerland, June 2009.Bibliography 55[7] R. Caceres, N. Duﬃeld, J. Horowitz, and D. Towsley, “Multicast-based inference ofnetwork-internal loss characteristics,” IEEE Trans. Inform. Theory, vol. 45, no. 7,pp. 2462–2480, Nov. 1999.[8] N. Duﬃeld, J. Horowitz, F. LoPresti, and D. Towsley, “Multicast topology inferencefrom measured end-to-end loss,” IEEE Trans. Inform. Theory, vol. 48, no. 1, pp.26–45, Jan. 2002.[9] N. Duﬃeld, “Network tomography of binary network performance characteristics,”IEEE Trans. Inform. Theory, vol. 52, no. 12, pp. 5373–5388, Dec. 2006.[10] Y. Tsang, M. Coates, and R. Nowak, “Passive unicast network tomography usingEM algorithm,” in Proc. of IEEE Int’l Conf. Acoust., Speech, Signal Processing, SaltLake City, UT, May 2001.[11] J. Zhao, R. Govindan, and D. Estrin, “Computing aggregates for monitoring wirelesssensor networks,” in Proc. of IEEE Int’l Workshop on Sensor Network Protocols andApplications, Anchorage, AK, May 2003.[12] H. Nguyen and P. Thiran, “Using end-to-end data to infer lossy links in sensornetworks,” in Proc. of IEEE INFOCOM, Barcelona, Spain, Apr. 2006.[13] Y. Lin, B. Liang, and B. Li, “Passive loss inference in wireless sensor networks basedon network coding,” in Proc. of IEEE INFOCOM, Rio de Janeiro, Brazil, Apr. 2009.Bibliography 56[14] R. Ahlswede, N. Cai, S. Li, and R. Yeung, “Network information ﬂow,” IEEE Trans.Inform. Theory, vol. 46, no. 4, pp. 1204–1216, July 2000.[15] R. Koetter and M. Medard, “Beyond routing: An algebraic approach to networkcoding,” in Proc. of IEEE INFOCOM, New York, NY, Nov. 2002.[16] T. Ho and D. Lun, Network Coding: An Introduction. Cambridge University Press,2008.[17] C. Fragouli, J. LeBoudec, and J. Widmer, “Network coding: An instant primer,”ACM SIGCOMM Computer Communication Review, vol. 36, no. 1, pp. 63–68, Jan.2006.[18] C. Fragouli, A. Markopoulou, R. Srinivasan, and S. Diggavi, “Network monitoring:It depends on your points of view,” in Proc. of Information Theory and ApplicationsWorkshop, San Diego, CA, Jan. 2007.[19] W. Zhu and Z. Geng, “A bottom-up inference of loss rate,” Computer Communica-tions, vol. 28, no. 4, pp. 351–365, Mar. 2005.[20] N. Duﬃeld, F. LoPresti, V. Paxson, and D. Towsley, “Inferring link loss using stripedunicast probes,” in Proc. of IEEE INFOCOM, Anchorage, AK, Apr. 2001.[21] V. Padmanabhan, L. Qiu, and H. Wang, “Passive network tomography usingbayesian inference,” in Proc. of the 2nd ACM SIGCOMM Workshop on InternetMeasurement, Marseille, France, Nov. 2002.Bibliography 57[22] T. Bu, N. Duﬃeld, F. LoPresti, and D. Towsley, “Network tomography on generaltopologies,” in Proc. of ACM SIGMETRICS, Marina Del Rey, CA, June 2002.[23] C. Fragouli and A. Markopoulou, “A network coding approach to network tomog-raphy,” in Proc. of 43rd Allerton Conf., Monticello, IL, Sept. 2005.[24] M. Gjoka, C. Fragouli, P. Sattari, and A. Markopoulou, “Loss tomography in generaltopologies with network coding,” in Proc. of IEEE GLOBECOM, Washington, DC,Nov. 2007.[25] H. Yao, S. Jaggi, and M. Chen, “Network coding tomography for network failures,”in Proc. of IEEE Infocom (Mini-Conference), San Diego, CA, March 2010.[26] Y. Mao, F. Kschischang, B. Li, and S. Pasupathy, “A factor graph approach tolink loss monitoring in wireless sensor networks,” IEEE J. Select. Areas Commun.,vol. 23, no. 4, pp. 820–829, Apr. 2005.[27] F. Kschischang, B. Frey, and H. Loeliger, “Factor graphs and the sum-product al-gorithm,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 498–519, Feb. 2001.[28] Y. Zhao, Y. Chen, and D. Bindel, “Towards unbiased end-to-end network diagnosis,”IEEE/ACM Trans. Networking, vol. 17, no. 5, pp. 1724–1737, Dec. 2009.[29] B. Sun and Z. Zhang, “Probabilistic diagnosis of link loss using end-to-end pathmeasurements and maximum likelihood estimation,” in Proc. of IEEE ICC, Dresden,Germany, June 2009.Bibliography 58[30] Y. Chen, D. Bindel, H. Song, and R. Katz, “Algebra-based scalable overlay net-work monitoring: Algorithms, evaluation, and applications,” IEEE/ACM Trans.Networking, vol. 15, no. 5, pp. 1084–1097, Oct. 2007.[31] M. Coates and R. Nowak, “Network loss inference using unicast end-to-end mea-surement,” in Proc. of ITC Seminar on IP Traﬃc, Measurement and Modelling,Monterey, CA, Sept. 2000.[32] H. Su, W. Chen, S. Lin, D. Jin, and L. Zeng, “The inference of link loss rates withinternal monitors,” in Proc. of IEEE GLOBECOM, New Orleans, LA, Dec. 2008.[33] J. Gui, V. Shah-Mansouri, and V. Wong, “A linear algebraic approach for losstomography in mesh topologies using network coding,” in Proc. of IEEE ICC, CapeTown, South Africa, May. 2010.[34] ——, “Accurate and eﬃcient network tomography via network coding,” submittedto IEEE Trans. on Vehicular Technology, 2010.[35] R. Myers, D. Montgomery, and G. Vining, Generalized Linear Models: With Appli-cations in Engineering and the Sciences. John Wiley & Sons, Inc., 2001.[36] G. Grimmett and D. Stirzaker, Probability and Random Processes, 3rd ed. OxfordUniversity Press, 2001.[37] G. Golub and C. Loan, Matrix Computations, 3rd ed. The Johns Hopkins UniversityPress, 1996.Bibliography 59[38] “Internet2 network,” http://www.internet2.edu/network/.[39] “BRITE,” www.cs.bu.edu/brite/.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Accurate and efficient network monitoring on mesh topologies...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Accurate and efficient network monitoring on mesh topologies via network coding Gui, Jiaqi 2010-06-21
pdf
Page Metadata
Item Metadata
Title | Accurate and efficient network monitoring on mesh topologies via network coding |
Creator |
Gui, Jiaqi |
Publisher | University of British Columbia |
Date Issued | 2010 |
Description | Accurate and efficient measurement of network-internal characteristics is critical for management and maintenance of large-scale networks. In this thesis, we propose a linear algebraic network tomography (LANT) framework for active inference of link loss rates on mesh topologies via network coding. Probe packets are transmitted from the sources to the destinations along a set of paths. Intermediate nodes linearly combine the received probes and transmit the coded probes using pre-determined coding coefficients. Although a smaller probe size can reduce the bandwidth usage of the network, the inference framework is not valid if the probe size falls below a certain threshold. To this end, we establish a tight lower bound on probe size which is necessary for establishing the mappings between the contents of the received probes and the losses on the different sets of paths. Then, we develop algorithms to find the coding coefficients such that the lower bound on probe size is achieved. Furthermore, we propose a linear algebraic approach to developing consistent estimators of link loss rates, which converge to the actual loss rates as the number of probes increases. We show that using the LANT framework, the identifiability of a link, which only depends on the network topology, is a necessary and sufficient condition for the consistent estimation of its loss rate. Simulation results show that the LANT framework achieves better estimation accuracy than the belief propagation (BP) algorithm for large number of probe packets. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-06-21 |
Provider | Vancouver : University of British Columbia Library |
DOI | 10.14288/1.0071010 |
URI | http://hdl.handle.net/2429/25924 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2010-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2010_fall_gui_jiaqi.pdf [ 553.48kB ]
- Metadata
- JSON: 24-1.0071010.json
- JSON-LD: 24-1.0071010-ld.json
- RDF/XML (Pretty): 24-1.0071010-rdf.xml
- RDF/JSON: 24-1.0071010-rdf.json
- Turtle: 24-1.0071010-turtle.txt
- N-Triples: 24-1.0071010-rdf-ntriples.txt
- Original Record: 24-1.0071010-source.json
- Full Text
- 24-1.0071010-fulltext.txt
- Citation
- 24-1.0071010.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0071010/manifest