ACCURATE AND EFFICIENT NETWORK MONITORING ON MESH TOPOLOGIES VIA NETWORK CODING by JIAQI GUI B.E., Information Engineering, Shanghai Jiao Tong University, China, 2008 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) June 2010 © Jiaqi Gui, 2010 ii Abstract Accurate and eﬃcient measurement of network-internal characteristics is critical for management and maintenance of large-scale networks. In this thesis, we propose a linear algebraic network tomography (LANT) framework for active inference of link loss rates on mesh topologies via network coding. Probe packets are transmitted from the sources to the destinations along a set of paths. Intermediate nodes linearly combine the received probes and transmit the coded probes using pre-determined coding coeﬃcients. Although a smaller probe size can reduce the bandwidth usage of the network, the inference framework is not valid if the probe size falls below a certain threshold. To this end, we establish a tight lower bound on probe size which is necessary for establishing the mappings between the contents of the received probes and the losses on the diﬀerent sets of paths. Then, we develop algorithms to ﬁnd the coding coeﬃcients such that the lower bound on probe size is achieved. Furthermore, we propose a linear algebraic approach to developing consistent estimators of link loss rates, which converge to the actual loss rates as the number of probes increases. We show that using the LANT framework, the identiﬁability of a link, which only depends on the network topology, is a necessary and suﬃcient condition for the consistent estimation of its loss rate. Simulation results show Abstract iii that the LANT framework achieves better estimation accuracy than the belief propagation (BP) algorithm for large number of probe packets. iv Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Network Coding Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Motivations and Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2 The LANT Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Contents v 2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Probe Coding Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.1 Lower Bound on Probe Size . . . . . . . . . . . . . . . . . . . . . 22 2.2.2 Algorithms for Finding a Valid Probe Coding Scheme . . . . . . . 25 Constructing Auxiliary Trees . . . . . . . . . . . . . . . . . . . . 26 Selecting Coding Coeﬃcients . . . . . . . . . . . . . . . . . . . . . 26 Designing Probe Packets . . . . . . . . . . . . . . . . . . . . . . . 28 2.3 Linear Algebraic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.1 Least-squares Solutions . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.2 Method of Normal Equations . . . . . . . . . . . . . . . . . . . . 39 2.3.3 Method of Row Selection . . . . . . . . . . . . . . . . . . . . . . . 40 3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 vi List of Figures 1.1 The Butterﬂy network. Sources s1 and s2 multicast information x1 and x2 to receivers r1 and r2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 A directed acyclic graph with V = {s, r, 1, 2, 3, 4} and E = {e1 , e2 , . . . , e7 }. The set of monitored end-to-end paths P = {P1 , P2 , P3 }, where P1 = {e1 , e2 , e5 , e7 }, P2 = {e1 , e2 , e4 , e6 , e7 }, and P3 = {e1 , e3 , e6 , e7 }. For link e2 = (1, 2), we have P(e2 ) = {P1 , P2 }. . . . . . . . . . . . . . . . . . . . . 18 2.2 A directed acyclic graph with two end links e6 and e7 . . . . . . . . . . . . 25 2.3 Two auxiliary trees Te6 and Te7 , corresponding to end links e6 and e7 , respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1 Directed acyclic graphs with diﬀerent number of sources. (a) One source (node 1) and one receiver (node 9); (b) two sources (nodes 1 and 9) and one receiver (node 7); (c) three sources (nodes 1, 4 and 10) and one receiver (node 9). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2 The RMSE of LA using the methods of normal equations (LA-NE) and row selection (LA-RS), versus the number of probe batches n. . . . . . . 46 List of Figures vii 3.3 The RMSE of the BP algorithm and the LA-RS algorithm, versus the number of probe batches n, for diﬀerent average success rate αave . . . . . 47 3.4 The RMSE of the BP algorithm and the LA-RS algorithm, versus the average success rate αave (n = 500). . . . . . . . . . . . . . . . . . . . . . 47 3.5 The RMSE of the LA-RS algorithm, versus the number of probe batches n, for diﬀerent number of sources (αave = 0.8). . . . . . . . . . . . . . . . 48 3.6 The RMSE of the LA-RS algorithm, versus the average success rate αave , for diﬀerent number of sources (n = 500). . . . . . . . . . . . . . . . . . . 49 3.7 The RMSE of the LA-RS algorithm, versus the number of probe batches n, for networks of diﬀerent sizes (αave = 0.9). . . . . . . . . . . . . . . . . 50 viii List of Acronyms QoS Quality of Service MVWA Minimum Variance Weighted Average EM Expectation-Maximization LEND Least-Biased End-to-end Network Diagnosis MILS Minimal Identiﬁable Link Sequence LPRs Link Pass Ratios PPRs Path Pass Ratios LA Linear Algebra LANT Linear Algebraic Network Tomography BP Belief Propagation ML Maximum Likelihood RMSE Root Mean Square Error List of Acronyms NE Normal Equations RS Row Selection ix x List of Symbols |·| Cardinality of a set n Number of probe batches V Set of nodes S Set of source nodes R Set of receiver nodes E Set of links EI Set of identiﬁable links EV Set of virtual links ER Set of end links P Set of monitored end-to-end paths P(e) Set of end-to-end paths that include link e P Power set of P, |P| = 2|P| List of Symbols G = (V, E) Directed acyclic graph Ge Subgraph consisting of the links and nodes in set P(e) G Set of subgraphs with overlapping links Te Auxiliary tree corresponding to end link e and subgraph Ge Le Set of leaf nodes in Te ui (v) The ith tree node that corresponds to the non-receiver node v in the directed graph εi (e) The ith tree link that corresponds to link e in the directed graph ℓ Probe size, number of bits in each probe packet q = 2ℓ Alphabet size of a probe packet M = (mi,j )|P|×|E| Path-link matrix M = (mi,j )|P|×|EI ∪EV | Type 1 modiﬁed path-link matrix M = (mi,j )(|P|−1)×|EI ∪EV | M(|P|) Type 2 modiﬁed path-link matrix Auxiliary matrix of type 2 modiﬁed path-link matrix with path set P αj Actual link success rate of jth link in EI ∪ EV xi List of Symbols βi Actual path success rate of ith path in P θi Actual path set success rate of ith path set in P \ {∅} a = (aj )|EI ∪EV |×1 b = (bi )|P|×1 c = (ci )(|P|−1)×1 Column vector, where aj = log αj Column vector, where bi = log βi Column vector, where ci = log θi µ = |P| − 1 Number of rows in M ν = |EI ∪ EV | Number of columns in M M1 Reduced ν × ν path-link matrix from M after row selection c1 Reduced ν × 1 column vector from c after row selection xii xiii Acknowledgements First, I would like to express my deepest gratitude to my supervisor, Dr. Vincent Wong, for his patient guidance, constant encouragement, and excellent advice throughout my graduate study. Through our weekly meeting, we generate new ideas, stimulate creativity and dig into technology details. Without his invaluable help, this work would not be possible. A special thanks goes to my colleague, Vahid Shah-Mansouri, who provided me with precious assistance during the completion of this thesis. I am also thankful to all my colleagues in Dr. Wong’s group: Amir Hamed Mohsenian-Rad, Vahid Shah-Mansouri, Man Hon Cheung, Keivan Ronasi, Ehsan Vahedi, Wei Bao, Xiaolei Hao, Wenbo Shi, and Jinbiao Xu as well as other friends in the data communications group, for sharing their experiences and knowledge during the time of my study. Finally, I take this opportunity to express my profound gratitude to my beloved parents for their understanding, support, and endless love during my study in Canada. To them I dedicate this thesis. This research is supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada under grant number STPSC 356767. 1 Chapter 1 Introduction Accurate and eﬃcient measurement of network-internal characteristics is critical for management and maintenance of large-scale networks. With accurate and timely performance estimates, more eﬃcient traﬃc control protocols and dynamic routing algorithms can be designed. Quality-of-service (QoS) guarantees can be achieved if the available bandwidth can be gauged; the resulting service level agreements can be veriﬁed. Detecting anomalous or malicious behavior becomes a more achievable task [1]. Although network administrators can monitor local traﬃc conditions and detect congestion points in small-scale networks, most networks are not completely isolated. The user-perceived performance of a network thus depends heavily on the performance of the internetwork. The traditional approach for characterizing network performance is based on detailed queueing models at the individual router level, which requires access to a wide range of routers to obtain link-level statistics. However, the routers are operated by diﬀerent companies or service providers, which makes it diﬃcult to collect detailed information at individual devices. Alternatively, we can make useful end-to-end measurements that do not require widely cooperation from internal network devices. Subsequently, based on the end-to-end measurements, we can apply inference techniques to extract the hidden Chapter 1. Introduction 2 information of interest. Broadly speaking, large-scale network inference involves estimating network performance parameters based on traﬃc measurements at a limited subset of the nodes. Vardi was one of the ﬁrst researchers to rigorously study this type of problem and he coined the term network tomography [2] due to the similarity between the inference of network characteristics and medical tomography. Three forms of network tomography have been addressed in the recent literature: (1) link-level parameter estimation based on end-toend, path-level traﬃc measurements [3, 4], (2) sender-receiver path-level traﬃc intensity estimation based on link-level traﬃc measurements, and (3) the inference of network topology [5, 6]. Characterizing these parameters is critical for detecting congestion, faults and other anomalous behavior, ensuring compliance of service-level agreements with users, and management of overlay networks. In link-level parameter estimation, the end-to-end measurements usually consist of the number of probe or data packets transmitted and received between the source and the receiver nodes or the delay between packet transmissions and receptions. The objective is to estimate the loss rate or the queuing delay of each link. Dropped packets on a link are usually due to overload of the ﬁnite output buﬀer of one of the routers encountered when traversing the link, but may also be caused by equipment downtime due to maintenance or power failures. The end-to-end delay is due to both propagation delay processing delay, queuing delay, and transmission delay. As assumed by most literature, occurrences of dropped packets and queuing delay is inherently random. Chapter 1. Introduction 3 In path-level traﬃc intensity estimation, the measurements consist of the number of probe or data packets that transmitted through nodes in the network. In privately owned networks, the collection of such measurements is relatively straightforward. Based on these measurements, the goal is to estimate how much traﬃc originated from a speciﬁed node and was transmitted to a speciﬁed destination. The combination of the traﬃc intensities of all the origin-destination pairs forms the origin-destination traﬃc matrix. Both the node-level measurements and the parameter to be estimated are inherently random. In the inference of network topology, the measurements usually consist of the number of probe or data packets transmitted and received between the source and the receiver nodes or the delay between packet transmissions and receptions. Some proposals require clock synchronization while other more practical ones do not. The physical network topology can be represented as a directed graph, where each vertex represents a physical device such as a router or a switch and the edges correspond to the communication links between those devices. Based on the end-to-end measurements, the logical topology can be determined. Network tomography can be performed either in an active or passive manner. Active network tomography refers to the case where probe packets are sent from the sources to the receivers located on the periphery of the network [7–9]. Using the end-to-end measurements generated by probe packets transmitted and received between the source and the receiver nodes, more informative and reliable path-level measurements are provided Chapter 1. Introduction 4 at the cost of utilizing additional network resources such as bandwidth and energy. On the contrary, passive network tomography reveals information from the existing data traﬃc, so that it is more attractive for networks (e.g., wireless sensor networks) with limited power supply and bandwidth constraints [10–13]. There is also an accelerating trend toward network security that will create a highly uncooperative environment for active tomography. For example, ﬁrewalls designed to protect information may not allow requests for routing information, special packet handling and other network transport protocols required by many active tomography techniques. This has prompted investigations into passive-based traﬃc monitoring techniques. 1.1 Network Coding Basics Networked systems arise in various communication contexts such as telephone networks, the public Internet, peer-to-peer networks, ad-hoc wireless networks, and wireless sensor networks. An inherent premise behind the operation of all communication networks lies in the way information is treated. Recently, with the advent of network coding, the simple but important observation was made that in communication networks, intermediate nodes are allowed not only forward but also process the incoming independent information ﬂows [14–16]. At the network layer, for example, intermediate nodes can perform binary addition of independent bitstreams, whereas, at the physical layer of optical networks, intermediate nodes can superimpose incoming optical signals. In other words, data streams that are independently produced and consumed do not necessarily need to Chapter 1. Introduction (a) Routing to r1 (b) Routing to r2 5 (c) Network Coding Figure 1.1: The Butterﬂy network. Sources s1 and s2 multicast information x1 and x2 to receivers r1 and r2 . be kept separate when they are transported throughout the network. Combining independent data streams can better tailor the information ﬂow to the network environment and accommodate the demands of speciﬁc traﬃc patterns. This shift in paradigm is expected to revolutionize the way we manage, operate, and understand the organization in networks, as well as to have a deep impact on a wide range of areas such as reliable delivery, resource sharing, eﬃcient ﬂow control, network monitoring, and security [17]. One essential beneﬁt of network coding is in terms of throughput when multicasting. The following simple example from [14] illustrates the basic concepts in network coding and gives a preliminary idea of the expected beneﬁts and challenges. Example 1: Fig. 1.1 depicts a communication network represented as a directed graph where vertices correspond to terminals and edges correspond to channels. This Chapter 1. Introduction 6 example is commonly known in the network coding literature as the butterﬂy network. Assume we can send one bit per time slot through each channel. We have two sources s1 and s2 , and two receivers r1 and r2 . Each source produces one bit per time slot which we denote by x1 and x2 , respectively. If receiver r1 uses all the network resources by itself, it could receive information from both sources. Indeed, we could route the bit x1 from source s1 along the path {AD} and the bit x2 from source s2 along the path {BC, CE, ED}, as depicted in Fig. 1.1(a). Similarly, if the second receiver r2 uses all the network resources by itself, it could also receive information from both sources. Indeed, we could route the bit x1 from source s1 along the path {AC, CE, EF }, and the bit x2 from source s2 along the path {BF } as depicted in Fig. 1.1(b). Now assume that both receivers want to simultaneously receive the information from both sources. We then have a contention for the use of edge CE, since we assume that each channel can only transmit one bit per time slot. Traditionally, information ﬂow was treated like ﬂuid through pipes, and independent information ﬂows were kept separate. The simple but important observation made in the work by Ahlswede et al. is that we can allow intermediate nodes in the network to process their incoming information ﬂows, and not just forward them. In Fig. 1.1(c), node C can take bits x1 and x2 and XOR them to create a third bit x3 = x1 + x2 which it can then send through edge CE (the XOR operation corresponds to addition over the binary ﬁeld). r1 receives {x1 , x1 + x2 }, and can solve this system of equations to retrieve x2 by XORing x1 and x1 + x2 . Similarly, r2 receives {x2 , x1 + x2 }, and can solve this system of equations to retrieve x1 by XORing Chapter 1. Introduction 7 x2 and x1 + x2 . The previous example shows that if we allow intermediate node in the network to combine information streams and extract the information at the receivers, we can increase the throughput when multicasting. Network coding can also be used to infer the loss rates of links in an overlay network. For conventional active tomography, packets are usually multicast to several receivers. After a suﬃciently large number of probe packets, shared links and their loss rates can be inferred with reasonable accuracy. In such a setting, network coding can provide additional ﬂexibility since probe packets are not only duplicated at branching points of the multicast tree, but may also be merged. If multiple senders transmit packets to a single receiver, and these packets are combined within the network, it allows to infer network parameters in much the same way as multicasting from one sender to multiple receivers [18]. Furthermore, if the network coding coeﬃcients (i.e., the speciﬁc way in which packets are combined at the nodes) are known in advance, the received coded probe packets can provide additional information about which packets were lost in which part of the tree. By exploiting these features, it is possible to signiﬁcantly reduce the number of active probes and the bandwidth usage generated by these probes, and thus increase bandwidth eﬃciency. Chapter 1. Introduction 1.2 8 Related Work In the area of loss tomography, extensive studies have been given to multicast tree topologies. In [7], Caceres et al. developed a maximum-likelihood estimator for loss rates on internal links based on losses observed by multicast receivers. It exploits the inherent correlation between such observations to infer the performance of paths between branch points in the tree spanning a multicast source and its receivers. The proposed method relies on the iterative approximation to identify the parameters that requires a long execution time. In addition, the parameters identiﬁed by this method may not be the true values of those parameters since the iterative procedure may trap into a local maximum. In [19], Zhu et al. proposed an estimate that is based on the correlation between a link and its sibling links to identify the loss rate of the link. The proposed method, instead of using an iterative approach to approximate the maximum, employs a bottom-up approach to identify the loss rates of the links of a network. In contrast to multicast techniques, unicast inference based on multicast tree topologies can easily be performed on most networks. In [20], Duﬃeld et al. designed experiments based on the notion of transmitting stripes of packets (with no delay between transmission of successive packets within a stripe) to two or more receivers. The purpose of these stripes is to ensure that the correlation in receiver observations matches as closely as possible what would have been observed if the stripe had been replaced by a multicast probe that followed the same paths to the receivers. In [10], Tsang et al. designed a measurement procedure for network loss inference based on end-to-end packet Chapter 1. Introduction 9 pair measurements. Back-to-back packet pairs are the two packets that are sent one after the other by the source, possible destined for diﬀerent receivers, but sharing a common set of links in their paths. In [21], Padmanabhan et al. investigated the problem of identifying lossy links in the interior of the Internet by passively observing the end-to-end performance of existing traﬃc between a server and its clients. The key advantage of a passive approach is that it does not introduce additional traﬃc which might perturb the object of inference, i.e., the link loss rates. The techniques depend only on knowing the number of lost and successful packets sent to each client. While the accuracy of link loss rate inference may consequently suﬀer, the techniques can still pinpoint the trouble spots in the network (e.g., highly lossy links). They developed and evaluated three techniques for passive network tomography: random sampling, linear optimization, and Bayesian inference using Gibbs sampling. To extend the existing multicast and unicast tomography approaches to general topologies, in [22], Bu et al. proposed an approach using multiple trees to cover a mesh topology and combine the inferred loss rates. They further proposed two algorithms to perform the link-level inferences. One, the minimum variance weighted average (MVWA) algorithm treats the trees separately and then average the results. The second, based on expectation-maximization (EM) merges all of the measurements into one computation. However, this approach may have low bandwidth eﬃciency, since those links that are part of multiple trees would be traversed by multiple probe packets in each time slot, and thus Chapter 1. Introduction 10 create additional traﬃc. In addition, it may incur high monitoring cost, since it requires a large number of receivers to be deployed in each multicast tree. The pioneering work in [14] showed that for multicast networks, if intermediate nodes can perform network coding, that is to perform simple local XOR-operations on incoming packets, one can achieve the min-cut throughput of the network to each receiver. Recent studies show that applying network coding in loss tomography can increase bandwidth eﬃciency [17, 23]. In a network which is capable of performing network coding in addition to multicast, the intermediate nodes linearly combine incoming probe packets and forward the coded probe packets to the outgoing links according to pre-determined coding coeﬃcients. Results in [18] show that for active monitoring using network coding, appropriate selection of the number and location of sources and receivers can aﬀect the accuracy of estimation in general tree topologies. The work in [24] established a framework for loss tomography on mesh topologies. An orientation algorithm is proposed to ﬁnd a directed acyclic graph from an undirected graph with selected sources. An example is illustrated in [18] such that each link in a mesh topology can be traversed in each time slot by exactly one probe. In contrast to network coding with pre-determined coding coeﬃcients, randomized network coding changes the fundamental connection between path and link loss probabilities such that new inference algorithms need to be developed. In [13], Lin et al. studied the loss inference problem in sensor networks with randomized network coding. As end-to-end data are not suﬃcient to compute link loss rates precisely, they proposed Chapter 1. Introduction 11 inference algorithms based on Bayesian principles to discover the set of highly lossy links in sensor networks. The algorithms achieve high detection and low false-positive rates in extensive simulations. In [25], Yao et al. studied passive network tomography in the presence of network failures, under the setting of random linear network coding. Several sets of algorithms for topology estimation and failure detection are proposed under various setting of adversarial random failures. To reduce monitoring cost, a set of end-to-end paths on mesh topologies only requires a limited number of sources and receivers. In [26], Mao et al. proposed a brief propagation (BP) algorithm, which is combined with the use of network coding in [24]. The BP algorithm is a low complexity algorithmic framework for link loss monitoring based on the recent modeling and computational methodology of factor graphs [27]. It iteratively updates the estimates of link losses upon receiving (or detecting the loss of) recently sent packets. The algorithm exhibits good performance and scalability, and can be easily adapted to diﬀerent statistical models of networking scenarios. In particular, due to its low complexity, the algorithm is particularly suitable as a long-term monitoring facility. In [28], Zhao et al. proposed a least-biased end-to-end network diagnosis (LEND) system for inferring link-level properties like loss rate. They deﬁned a minimal identiﬁable link sequence (MILS) as a link sequence of minimal length whose properties can be uniquely identiﬁed from end-to-end measurements. They designed eﬃcient algorithms to ﬁnd all the MILSs and infer their loss rates for diagnosis. The LEND system works for any network topology and for both directed and undirected topologies. It gives highly Chapter 1. Introduction 12 accurate estimates of the loss rates of MILSs and such diagnosis can be achieved with ﬁne granularity and in near real-time even for reasonably large overlay networks. The LEND system can also supplement existing statistical inference approaches and provide smooth tradeoﬀ between diagnosis accuracy and granularity. In [29], Sun et al. focused on the problem of ﬁnding the link pass ratios (LPRs) when the path pass ratios (PPRs) of a set of paths are given. They proved the problem of ﬁnding the maximum-likelihood estimation of LPRs given PPRs is NP-hard, and then proposed a simple algorithm based on divide-and-conquer. It ﬁrst estimates the number of faulty links on a path, then uses the global information to estimate assign LPRs to the links. It requires a priori probability distribution function on link loss rates and an assumption that the majority of links being lossless. In [30], Chen et al. focused on overlay network monitoring, which enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with n hosts, existing systems either require O(n2 ) measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. They proposed an algebraic approach that selectively monitors k linearly independent paths that can fully describe all the O(n2 ) paths. The loss rates and latency of these k paths can be used to estimate the loss rates and latency of all other paths. Chapter 1. Introduction 1.3 13 Motivations and Objective In this thesis, we consider the problem of link loss tomography on mesh topologies. Although there are extensive studies of link loss inference on multicast tree topologies [7, 19, 31, 32], loss tomography on mesh topologies is still a challenging problem. Existing approaches have not exploited the inherent information in the end-to-end observations. As a result, the linear system of link loss rates and path loss rates usually has a coeﬃcient matrix with deﬁcient column rank1 , which makes it diﬃcult to accurately infer the link loss rates [30]. In general, most of the previously proposed loss tomography approaches on mesh topologies in the literature have one or more of the following performance bottlenecks: (1) low bandwidth eﬃciency, (2) high monitoring cost, (3) estimation not being always accurate, and (4) requiring additional assumptions. In this thesis, we propose a linear algebraic network tomography (LANT) framework for active inference of link loss rates on mesh topologies. To increase bandwidth eﬃciency and reduce monitoring cost, we send probe packets along a set of end-to-end paths rather than multicast trees and apply network coding. To increase the estimation accuracy, we exploit the inherent correlation between the losses on the links and those on the diﬀerent sets of paths, which is captured through network coding. We refer to probe packets and network coding schemes jointly 1 The column rank of an m × n matrix is the maximum number of linearly independent columns of the matrix. If the matrix has rank n, then it has full column rank; otherwise, the matrix has deficient column rank. Chapter 1. Introduction 14 as probe coding schemes. In our LANT framework, a valid probe coding scheme enables us to establish the mappings between the contents of received probe packets and the losses on the diﬀerent sets of paths. Using valid probe coding schemes, we obtain valid end-to-end observations, based on which we can distinguish which paths have successfully transmitted a probe and which paths have not. We also deﬁne link identiﬁability, a link property that only depends on the network topology. For identiﬁable links, we develop consistent estimators that converge to the actual loss rates as the number of probes increases. Since the number of all path sets can grow exponentially as the number of total paths increases, we selectively monitor a subset of all path sets (the method of row selection), which are suﬃcient to infer the loss rates of all identiﬁable links. 1.4 Contributions The main contributions of this thesis are as follows [33, 34]: • We establish a tight lower bound on probe size, which is necessary for valid probe coding schemes when network coding is applied. Then, we develop algorithms to ﬁnd a valid probe coding scheme such that the lower bound on probe size is achieved. • We propose a linear algebraic (LA) approach to developing consistent estimators of link loss rates, which converge to the actual loss rates as the number of probes increases. We combine the methods of normal equations and row selection with the Chapter 1. Introduction 15 LA approach, and analyze the computational complexity. • We prove that the identiﬁability of a link, which only depends on the network topology, is a necessary and suﬃcient condition for the consistent estimation of its loss rate, using the LANT framework. • Simulation results show that the LA approach using the method of row selection can eﬀectively decrease computational complexity without reducing estimation accuracy. Besides, the LA approach achieves better estimation accuracy than the BP algorithm, when the estimators converge. Although the eﬀect of the number and location of sources on the accuracy can be negligible with relatively large success rates or suﬃcient probe batches, diﬀerent number and location of sources may result in diﬀerent number of identiﬁable links. The framework we present in this thesis is unique when compared to the prior work done in the area of loss tomography using network coding. In terms of bandwidth eﬃciency, the work in [24] establishes a loose lower bound on probe size for valid probe coding schemes, while the problem of ﬁnding coding coeﬃcients remains unexplored. Without eﬃcient algorithms to ﬁnd coding coeﬃcients, the inference framework is incomplete and cannot be applied. We establish a tight lower bound and also develop algorithms to ﬁnd a valid probe coding scheme such that the lower bound on probe size is achieved. In terms of inference approaches, the BP algorithm [26] only uses the information of the losses on diﬀerent paths such that the estimation may not be accurate for networks with Chapter 1. Introduction 16 relatively high link loss rates. The work in [29] requires additional assumptions such as a priori probability distribution function and the majority of links being lossless. The work in [28] only ﬁnd the loss rate of a minimal identiﬁable link sequence. In contrast, our LA approach does not need extra assumptions while it can still obtain additional useful information, the losses on the diﬀerent sets of paths. This information can only be obtained via probe coding schemes and cannot be achieved by routing probes in general. As a result, we obtain better estimation accuracy and obtain more identiﬁable links. 1.5 Structure of the Thesis The rest of this thesis is organized as follows. In Chapter 2, we present the LANT framework, including system model, probe coding schemes, and an LA approach for the consistent estimation of link loss rates. Chapter 3 presents performance evaluation. Conclusions are given in Chapter 4. 17 Chapter 2 The LANT Framework In this chapter, we present we present the LANT framework, including system model, probe coding schemes, and an LA approach for the consistent estimation of link loss rates. 2.1 System Model We model the network as a directed acyclic graph G = (V, E), consisting of a set of nodes V and a set of links E. The node set V includes routers and periphery devices where probe packets are sent and received. A link e = (v, v ′ ) ∈ E denotes a directed communication link from node v to node v ′ . Let S and R denote the set of source nodes and the set of receiver nodes, respectively. The set of monitored end-to-end paths is denoted by P. A path P ∈ P is a set of directed links from a source to a receiver. Let P(e) denote the set of paths that include link e. We deﬁne a path-link matrix M = (mi,j )|P|×|E| , whose |P| rows correspond to the |P| paths and the |E| columns correspond to the |E| links, as follows: The element mi,j is equal to 1 if the ith path in set P includes the jth link in set E, and is equal to 0 otherwise. As an example, the directed acyclic graph in Fig. 2.1 has Chapter 2. The LANT Framework 18 Figure 2.1: A directed acyclic graph with V = {s, r, 1, 2, 3, 4} and E = {e1 , e2 , . . . , e7 }. The set of monitored end-to-end paths P = {P1 , P2 , P3 }, where P1 = {e1 , e2 , e5 , e7 }, P2 = {e1 , e2 , e4 , e6 , e7 }, and P3 = {e1 , e3 , e6 , e7 }. For link e2 = (1, 2), we have P(e2 ) = {P1 , P2 }. three paths from source s to receiver r. Its path-link matrix is a 3 by 7 binary matrix as shown below: e1 e2 e3 e4 e5 e6 e7 P1 1 M = P2 1 P3 1 1 0 0 1 0 1 0 1 0 1 0 1 0 0 1 1 1 . 1 (2.1) Given a directed acyclic graph G = (V, E) and a set of monitored end-to-end paths P, a link e ∈ E is called identiﬁable, if for each link pair (e, e′ ) where e′ ∈ E \{e}, there exists at least one path in P that includes only one of the two links, i.e., P(e) ̸= P(e′ ). As in Fig. 2.1, links e2 , e3 , . . . , e6 are identiﬁable links, while links e1 and e7 are non-identiﬁable links since P(e1 ) = P(e7 ). We notice that the identiﬁability of a link depends only on the network topology. The following proposition shows that the identiﬁability of a link is a necessary condi- Chapter 2. The LANT Framework 19 tion for the estimation of its loss rate. Proposition 1 The loss rate of a link can be estimated only if the link is an identiﬁable link. Proof : We prove it by contradiction. Suppose there exists a link pair (e, e′ ), where e, e′ ∈ E, such that all paths in P include either both or none of them, i.e., P(e) = P(e′ ). If a probe packet is being dropped on either link e or e′ , the same end-to-end observation (probe packets with the same contents) will be obtained in either case. Therefore, we cannot diagnose on which link the loss of probe packet occurs, and it is not possible to estimate the loss rate of these links. We divide the set of non-identiﬁable links into several groups, where each group contains a set of links that are included in the same set of paths. We refer to each group as a virtual link. As in Fig. 2.1, since P(e1 ) = P(e7 ), we refer to e1 and e7 as one virtual link ev1 . Let EI and EV denote the set of identiﬁable links and the set of virtual links, respectively. We have EI = {ev1 } and EV = {e2 , e3 , . . . , e6 }. Note that EI ∩ EV = ∅. Thus, for each link e ∈ EI ∪ EV , we have P(e) ̸= P(e′ ) for all e′ ∈ EI ∪ EV \ {e}. We ﬁx the order of elements in EI ∪ EV . Accordingly, we deﬁne a modiﬁed path-link matrix M = (mi,j )|P|×|EI ∪EV | as follows: The element mi,j is equal to 1 if the ith path in set P includes the jth link in set EI ∪ EV , and is equal to 0 otherwise. We refer to M as type 1 modiﬁed path-link matrix. The type 1 modiﬁed path-link matrix for the graph in Fig. Chapter 2. The LANT Framework 20 2.1 is shown below: ev1 e2 e3 e4 e5 e6 P1 1 M = P2 1 P3 1 1 0 0 1 1 0 1 0 0 1 0 0 0 1 . 1 (2.2) We model the loss of packets on diﬀerent links by a set of mutually independent Bernoulli processes. Losses are therefore spatial and temporal independent. This model is commonly used in the literature [7–9, 19, 31, 32] for network tomography. We deﬁne αj ∈ (0, 1] as the link success rate of the jth link in set EI ∪ EV , which is the probability that a packet can be successfully transmitted on the jth link. Thus, 1 − αj denotes the loss rate of the jth link in set EI ∪ EV . Moreover, we deﬁne βi ∈ (0, 1] as the path success rate of the ith path in set P, which is the probability that a probe packet can be successfully transmitted on the ith path in set P. Unlike data packets, probe packets would not be retransmitted if being dropped. Thus, we have |EI ∪EV | ∏ (αj )mi,j = βi , i = 1, . . . , |P|. (2.3) j=1 Taking logarithm on both sides of (2.3), we can reformulate it as linear equations: |EI ∪EV | ∑ mi,j log αj = log βi , i = 1, . . . , |P|, (2.4) j=1 where log αj and log βi are the variables of linear equations. Setting aj = log αj and bi = log βi , we have |EI ∪EV | ∑ j=1 mi,j aj = bi , i = 1, . . . , |P|. (2.5) Chapter 2. The LANT Framework 21 We deﬁne two column vectors a = (aj )|EI ∪EV |×1 , and b = (bi )|P|×1 . The system can be represented in the matrix form as Ma = b. (2.6) Equation (2.6) shows the relation between the path and link success rates. The objective of loss tomography is to infer the link loss rates using end-to-end observations (i.e., the ˆ denote the estimator ˆ and b number and the contents of the received probe packets). Let a ˆ while a ˆ of a and b, respectively. By measuring the path success rates, we can estimate b remains unknown. Thus, equation (2.6) becomes a system of |P| equations with |EI ∪ EV | unknowns as: ˆ Mˆ a = b. (2.7) In most cases, the number of identiﬁable and virtual links is greater than the number of paths. That is, |EI ∪ EV | > |P|. Thus, (2.7) is under-determined. We propose the LANT ˆ. framework to obtain additional useful information and determine a The LANT framework is composed of two phases. In the ﬁrst phase, we apply network coding and perform end-to-end measurements on the set of paths P. n batches of probe packets are sent from the sources in a synchronized manner. In each time slot, the intermediate nodes linearly combine the incoming probes according to speciﬁc coding coeﬃcients. The key objective in this phase is to ﬁnd the minimum probe packet size that can establish the mappings between the contents of the received probe packets and the losses on the diﬀerent sets of paths. In the second phase, we inspect the contents of the received probe packets. We show that it can provide us with more information than Chapter 2. The LANT Framework 22 path success rates. We establish a linear system whose coeﬃcient matrix has full column rank, and use computational eﬃcient algorithms to develop consistent estimators of link loss rates. In the next two sections, we describe these two phases in details. 2.2 Probe Coding Schemes In Subsection 2.2.1, we establish a tight lower bound on probe size (i.e., number of bits in each probe packet), which is necessary for valid probe coding schemes. Then, we propose algorithms to ﬁnd a valid probe coding scheme with the minimum probe size in Subsection 2.2.2. 2.2.1 Lower Bound on Probe Size We refer to probe packets and network coding schemes jointly as probe coding schemes. A probe coding scheme is valid if we can determine which paths have successfully transmitted a probe and which paths have not from the end-to-end observations. We adopt linear network coding schemes [15] that are suﬃcient for our task. A probe packet is a binary vector (·)2 of length ℓ, which can be interpreted as an element in a ﬁnite ﬁeld Fq with an alphabet of size q (q = 2ℓ ). A coding coeﬃcient can also be interpreted as an element in the ﬁnite ﬁeld Fq . Within valid probe coding schemes, the probe size ℓ is desired to be as small as possible, since it is directly related to bandwidth eﬃciency. Although a smaller probe size can reduce the bandwidth usage of the network, the inference framework is not valid if the probe size falls below a certain Chapter 2. The LANT Framework 23 threshold. For example, in Fig. 2.1, receiver r receives coded packets that are combined from packets on three diﬀerent paths. Using one-bit probe packets, we are not able to distinguish which of these three paths have successfully transmitted a probe packet. In this case, we need probe packets with at least three bits for valid probe coding schemes while smaller probe sizes cannot constitute valid probe coding schemes. Before we ﬁnd a lower bound on probe size, which is necessary for valid probe coding schemes, we present the notations used in our approach. In a directed acyclic graph G = (V, E), a link is an end link if it is adjacent to a receiver r ∈ R. The set of all end links is denoted by ER . For an end link e ∈ ER , let Ge denote a subgraph of G consisting of the links and nodes involved in set P(e). We notice that if receiver r has multiple end links, it would know from which link a packet is received. Let qe and ℓe denote the alphabet size and the length of the probe packets transmitted on subgraph Ge , respectively. The following theorem presents a loose lower bound on probe packet size for valid probe coding schemes. Theorem 1 For the probes transmitted on subgraph Ge , where e ∈ ER , the probe size should satisfy ℓe ≥ |P(e)| (i.e., qe ≥ 2|P(e)| ), in order to obtain valid end-to-end observations. Proof : For each end link e ∈ ER , let P(e) = {P1 , P2 , . . . , P|P(e)| }. As for valid probe coding schemes, based on the content of the received probe, a receiver should distinguish which paths have successfully transmitted a probe and which paths have not. Without loss of generality, we start from path P1 . Since a zero binary vector will introduce Chapter 2. The LANT Framework 24 ambiguity, (1)2 is the smallest binary vector we can use to denote the case where only path P1 has successfully transmitted a probe. (10)2 is the smallest binary vector we can use to denote the case where only path P2 has successfully transmitted a probe. Since (11)2 denotes the case where both paths P1 and P2 have successfully transmitted a probe, (100)2 is the smallest binary vector we can use to denote the case where only path P3 has successfully transmitted a probe. By induction, we can show that (10 · · · 0)2 of length |P(e)| is the smallest binary vector we can use to denote the case where only path P|P(e)| has successfully transmitted a probe. We modify the above binary vectors to vectors of length |P(e)| with zeros added to the left-hand side. Thus, for the probes transmitted in subgraph Ge , we have ℓe ≥ |P(e)|, and qe ≥ 2|P(e)| . Although Theorem 1 provides lower bounds on probe size for the probes transmitted on diﬀerent subgraphs separately, some lower bounds may not be achieved. For example, in Fig. 2.2, we have ℓe6 ≥ 2 and ℓe7 ≥ 4 for subgraphs Ge6 and Ge7 , respectively. However, there are overlapping links in Ge6 and Ge7 such as links e1 , e2 and e3 . In this case, a valid probe size should be 4. Let G denote a set of subgraphs with overlapping links. The probes transmitted on these subgraphs should have the same size. Let ℓG denote the size of such probes. Correspondingly, the set of end links in the subgraph set G is denoted by ER (G ). The following proposition presents an improved lower bound on probe size for valid probe coding schemes. Proposition 2 For the probes transmitted on subgraph set G , the probe size should satisfy ℓG ≥ maxe∈ER (G ) |P(e)|, in order to obtain valid end-to-end observations. Chapter 2. The LANT Framework 25 Figure 2.2: A directed acyclic graph with two end links e6 and e7 . Proof : According to Theorem 1, for the probes transmitted in each subgraph Ge where e ∈ ER (G ), there exists a lower bound ℓe ≥ |P(e)|. Since network coding is applied, the probes transmitted on one link should have the same size. Similarly, the probes transmitted on one subgraph should also have the same size. Thus, for the probes transmitted in the subgraphs with overlapping links, the lower bound on probe size is the maximum value of the lower bounds obtained from Theorem 1. That is, ℓG ≥ maxe∈ER (G ) |P(e)|. 2.2.2 Algorithms for Finding a Valid Probe Coding Scheme We propose an approach to ﬁnd a valid probe coding scheme, such that the improved lower bound obtained from Proposition 2 is achieved. The approach is divided into three processes, described in this section. Chapter 2. The LANT Framework 26 Constructing Auxiliary Trees For each end link e = (h, r) ∈ ER , we introduce an auxiliary tree topology Te . There is a one-to-many mapping from the nodes and the links in the original graph G to the nodes and the links in the auxiliary tree Te . We use u0 (r) to denote the tree node that corresponds the root node r in graph G. We use ui (v) to denote the ith tree node that corresponds to the non-receiver node v in graph G. Similarly, we use εi (e) to denote the ith tree link that corresponds to link e in graph G. Algorithm 1 shows how to construct the auxiliary trees corresponding to each end link in set ER . For each end link e = (h, r) ∈ ER , nodes u0 (r), u1 (h) and link ε1 (e) are ﬁrst added into Te (Step 2). We deﬁne a leaf node as a node only with outgoing links in Te . Node u0 (r) is the destination node. The set of leaf nodes Le initially includes node u1 (h) (Step 3). If there exists tree node uk (v) ∈ Le , where k ∈ {1, 2, . . . , i} and v is not a source node in G, then along the incoming links of node v while ignoring the outgoing links, we ﬁnd a set of nodes. Corresponding to these nodes and the incoming links, new tree nodes and tree links are deﬁned and added into Te (Step 7). The set of leaf nodes Le is updated in Steps 8 and 11. The counter i for the number of tree links in Te is updated in Step 9. Selecting Coding Coeﬃcients The coding coeﬃcients are readily obtained based on the auxiliary trees. The non-receiver nodes with multiple incoming links in G are the nodes that perform network coding. The Chapter 2. The LANT Framework 27 Algorithm 1 Algorithm for constructing auxiliary trees. Assume graph G = (V, E) is given. 1: for each end link e = (h, r) ∈ ER do 2: Add nodes u0 (r), u1 (h) and link ε1 (e) into auxiliary tree Te 3: Initialize the set of leaf nodes Le ← {u1 (h)} 4: i←2 5: while ∃ node uk (v) ∈ Le , k ∈ {1, 2, . . . , i} and v ∈ / S do for each v ′ : ∃(v ′ , v) ∈ E do 6: 7: Add node ui (v ′ ) and link εi (v ′ , v) into Te 8: Update Le ← Le 9: i←i+1 ∪ {ui (v ′ )} 10: end for 11: Update Le ← Le \ {uk (v)} end while 12: 13: end for corresponding tree nodes would also have multiple incoming tree links. The remaining nodes in G perform either forwarding or multicasting. Algorithm 2 shows how to select coding coeﬃcients. For each node uk (v) in Te , suppose t(uk (v)) it has a set of t(uk (v)) incoming links {ε1in (e1in ), ε2in (e2in ), . . ., εin t(uk (v)) (ein )} and one outgoing link εout (eout ). Then, node v has coding coeﬃcients [δ(e1in , eout ), δ(e2in , eout ), . . ., t(uk (v)) δ(ein t(uk (v)) , eout )]. Suppose tree links ε1in (e1in ), ε2in (e2in ), . . ., εin t(uk (v)) (ein ) have n1 , n2 , Chapter 2. The LANT Framework 28 Algorithm 2 Algorithm for selecting coding coeﬃcients. Assume the auxiliary trees are given. 1: for each auxiliary tree Te do 2: for each node uk (v) in Te with outgoing link εout (eout ) do 3: n0 ← 0 4: for each incoming link εiin (eiin ) of node uk (v), i = 1, 2, . . . , t(uk (v)) do 5: Find its corresponding leaf-node set L(εiin (eiin )) ⊆ Le 6: ni ← |L(εiin (eiin ))| 7: δ(eiin , eiout ) ← 2n0 +n1 +···+ni−1 8: 9: end for end for 10: end for . . ., nt(uk (v)) corresponding leaf nodes, respectively. Then, we choose the values of coding coeﬃcients as [20 , 2n1 , 2n1 +n2 , . . ., 2n1 +n2 +···+nt(uk (v))−1 ]. Designing Probe Packets The path-link matrix M can be easily obtained based on the paths from the leaf nodes to the destination node in each auxiliary tree according to Algorithm 3. Now, we show how to ﬁnd the sets of subgraphs with overlapping links. Each subgraph Ge originally constitutes a subgraph set {Ge }. We check each column of the path-link matrix M. If a column has multiple 1s and it also represents that diﬀerent subgraph sets include the same link, we combine these subgraph sets as one subgraph set. Then, for each subgraph Chapter 2. The LANT Framework 29 set G with its end-link set ER (G ), according to Proposition 2, we calculate the tight lower bound on probe size ℓG = maxe∈ER (G ) |Le |. Thus, probes as (0 · · · 01)2 of length ℓG are sent from the sources to the outgoing links in G . Finally, for each path Pi ∈ P, multiplying (0 · · · 01)2 of its corresponding length by the coding coeﬃcients along path Pi , we can obtain the content of a received probe that denotes the case where only path Pi has successfully transmitted a probe. In this way, we can establish the mappings between the contents of received probes and the losses on the diﬀerent combinations of paths for each subgraph, and thus obtain a valid probe coding scheme. Example 2: We consider a directed acyclic graph G = (V, E) depicted in Fig. 2.2. We use Algorithms 1-2 to obtain a valid probe coding scheme with the minimum probe size. First, we construct two auxiliary trees Te6 and Te7 , as depicted in Fig. 2.3. Second, we choose nodes 1 and 2 as the nodes that perform network coding, since they are the non-receiver nodes with multiple incoming links. Based on the auxiliary tree Te6 , we have |L(ε3 (e1 ))| = 1 (Algorithm 2, Steps 5-6). Thus, some of the coding coeﬃcients of node 1 are obtained as [δ(e1 , e3 ), δ(e2 , e3 )] = [1, 2] (Algorithm 2, Step 7). Similarly, based on Te7 , we obtain the coding coeﬃcients of node 1 as [δ(e1 , e4 ), δ(e2 , e4 )] = [1, 2]. We also obtain the coding coeﬃcients of node 2 as [δ(e4 , e7 ), δ(e5 , e7 )] = [1, 4], since |L(ε2 (e4 ))| = 2. Chapter 2. The LANT Framework 30 Algorithm 3 Algorithm for constructing path-link matrix. Assume the auxiliary trees are given. 1: Initialize an empty path-link matrix M 2: for each auxiliary tree Te do 3: i←1 4: for each tree path Pi in Te do 5: 6: for each mi,j , j = 1, 2, . . . , |E| do If ∃εk (ej ) ∈ Pi , mi,j = 1; otherwise, mi,j = 0. 7: end for 8: Row vector ω ← (m i,j )1×|E| 9: 10: 11: M Update M ← ω i←i+1 end for 12: end for Third, the path-link matrix M can be obtained as follows: M= e1 e2 e3 e4 e5 e6 e7 1 0 1 0 0 1 0 1 1 0 0 1 1 0 0 1 0 0 0 0 1 . 1 1 1 0 1 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 (2.8) Chapter 2. The LANT Framework (a) Auxiliary tree Te6 31 (b) Auxiliary tree Te7 Figure 2.3: Two auxiliary trees Te6 and Te7 , corresponding to end links e6 and e7 , respectively. The top two rows represents the two paths in subgraph Ge6 , and the bottom four rows represents the four paths in subgraph Ge7 . Since link e1 is involved in both {Ge6 } and {Ge7 } (checking the ﬁrst column of M), we combine the two subgraph sets and obtain one subgraph set G = {Ge6 , Ge7 }. Counting the number of leaf nodes in each auxiliary tree, we obtain |P(e6 )| = 2 and |P(e7 )| = 4. Thus, ℓG = max {2, 4} = 4 and probes as (0001)2 are sent from sources s1 and s2 to outgoing links e1 and e2 , respectively. Chapter 2. The LANT Framework 2.3 32 Linear Algebraic Approach As described previously, the system in (2.7) has |P| equations with |EI ∪ EV | unknowns. However, |P| may be less than |EI ∪ EV |, such as the topologies in Figs. 2.1 and 2.2, ˆ in (2.7) cannot be uniquely determined. Even when |P| ≥ |EI ∪ EV |, it does and thus a ˆ can be determined. In this section, we propose a linear algebraic not ensure that a ˆ can be (LA) approach using the observations from coding operations. We show that a determined using the method of least-squares [35]. Then, we combine the methods of normal equations and row selection with the LA approach and analyze the computational complexity. 2.3.1 Least-squares Solutions By inspecting the contents of the received coded probe packets at the destinations, we can estimate not only the success rate of a single path, but also the success rate of a set of paths. As the main consequence of valid probe coding schemes, it enables us to distinguish between the paths that have contributed to a coded probe packet and the paths that have not. This is unique to the networks which use probe coding schemes and cannot be achieved by routing probes in general. We denote the power set1 of P by P. Thus, |P| = 2|P| . Each element of P is a subset of P, which can be used to represent a unique combination of paths. Let θi denote 1 The power set of a set is the set of all subsets of that set. For example, the power set of {a, b} is {∅, {a}, {b}, {a, b}}. Chapter 2. The LANT Framework 33 the path set success rate of the ith path set in P \ {∅}, which is the probability that a batch of probes can be successfully transmitted on all the paths in the ith path set. We deﬁne a path set success rate except for ∅ ∈ P, because we require the probability that at least one path can successfully transmit a probe to obtain an equation of link success rates and a path set success rate. Accordingly, we deﬁne a modiﬁed path-link matrix M = (mi,j )(|P|−1)×|EI ∪EV | as follows: The element mi,j is equal to 1 if there exists a path in the ith path set in set P \ {∅} which includes the jth link in set EI ∪ EV , and is equal to 0 otherwise. We refer to M as type 2 modiﬁed path-link matrix. The type 2 modiﬁed path-link matrix for the graph in Fig. 2.1 is shown below: M= 1 1 0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 1 1 1 0 1 1 1 . 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 (2.9) Each of the top three rows represents the path set that includes only one path. We deﬁne a column vector, c = (ci )(|P|−1)×1 , where ci = log θi . The column vector a is deﬁned as in Section 2.1. Thus, we have a linear system |EI ∪EV | ∑ j=1 mi,j aj = ci , i = 1, . . . , |P| − 1 (2.10) Chapter 2. The LANT Framework 34 or in the matrix form as Ma = c. (2.11) For each path set in P \ {∅}, the n probe batches sent from the sources can be considered as a binomial experiment consisting of n trials. The associated binomial random variable Xi is deﬁned as the number of received coded probes (or probe batches for more than one incoming links) whose contents represent that all the paths in the ith path set have successfully transmitted a probe packet. The sample proportion θˆi = Xi /n is a maximum likelihood (ML) estimator of θi [36] (or an ML estimate resulting from end-to-end measurement xi substituted in the ˆ place of Xi ). Accordingly, we can obtain cˆ, the estimator of c. The column vector a remains unknown. Thus, we extend (2.7) to a system of |P| − 1 equations with |EI ∪ EV | unknowns, as follows: Mˆ a = cˆ. (2.12) The linear system in (2.12) has more equations than unknowns, i.e., |P| − 1 ≥ |EI ∪ EV |. (2.13) This is because for every pair of links e, e′ ∈ EI ∪ EV , P(e) is diﬀerent from P(e′ ), while |P| paths can have at most 2|P| − 1 diﬀerent combinations of paths. The inequality (2.13) ˆ to be determined. is a necessary condition for a ˆ in (2.12) can be determined by the least-squares, we introduce a To show that a (|P| − 1) × (|P| − 1) auxiliary matrix M(|P|) of a type 2 modiﬁed path-link matrix Chapter 2. The LANT Framework 35 M with an end-to-end path set P. Compared to M, M(|P|) has additional column vectors and has |P| − 1 column vectors in total. The |P| − 1 column vectors in the top |P| |P| × (|P| − 1) submatrix can represent all non-zero vectors in the vector space F2 . The bottom part of the additional columns are obtained according to the relation between the paths and path sets. An example of M(3) for M in (2.9) is as follows: M(3) = 1 1 0 0 1 0 1 1 1 0 1 0 1 0 1 0 1 0 0 1 1 1 1 0 1 1 1 1 . 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 (2.14) Lemma 1 Let M(|P|) be an auxiliary matrix of a type 2 modiﬁed path-link matrix M with an end-to-end path set P. Then, rank(M(|P|)) = 2|P| − 1, i.e., all 2|P| − 1 column vectors in M(|P|) are linearly independent. Proof : We prove it by induction. We mention that the matrix M(|P|) has binary entries and the column vectors are deﬁned in a vector space over a ﬁnite ﬁeld F2 . It can be veriﬁed that M(2) has full rank. Assume that matrix M(k) also has full rank. That is, all 2k − 1 columns in M(k) are linearly independent. Thus, the modulo 2 summation of any m columns of this matrix, for m = 2, . . . , 2k − 1, has at least one non-zero entry. Now, consider matrix M(k + 1). This matrix can be represented as follows after row and Chapter 2. The LANT Framework 36 column permutations: 0 1 1 ··· 1 0 ··· 0 .. M(k) . M(k) M(k + 1) = 0 1 1 ··· 1 . . . .. .. .. .. . M(k) 1 1 ··· 1 Permutations would not change its rank. The top row represents the newly added path, followed by two submatrices M1 = [M(k) 0 M(k)] and M2 = [M(k) 1], where 0 and 1 are columns of 0 and 1, respectively. We note that the path sets of M1 (rows in M1 ) do not include the new added path, while those of M2 all include it. Now, we show that the matrix M(k + 1) has full rank. To do so, we show that the summation of all possible combinations of these 2k+1 − 1 columns in M(k + 1) is a non-zero vector (i.e., there exists at least one non-zero entry in the summation vector). First, the middle column [1 0 1]T is included in the combination of the columns that we choose. Since the entries of the last row in M(k) are all ones, in the summation of the chosen vectors, at least one entry would be non-zero. This entry corresponds to the last row in M1 or in M2 . From now on, we exclude the middle column from our choices. Second, we choose the columns from either the 2k − 1 columns on the left-hand side or the 2k − 1 columns on the right-hand side (but not both at the same time). In this Chapter 2. The LANT Framework 37 case, at least one entry in the summation vector would be non-zero corresponding to the rows in M1 . It is because of the linear independency of the columns in M(k). Third, we choose the columns from both the 2k − 1 columns on the left-hand side and on the right-hand side. In this case, if an odd number of columns is chosen from the the right-hand side, the entry of the summation vector corresponding to the top row would be non-zero. However, if an even number of columns is chosen from the right-hand side, at least one entry of the summation vector corresponding to the rows in M2 would be non-zero, because of the linear independency of the columns in M(k). To this end, we have considered the modulo 2 summation for all possible combinations of the columns in matrix M(k + 1), and there is always at least one non-zero entry in the summation vector. Therefore, all these 2k+1 −1 column vectors in M(k +1) are linearly independent. With Lemma 1, the following theorem gives the rank of a type 2 modiﬁed path-link matrix. Theorem 2 Let a directed acyclic graph G = (V, E) be given with a system of linear equations in matrix form Mˆ a = cˆ. Then, rank(M) = |EI ∪ EV |. Proof : Let M(|P|) be an auxiliary matrix of M. The |EI ∪ EV | column vectors in M are among the 2|P| − 1 column vectors in M(|P|). From Lemma 1, all 2|P| − 1 column vectors in M(|P|) are linearly independent. Thus, these |EI ∪ EV | column vectors in M are also linearly independent. As a result, rank(M) = |EI ∪ EV |. ˆ can be determined by least-squares. Corollary 1 Let Mˆ a = cˆ be given. Then, a Chapter 2. The LANT Framework 38 Proof : When the number of equations is equal to the number of unknowns, i.e., |P| − 1 = |EI ∪ EV |, M is a square matrix. Theorem 2 ensures that M is invertible. ˆ can be determined as Thus, a ˆ = M−1 cˆ. a (2.15) When the number of equations is greater than the number of unknowns, i.e., |P|−1 > |EI ∪ EV |, the system is over-determined. We can apply least-squares [35] to obtain an approximate solution which minimizes the residual error ∥ˆc − Mˆ a∥. Theorem 2 ensures ˆ can be determined as that MT M is invertible. Thus, a ˆ = (MT M)−1 MT cˆ. a (2.16) We note that (2.15) is a special case of (2.16). ˆ using least-squares. The following theorem Corollary 1 gives an analytical solution of a demonstrates the consistency of the corresponding estimators. Theorem 3 1 − α ˆ j is a consistent estimator of 1 − αj . Proof : For each ej ∈ EI ∪ EV , 1 − α ˆ j is a function of α ˆ j , while α ˆ j is a function of p θˆ1 , θˆ2 , . . . , θˆ|P|−1 . Since θˆi − → θi , the continuous mapping theorem and Slutsky’s theorem p p [36] yield that 1 − α ˆj − → 1 − αj , where − → denotes convergence in probability. Proposition 3 The loss rate of a link can be consistently estimated if and only if the link is an identiﬁable link. Proof : Theorem 3 shows that the loss rates of all identiﬁable links can be consistently estimated by the estimators. In addition, Proposition 1 shows that if the loss rate of a Chapter 2. The LANT Framework 39 link can be estimated, then the link is identiﬁable. These together prove this proposition. Although the loss rate of the links which are not identiﬁable cannot be consistently estimated, we at least can obtain an upper bound on the loss rate of them, which is the loss rate of the corresponding virtual link. 2.3.2 Method of Normal Equations Algorithm 4 Method of Normal Equations [37] 1: Calculate the symmetric matrix MT M 2: Calculate Cholesky decomposition MT M = LLT ˆ 3: Calculate d ← MT c 4: Use forward substitution to solve Ly = d for y ˆ = y for a ˆ 5: Use back substitution to solve LT a The most common technique to solve a full rank least-squares problem is the method of normal equations [37]. We deﬁne µ = |P| − 1 and ν = |EI ∪ EV |, so that M is a µ × ν matrix. The ﬁrst step in the method of normal equations is to calculate the symmetric matrix (i.e., MT M). This requires µν 2 ﬂops2 . The second step is to calculate the Cholesky decomposition MT M = LLT requiring ν 3 /3 ﬂops. The third step is to calculate MT cˆ requiring 2µν ﬂops. The fourth and ﬁfth steps are to solve Ly = MT cˆ for 2 A flop is a floating point operation. Flop count is useful as a rough estimate of complexity and predictor of computational time on modern computers. Chapter 2. The LANT Framework 40 ˆ = MT cˆ for a ˆ using back substitution, each y using forward substitution and to solve LT a of which requires ν 2 ﬂops. Considering µ ≥ ν by (2.13), the complexity of this method is O(µν 2 ). Although the ﬁrst step in the method of normal equations includes the dominant term of the complexity, it only needs to be executed once for initial setup as long as the network topology remains unchanged. We need to obtain cˆ before we can calculate MT cˆ and perform forward/back substitutions (Steps 3-5). The complexity in calculating cˆ is O(µn), where n is the number of probe batches. This step and Steps 3-5 can be repeated κ times in a monitoring period. Thus, the LA approach using the method of normal equations has a complexity of O(µν 2 + µnκ + µνκ) in practice. 2.3.3 Method of Row Selection Since µ = 2|P| − 1, the number of path sets µ grows exponentially as |P| increases. As a result, the method of least-squares would lack scalability and thus have high complexity. Nonetheless, according to Theorem 2, rank(M) = ν. This means there exist ν linearly ˆ. independent path sets out of µ path sets which are suﬃcient to determine a To select ν linearly independent path sets, we modify the row selection algorithm proposed in [30], and obtain a reduced linear system as below: ˆ = cˆ1 , M1 a (2.17) where M1 ∈ {0, 1}ν×ν and cˆ1 ∈ Rν consists of ν rows of M and cˆ, respectively. Algorithm 5 shows the modiﬁed row (path set) selection algorithm. This algorithm incrementally Chapter 2. The LANT Framework 41 Algorithm 5 Modiﬁed Row (Path Set) Selection Algorithm 1: Initialize M1 ← the ﬁrst row in M 2: Initialize R by calculating the thin QR factorization of MT 1 3: while M1 is not a square matrix do 4: ω ← next row in M 5: ˆ 12 ← R−T M1 ω T R 6: ˆ 22 ← ∥ω∥2 − ∥R ˆ 12 ∥2 R 7: ˆ 22 ̸= 0 then if R 8: 9: 10: ˆ 12 R R Update R ← ˆ 22 0 R M1 Update M1 ← ω end if 11: end while ν×ν builds a QR factorization MT is an orthogonal matrix and 1 = QR, where Q ∈ R R ∈ Rν×ν is an upper triangular matrix. It only needs to be executed once for initial setup with a complexity of O(µν 2 ). ˆ = MT The complexity of calculating cˆ1 is reduced to O(νn). Then, we calculate a 1 z, where z = R−1 (R−1 )T cˆ1 , whose complexity is O(ν 2 ). We repeat the above steps for κ times in a monitoring period. Thus, the LA approach using the method of row selection has a lower complexity of O(µν 2 + νnκ + ν 2 κ) in practice. 42 Chapter 3 Performance Evaluation In this chapter, we access the performance of the LANT framework by simulations. 3.1 Simulation Setup For the network topology, we ﬁrst consider the Internet2 network map [38], which is a high-performance backbone network created by the Internet2 community. The topology is modiﬁed as the one used in [24], consisting of 10 nodes and 15 links. We apply the orientation algorithm [24] that converts the modiﬁed topology with selected sources to three directed acyclic graphs with diﬀerent number of sources in Fig. 3.1, where all links are identiﬁable. Destinations are determined by the orientation algorithm. To consider larger networks, we use BRITE [39] to generate three router-level undirected network topologies with Waxman model. BRITE is a universal topology generater which improves the state of the art and is based on design principles which include representativeness, inclusiveness, and inter-operability. Representativeness leads to synthetic topologies that accurately reﬂect many aspects of the actual Internet topology (e.g. hierarchical structure, degree distribution, etc.). Inclusiveness combines the strengths of as Chapter 3. Performance Evaluation 43 (a) One source. (b) Two sources. (c) Three sources. Figure 3.1: Directed acyclic graphs with diﬀerent number of sources. (a) One source (node 1) and one receiver (node 9); (b) two sources (nodes 1 and 9) and one receiver (node 7); (c) three sources (nodes 1, 4 and 10) and one receiver (node 9). Chapter 3. Performance Evaluation 44 many generation models as possible in a single generation tool. Inter-operability provides interfaces to widely-used simulation applications such as ns, SSF and OmNet++ as well as visualization applications. In the three large network topologies generated by BRITE, the number of nodes are chosen as 20, 100, and 500. The number of links is twice the number of nodes in each topology. A random link loss rate 1 − αj is assigned to the jth link ej ∈ EI ∪ EV , where the link success rate αj is uniformly distributed within [αave − 0.05, αave + 0.05]. The value of αave is chosen as 0.7, 0.75, 0.8, 0.85, 0.9, and 0.95, to adjust the average success rate across all links. After assigning each link a loss rate, we send n batches of probe packets. Each probe traversing a link is dropped at a ﬁxed probability as the link loss rate. In each simulation, we obtain an estimate 1 − α ˆ j of the actual link loss rate 1 − αj for the jth link in set EI ∪ EV . The root mean square error (RMSE) is used to determine the estimation accuracy across all identiﬁable links and virtual links. The RMSE is computed as 1/2 2 ∑ |αj − α ˆj | RMSE = . |E ∪ E | I V j=1 |EI ∪EV | (3.1) We brieﬂy summarize the belief propagation (BP) algorithm [26] and compare our LANT framework with the BP algorithm for loss rate inference through simulations in Section 3.2. Following the approach in [26], the ﬁrst step is to create the factor graph from the original graph. The factor graph is a bipartite graph: on one side there are the links (variable nodes), whose loss rates we aim to estimate; on the other side there are the paths (function nodes) that are observed by each received probe. An edge exists in Chapter 3. Performance Evaluation 45 the factor graph between a link and a path if the link belongs to this path in the original graph. We note that, unlike tree topologies considered in [26], in general topologies there might exist multiple paths for every source-receiver pair. The second step is to perform belief propagation. Each received probe triggers message passing in the factor graph and results in an estimate of link loss probabilities. Then, the estimates from diﬀerent probes are combined, using standard methods [26], to obtain an estimate 1 − α ˆ j of the actual link loss rate 1 − αj for the jth link ej ∈ EI ∪ EV . The results are averaged over 100 simulations to eliminate possible random eﬀects, where each simulation has new loss rate assignments and new loss processes. 3.2 Simulation Results First, we investigate the inﬂuence of diﬀerent methods adopted by LA approach on the estimation accuracy, based on the graph with one source in Fig. 3.1(a). The type 2 modiﬁed path-link matrix M has 127 rows, and we apply the method of normal equations. This case is denoted by LA-NE. Alternatively, we use Algorithm 5 to build a square matrix M1 , where each row (path set) includes 1 or 2 paths. This case is denoted by LA-RS. Fig. 3.2 shows the RMSE of LA approach using the two methods, as a function of the number of probe batches. We observe that the RMSE of the LA-NE algorithm is lower than that of the LA-RS algorithm when n = 50. Such behavior is reasonable since the LA-NE algorithm uses more equations to obtain link loss rates than the LA-RS algorithm. However, the diﬀerence of the RMSE is less than 2% and it vanishes as the Chapter 3. Performance Evaluation 46 0.16 LA−NE, α ave 0.14 LA−NE, α = 0.85 LA−RS, α = 0.85 ave 0.12 ave LA−NE, αave = 0.9 0.1 RMSE = 0.8 LA−RS, αave = 0.8 LA−RS, α ave = 0.9 0.08 0.06 0.04 0.02 0 2 10 3 10 n 4 10 Figure 3.2: The RMSE of LA using the methods of normal equations (LA-NE) and row selection (LA-RS), versus the number of probe batches n. number of probes increases. Therefore, for large number of probes, the performance of the LA-RS algorithm and the LA-NE algorithm is similar while the LA-RS algorithm outperforms the LA-NE algorithm in terms of the computational complexity. Second, we compare the estimation accuracy of the BP algorithm and the LA-RS algorithm, based on the graph with one source in Fig. 3.1(a). Fig. 3.3 shows the RMSE as a function of the number of probe batches, for diﬀerent average link success rates. We observe that the BP algorithm has better accuracy when n < 400, and the LARS algorithm achieves better accuracy, after sending reasonably suﬃcient probe batches (n > 400). This is because the LA-RS algorithm exploits the losses on the diﬀerent combinations of paths, while the BP algorithm only utilizes the losses on paths. Fig. 3.4 Chapter 3. Performance Evaluation 47 0.16 BP, α ave 0.14 = 0.8 BP, αave = 0.85 BP, α ave 0.12 = 0.9 LA−RS, α ave LA−RS, αave = 0.85 0.1 RMSE = 0.8 LA−RS, α ave = 0.9 0.08 0.06 0.04 0.02 0 2 3 10 4 10 n 10 Figure 3.3: The RMSE of the BP algorithm and the LA-RS algorithm, versus the number of probe batches n, for diﬀerent average success rate αave . 0.12 BP LA−RS 0.1 RMSE 0.08 0.06 0.04 0.02 0 0.7 0.75 0.8 α 0.85 0.9 0.95 ave Figure 3.4: The RMSE of the BP algorithm and the LA-RS algorithm, versus the average success rate αave (n = 500). Chapter 3. Performance Evaluation 48 0.16 0.14 One source Two sources Three sources 0.12 RMSE 0.1 0.08 0.06 0.04 0.02 0 2 10 3 10 n 4 10 Figure 3.5: The RMSE of the LA-RS algorithm, versus the number of probe batches n, for diﬀerent number of sources (αave = 0.8). shows the RMSE as a function of the average link success rate with 500 probe batches. The RMSE decreases as the average link success rate increases, which is consistent with the relative position of the curves in Fig. 3.3. Based on these two graphs, we can predict that when average loss rates are lower than 0.8, the BP algorithm would perform worse (RMSE > 6%, n = 20, 000), while the LA-RS algorithm would still achieve satisfactory accuracy (RMSE < 1%, n = 20, 000). Third, for the networks with diﬀerent number of sources in Fig. 3.1, Fig. 3.5 shows the RMSE as a function of the number of probe batches, and Fig. 3.6 shows the RMSE as a function of the average success rate. We compare the relative position of the three curves in Figs. 3.5 and 3.6, and obtain the following observation: The graph with more Chapter 3. Performance Evaluation 49 0.08 One source Two sources Three sources 0.07 RMSE 0.06 0.05 0.04 0.03 0.02 0.01 0.7 0.75 0.8 αave 0.85 0.9 0.95 Figure 3.6: The RMSE of the LA-RS algorithm, versus the average success rate αave , for diﬀerent number of sources (n = 500). sources achieves better estimation accuracy. However, the improvement of estimation accuracy is negligible with relatively large success rates or suﬃcient probe batches. In this case, we can use a small number of sources and ﬂexibly choose their locations. Finally, we investigate the performance of the LA-RS algorithm in three larger networks generated by BRITE. We randomly pick a part of nodes as source nodes in each network for 10 times. We pick 4 source nodes for the 20-node network, and 20 source nodes for the 100-node and the 500-node network. The orientation algorithm is applied to obtain directed acyclic graphs. There are 68.25% identiﬁable links on average in the directed graph for the 20-node network (40 links), 63.7% identiﬁable links on average in the the directed graph for 100-node network (200 links), and 26.17% identiﬁable links Chapter 3. Performance Evaluation 50 0.14 500 nodes 100 nodes 20 nodes 0.12 RMSE 0.1 0.08 0.06 0.04 0.02 0 2 10 3 10 n 4 10 Figure 3.7: The RMSE of the LA-RS algorithm, versus the number of probe batches n, for networks of diﬀerent sizes (αave = 0.9). on average in the directed graph for the 500-node network (1, 000 links). Although the eﬀect of the number and location of sources on the accuracy can be negligible with relatively large success rates or suﬃcient probe batches, diﬀerent number and location of sources may result in diﬀerent number of identiﬁable and virtual links. Fig. 3.7 shows the RMSE as a function of the number of probe batches for the networks of diﬀerent sizes. As expected, the LA-RS algorithm still achieves satisfactory accuracy (RMSE < 1%, n = 20, 000), while more probe batches are needed to achieve the same accuracy in larger networks. 51 Chapter 4 Conclusions and Future Work 4.1 Conclusions In this thesis, we studied the problem of link loss tomography on mesh topologies using network coding. We ﬁrst provided an overview of the area of network tomography in communication networks. Accurate and eﬃcient measurement of network-internal characteristics is critical for management and maintenance of large-scale networks. Recently there have been attempts to apply network coding in loss tomography in order to increase bandwidth eﬃciency. We introduced diﬀerent inference techniques in the related works and explained the various performance bottlenecks such as (1) low bandwidth eﬃciency, (2) high monitoring cost, (3) estimation not being always accurate, and (4) requiring additional assumptions. Then, we proposed a linear algebraic network tomography (LANT) framework for active inference of link loss rates. We ﬁrst established a tight lower bound on the probe size for valid end-to-end observations when network coding is applied. Then, we developed algorithms to ﬁnd a valid probe coding scheme, such that the lower bound on probe size is always achieved. Furthermore, we proposed a linear algebraic (LA) approach and devel- Chapter 4. Conclusions and Future Work 52 oped consistent estimators of link loss rates. We also demonstrated that the complexity of LA using the method of row selection is lower than that using the method of normal equations. Using our LANT framework, the identiﬁability of a link is the necessary and suﬃcient condition for its consistent loss estimation. We investigated the performance of the LANT framework under diﬀerent simulation scenarios. Results showed that, for large number of probes, the performance of the LA-RS algorithm and the LA-NE algorithm is similar while the LA-RS algorithm outperforms the LA-NE algorithm in terms of the computational complexity. Moreover, the LA-RS algorithm achieves better estimation accuracy than the belief propagation (BP) algorithm when the estimators converge. Although the eﬀect of the number and location of sources on the accuracy can be negligible with relatively large success rates or suﬃcient probe batches, diﬀerent number and location of sources may result in diﬀerent number of identiﬁable and virtual links. 4.2 Future Work One possible extension of this work is to minimize the number of nodes performing network coding. In the current work, we choose all non-receiver nodes with multiple incoming links as the nodes that perform network coding. In this way, we can easily obtain a valid coding scheme. One possible solution is to choose the nodes next to the source nodes, since these nodes performing network coding can suﬃciently distinguish diﬀerent information ﬂows, such that the intermediate nodes do not need to perform Chapter 4. Conclusions and Future Work 53 network coding. Another extension is design an algorithm to choose the number and location of the source and receiver nodes, such that all the links in the given network are identiﬁable. In the current work, we use the orientation algorithm [24] with selected sources to ﬁnd a directed acyclic graph. Using this algorithm, we cannot control the number and location of the receiver nodes. To solve this problem, a new algorithm to ﬁnd a directed acyclic graph with selected sources and receivers is necessary. 54 Bibliography [1] R. Castro, M. Coates, G. Liang, R. Nowak, and B. Yu, “Network tomography: Recent developments,” Statistical Science, vol. 19, no. 3, pp. 499–517, Aug. 2004. [2] Y. Yardi, “Network tomography: Estimating source-destination traﬃc intensities from link data,” J. Amer. Statist. Assoc., vol. 91, no. 433, pp. 365–377, March 1996. [3] Y. Tsang, M. Coates, and R. Nowak, “Network delay tomography,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 51, no. 8, pp. 2125–2136, Aug. 2003. [4] A. Chen, J. Cao, and T. Bu, “Network tomography: Identiﬁability and Fourier domain estimation,” in Proc. of IEEE INFOCOM, Anchorage, AK, May 2007. [5] G. Sharma, S. Jaggi, and B. Dey, “Network tomography via network coding,” in Proc. of Information Theory and Applications Workshop, San Diego, CA, Jan. 2008. [6] P. Sattari, A. Markopoulou, and C. Fragouli, “Multiple source multiple destination topology inference using network coding,” in Proc. of NetCod Workshop, Lausanne, Switzerland, June 2009. Bibliography 55 [7] R. Caceres, N. Duﬃeld, J. Horowitz, and D. Towsley, “Multicast-based inference of network-internal loss characteristics,” IEEE Trans. Inform. Theory, vol. 45, no. 7, pp. 2462–2480, Nov. 1999. [8] N. Duﬃeld, J. Horowitz, F. LoPresti, and D. Towsley, “Multicast topology inference from measured end-to-end loss,” IEEE Trans. Inform. Theory, vol. 48, no. 1, pp. 26–45, Jan. 2002. [9] N. Duﬃeld, “Network tomography of binary network performance characteristics,” IEEE Trans. Inform. Theory, vol. 52, no. 12, pp. 5373–5388, Dec. 2006. [10] Y. Tsang, M. Coates, and R. Nowak, “Passive unicast network tomography using EM algorithm,” in Proc. of IEEE Int’l Conf. Acoust., Speech, Signal Processing, Salt Lake City, UT, May 2001. [11] J. Zhao, R. Govindan, and D. Estrin, “Computing aggregates for monitoring wireless sensor networks,” in Proc. of IEEE Int’l Workshop on Sensor Network Protocols and Applications, Anchorage, AK, May 2003. [12] H. Nguyen and P. Thiran, “Using end-to-end data to infer lossy links in sensor networks,” in Proc. of IEEE INFOCOM, Barcelona, Spain, Apr. 2006. [13] Y. Lin, B. Liang, and B. Li, “Passive loss inference in wireless sensor networks based on network coding,” in Proc. of IEEE INFOCOM, Rio de Janeiro, Brazil, Apr. 2009. Bibliography 56 [14] R. Ahlswede, N. Cai, S. Li, and R. Yeung, “Network information ﬂow,” IEEE Trans. Inform. Theory, vol. 46, no. 4, pp. 1204–1216, July 2000. [15] R. Koetter and M. Medard, “Beyond routing: An algebraic approach to network coding,” in Proc. of IEEE INFOCOM, New York, NY, Nov. 2002. [16] T. Ho and D. Lun, Network Coding: An Introduction. Cambridge University Press, 2008. [17] C. Fragouli, J. LeBoudec, and J. Widmer, “Network coding: An instant primer,” ACM SIGCOMM Computer Communication Review, vol. 36, no. 1, pp. 63–68, Jan. 2006. [18] C. Fragouli, A. Markopoulou, R. Srinivasan, and S. Diggavi, “Network monitoring: It depends on your points of view,” in Proc. of Information Theory and Applications Workshop, San Diego, CA, Jan. 2007. [19] W. Zhu and Z. Geng, “A bottom-up inference of loss rate,” Computer Communications, vol. 28, no. 4, pp. 351–365, Mar. 2005. [20] N. Duﬃeld, F. LoPresti, V. Paxson, and D. Towsley, “Inferring link loss using striped unicast probes,” in Proc. of IEEE INFOCOM, Anchorage, AK, Apr. 2001. [21] V. Padmanabhan, L. Qiu, and H. Wang, “Passive network tomography using bayesian inference,” in Proc. of the 2nd ACM SIGCOMM Workshop on Internet Measurement, Marseille, France, Nov. 2002. Bibliography 57 [22] T. Bu, N. Duﬃeld, F. LoPresti, and D. Towsley, “Network tomography on general topologies,” in Proc. of ACM SIGMETRICS, Marina Del Rey, CA, June 2002. [23] C. Fragouli and A. Markopoulou, “A network coding approach to network tomography,” in Proc. of 43rd Allerton Conf., Monticello, IL, Sept. 2005. [24] M. Gjoka, C. Fragouli, P. Sattari, and A. Markopoulou, “Loss tomography in general topologies with network coding,” in Proc. of IEEE GLOBECOM, Washington, DC, Nov. 2007. [25] H. Yao, S. Jaggi, and M. Chen, “Network coding tomography for network failures,” in Proc. of IEEE Infocom (Mini-Conference), San Diego, CA, March 2010. [26] Y. Mao, F. Kschischang, B. Li, and S. Pasupathy, “A factor graph approach to link loss monitoring in wireless sensor networks,” IEEE J. Select. Areas Commun., vol. 23, no. 4, pp. 820–829, Apr. 2005. [27] F. Kschischang, B. Frey, and H. Loeliger, “Factor graphs and the sum-product algorithm,” IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 498–519, Feb. 2001. [28] Y. Zhao, Y. Chen, and D. Bindel, “Towards unbiased end-to-end network diagnosis,” IEEE/ACM Trans. Networking, vol. 17, no. 5, pp. 1724–1737, Dec. 2009. [29] B. Sun and Z. Zhang, “Probabilistic diagnosis of link loss using end-to-end path measurements and maximum likelihood estimation,” in Proc. of IEEE ICC, Dresden, Germany, June 2009. Bibliography 58 [30] Y. Chen, D. Bindel, H. Song, and R. Katz, “Algebra-based scalable overlay network monitoring: Algorithms, evaluation, and applications,” IEEE/ACM Trans. Networking, vol. 15, no. 5, pp. 1084–1097, Oct. 2007. [31] M. Coates and R. Nowak, “Network loss inference using unicast end-to-end measurement,” in Proc. of ITC Seminar on IP Traﬃc, Measurement and Modelling, Monterey, CA, Sept. 2000. [32] H. Su, W. Chen, S. Lin, D. Jin, and L. Zeng, “The inference of link loss rates with internal monitors,” in Proc. of IEEE GLOBECOM, New Orleans, LA, Dec. 2008. [33] J. Gui, V. Shah-Mansouri, and V. Wong, “A linear algebraic approach for loss tomography in mesh topologies using network coding,” in Proc. of IEEE ICC, Cape Town, South Africa, May. 2010. [34] ——, “Accurate and eﬃcient network tomography via network coding,” submitted to IEEE Trans. on Vehicular Technology, 2010. [35] R. Myers, D. Montgomery, and G. Vining, Generalized Linear Models: With Applications in Engineering and the Sciences. John Wiley & Sons, Inc., 2001. [36] G. Grimmett and D. Stirzaker, Probability and Random Processes, 3rd ed. Oxford University Press, 2001. [37] G. Golub and C. Loan, Matrix Computations, 3rd ed. The Johns Hopkins University Press, 1996. Bibliography [38] “Internet2 network,” http://www.internet2.edu/network/. [39] “BRITE,” www.cs.bu.edu/brite/. 59
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Accurate and efficient network monitoring on mesh topologies...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Accurate and efficient network monitoring on mesh topologies via network coding Gui, Jiaqi 2010
pdf
Page Metadata
Item Metadata
Title | Accurate and efficient network monitoring on mesh topologies via network coding |
Creator |
Gui, Jiaqi |
Publisher | University of British Columbia |
Date Issued | 2010 |
Description | Accurate and efficient measurement of network-internal characteristics is critical for management and maintenance of large-scale networks. In this thesis, we propose a linear algebraic network tomography (LANT) framework for active inference of link loss rates on mesh topologies via network coding. Probe packets are transmitted from the sources to the destinations along a set of paths. Intermediate nodes linearly combine the received probes and transmit the coded probes using pre-determined coding coefficients. Although a smaller probe size can reduce the bandwidth usage of the network, the inference framework is not valid if the probe size falls below a certain threshold. To this end, we establish a tight lower bound on probe size which is necessary for establishing the mappings between the contents of the received probes and the losses on the different sets of paths. Then, we develop algorithms to find the coding coefficients such that the lower bound on probe size is achieved. Furthermore, we propose a linear algebraic approach to developing consistent estimators of link loss rates, which converge to the actual loss rates as the number of probes increases. We show that using the LANT framework, the identifiability of a link, which only depends on the network topology, is a necessary and sufficient condition for the consistent estimation of its loss rate. Simulation results show that the LANT framework achieves better estimation accuracy than the belief propagation (BP) algorithm for large number of probe packets. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-06-21 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0071010 |
URI | http://hdl.handle.net/2429/25924 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
Graduation Date | 2010-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 24-ubc_2010_fall_gui_jiaqi.pdf [ 553.48kB ]
- Metadata
- JSON: 24-1.0071010.json
- JSON-LD: 24-1.0071010-ld.json
- RDF/XML (Pretty): 24-1.0071010-rdf.xml
- RDF/JSON: 24-1.0071010-rdf.json
- Turtle: 24-1.0071010-turtle.txt
- N-Triples: 24-1.0071010-rdf-ntriples.txt
- Original Record: 24-1.0071010-source.json
- Full Text
- 24-1.0071010-fulltext.txt
- Citation
- 24-1.0071010.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0071010/manifest