ACCURATE AND EFFICIENT NETWORK MONITORING ON MESH TOPOLOGIES VIA NETWORK CODING by JIAQI GUI B.E., Information Engineering, Shanghai Jiao Tong University, China, 2008 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) June 2010 © Jiaqi Gui, 2010 ii Abstract Accurate and ecient measurement of network-internal characteristics is critical for man- agement and maintenance of large-scale networks. In this thesis, we propose a linear algebraic network tomography (LANT) framework for active inference of link loss rates on mesh topologies via network coding. Probe packets are transmitted from the sources to the destinations along a set of paths. Intermediate nodes linearly combine the re- ceived probes and transmit the coded probes using pre-determined coding coecients. Although a smaller probe size can reduce the bandwidth usage of the network, the infer- ence framework is not valid if the probe size falls below a certain threshold. To this end, we establish a tight lower bound on probe size which is necessary for establishing the mappings between the contents of the received probes and the losses on the dierent sets of paths. Then, we develop algorithms to nd the coding coecients such that the lower bound on probe size is achieved. Furthermore, we propose a linear algebraic approach to developing consistent estimators of link loss rates, which converge to the actual loss rates as the number of probes increases. We show that using the LANT framework, the identiability of a link, which only depends on the network topology, is a necessary and sucient condition for the consistent estimation of its loss rate. Simulation results show Abstract iii that the LANT framework achieves better estimation accuracy than the belief propaga- tion (BP) algorithm for large number of probe packets. iv Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Network Coding Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Motivations and Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.5 Structure of the Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2 The LANT Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Contents v 2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2 Probe Coding Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.1 Lower Bound on Probe Size . . . . . . . . . . . . . . . . . . . . . 22 2.2.2 Algorithms for Finding a Valid Probe Coding Scheme . . . . . . . 25 Constructing Auxiliary Trees . . . . . . . . . . . . . . . . . . . . 26 Selecting Coding Coecients . . . . . . . . . . . . . . . . . . . . . 26 Designing Probe Packets . . . . . . . . . . . . . . . . . . . . . . . 28 2.3 Linear Algebraic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.1 Least-squares Solutions . . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.2 Method of Normal Equations . . . . . . . . . . . . . . . . . . . . 39 2.3.3 Method of Row Selection . . . . . . . . . . . . . . . . . . . . . . . 40 3 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 vi List of Figures 1.1 The Butter y network. Sources s1 and s2 multicast information x1 and x2 to receivers r1 and r2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 A directed acyclic graph with V = fs; r; 1; 2; 3; 4g and E = fe1; e2; : : : ; e7g. The set of monitored end-to-end paths P = fP1; P2; P3g, where P1 = fe1; e2; e5; e7g, P2 = fe1; e2; e4; e6; e7g, and P3 = fe1; e3; e6; e7g. For link e2 = (1; 2), we have P(e2) = fP1; P2g. . . . . . . . . . . . . . . . . . . . . 18 2.2 A directed acyclic graph with two end links e6 and e7. . . . . . . . . . . . 25 2.3 Two auxiliary trees Te6 and Te7 , corresponding to end links e6 and e7, respectively. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1 Directed acyclic graphs with dierent number of sources. (a) One source (node 1) and one receiver (node 9); (b) two sources (nodes 1 and 9) and one receiver (node 7); (c) three sources (nodes 1, 4 and 10) and one receiver (node 9). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2 The RMSE of LA using the methods of normal equations (LA-NE) and row selection (LA-RS), versus the number of probe batches n. . . . . . . 46 List of Figures vii 3.3 The RMSE of the BP algorithm and the LA-RS algorithm, versus the number of probe batches n, for dierent average success rate ave. . . . . 47 3.4 The RMSE of the BP algorithm and the LA-RS algorithm, versus the average success rate ave (n = 500). . . . . . . . . . . . . . . . . . . . . . 47 3.5 The RMSE of the LA-RS algorithm, versus the number of probe batches n, for dierent number of sources (ave = 0:8). . . . . . . . . . . . . . . . 48 3.6 The RMSE of the LA-RS algorithm, versus the average success rate ave, for dierent number of sources (n = 500). . . . . . . . . . . . . . . . . . . 49 3.7 The RMSE of the LA-RS algorithm, versus the number of probe batches n, for networks of dierent sizes (ave = 0:9). . . . . . . . . . . . . . . . . 50 viii List of Acronyms QoS Quality of Service MVWA Minimum Variance Weighted Average EM Expectation-Maximization LEND Least-Biased End-to-end Network Diagnosis MILS Minimal Identiable Link Sequence LPRs Link Pass Ratios PPRs Path Pass Ratios LA Linear Algebra LANT Linear Algebraic Network Tomography BP Belief Propagation ML Maximum Likelihood RMSE Root Mean Square Error List of Acronyms ix NE Normal Equations RS Row Selection xList of Symbols j j Cardinality of a set n Number of probe batches V Set of nodes S Set of source nodes R Set of receiver nodes E Set of links EI Set of identiable links EV Set of virtual links ER Set of end links P Set of monitored end-to-end paths P(e) Set of end-to-end paths that include link e P Power set of P , jPj = 2jPj List of Symbols xi G = (V ; E) Directed acyclic graph Ge Subgraph consisting of the links and nodes in set P(e) G Set of subgraphs with overlapping links Te Auxiliary tree corresponding to end link e and subgraph Ge Le Set of leaf nodes in Te ui(v) The ith tree node that corresponds to the non-receiver node v in the directed graph "i(e) The ith tree link that corresponds to link e in the directed graph ` Probe size, number of bits in each probe packet q = 2` Alphabet size of a probe packet M = (mi;j)jPjjEj Path-link matrix M = (mi;j)jPjjEI[EV j Type 1 modied path-link matrix fM = (emi;j)(jPj1)jEI[EV j Type 2 modied path-link matrix M(jPj) Auxiliary matrix of type 2 modied path-link matrix with path set P j Actual link success rate of jth link in EI [ EV List of Symbols xii i Actual path success rate of ith path in P i Actual path set success rate of ith path set in P n f?g a = (aj)jEI[EV j1 Column vector, where aj = logj b = (bi)jPj1 Column vector, where bi = log i c = (ci)(jPj1)1 Column vector, where ci = log i = jPj 1 Number of rows in fM = jEI [ EV j Number of columns in fM fM1 Reduced path-link matrix from fM after row selection c1 Reduced 1 column vector from c after row selection xiii Acknowledgements First, I would like to express my deepest gratitude to my supervisor, Dr. Vincent Wong, for his patient guidance, constant encouragement, and excellent advice throughout my graduate study. Through our weekly meeting, we generate new ideas, stimulate creativity and dig into technology details. Without his invaluable help, this work would not be possible. A special thanks goes to my colleague, Vahid Shah-Mansouri, who provided me with precious assistance during the completion of this thesis. I am also thankful to all my colleagues in Dr. Wong's group: Amir Hamed Mohsenian-Rad, Vahid Shah-Mansouri, Man Hon Cheung, Keivan Ronasi, Ehsan Vahedi, Wei Bao, Xiaolei Hao, Wenbo Shi, and Jinbiao Xu as well as other friends in the data communications group, for sharing their experiences and knowledge during the time of my study. Finally, I take this opportunity to express my profound gratitude to my beloved parents for their understanding, support, and endless love during my study in Canada. To them I dedicate this thesis. This research is supported by the Natural Sciences and Engineering Research Council (NSERC) of Canada under grant number STPSC 356767. 1Chapter 1 Introduction Accurate and ecient measurement of network-internal characteristics is critical for man- agement and maintenance of large-scale networks. With accurate and timely performance estimates, more ecient trac control protocols and dynamic routing algorithms can be designed. Quality-of-service (QoS) guarantees can be achieved if the available band- width can be gauged; the resulting service level agreements can be veried. Detecting anomalous or malicious behavior becomes a more achievable task [1]. Although network administrators can monitor local trac conditions and detect con- gestion points in small-scale networks, most networks are not completely isolated. The user-perceived performance of a network thus depends heavily on the performance of the internetwork. The traditional approach for characterizing network performance is based on detailed queueing models at the individual router level, which requires access to a wide range of routers to obtain link-level statistics. However, the routers are operated by dier- ent companies or service providers, which makes it dicult to collect detailed information at individual devices. Alternatively, we can make useful end-to-end measurements that do not require widely cooperation from internal network devices. Subsequently, based on the end-to-end measurements, we can apply inference techniques to extract the hidden Chapter 1. Introduction 2 information of interest. Broadly speaking, large-scale network inference involves estimating network perfor- mance parameters based on trac measurements at a limited subset of the nodes. Vardi was one of the rst researchers to rigorously study this type of problem and he coined the term network tomography [2] due to the similarity between the inference of network characteristics and medical tomography. Three forms of network tomography have been addressed in the recent literature: (1) link-level parameter estimation based on end-to- end, path-level trac measurements [3, 4], (2) sender-receiver path-level trac intensity estimation based on link-level trac measurements, and (3) the inference of network topology [5, 6]. Characterizing these parameters is critical for detecting congestion, faults and other anomalous behavior, ensuring compliance of service-level agreements with users, and management of overlay networks. In link-level parameter estimation, the end-to-end measurements usually consist of the number of probe or data packets transmitted and received between the source and the receiver nodes or the delay between packet transmissions and receptions. The objective is to estimate the loss rate or the queuing delay of each link. Dropped packets on a link are usually due to overload of the nite output buer of one of the routers encountered when traversing the link, but may also be caused by equipment downtime due to maintenance or power failures. The end-to-end delay is due to both propagation delay processing delay, queuing delay, and transmission delay. As assumed by most literature, occurrences of dropped packets and queuing delay is inherently random. Chapter 1. Introduction 3 In path-level trac intensity estimation, the measurements consist of the number of probe or data packets that transmitted through nodes in the network. In privately owned networks, the collection of such measurements is relatively straightforward. Based on these measurements, the goal is to estimate how much trac originated from a specied node and was transmitted to a specied destination. The combination of the trac intensities of all the origin-destination pairs forms the origin-destination trac matrix. Both the node-level measurements and the parameter to be estimated are inherently random. In the inference of network topology, the measurements usually consist of the number of probe or data packets transmitted and received between the source and the receiver nodes or the delay between packet transmissions and receptions. Some proposals require clock synchronization while other more practical ones do not. The physical network topology can be represented as a directed graph, where each vertex represents a physical device such as a router or a switch and the edges correspond to the communication links between those devices. Based on the end-to-end measurements, the logical topology can be determined. Network tomography can be performed either in an active or passive manner. Active network tomography refers to the case where probe packets are sent from the sources to the receivers located on the periphery of the network [7{9]. Using the end-to-end mea- surements generated by probe packets transmitted and received between the source and the receiver nodes, more informative and reliable path-level measurements are provided Chapter 1. Introduction 4 at the cost of utilizing additional network resources such as bandwidth and energy. On the contrary, passive network tomography reveals information from the existing data trac, so that it is more attractive for networks (e.g., wireless sensor networks) with limited power supply and bandwidth constraints [10{13]. There is also an accelerating trend toward network security that will create a highly uncooperative environment for active tomography. For example, rewalls designed to protect information may not allow requests for routing information, special packet handling and other network transport protocols required by many active tomography techniques. This has prompted investi- gations into passive-based trac monitoring techniques. 1.1 Network Coding Basics Networked systems arise in various communication contexts such as telephone networks, the public Internet, peer-to-peer networks, ad-hoc wireless networks, and wireless sensor networks. An inherent premise behind the operation of all communication networks lies in the way information is treated. Recently, with the advent of network coding, the sim- ple but important observation was made that in communication networks, intermediate nodes are allowed not only forward but also process the incoming independent informa- tion ows [14{16]. At the network layer, for example, intermediate nodes can perform binary addition of independent bitstreams, whereas, at the physical layer of optical net- works, intermediate nodes can superimpose incoming optical signals. In other words, data streams that are independently produced and consumed do not necessarily need to Chapter 1. Introduction 5 (a) Routing to r1 (b) Routing to r2 (c) Network Coding Figure 1.1: The Butter y network. Sources s1 and s2 multicast information x1 and x2 to receivers r1 and r2. be kept separate when they are transported throughout the network. Combining inde- pendent data streams can better tailor the information ow to the network environment and accommodate the demands of specic trac patterns. This shift in paradigm is expected to revolutionize the way we manage, operate, and understand the organization in networks, as well as to have a deep impact on a wide range of areas such as reliable delivery, resource sharing, ecient ow control, network monitoring, and security [17]. One essential benet of network coding is in terms of throughput when multicasting. The following simple example from [14] illustrates the basic concepts in network coding and gives a preliminary idea of the expected benets and challenges. Example 1: Fig. 1.1 depicts a communication network represented as a directed graph where vertices correspond to terminals and edges correspond to channels. This Chapter 1. Introduction 6 example is commonly known in the network coding literature as the butter y network. Assume we can send one bit per time slot through each channel. We have two sources s1 and s2, and two receivers r1 and r2. Each source produces one bit per time slot which we denote by x1 and x2, respectively. If receiver r1 uses all the network resources by itself, it could receive information from both sources. Indeed, we could route the bit x1 from source s1 along the path fADg and the bit x2 from source s2 along the path fBC;CE;EDg, as depicted in Fig. 1.1(a). Similarly, if the second receiver r2 uses all the network resources by itself, it could also receive information from both sources. Indeed, we could route the bit x1 from source s1 along the path fAC;CE;EFg, and the bit x2 from source s2 along the path fBFg as depicted in Fig. 1.1(b). Now assume that both receivers want to simultaneously receive the information from both sources. We then have a contention for the use of edge CE, since we assume that each channel can only transmit one bit per time slot. Traditionally, information ow was treated like uid through pipes, and independent information ows were kept separate. The simple but important observation made in the work by Ahlswede et al. is that we can allow intermediate nodes in the network to process their incoming information ows, and not just forward them. In Fig. 1.1(c), node C can take bits x1 and x2 and XOR them to create a third bit x3 = x1 + x2 which it can then send through edge CE (the XOR operation corresponds to addition over the binary eld). r1 receives fx1; x1 + x2g, and can solve this system of equations to retrieve x2 by XORing x1 and x1+x2. Similarly, r2 receives fx2; x1 + x2g, and can solve this system of equations to retrieve x1 by XORing Chapter 1. Introduction 7 x2 and x1 + x2. The previous example shows that if we allow intermediate node in the network to combine information streams and extract the information at the receivers, we can increase the throughput when multicasting. Network coding can also be used to infer the loss rates of links in an overlay network. For conventional active tomography, packets are usually multicast to several receivers. After a suciently large number of probe packets, shared links and their loss rates can be inferred with reasonable accuracy. In such a setting, network coding can provide additional exibility since probe packets are not only duplicated at branching points of the multicast tree, but may also be merged. If multiple senders transmit packets to a single receiver, and these packets are combined within the network, it allows to infer network parameters in much the same way as multicasting from one sender to multiple receivers [18]. Furthermore, if the network coding coecients (i.e., the specic way in which packets are combined at the nodes) are known in advance, the received coded probe packets can provide additional information about which packets were lost in which part of the tree. By exploiting these features, it is possible to signicantly reduce the number of active probes and the bandwidth usage generated by these probes, and thus increase bandwidth eciency. Chapter 1. Introduction 8 1.2 Related Work In the area of loss tomography, extensive studies have been given to multicast tree topolo- gies. In [7], Caceres et al. developed a maximum-likelihood estimator for loss rates on internal links based on losses observed by multicast receivers. It exploits the inherent correlation between such observations to infer the performance of paths between branch points in the tree spanning a multicast source and its receivers. The proposed method relies on the iterative approximation to identify the parameters that requires a long exe- cution time. In addition, the parameters identied by this method may not be the true values of those parameters since the iterative procedure may trap into a local maximum. In [19], Zhu et al. proposed an estimate that is based on the correlation between a link and its sibling links to identify the loss rate of the link. The proposed method, in- stead of using an iterative approach to approximate the maximum, employs a bottom-up approach to identify the loss rates of the links of a network. In contrast to multicast techniques, unicast inference based on multicast tree topolo- gies can easily be performed on most networks. In [20], Dueld et al. designed exper- iments based on the notion of transmitting stripes of packets (with no delay between transmission of successive packets within a stripe) to two or more receivers. The pur- pose of these stripes is to ensure that the correlation in receiver observations matches as closely as possible what would have been observed if the stripe had been replaced by a multicast probe that followed the same paths to the receivers. In [10], Tsang et al. designed a measurement procedure for network loss inference based on end-to-end packet Chapter 1. Introduction 9 pair measurements. Back-to-back packet pairs are the two packets that are sent one after the other by the source, possible destined for dierent receivers, but sharing a common set of links in their paths. In [21], Padmanabhan et al. investigated the problem of identifying lossy links in the interior of the Internet by passively observing the end-to-end performance of existing trac between a server and its clients. The key advantage of a passive approach is that it does not introduce additional trac which might perturb the object of inference, i.e., the link loss rates. The techniques depend only on knowing the number of lost and successful packets sent to each client. While the accuracy of link loss rate inference may conse- quently suer, the techniques can still pinpoint the trouble spots in the network (e.g., highly lossy links). They developed and evaluated three techniques for passive network tomography: random sampling, linear optimization, and Bayesian inference using Gibbs sampling. To extend the existing multicast and unicast tomography approaches to general topologies, in [22], Bu et al. proposed an approach using multiple trees to cover a mesh topology and combine the inferred loss rates. They further proposed two algorithms to perform the link-level inferences. One, the minimum variance weighted average (MVWA) algorithm treats the trees separately and then average the results. The second, based on expectation-maximization (EM) merges all of the measurements into one computation. However, this approach may have low bandwidth eciency, since those links that are part of multiple trees would be traversed by multiple probe packets in each time slot, and thus Chapter 1. Introduction 10 create additional trac. In addition, it may incur high monitoring cost, since it requires a large number of receivers to be deployed in each multicast tree. The pioneering work in [14] showed that for multicast networks, if intermediate nodes can perform network coding, that is to perform simple local XOR-operations on incoming packets, one can achieve the min-cut throughput of the network to each receiver. Recent studies show that applying network coding in loss tomography can increase bandwidth eciency [17, 23]. In a network which is capable of performing network coding in addi- tion to multicast, the intermediate nodes linearly combine incoming probe packets and forward the coded probe packets to the outgoing links according to pre-determined cod- ing coecients. Results in [18] show that for active monitoring using network coding, appropriate selection of the number and location of sources and receivers can aect the accuracy of estimation in general tree topologies. The work in [24] established a frame- work for loss tomography on mesh topologies. An orientation algorithm is proposed to nd a directed acyclic graph from an undirected graph with selected sources. An example is illustrated in [18] such that each link in a mesh topology can be traversed in each time slot by exactly one probe. In contrast to network coding with pre-determined coding coecients, randomized network coding changes the fundamental connection between path and link loss prob- abilities such that new inference algorithms need to be developed. In [13], Lin et al. studied the loss inference problem in sensor networks with randomized network coding. As end-to-end data are not sucient to compute link loss rates precisely, they proposed Chapter 1. Introduction 11 inference algorithms based on Bayesian principles to discover the set of highly lossy links in sensor networks. The algorithms achieve high detection and low false-positive rates in extensive simulations. In [25], Yao et al. studied passive network tomography in the presence of network failures, under the setting of random linear network coding. Sev- eral sets of algorithms for topology estimation and failure detection are proposed under various setting of adversarial random failures. To reduce monitoring cost, a set of end-to-end paths on mesh topologies only requires a limited number of sources and receivers. In [26], Mao et al. proposed a brief propagation (BP) algorithm, which is combined with the use of network coding in [24]. The BP algorithm is a low complexity algorithmic framework for link loss monitoring based on the recent modeling and computational methodology of factor graphs [27]. It iteratively updates the estimates of link losses upon receiving (or detecting the loss of) recently sent packets. The algorithm exhibits good performance and scalability, and can be easily adapted to dierent statistical models of networking scenarios. In particular, due to its low complexity, the algorithm is particularly suitable as a long-term monitoring facility. In [28], Zhao et al. proposed a least-biased end-to-end network diagnosis (LEND) system for inferring link-level properties like loss rate. They dened a minimal identiable link sequence (MILS) as a link sequence of minimal length whose properties can be uniquely identied from end-to-end measurements. They designed ecient algorithms to nd all the MILSs and infer their loss rates for diagnosis. The LEND system works for any network topology and for both directed and undirected topologies. It gives highly Chapter 1. Introduction 12 accurate estimates of the loss rates of MILSs and such diagnosis can be achieved with ne granularity and in near real-time even for reasonably large overlay networks. The LEND system can also supplement existing statistical inference approaches and provide smooth tradeo between diagnosis accuracy and granularity. In [29], Sun et al. focused on the problem of nding the link pass ratios (LPRs) when the path pass ratios (PPRs) of a set of paths are given. They proved the problem of nding the maximum-likelihood estimation of LPRs given PPRs is NP-hard, and then proposed a simple algorithm based on divide-and-conquer. It rst estimates the number of faulty links on a path, then uses the global information to estimate assign LPRs to the links. It requires a priori probability distribution function on link loss rates and an assumption that the majority of links being lossless. In [30], Chen et al. focused on overlay network monitoring, which enables distributed Internet applications to detect and recover from path outages and periods of degraded performance within seconds. For an overlay network with n hosts, existing systems either require O(n2) measurements, and thus lack scalability, or can only estimate the latency but not congestion or failures. They proposed an algebraic approach that selectively monitors k linearly independent paths that can fully describe all the O(n2) paths. The loss rates and latency of these k paths can be used to estimate the loss rates and latency of all other paths. Chapter 1. Introduction 13 1.3 Motivations and Objective In this thesis, we consider the problem of link loss tomography on mesh topologies. Although there are extensive studies of link loss inference on multicast tree topologies [7, 19, 31, 32], loss tomography on mesh topologies is still a challenging problem. Existing approaches have not exploited the inherent information in the end-to-end observations. As a result, the linear system of link loss rates and path loss rates usually has a coecient matrix with decient column rank1, which makes it dicult to accurately infer the link loss rates [30]. In general, most of the previously proposed loss tomography approaches on mesh topologies in the literature have one or more of the following performance bottlenecks: (1) low bandwidth eciency, (2) high monitoring cost, (3) estimation not being always accurate, and (4) requiring additional assumptions. In this thesis, we propose a linear algebraic network tomography (LANT) framework for active inference of link loss rates on mesh topologies. To increase bandwidth eciency and reduce monitoring cost, we send probe packets along a set of end-to-end paths rather than multicast trees and apply network coding. To increase the estimation accuracy, we exploit the inherent correlation between the losses on the links and those on the dierent sets of paths, which is captured through network coding. We refer to probe packets and network coding schemes jointly 1The column rank of an m n matrix is the maximum number of linearly independent columns of the matrix. If the matrix has rank n, then it has full column rank; otherwise, the matrix has decient column rank. Chapter 1. Introduction 14 as probe coding schemes. In our LANT framework, a valid probe coding scheme enables us to establish the mappings between the contents of received probe packets and the losses on the dierent sets of paths. Using valid probe coding schemes, we obtain valid end-to-end observations, based on which we can distinguish which paths have successfully transmitted a probe and which paths have not. We also dene link identiability, a link property that only depends on the network topology. For identiable links, we develop consistent estimators that converge to the actual loss rates as the number of probes increases. Since the number of all path sets can grow exponentially as the number of total paths increases, we selectively monitor a subset of all path sets (the method of row selection), which are sucient to infer the loss rates of all identiable links. 1.4 Contributions The main contributions of this thesis are as follows [33, 34]: We establish a tight lower bound on probe size, which is necessary for valid probe coding schemes when network coding is applied. Then, we develop algorithms to nd a valid probe coding scheme such that the lower bound on probe size is achieved. We propose a linear algebraic (LA) approach to developing consistent estimators of link loss rates, which converge to the actual loss rates as the number of probes increases. We combine the methods of normal equations and row selection with the Chapter 1. Introduction 15 LA approach, and analyze the computational complexity. We prove that the identiability of a link, which only depends on the network topology, is a necessary and sucient condition for the consistent estimation of its loss rate, using the LANT framework. Simulation results show that the LA approach using the method of row selection can eectively decrease computational complexity without reducing estimation ac- curacy. Besides, the LA approach achieves better estimation accuracy than the BP algorithm, when the estimators converge. Although the eect of the number and location of sources on the accuracy can be negligible with relatively large suc- cess rates or sucient probe batches, dierent number and location of sources may result in dierent number of identiable links. The framework we present in this thesis is unique when compared to the prior work done in the area of loss tomography using network coding. In terms of bandwidth e- ciency, the work in [24] establishes a loose lower bound on probe size for valid probe coding schemes, while the problem of nding coding coecients remains unexplored. Without ecient algorithms to nd coding coecients, the inference framework is incomplete and cannot be applied. We establish a tight lower bound and also develop algorithms to nd a valid probe coding scheme such that the lower bound on probe size is achieved. In terms of inference approaches, the BP algorithm [26] only uses the information of the losses on dierent paths such that the estimation may not be accurate for networks with Chapter 1. Introduction 16 relatively high link loss rates. The work in [29] requires additional assumptions such as a priori probability distribution function and the majority of links being lossless. The work in [28] only nd the loss rate of a minimal identiable link sequence. In contrast, our LA approach does not need extra assumptions while it can still obtain additional useful information, the losses on the dierent sets of paths. This information can only be obtained via probe coding schemes and cannot be achieved by routing probes in general. As a result, we obtain better estimation accuracy and obtain more identiable links. 1.5 Structure of the Thesis The rest of this thesis is organized as follows. In Chapter 2, we present the LANT framework, including system model, probe coding schemes, and an LA approach for the consistent estimation of link loss rates. Chapter 3 presents performance evaluation. Conclusions are given in Chapter 4. 17 Chapter 2 The LANT Framework In this chapter, we present we present the LANT framework, including system model, probe coding schemes, and an LA approach for the consistent estimation of link loss rates. 2.1 System Model We model the network as a directed acyclic graph G = (V ; E), consisting of a set of nodes V and a set of links E . The node set V includes routers and periphery devices where probe packets are sent and received. A link e = (v; v0) 2 E denotes a directed communication link from node v to node v0. Let S and R denote the set of source nodes and the set of receiver nodes, respectively. The set of monitored end-to-end paths is denoted by P . A path P 2 P is a set of directed links from a source to a receiver. Let P(e) denote the set of paths that include link e. We dene a path-link matrix M = (mi;j)jPjjEj, whose jPj rows correspond to the jPj paths and the jEj columns correspond to the jEj links, as follows: The element mi;j is equal to 1 if the ith path in set P includes the jth link in set E , and is equal to 0 otherwise. As an example, the directed acyclic graph in Fig. 2.1 has Chapter 2. The LANT Framework 18 Figure 2.1: A directed acyclic graph with V = fs; r; 1; 2; 3; 4g and E = fe1; e2; : : : ; e7g. The set of monitored end-to-end paths P = fP1; P2; P3g, where P1 = fe1; e2; e5; e7g, P2 = fe1; e2; e4; e6; e7g, and P3 = fe1; e3; e6; e7g. For link e2 = (1; 2), we have P(e2) = fP1; P2g. three paths from source s to receiver r. Its path-link matrix is a 3 by 7 binary matrix as shown below: M = 26666664 e1 e2 e3 e4 e5 e6 e7 P1 1 1 0 0 1 0 1 P2 1 1 0 1 0 1 1 P3 1 0 1 0 0 1 1 37777775: (2.1) Given a directed acyclic graph G = (V ; E) and a set of monitored end-to-end paths P , a link e 2 E is called identiable, if for each link pair (e; e0) where e0 2 E nfeg, there exists at least one path in P that includes only one of the two links, i.e., P(e) 6= P(e0). As in Fig. 2.1, links e2; e3; : : : ; e6 are identiable links, while links e1 and e7 are non-identiable links since P(e1) = P(e7). We notice that the identiability of a link depends only on the network topology. The following proposition shows that the identiability of a link is a necessary condi- Chapter 2. The LANT Framework 19 tion for the estimation of its loss rate. Proposition 1 The loss rate of a link can be estimated only if the link is an identiable link. Proof : We prove it by contradiction. Suppose there exists a link pair (e; e0), where e; e0 2 E , such that all paths in P include either both or none of them, i.e., P(e) = P(e0). If a probe packet is being dropped on either link e or e0, the same end-to-end observation (probe packets with the same contents) will be obtained in either case. Therefore, we cannot diagnose on which link the loss of probe packet occurs, and it is not possible to estimate the loss rate of these links. We divide the set of non-identiable links into several groups, where each group contains a set of links that are included in the same set of paths. We refer to each group as a virtual link. As in Fig. 2.1, since P(e1) = P(e7), we refer to e1 and e7 as one virtual link ev1 . Let EI and EV denote the set of identiable links and the set of virtual links, respectively. We have EI = fev1g and EV = fe2; e3; : : : ; e6g. Note that EI \ EV = ?. Thus, for each link e 2 EI [ EV , we have P(e) 6= P(e0) for all e0 2 EI [ EV n feg. We x the order of elements in EI [ EV . Accordingly, we dene a modied path-link matrix M = (mi;j)jPjjEI[EV j as follows: The element mi;j is equal to 1 if the ith path in set P includes the jth link in set EI [ EV , and is equal to 0 otherwise. We refer to M as type 1 modied path-link matrix. The type 1 modied path-link matrix for the graph in Fig. Chapter 2. The LANT Framework 20 2.1 is shown below: M = 26666664 ev1 e2 e3 e4 e5 e6 P1 1 1 0 0 1 0 P2 1 1 0 1 0 1 P3 1 0 1 0 0 1 37777775: (2.2) We model the loss of packets on dierent links by a set of mutually independent Bernoulli processes. Losses are therefore spatial and temporal independent. This model is commonly used in the literature [7{9, 19, 31, 32] for network tomography. We dene j 2 (0; 1] as the link success rate of the jth link in set EI [ EV , which is the probability that a packet can be successfully transmitted on the jth link. Thus, 1 j denotes the loss rate of the jth link in set EI [ EV . Moreover, we dene i 2 (0; 1] as the path success rate of the ith path in set P , which is the probability that a probe packet can be successfully transmitted on the ith path in set P . Unlike data packets, probe packets would not be retransmitted if being dropped. Thus, we have jEI[EV jY j=1 (j) mi;j = i; i = 1; : : : ; jPj: (2.3) Taking logarithm on both sides of (2.3), we can reformulate it as linear equations: jEI[EV jX j=1 mi;j logj = log i; i = 1; : : : ; jPj; (2.4) where logj and log i are the variables of linear equations. Setting aj = logj and bi = log i, we have jEI[EV jX j=1 mi;jaj = bi; i = 1; : : : ; jPj: (2.5) Chapter 2. The LANT Framework 21 We dene two column vectors a = (aj)jEI[EV j1, and b = (bi)jPj1. The system can be represented in the matrix form as Ma = b: (2.6) Equation (2.6) shows the relation between the path and link success rates. The objective of loss tomography is to infer the link loss rates using end-to-end observations (i.e., the number and the contents of the received probe packets). Let â and b̂ denote the estimator of a and b, respectively. By measuring the path success rates, we can estimate b̂ while â remains unknown. Thus, equation (2.6) becomes a system of jPj equations with jEI [EV j unknowns as: Mâ = b̂: (2.7) In most cases, the number of identiable and virtual links is greater than the number of paths. That is, jEI [EV j > jPj. Thus, (2.7) is under-determined. We propose the LANT framework to obtain additional useful information and determine â. The LANT framework is composed of two phases. In the rst phase, we apply network coding and perform end-to-end measurements on the set of paths P . n batches of probe packets are sent from the sources in a synchronized manner. In each time slot, the intermediate nodes linearly combine the incoming probes according to specic coding coecients. The key objective in this phase is to nd the minimum probe packet size that can establish the mappings between the contents of the received probe packets and the losses on the dierent sets of paths. In the second phase, we inspect the contents of the received probe packets. We show that it can provide us with more information than Chapter 2. The LANT Framework 22 path success rates. We establish a linear system whose coecient matrix has full column rank, and use computational ecient algorithms to develop consistent estimators of link loss rates. In the next two sections, we describe these two phases in details. 2.2 Probe Coding Schemes In Subsection 2.2.1, we establish a tight lower bound on probe size (i.e., number of bits in each probe packet), which is necessary for valid probe coding schemes. Then, we propose algorithms to nd a valid probe coding scheme with the minimum probe size in Subsection 2.2.2. 2.2.1 Lower Bound on Probe Size We refer to probe packets and network coding schemes jointly as probe coding schemes. A probe coding scheme is valid if we can determine which paths have successfully trans- mitted a probe and which paths have not from the end-to-end observations. We adopt linear network coding schemes [15] that are sucient for our task. A probe packet is a binary vector ()2 of length `, which can be interpreted as an element in a nite eld Fq with an alphabet of size q (q = 2`). A coding coecient can also be interpreted as an element in the nite eld Fq. Within valid probe coding schemes, the probe size ` is desired to be as small as possible, since it is directly related to bandwidth eciency. Although a smaller probe size can reduce the bandwidth usage of the network, the inference framework is not valid if the probe size falls below a certain Chapter 2. The LANT Framework 23 threshold. For example, in Fig. 2.1, receiver r receives coded packets that are combined from packets on three dierent paths. Using one-bit probe packets, we are not able to distinguish which of these three paths have successfully transmitted a probe packet. In this case, we need probe packets with at least three bits for valid probe coding schemes while smaller probe sizes cannot constitute valid probe coding schemes. Before we nd a lower bound on probe size, which is necessary for valid probe coding schemes, we present the notations used in our approach. In a directed acyclic graph G = (V ; E), a link is an end link if it is adjacent to a receiver r 2 R. The set of all end links is denoted by ER. For an end link e 2 ER, let Ge denote a subgraph of G consisting of the links and nodes involved in set P(e). We notice that if receiver r has multiple end links, it would know from which link a packet is received. Let qe and `e denote the alphabet size and the length of the probe packets transmitted on subgraph Ge, respectively. The following theorem presents a loose lower bound on probe packet size for valid probe coding schemes. Theorem 1 For the probes transmitted on subgraph Ge, where e 2 ER, the probe size should satisfy `e jP(e)j (i.e., qe 2jP(e)j), in order to obtain valid end-to-end observa- tions. Proof : For each end link e 2 ER, let P(e) = fP1; P2; : : : ; PjP(e)jg. As for valid probe coding schemes, based on the content of the received probe, a receiver should distinguish which paths have successfully transmitted a probe and which paths have not. Without loss of generality, we start from path P1. Since a zero binary vector will introduce Chapter 2. The LANT Framework 24 ambiguity, (1)2 is the smallest binary vector we can use to denote the case where only path P1 has successfully transmitted a probe. (10)2 is the smallest binary vector we can use to denote the case where only path P2 has successfully transmitted a probe. Since (11)2 denotes the case where both paths P1 and P2 have successfully transmitted a probe, (100)2 is the smallest binary vector we can use to denote the case where only path P3 has successfully transmitted a probe. By induction, we can show that (10 0)2 of length jP(e)j is the smallest binary vector we can use to denote the case where only path PjP(e)j has successfully transmitted a probe. We modify the above binary vectors to vectors of length jP(e)j with zeros added to the left-hand side. Thus, for the probes transmitted in subgraph Ge, we have `e jP(e)j, and qe 2jP(e)j. Although Theorem 1 provides lower bounds on probe size for the probes transmitted on dierent subgraphs separately, some lower bounds may not be achieved. For example, in Fig. 2.2, we have `e6 2 and `e7 4 for subgraphs Ge6 and Ge7 , respectively. However, there are overlapping links in Ge6 and Ge7 such as links e1, e2 and e3. In this case, a valid probe size should be 4. Let G denote a set of subgraphs with overlapping links. The probes transmitted on these subgraphs should have the same size. Let `G denote the size of such probes. Correspondingly, the set of end links in the subgraph set G is denoted by ER(G ). The following proposition presents an improved lower bound on probe size for valid probe coding schemes. Proposition 2 For the probes transmitted on subgraph set G , the probe size should satisfy `G maxe2ER(G ) jP(e)j, in order to obtain valid end-to-end observations. Chapter 2. The LANT Framework 25 Figure 2.2: A directed acyclic graph with two end links e6 and e7. Proof : According to Theorem 1, for the probes transmitted in each subgraph Ge where e 2 ER(G ), there exists a lower bound `e jP(e)j. Since network coding is applied, the probes transmitted on one link should have the same size. Similarly, the probes transmitted on one subgraph should also have the same size. Thus, for the probes transmitted in the subgraphs with overlapping links, the lower bound on probe size is the maximum value of the lower bounds obtained from Theorem 1. That is, `G maxe2ER(G ) jP(e)j. 2.2.2 Algorithms for Finding a Valid Probe Coding Scheme We propose an approach to nd a valid probe coding scheme, such that the improved lower bound obtained from Proposition 2 is achieved. The approach is divided into three processes, described in this section. Chapter 2. The LANT Framework 26 Constructing Auxiliary Trees For each end link e = (h; r) 2 ER, we introduce an auxiliary tree topology Te. There is a one-to-many mapping from the nodes and the links in the original graph G to the nodes and the links in the auxiliary tree Te. We use u0(r) to denote the tree node that corresponds the root node r in graph G. We use ui(v) to denote the ith tree node that corresponds to the non-receiver node v in graph G. Similarly, we use "i(e) to denote the ith tree link that corresponds to link e in graph G. Algorithm 1 shows how to construct the auxiliary trees corresponding to each end link in set ER. For each end link e = (h; r) 2 ER, nodes u0(r), u1(h) and link "1(e) are rst added into Te (Step 2). We dene a leaf node as a node only with outgoing links in Te. Node u0(r) is the destination node. The set of leaf nodes Le initially includes node u1(h) (Step 3). If there exists tree node uk(v) 2 Le, where k 2 f1; 2; : : : ; ig and v is not a source node in G, then along the incoming links of node v while ignoring the outgoing links, we nd a set of nodes. Corresponding to these nodes and the incoming links, new tree nodes and tree links are dened and added into Te (Step 7). The set of leaf nodes Le is updated in Steps 8 and 11. The counter i for the number of tree links in Te is updated in Step 9. Selecting Coding Coecients The coding coecients are readily obtained based on the auxiliary trees. The non-receiver nodes with multiple incoming links in G are the nodes that perform network coding. The Chapter 2. The LANT Framework 27 Algorithm 1 Algorithm for constructing auxiliary trees. Assume graph G = (V ; E) is given. 1: for each end link e = (h; r) 2 ER do 2: Add nodes u0(r), u1(h) and link "1(e) into auxiliary tree Te 3: Initialize the set of leaf nodes Le fu1(h)g 4: i 2 5: while 9 node uk(v) 2 Le; k 2 f1; 2; : : : ; ig and v =2 S do 6: for each v0 : 9(v0; v) 2 E do 7: Add node ui(v 0) and link "i(v0; v) into Te 8: Update Le Le Sfui(v0)g 9: i i+ 1 10: end for 11: Update Le Le n fuk(v)g 12: end while 13: end for corresponding tree nodes would also have multiple incoming tree links. The remaining nodes in G perform either forwarding or multicasting. Algorithm 2 shows how to select coding coecients. For each node uk(v) in Te, suppose it has a set of t(uk(v)) incoming links f"1in(e1in), "2in(e2in), : : :, "t(uk(v))in (et(uk(v))in )g and one outgoing link "out(eout). Then, node v has coding coecients [(e 1 in; eout), (e 2 in; eout), : : :, (e t(uk(v)) in ; eout)]. Suppose tree links " 1 in(e 1 in), " 2 in(e 2 in), : : :, " t(uk(v)) in (e t(uk(v)) in ) have n1, n2, Chapter 2. The LANT Framework 28 Algorithm 2 Algorithm for selecting coding coecients. Assume the auxiliary trees are given. 1: for each auxiliary tree Te do 2: for each node uk(v) in Te with outgoing link "out(eout) do 3: n0 0 4: for each incoming link "iin(e i in) of node uk(v), i = 1; 2; : : : ; t(uk(v)) do 5: Find its corresponding leaf-node set L("iin(eiin)) Le 6: ni jL("iin(eiin))j 7: (eiin; e i out) 2n0+n1++ni1 8: end for 9: end for 10: end for : : :, nt(uk(v)) corresponding leaf nodes, respectively. Then, we choose the values of coding coecients as [20, 2n1 , 2n1+n2 , : : :, 2n1+n2++nt(uk(v))1 ]. Designing Probe Packets The path-link matrix M can be easily obtained based on the paths from the leaf nodes to the destination node in each auxiliary tree according to Algorithm 3. Now, we show how to nd the sets of subgraphs with overlapping links. Each subgraph Ge originally constitutes a subgraph set fGeg. We check each column of the path-link matrix M. If a column has multiple 1s and it also represents that dierent subgraph sets include the same link, we combine these subgraph sets as one subgraph set. Then, for each subgraph Chapter 2. The LANT Framework 29 set G with its end-link set ER(G ), according to Proposition 2, we calculate the tight lower bound on probe size `G = maxe2ER(G ) jLej. Thus, probes as (0 01)2 of length `G are sent from the sources to the outgoing links in G . Finally, for each path Pi 2 P , multiplying (0 01)2 of its corresponding length by the coding coecients along path Pi, we can obtain the content of a received probe that denotes the case where only path Pi has successfully transmitted a probe. In this way, we can establish the mappings between the contents of received probes and the losses on the dierent combinations of paths for each subgraph, and thus obtain a valid probe coding scheme. Example 2: We consider a directed acyclic graph G = (V ; E) depicted in Fig. 2.2. We use Algorithms 1-2 to obtain a valid probe coding scheme with the minimum probe size. First, we construct two auxiliary trees Te6 and Te7 , as depicted in Fig. 2.3. Second, we choose nodes 1 and 2 as the nodes that perform network coding, since they are the non-receiver nodes with multiple incoming links. Based on the auxiliary tree Te6 , we have jL("3(e1))j = 1 (Algorithm 2, Steps 5-6). Thus, some of the coding coecients of node 1 are obtained as [(e1; e3); (e2; e3)] = [1; 2] (Algorithm 2, Step 7). Similarly, based on Te7 , we obtain the coding coecients of node 1 as [(e1; e4); (e2; e4)] = [1; 2]. We also obtain the coding coecients of node 2 as [(e4; e7); (e5; e7)] = [1; 4], since jL("2(e4))j = 2. Chapter 2. The LANT Framework 30 Algorithm 3 Algorithm for constructing path-link matrix. Assume the auxiliary trees are given. 1: Initialize an empty path-link matrix M 2: for each auxiliary tree Te do 3: i 1 4: for each tree path Pi in Te do 5: for each mi;j, j = 1; 2; : : : ; jEj do 6: If 9"k(ej) 2 Pi, mi;j = 1; otherwise, mi;j = 0. 7: end for 8: Row vector ! (mi;j)1jEj 9: Update M 2664M ! 3775 10: i i+ 1 11: end for 12: end for Third, the path-link matrix M can be obtained as follows: M = 26666666666666666664 e1 e2 e3 e4 e5 e6 e7 1 0 1 0 0 1 0 0 1 1 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 1 0 1 1 0 1 0 1 37777777777777777775 : (2.8) Chapter 2. The LANT Framework 31 (a) Auxiliary tree Te6 (b) Auxiliary tree Te7 Figure 2.3: Two auxiliary trees Te6 and Te7 , corresponding to end links e6 and e7, re- spectively. The top two rows represents the two paths in subgraph Ge6 , and the bottom four rows represents the four paths in subgraph Ge7 . Since link e1 is involved in both fGe6g and fGe7g (checking the rst column of M), we combine the two subgraph sets and obtain one subgraph set G = fGe6 ;Ge7g. Counting the number of leaf nodes in each auxiliary tree, we obtain jP(e6)j = 2 and jP(e7)j = 4. Thus, `G = max f2; 4g = 4 and probes as (0001)2 are sent from sources s1 and s2 to outgoing links e1 and e2, respectively. Chapter 2. The LANT Framework 32 2.3 Linear Algebraic Approach As described previously, the system in (2.7) has jPj equations with jEI [ EV j unknowns. However, jPj may be less than jEI [ EV j, such as the topologies in Figs. 2.1 and 2.2, and thus â in (2.7) cannot be uniquely determined. Even when jPj jEI [ EV j, it does not ensure that â can be determined. In this section, we propose a linear algebraic (LA) approach using the observations from coding operations. We show that â can be determined using the method of least-squares [35]. Then, we combine the methods of normal equations and row selection with the LA approach and analyze the computational complexity. 2.3.1 Least-squares Solutions By inspecting the contents of the received coded probe packets at the destinations, we can estimate not only the success rate of a single path, but also the success rate of a set of paths. As the main consequence of valid probe coding schemes, it enables us to distinguish between the paths that have contributed to a coded probe packet and the paths that have not. This is unique to the networks which use probe coding schemes and cannot be achieved by routing probes in general. We denote the power set1 of P by P. Thus, jPj = 2jPj. Each element of P is a subset of P , which can be used to represent a unique combination of paths. Let i denote 1The power set of a set is the set of all subsets of that set. For example, the power set of fa; bg is f?; fag; fbg; fa; bgg. Chapter 2. The LANT Framework 33 the path set success rate of the ith path set in P n f?g, which is the probability that a batch of probes can be successfully transmitted on all the paths in the ith path set. We dene a path set success rate except for ? 2P, because we require the probability that at least one path can successfully transmit a probe to obtain an equation of link success rates and a path set success rate. Accordingly, we dene a modied path-link matrix fM = (emi;j)(jPj1)jEI[EV j as fol- lows: The element emi;j is equal to 1 if there exists a path in the ith path set in set P n f?g which includes the jth link in set EI [EV , and is equal to 0 otherwise. We refer to fM as type 2 modied path-link matrix. The type 2 modied path-link matrix for the graph in Fig. 2.1 is shown below: fM = 266666666666666666666664 1 1 0 0 1 0 1 1 0 1 0 1 1 0 1 0 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 377777777777777777777775 : (2.9) Each of the top three rows represents the path set that includes only one path. We dene a column vector, c = (ci)(jPj1)1, where ci = log i. The column vector a is dened as in Section 2.1. Thus, we have a linear system jEI[EV jX j=1 emi;jaj = ci; i = 1; : : : ; jPj 1 (2.10) Chapter 2. The LANT Framework 34 or in the matrix form as fMa = c: (2.11) For each path set in P n f?g, the n probe batches sent from the sources can be considered as a binomial experiment consisting of n trials. The associated binomial random variable Xi is dened as the number of received coded probes (or probe batches for more than one incoming links) whose contents represent that all the paths in the ith path set have successfully transmitted a probe packet. The sample proportion ̂i = Xi=n is a maximum likelihood (ML) estimator of i [36] (or an ML estimate resulting from end-to-end measurement xi substituted in the place of Xi). Accordingly, we can obtain ĉ, the estimator of c. The column vector â remains unknown. Thus, we extend (2.7) to a system of jPj 1 equations with jEI [EV j unknowns, as follows: fMâ = ĉ: (2.12) The linear system in (2.12) has more equations than unknowns, i.e., jPj 1 jEI [ EV j: (2.13) This is because for every pair of links e; e0 2 EI [ EV , P(e) is dierent from P(e0), while jPj paths can have at most 2jPj1 dierent combinations of paths. The inequality (2.13) is a necessary condition for â to be determined. To show that â in (2.12) can be determined by the least-squares, we introduce a (jPj 1) (jPj 1) auxiliary matrix M(jPj) of a type 2 modied path-link matrix Chapter 2. The LANT Framework 35 fM with an end-to-end path set P. Compared to fM, M(jPj) has additional column vectors and has jPj 1 column vectors in total. The jPj 1 column vectors in the top jPj(jPj1) submatrix can represent all non-zero vectors in the vector space FjPj2 . The bottom part of the additional columns are obtained according to the relation between the paths and path sets. An example of M(3) for fM in (2.9) is as follows: M(3) = 266666666666666666666664 1 1 0 0 1 0 1 1 1 0 1 0 1 0 1 0 1 0 0 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 377777777777777777777775 : (2.14) Lemma 1 Let M(jPj) be an auxiliary matrix of a type 2 modied path-link matrix fM with an end-to-end path set P. Then, rank(M(jPj)) = 2jPj 1, i.e., all 2jPj 1 column vectors in M(jPj) are linearly independent. Proof : We prove it by induction. We mention that the matrix M(jPj) has binary entries and the column vectors are dened in a vector space over a nite eld F2. It can be veried that M(2) has full rank. Assume that matrix M(k) also has full rank. That is, all 2k 1 columns in M(k) are linearly independent. Thus, the modulo 2 summation of any m columns of this matrix, for m = 2; : : : ; 2k 1, has at least one non-zero entry. Now, consider matrixM(k+1). This matrix can be represented as follows after row and Chapter 2. The LANT Framework 36 column permutations: M(k + 1) = 266666666666666666666664 0 0 1 1 1 0 M(k) ... M(k) 0 1 1 1 M(k) ... ... . . . ... 1 1 1 377777777777777777777775 Permutations would not change its rank. The top row represents the newly added path, followed by two submatrices M1 = [M(k) 0M(k)] and M2 = [M(k) 1], where 0 and 1 are columns of 0 and 1, respectively. We note that the path sets of M1 (rows in M1) do not include the new added path, while those ofM2 all include it. Now, we show that the matrixM(k+1) has full rank. To do so, we show that the summation of all possible combinations of these 2k+11 columns inM(k+1) is a non-zero vector (i.e., there exists at least one non-zero entry in the summation vector). First, the middle column [1 0 1]T is included in the combination of the columns that we choose. Since the entries of the last row in M(k) are all ones, in the summation of the chosen vectors, at least one entry would be non-zero. This entry corresponds to the last row inM1 or inM2. From now on, we exclude the middle column from our choices. Second, we choose the columns from either the 2k 1 columns on the left-hand side or the 2k 1 columns on the right-hand side (but not both at the same time). In this Chapter 2. The LANT Framework 37 case, at least one entry in the summation vector would be non-zero corresponding to the rows in M1. It is because of the linear independency of the columns in M(k). Third, we choose the columns from both the 2k1 columns on the left-hand side and on the right-hand side. In this case, if an odd number of columns is chosen from the the right-hand side, the entry of the summation vector corresponding to the top row would be non-zero. However, if an even number of columns is chosen from the right-hand side, at least one entry of the summation vector corresponding to the rows in M2 would be non-zero, because of the linear independency of the columns in M(k). To this end, we have considered the modulo 2 summation for all possible combinations of the columns in matrix M(k + 1), and there is always at least one non-zero entry in the summation vector. Therefore, all these 2k+11 column vectors inM(k+1) are linearly independent. With Lemma 1, the following theorem gives the rank of a type 2 modied path-link matrix. Theorem 2 Let a directed acyclic graph G = (V ; E) be given with a system of linear equations in matrix form fMâ = ĉ. Then, rank(fM) = jEI [ EV j. Proof : Let M(jPj) be an auxiliary matrix of fM. The jEI [ EV j column vectors in fM are among the 2jPj 1 column vectors in M(jPj). From Lemma 1, all 2jPj 1 column vectors in M(jPj) are linearly independent. Thus, these jEI [ EV j column vectors in fM are also linearly independent. As a result, rank(fM) = jEI [ EV j. Corollary 1 Let fMâ = ĉ be given. Then, â can be determined by least-squares. Chapter 2. The LANT Framework 38 Proof : When the number of equations is equal to the number of unknowns, i.e., jPj 1 = jEI [ EV j, fM is a square matrix. Theorem 2 ensures that fM is invertible. Thus, â can be determined as â = fM1ĉ: (2.15) When the number of equations is greater than the number of unknowns, i.e., jPj1 > jEI [ EV j, the system is over-determined. We can apply least-squares [35] to obtain an approximate solution which minimizes the residual error kĉfMâk. Theorem 2 ensures that fMTfM is invertible. Thus, â can be determined as â = (fMTfM)1fMTĉ: (2.16) We note that (2.15) is a special case of (2.16). Corollary 1 gives an analytical solution of â using least-squares. The following theorem demonstrates the consistency of the corresponding estimators. Theorem 3 1 ̂j is a consistent estimator of 1 j. Proof : For each ej 2 EI [ EV , 1 ̂j is a function of ̂j, while ̂j is a function of ̂1; ̂2; : : : ; ̂jPj1. Since ̂i p! i, the continuous mapping theorem and Slutsky's theorem [36] yield that 1 ̂j p! 1 j, where p! denotes convergence in probability. Proposition 3 The loss rate of a link can be consistently estimated if and only if the link is an identiable link. Proof : Theorem 3 shows that the loss rates of all identiable links can be consistently estimated by the estimators. In addition, Proposition 1 shows that if the loss rate of a Chapter 2. The LANT Framework 39 link can be estimated, then the link is identiable. These together prove this proposition. Although the loss rate of the links which are not identiable cannot be consistently estimated, we at least can obtain an upper bound on the loss rate of them, which is the loss rate of the corresponding virtual link. 2.3.2 Method of Normal Equations Algorithm 4 Method of Normal Equations [37] 1: Calculate the symmetric matrix fMTfM 2: Calculate Cholesky decomposition fMTfM = LLT 3: Calculate d fMTĉ 4: Use forward substitution to solve Ly = d for y 5: Use back substitution to solve LTâ = y for â The most common technique to solve a full rank least-squares problem is the method of normal equations [37]. We dene = jPj 1 and = jEI [ EV j, so that fM is a matrix. The rst step in the method of normal equations is to calculate the symmetric matrix (i.e., fMTfM). This requires 2 ops2. The second step is to calculate the Cholesky decomposition fMTfM = LLT requiring 3=3 ops. The third step is to calculate fMTĉ requiring 2 ops. The fourth and fth steps are to solve Ly = fMTĉ for 2A op is a oating point operation. Flop count is useful as a rough estimate of complexity and predictor of computational time on modern computers. Chapter 2. The LANT Framework 40 y using forward substitution and to solve LTâ = fMTĉ for â using back substitution, each of which requires 2 ops. Considering by (2.13), the complexity of this method is O(2). Although the rst step in the method of normal equations includes the dominant term of the complexity, it only needs to be executed once for initial setup as long as the network topology remains unchanged. We need to obtain ĉ before we can calculate fMTĉ and perform forward/back substitutions (Steps 3-5). The complexity in calculating ĉ is O(n), where n is the number of probe batches. This step and Steps 3-5 can be repeated times in a monitoring period. Thus, the LA approach using the method of normal equations has a complexity of O(2 + n+ ) in practice. 2.3.3 Method of Row Selection Since = 2jPj 1, the number of path sets grows exponentially as jPj increases. As a result, the method of least-squares would lack scalability and thus have high complexity. Nonetheless, according to Theorem 2, rank(fM) = . This means there exist linearly independent path sets out of path sets which are sucient to determine â. To select linearly independent path sets, we modify the row selection algorithm proposed in [30], and obtain a reduced linear system as below: fM1â = ĉ1; (2.17) wherefM1 2 f0; 1g and ĉ1 2 R consists of rows offM and ĉ, respectively. Algorithm 5 shows the modied row (path set) selection algorithm. This algorithm incrementally Chapter 2. The LANT Framework 41 Algorithm 5 Modied Row (Path Set) Selection Algorithm 1: Initialize fM1 the rst row in fM 2: Initialize R by calculating the thin QR factorization of fMT1 3: while fM1 is not a square matrix do 4: ! next row in fM 5: R̂12 RTfM1!T 6: R̂22 k!k2 kR̂12k2 7: if R̂22 6= 0 then 8: Update R 2664R R̂12 0 R̂22 3775 9: Update fM1 2664fM1 ! 3775 10: end if 11: end while builds a QR factorization fMT1 = QR, where Q 2 R is an orthogonal matrix and R 2 R is an upper triangular matrix. It only needs to be executed once for initial setup with a complexity of O(2). The complexity of calculating ĉ1 is reduced to O(n). Then, we calculate â = fMT1 z, where z = R1(R1)Tĉ1, whose complexity is O(2). We repeat the above steps for times in a monitoring period. Thus, the LA approach using the method of row selection has a lower complexity of O(2 + n+ 2) in practice. 42 Chapter 3 Performance Evaluation In this chapter, we access the performance of the LANT framework by simulations. 3.1 Simulation Setup For the network topology, we rst consider the Internet2 network map [38], which is a high-performance backbone network created by the Internet2 community. The topology is modied as the one used in [24], consisting of 10 nodes and 15 links. We apply the orientation algorithm [24] that converts the modied topology with selected sources to three directed acyclic graphs with dierent number of sources in Fig. 3.1, where all links are identiable. Destinations are determined by the orientation algorithm. To consider larger networks, we use BRITE [39] to generate three router-level undi- rected network topologies with Waxman model. BRITE is a universal topology generater which improves the state of the art and is based on design principles which include repre- sentativeness, inclusiveness, and inter-operability. Representativeness leads to synthetic topologies that accurately re ect many aspects of the actual Internet topology (e.g. hier- archical structure, degree distribution, etc.). Inclusiveness combines the strengths of as Chapter 3. Performance Evaluation 43 (a) One source. (b) Two sources. (c) Three sources. Figure 3.1: Directed acyclic graphs with dierent number of sources. (a) One source (node 1) and one receiver (node 9); (b) two sources (nodes 1 and 9) and one receiver (node 7); (c) three sources (nodes 1, 4 and 10) and one receiver (node 9). Chapter 3. Performance Evaluation 44 many generation models as possible in a single generation tool. Inter-operability provides interfaces to widely-used simulation applications such as ns, SSF and OmNet++ as well as visualization applications. In the three large network topologies generated by BRITE, the number of nodes are chosen as 20, 100, and 500. The number of links is twice the number of nodes in each topology. A random link loss rate 1 j is assigned to the jth link ej 2 EI [ EV , where the link success rate j is uniformly distributed within [ave 0:05; ave + 0:05]. The value of ave is chosen as 0:7, 0:75, 0:8, 0:85, 0:9, and 0:95, to adjust the average success rate across all links. After assigning each link a loss rate, we send n batches of probe packets. Each probe traversing a link is dropped at a xed probability as the link loss rate. In each simulation, we obtain an estimate 1 ̂j of the actual link loss rate 1j for the jth link in set EI [EV . The root mean square error (RMSE) is used to determine the estimation accuracy across all identiable links and virtual links. The RMSE is computed as RMSE = 0@jEI[EV jX j=1 jj ̂jj2 jEI [ EV j 1A1=2 : (3.1) We brie y summarize the belief propagation (BP) algorithm [26] and compare our LANT framework with the BP algorithm for loss rate inference through simulations in Section 3.2. Following the approach in [26], the rst step is to create the factor graph from the original graph. The factor graph is a bipartite graph: on one side there are the links (variable nodes), whose loss rates we aim to estimate; on the other side there are the paths (function nodes) that are observed by each received probe. An edge exists in Chapter 3. Performance Evaluation 45 the factor graph between a link and a path if the link belongs to this path in the original graph. We note that, unlike tree topologies considered in [26], in general topologies there might exist multiple paths for every source-receiver pair. The second step is to perform belief propagation. Each received probe triggers message passing in the factor graph and results in an estimate of link loss probabilities. Then, the estimates from dierent probes are combined, using standard methods [26], to obtain an estimate 1 ̂j of the actual link loss rate 1 j for the jth link ej 2 EI [ EV . The results are averaged over 100 simulations to eliminate possible random eects, where each simulation has new loss rate assignments and new loss processes. 3.2 Simulation Results First, we investigate the in uence of dierent methods adopted by LA approach on the estimation accuracy, based on the graph with one source in Fig. 3.1(a). The type 2 modied path-link matrixfM has 127 rows, and we apply the method of normal equations. This case is denoted by LA-NE. Alternatively, we use Algorithm 5 to build a square matrix fM1, where each row (path set) includes 1 or 2 paths. This case is denoted by LA-RS. Fig. 3.2 shows the RMSE of LA approach using the two methods, as a function of the number of probe batches. We observe that the RMSE of the LA-NE algorithm is lower than that of the LA-RS algorithm when n = 50. Such behavior is reasonable since the LA-NE algorithm uses more equations to obtain link loss rates than the LA-RS algorithm. However, the dierence of the RMSE is less than 2% and it vanishes as the Chapter 3. Performance Evaluation 46 102 103 104 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 n R M SE LA−NE, α ave = 0.8 LA−RS, α ave = 0.8 LA−NE, α ave = 0.85 LA−RS, α ave = 0.85 LA−NE, α ave = 0.9 LA−RS, α ave = 0.9 Figure 3.2: The RMSE of LA using the methods of normal equations (LA-NE) and row selection (LA-RS), versus the number of probe batches n. number of probes increases. Therefore, for large number of probes, the performance of the LA-RS algorithm and the LA-NE algorithm is similar while the LA-RS algorithm outperforms the LA-NE algorithm in terms of the computational complexity. Second, we compare the estimation accuracy of the BP algorithm and the LA-RS algorithm, based on the graph with one source in Fig. 3.1(a). Fig. 3.3 shows the RMSE as a function of the number of probe batches, for dierent average link success rates. We observe that the BP algorithm has better accuracy when n < 400, and the LA- RS algorithm achieves better accuracy, after sending reasonably sucient probe batches (n > 400). This is because the LA-RS algorithm exploits the losses on the dierent combinations of paths, while the BP algorithm only utilizes the losses on paths. Fig. 3.4 Chapter 3. Performance Evaluation 47 102 103 104 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 n R M SE BP, α ave = 0.8 BP, α ave = 0.85 BP, α ave = 0.9 LA−RS, α ave = 0.8 LA−RS, α ave = 0.85 LA−RS, α ave = 0.9 Figure 3.3: The RMSE of the BP algorithm and the LA-RS algorithm, versus the num- ber of probe batches n, for dierent average success rate ave. 0.7 0.75 0.8 0.85 0.9 0.95 0 0.02 0.04 0.06 0.08 0.1 0.12 α ave R M SE BP LA−RS Figure 3.4: The RMSE of the BP algorithm and the LA-RS algorithm, versus the av- erage success rate ave (n = 500). Chapter 3. Performance Evaluation 48 102 103 104 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 n R M SE One source Two sources Three sources Figure 3.5: The RMSE of the LA-RS algorithm, versus the number of probe batches n, for dierent number of sources (ave = 0:8). shows the RMSE as a function of the average link success rate with 500 probe batches. The RMSE decreases as the average link success rate increases, which is consistent with the relative position of the curves in Fig. 3.3. Based on these two graphs, we can predict that when average loss rates are lower than 0:8, the BP algorithm would perform worse (RMSE > 6%, n = 20; 000), while the LA-RS algorithm would still achieve satisfactory accuracy (RMSE < 1%, n = 20; 000). Third, for the networks with dierent number of sources in Fig. 3.1, Fig. 3.5 shows the RMSE as a function of the number of probe batches, and Fig. 3.6 shows the RMSE as a function of the average success rate. We compare the relative position of the three curves in Figs. 3.5 and 3.6, and obtain the following observation: The graph with more Chapter 3. Performance Evaluation 49 0.7 0.75 0.8 0.85 0.9 0.95 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 α ave R M SE One source Two sources Three sources Figure 3.6: The RMSE of the LA-RS algorithm, versus the average success rate ave, for dierent number of sources (n = 500). sources achieves better estimation accuracy. However, the improvement of estimation accuracy is negligible with relatively large success rates or sucient probe batches. In this case, we can use a small number of sources and exibly choose their locations. Finally, we investigate the performance of the LA-RS algorithm in three larger net- works generated by BRITE. We randomly pick a part of nodes as source nodes in each network for 10 times. We pick 4 source nodes for the 20-node network, and 20 source nodes for the 100-node and the 500-node network. The orientation algorithm is applied to obtain directed acyclic graphs. There are 68:25% identiable links on average in the directed graph for the 20-node network (40 links), 63:7% identiable links on average in the the directed graph for 100-node network (200 links), and 26:17% identiable links Chapter 3. Performance Evaluation 50 102 103 104 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 n R M SE 500 nodes 100 nodes 20 nodes Figure 3.7: The RMSE of the LA-RS algorithm, versus the number of probe batches n, for networks of dierent sizes (ave = 0:9). on average in the directed graph for the 500-node network (1; 000 links). Although the eect of the number and location of sources on the accuracy can be negligible with rel- atively large success rates or sucient probe batches, dierent number and location of sources may result in dierent number of identiable and virtual links. Fig. 3.7 shows the RMSE as a function of the number of probe batches for the networks of dierent sizes. As expected, the LA-RS algorithm still achieves satisfactory accuracy (RMSE < 1%, n = 20; 000), while more probe batches are needed to achieve the same accuracy in larger networks. 51 Chapter 4 Conclusions and Future Work 4.1 Conclusions In this thesis, we studied the problem of link loss tomography on mesh topologies using network coding. We rst provided an overview of the area of network tomography in communication networks. Accurate and ecient measurement of network-internal char- acteristics is critical for management and maintenance of large-scale networks. Recently there have been attempts to apply network coding in loss tomography in order to increase bandwidth eciency. We introduced dierent inference techniques in the related works and explained the various performance bottlenecks such as (1) low bandwidth eciency, (2) high monitoring cost, (3) estimation not being always accurate, and (4) requiring additional assumptions. Then, we proposed a linear algebraic network tomography (LANT) framework for ac- tive inference of link loss rates. We rst established a tight lower bound on the probe size for valid end-to-end observations when network coding is applied. Then, we developed algorithms to nd a valid probe coding scheme, such that the lower bound on probe size is always achieved. Furthermore, we proposed a linear algebraic (LA) approach and devel- Chapter 4. Conclusions and Future Work 52 oped consistent estimators of link loss rates. We also demonstrated that the complexity of LA using the method of row selection is lower than that using the method of normal equations. Using our LANT framework, the identiability of a link is the necessary and sucient condition for its consistent loss estimation. We investigated the performance of the LANT framework under dierent simulation scenarios. Results showed that, for large number of probes, the performance of the LA-RS algorithm and the LA-NE algorithm is similar while the LA-RS algorithm outperforms the LA-NE algorithm in terms of the computational complexity. Moreover, the LA-RS algorithm achieves better estimation accuracy than the belief propagation (BP) algo- rithm when the estimators converge. Although the eect of the number and location of sources on the accuracy can be negligible with relatively large success rates or sucient probe batches, dierent number and location of sources may result in dierent number of identiable and virtual links. 4.2 Future Work One possible extension of this work is to minimize the number of nodes performing network coding. In the current work, we choose all non-receiver nodes with multiple incoming links as the nodes that perform network coding. In this way, we can easily obtain a valid coding scheme. One possible solution is to choose the nodes next to the source nodes, since these nodes performing network coding can suciently distinguish dierent information ows, such that the intermediate nodes do not need to perform Chapter 4. Conclusions and Future Work 53 network coding. Another extension is design an algorithm to choose the number and location of the source and receiver nodes, such that all the links in the given network are identiable. In the current work, we use the orientation algorithm [24] with selected sources to nd a directed acyclic graph. Using this algorithm, we cannot control the number and location of the receiver nodes. To solve this problem, a new algorithm to nd a directed acyclic graph with selected sources and receivers is necessary. 54 Bibliography [1] R. Castro, M. Coates, G. Liang, R. Nowak, and B. Yu, \Network tomography: Recent developments," Statistical Science, vol. 19, no. 3, pp. 499{517, Aug. 2004. [2] Y. Yardi, \Network tomography: Estimating source-destination trac intensities from link data," J. Amer. Statist. Assoc., vol. 91, no. 433, pp. 365{377, March 1996. [3] Y. Tsang, M. Coates, and R. Nowak, \Network delay tomography," IEEE Trans. Acoust., Speech, Signal Processing, vol. 51, no. 8, pp. 2125{2136, Aug. 2003. [4] A. Chen, J. Cao, and T. Bu, \Network tomography: Identiability and Fourier domain estimation," in Proc. of IEEE INFOCOM, Anchorage, AK, May 2007. [5] G. Sharma, S. Jaggi, and B. Dey, \Network tomography via network coding," in Proc. of Information Theory and Applications Workshop, San Diego, CA, Jan. 2008. [6] P. Sattari, A. Markopoulou, and C. Fragouli, \Multiple source multiple destination topology inference using network coding," in Proc. of NetCod Workshop, Lausanne, Switzerland, June 2009. Bibliography 55 [7] R. Caceres, N. Dueld, J. Horowitz, and D. Towsley, \Multicast-based inference of network-internal loss characteristics," IEEE Trans. Inform. Theory, vol. 45, no. 7, pp. 2462{2480, Nov. 1999. [8] N. Dueld, J. Horowitz, F. LoPresti, and D. Towsley, \Multicast topology inference from measured end-to-end loss," IEEE Trans. Inform. Theory, vol. 48, no. 1, pp. 26{45, Jan. 2002. [9] N. Dueld, \Network tomography of binary network performance characteristics," IEEE Trans. Inform. Theory, vol. 52, no. 12, pp. 5373{5388, Dec. 2006. [10] Y. Tsang, M. Coates, and R. Nowak, \Passive unicast network tomography using EM algorithm," in Proc. of IEEE Int'l Conf. Acoust., Speech, Signal Processing, Salt Lake City, UT, May 2001. [11] J. Zhao, R. Govindan, and D. Estrin, \Computing aggregates for monitoring wireless sensor networks," in Proc. of IEEE Int'l Workshop on Sensor Network Protocols and Applications, Anchorage, AK, May 2003. [12] H. Nguyen and P. Thiran, \Using end-to-end data to infer lossy links in sensor networks," in Proc. of IEEE INFOCOM, Barcelona, Spain, Apr. 2006. [13] Y. Lin, B. Liang, and B. Li, \Passive loss inference in wireless sensor networks based on network coding," in Proc. of IEEE INFOCOM, Rio de Janeiro, Brazil, Apr. 2009. Bibliography 56 [14] R. Ahlswede, N. Cai, S. Li, and R. Yeung, \Network information ow," IEEE Trans. Inform. Theory, vol. 46, no. 4, pp. 1204{1216, July 2000. [15] R. Koetter and M. Medard, \Beyond routing: An algebraic approach to network coding," in Proc. of IEEE INFOCOM, New York, NY, Nov. 2002. [16] T. Ho and D. Lun, Network Coding: An Introduction. Cambridge University Press, 2008. [17] C. Fragouli, J. LeBoudec, and J. Widmer, \Network coding: An instant primer," ACM SIGCOMM Computer Communication Review, vol. 36, no. 1, pp. 63{68, Jan. 2006. [18] C. Fragouli, A. Markopoulou, R. Srinivasan, and S. Diggavi, \Network monitoring: It depends on your points of view," in Proc. of Information Theory and Applications Workshop, San Diego, CA, Jan. 2007. [19] W. Zhu and Z. Geng, \A bottom-up inference of loss rate," Computer Communica- tions, vol. 28, no. 4, pp. 351{365, Mar. 2005. [20] N. Dueld, F. LoPresti, V. Paxson, and D. Towsley, \Inferring link loss using striped unicast probes," in Proc. of IEEE INFOCOM, Anchorage, AK, Apr. 2001. [21] V. Padmanabhan, L. Qiu, and H. Wang, \Passive network tomography using bayesian inference," in Proc. of the 2nd ACM SIGCOMM Workshop on Internet Measurement, Marseille, France, Nov. 2002. Bibliography 57 [22] T. Bu, N. Dueld, F. LoPresti, and D. Towsley, \Network tomography on general topologies," in Proc. of ACM SIGMETRICS, Marina Del Rey, CA, June 2002. [23] C. Fragouli and A. Markopoulou, \A network coding approach to network tomog- raphy," in Proc. of 43rd Allerton Conf., Monticello, IL, Sept. 2005. [24] M. Gjoka, C. Fragouli, P. Sattari, and A. Markopoulou, \Loss tomography in general topologies with network coding," in Proc. of IEEE GLOBECOM, Washington, DC, Nov. 2007. [25] H. Yao, S. Jaggi, and M. Chen, \Network coding tomography for network failures," in Proc. of IEEE Infocom (Mini-Conference), San Diego, CA, March 2010. [26] Y. Mao, F. Kschischang, B. Li, and S. Pasupathy, \A factor graph approach to link loss monitoring in wireless sensor networks," IEEE J. Select. Areas Commun., vol. 23, no. 4, pp. 820{829, Apr. 2005. [27] F. Kschischang, B. Frey, and H. Loeliger, \Factor graphs and the sum-product al- gorithm," IEEE Trans. Inform. Theory, vol. 47, no. 2, pp. 498{519, Feb. 2001. [28] Y. Zhao, Y. Chen, and D. Bindel, \Towards unbiased end-to-end network diagnosis," IEEE/ACM Trans. Networking, vol. 17, no. 5, pp. 1724{1737, Dec. 2009. [29] B. Sun and Z. Zhang, \Probabilistic diagnosis of link loss using end-to-end path measurements and maximum likelihood estimation," in Proc. of IEEE ICC, Dresden, Germany, June 2009. Bibliography 58 [30] Y. Chen, D. Bindel, H. Song, and R. Katz, \Algebra-based scalable overlay net- work monitoring: Algorithms, evaluation, and applications," IEEE/ACM Trans. Networking, vol. 15, no. 5, pp. 1084{1097, Oct. 2007. [31] M. Coates and R. Nowak, \Network loss inference using unicast end-to-end mea- surement," in Proc. of ITC Seminar on IP Trac, Measurement and Modelling, Monterey, CA, Sept. 2000. [32] H. Su, W. Chen, S. Lin, D. Jin, and L. Zeng, \The inference of link loss rates with internal monitors," in Proc. of IEEE GLOBECOM, New Orleans, LA, Dec. 2008. [33] J. Gui, V. Shah-Mansouri, and V. Wong, \A linear algebraic approach for loss tomography in mesh topologies using network coding," in Proc. of IEEE ICC, Cape Town, South Africa, May. 2010. [34] ||, \Accurate and ecient network tomography via network coding," submitted to IEEE Trans. on Vehicular Technology, 2010. [35] R. Myers, D. Montgomery, and G. Vining, Generalized Linear Models: With Appli- cations in Engineering and the Sciences. John Wiley & Sons, Inc., 2001. [36] G. Grimmett and D. Stirzaker, Probability and Random Processes, 3rd ed. Oxford University Press, 2001. [37] G. Golub and C. Loan,Matrix Computations, 3rd ed. The Johns Hopkins University Press, 1996. Bibliography 59 [38] \Internet2 network," http://www.internet2.edu/network/. [39] \BRITE," www.cs.bu.edu/brite/.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Accurate and efficient network monitoring on mesh topologies...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Accurate and efficient network monitoring on mesh topologies via network coding Gui, Jiaqi 2010
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Accurate and efficient network monitoring on mesh topologies via network coding |
Creator |
Gui, Jiaqi |
Publisher | University of British Columbia |
Date Issued | 2010 |
Description | Accurate and efficient measurement of network-internal characteristics is critical for management and maintenance of large-scale networks. In this thesis, we propose a linear algebraic network tomography (LANT) framework for active inference of link loss rates on mesh topologies via network coding. Probe packets are transmitted from the sources to the destinations along a set of paths. Intermediate nodes linearly combine the received probes and transmit the coded probes using pre-determined coding coefficients. Although a smaller probe size can reduce the bandwidth usage of the network, the inference framework is not valid if the probe size falls below a certain threshold. To this end, we establish a tight lower bound on probe size which is necessary for establishing the mappings between the contents of the received probes and the losses on the different sets of paths. Then, we develop algorithms to find the coding coefficients such that the lower bound on probe size is achieved. Furthermore, we propose a linear algebraic approach to developing consistent estimators of link loss rates, which converge to the actual loss rates as the number of probes increases. We show that using the LANT framework, the identifiability of a link, which only depends on the network topology, is a necessary and sufficient condition for the consistent estimation of its loss rate. Simulation results show that the LANT framework achieves better estimation accuracy than the belief propagation (BP) algorithm for large number of probe packets. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-06-21 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivatives 4.0 International |
DOI | 10.14288/1.0071010 |
URI | http://hdl.handle.net/2429/25924 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2010-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2010_fall_gui_jiaqi.pdf [ 553.48kB ]
- Metadata
- JSON: 24-1.0071010.json
- JSON-LD: 24-1.0071010-ld.json
- RDF/XML (Pretty): 24-1.0071010-rdf.xml
- RDF/JSON: 24-1.0071010-rdf.json
- Turtle: 24-1.0071010-turtle.txt
- N-Triples: 24-1.0071010-rdf-ntriples.txt
- Original Record: 24-1.0071010-source.json
- Full Text
- 24-1.0071010-fulltext.txt
- Citation
- 24-1.0071010.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0071010/manifest