"Science, Faculty of"@en . "Computer Science, Department of"@en . "DSpace"@en . "UBCV"@en . "Tayarani Najaran, Mahdi"@en . "2009-08-26T17:39:49Z"@en . "2009"@en . "Master of Science - MSc"@en . "University of British Columbia"@en . "Multimedia applications are known to have low end-to-end latency requirements for interactivity, suggesting UDP to be a suitable transport protocol. Traditionally, the bandwidth requirements were set to be well below the network capacity. Modern applications could evolve if more bandwidth were available without giving up interactivity. However, the lack of a suitable transport providing such high bandwidth with low latency has prevented the existence of the next generation of multimedia applications. A transport built on top of UDP can no longer traverse the network without being hit by major packet drops or severely damaging other traffic. TCP, having built-in congestion control, is safe to use for high bandwidth requirements, but has been associated with unexpected high latencies. \nIn this thesis, we go against conventional wisdom and design Paceline, a transport layer on top of TCP to support high-bandwidth low-latency applications. Paceline improves latency through application level rate control. It also detects when TCP connections exponentially back-off and automatically replaces them with standby connections. Through evaluation we show Paceline improves TCP's shortcomings in terms of end-to-end latency in extremely congested networks, without sacrificing TCP's effectiveness in utilization or bandwidth sharing.\nWe design two different applications on top of Paceline for streaming live video. We present a preliminary evaluation of the application's performance when using Paceline in congested networks and show significant improvements compared to TCP."@en . "https://circle.library.ubc.ca/rest/handle/2429/12570?expand=metadata"@en . "1245406 bytes"@en . "application/pdf"@en . "Paceline: Toward High-Bandwidth Interactive Continous Media Applications Over TCP by Mahdi Tayarani Najaran B.Sc., Ferdowsi University of Mashhad, 2005 M.Sc., Sharif University of Technology, 2007 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Computer Science) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2009 c\u00C2\u00A9 Mahdi Tayarani Najaran 2009 Abstract Multimedia applications are known to have low end-to-end latency require- ments for interactivity, suggesting UDP to be a suitable transport protocol. Traditionally, the bandwidth requirements were set to be well below the net- work capacity. Modern applications could evolve if more bandwidth were available without giving up interactivity. However, the lack of a suitable transport providing such high bandwidth with low latency has prevented the existence of the next generation of multimedia applications. A trans- port built on top of UDP can no longer traverse the network without being hit by major packet drops or severely damaging other traffic. TCP, having built-in congestion control, is safe to use for high bandwidth requirements, but has been associated with unexpected high latencies. In this thesis, we go against conventional wisdom and design Paceline, a transport layer on top of TCP to support high-bandwidth low-latency applications. Paceline improves latency through application level rate con- trol. It also detects when TCP connections exponentially back-off and auto- matically replaces them with standby connections. Through evaluation we show Paceline improves TCP\u00E2\u0080\u0099s shortcomings in terms of end-to-end latency in extremely congested networks, without sacrificing TCP\u00E2\u0080\u0099s effectiveness in utilization or bandwidth sharing. We design two different applications on top of Paceline for streaming live video. We present a preliminary evaluation of the application\u00E2\u0080\u0099s per- formance when using Paceline in congested networks and show significant improvements compared to TCP. ii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3 Transport: Latency Controller . . . . . . . . . . . . . . . . . 7 3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Application Level Measurements . . . . . . . . . . . . . . . . 8 3.2.1 Latency Measurements . . . . . . . . . . . . . . . . . 9 3.2.2 Bandwidth Measurements . . . . . . . . . . . . . . . 11 3.3 Congestion Window Approximation . . . . . . . . . . . . . . 11 3.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . 17 3.4.2 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.4.3 Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.5 Incremental Deployment . . . . . . . . . . . . . . . . . . . . 22 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4 Transport: Failover . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2 Connection Management . . . . . . . . . . . . . . . . . . . . 27 4.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.1 Mechanism . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.2 Controller . . . . . . . . . . . . . . . . . . . . . . . . 29 iii Table of Contents 4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.5 Incremental Deployment . . . . . . . . . . . . . . . . . . . . 31 4.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5 Application: Live-In . . . . . . . . . . . . . . . . . . . . . . . . 34 5.1 QStream and Live-in . . . . . . . . . . . . . . . . . . . . . . 34 5.2 Application Level Evaluation . . . . . . . . . . . . . . . . . . 38 5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6 Application: QPS . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.2 Preferential Subscriptions . . . . . . . . . . . . . . . . . . . . 46 7 Conclusion and Future Work . . . . . . . . . . . . . . . . . . 48 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 iv List of Tables 3.1 Average CWND size in the experiments based on the number of flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Oldest un-acked measurements for TCP . . . . . . . . . . . . 19 3.3 Oldest un-acked measurements for SOCK and ADAPT la- tency controllers vs. TCP . . . . . . . . . . . . . . . . . . . . 19 3.4 Jain fairness index measurements for SOCK and ADAPT la- tency controllers vs. TCP . . . . . . . . . . . . . . . . . . . . 20 3.5 Oldest un-acked measurements for ADAPT and mixed TCP . 22 4.1 Worst case oldest un-acked measurements for SOCK and ADAPT latency controllers vs. TCP . . . . . . . . . . . . . . . . . . . 26 4.2 Median and 99.9 percentile oldest un-acked measurements for the ADAPT latency controller with failover . . . . . . . . . . 30 4.3 Worst case oldest un-acked measurements for the ADAPT latency controller with failover . . . . . . . . . . . . . . . . . 30 4.4 Jain fairness index measurements for ADAPT latency con- troller with failover vs TCP. . . . . . . . . . . . . . . . . . . . 31 4.5 Oldest un-acked measurements for ADAPT with failover and mixed TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 v List of Figures 1.1 General structure of QStream . . . . . . . . . . . . . . . . . . 3 3.1 rttewma vs. rttewma\u00E2\u0088\u0092 vs. TCP rtt measurements . . . . . . . 10 3.2 Different rate measurement signals vs. b\u00CC\u0082w. . . . . . . . . . . . 12 3.3 Long-term vs. Long-term+ vs. Long-term\u00E2\u0088\u0092 smoothing of bandwidth measurements . . . . . . . . . . . . . . . . . . . . 13 3.4 \u00CB\u0086CWND from Equation 3.4 compared to CWND . . . . . . 13 3.5 \u00CB\u0086CWND obtained from Equation 3.6 vs. CWND . . . . . . . 15 3.6 Emulab setup used for evaluation . . . . . . . . . . . . . . . . 18 3.7 Oldest un-acked and fairness of ADAPT compared to TCP with 8, 16, 24 and 32 flows. . . . . . . . . . . . . . . . . . . . 21 3.8 Oldest un-acked of ADAPT with mixed number of TCP flows. 23 3.9 Fairness of ADAPT with mixed number of TCP flows. . . . . 24 3.10 Fairness of ADAPT flows vs. TCP with 16 flows and 50% mix 25 4.1 Message port session . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Oldest un-acked and fairness of ADAPT with mixed number of TCP flows. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.1 Qvid Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.2 Live timeline. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.3 Live-In output for local streaming . . . . . . . . . . . . . . . 38 5.4 Xmit window size versus application goodput. The end-to- end latency is xmit window size plus a constant 3 frames for setup and decode (1 frame = 33.3ms). . . . . . . . . . . . . . 39 5.5 Average frames per second with an 8 frame xmit window size. 40 6.1 Peer-to-peer conferencing vs. QPS conferencing. . . . . . . . 42 6.2 Qps session and viewers. . . . . . . . . . . . . . . . . . . . . . 44 6.3 Qps from a subscriber\u00E2\u0080\u0099s view. . . . . . . . . . . . . . . . . . . 45 6.4 Live TV coverage of an event. . . . . . . . . . . . . . . . . . . 47 vi Acknowledgements I would like to take the time to thank my supervisor Charles \u00E2\u0080\u009CBuck\u00E2\u0080\u009D Krasic for the inspiration, motivation and guidance throughout this work. I\u00E2\u0080\u0099d also like to thank my dear wife Sara, for all her support. 1 Chapter 1 Introduction Multimedia applications have time sensitive data, requiring a transport pro- tocol that can deliver data in strict time bounds over the Internet. For long, transports built on top of UDP have been the choice for time sensitive appli- cations, in contrast to TCP which is believed to be suitable for most types of traffic, including bulk data transfer. Modern multimedia applications are increasing their bandwidth requirements, but the lack of a suitable trans- port in the Internet has prevented new generations of such applications from dominating the Internet. New applications such as multi-party high defini- tion video conferencing or on-line multiplayer games with an extremely high number of players demand both high bandwidth and low latency from the network. Based on traditional wisdom, application designers either have to choose UDP for low latency and try consuming more bandwidth, use TCP for high bandwidth with the hopes that the network load is low enough for TCP to have an acceptable performance, or design a new transport protocol for the Internet. Neither of these solutions seem convenient enough. Without congestion control, it is not responsible to use UDP for high bandwidth applications. Moreover, TCP\u00E2\u0080\u0099s latency starts degrading drastically as the network usage closes upon the network capacity. Finally, TCP has certain features, i.e. it is extremely efficient in consuming available bandwidth, flows share bandwidth in a rather fair manner, it is extremely efficient in highly utilizing network links and provides an easy to use service model to the application layer, that have been fine tuned over the past decades. Generally speaking, TCP has become sophisticated enough to dominate new protocols. This makes TCP, by far, the dominant transport protocol in the Internet, and the process of designing a successful new transport protocol overly complicated and to some extent, impossible. Instead of trying to replace TCP, we decided to improve the performance of TCP using application level techniques. We call the transport we designed Paceline. We evaluated Paceline in the context of video streaming using QStream [18]. QStream is an event based video player than can adapt to disk speed, network speed, available CPU and available bandwidth. Figure 1.1 displays 2 Chapter 1. Introduction the relationship between three different parts of QStream: the application layer referred to as Qvid, the transport layer referred to as Paceline, and the event loop referred to as QSF. Figure 1.1: General structure of QStream. Message Port fragments application data units (ADU) handed down from higher levels into messages. Each message is assigned a unique sequence number, and the messages are ordered by priority. Messages are sent over a TCP connection, re-assembled into ADUs in the receiving side Message Port and handed up to higher levels. At any point the higher levels can cancel ADUs they had previously written to Message Port, which is a basis of the adaptation to network bandwidth. We present a brief overview of related work in Chapter 2. The main contributions of our work consist of the next two chapters. We present our latency controller in Chapter 3. The latency controller adjusts the sending rate to minimize the delay experienced by messages. We evaluate our latency controller in various heavy network conditions and compare its performance against normal TCP. The results show decreases in median end-to-end la- 3 Chapter 1. Introduction tency over TCP by factors close to 3. Chapter 4 describes the failover mechanism used to improve worst case results of of Chapter 3. Paceline detects when the TCP connection gets stuck in exponential back-offs and replaces it with a new standby TCP connection. We present two different applications, Live-In and Qps (QStram Pub- Sub), in Chapters 5 and 6 respectively. Live-In was designed with live video conferencing intentions. We evaluate how Paceline can assist such applications and show clear improvements over TCP. Qps was created to assist Live-In with multi-party video conferencing, and thereafter, support for real-time video mixing was added. Evaluation of Qps is left to future work. Chapter 7 concludes and discusses what we presented, and provides guidelines for future work.. 4 Chapter 2 Related Work Support for high-bandwidth low-latency streams has been a core subject of a broad array of multimedia networking research. The main influences on our work have been work on congestion control, and alternative transport protocols. TCP Vegas was a seminal work in congestion control that pro- posed the use of delay measurements to proactively respond to congestion [3], Paceline\u00E2\u0080\u0099s latency controller is part of the long line of work that has since employed similar techniques. Later work on slowly responsive TCP-friendly congestion control aimed to better suit the needs of multimedia applica- tions [4]. Paceline reflects the general trend in multimedia toward user-level implementations that make due with TCP. Aside from congestion control, alternative transport service models have also appeared, such as the SCTP [13] and DCCP [9], and more recently Structured Streams (SST)[5]. Like SCTP, Paceline identifies head of line blocking as a major issue at the transport level. Paceline\u00E2\u0080\u0099s use of multi- ple transport connections has some similarity to SCTP\u00E2\u0080\u0099s support of multi- homing, but SCTP multi-homing is meant for use with redundant physical paths, while Paceline\u00E2\u0080\u0099s failover is employed for connections on the same path. Paceline shares a datagram (message) orientation of DCCP, and like DCCP, Paceline was designed with multimedia applications such as video streaming as the main target. However, Paceline has major differences in design and implementation due to the fact that it works above TCP, rather than providing a complete replacement. SST, currently implemented over UDP, provides cheap connection setup and tear down for web data. A major inspiration for Paceline was work of Bhandarkar [1] which pro- posed to overcome obstacles facing active queue management (AQM), by emulating it at end hosts. Paceline is similar in that its latency controller can be viewed as a user-level emulation of TCP Vegas like rate-based con- gestion control. Even though exponential back-off of TCP was known to be essential to TCP\u00E2\u0080\u0099s stability, Mondal et al. [12] have proven removing it from TCP to be safe in today\u00E2\u0080\u0099s Internet. Paceline\u00E2\u0080\u0099s failover mechanism is an application level technique to detect and eliminate TCP exponential back-off. 5 Chapter 2. Related Work RD network services [16] share our motivation of providing high band- width and low delay communication. However, applications have to choose either low delay or high bandwith but not both. In addition, RD proposes changing routers to implement separate queues for different traffic types while we employ a rate control algorithm above TCP to reduce queuing delays for all traffic types. Recent work on fast paced large scale games, DonneyBrook [2], explains the high bandwidth requirements for emerging network games. Emerging games do not adhere to the old wisdom of network games having thin com- munication streams [15]. Interestingly, DonneyBrook\u00E2\u0080\u0099s main contribution is reducing the bandwidth requirements by defining interest sets which is a type of adaptation that can be improved using Paceline\u00E2\u0080\u0099s adaptation prim- itives. Priorities can better capture the range of players\u00E2\u0080\u0099 interests instead of using two discrete types of updates (important and less frequent). More- over, cancellation of expired updates can enable rate adaptation based on the network conditions without using a complex reservation scheme. The work done in [17] presents a potential solution to improve TCP\u00E2\u0080\u0099s balancing of bandwidth fairly among competing flows. They propose using a variable maximum segment size in TCP. Minbuff [7] proves that a great portion of latency experienced by TCP packets is inside the kernel. Hence, they suggest the application limit its sending rate based on information provided by the kernel. They explore different criteria for limiting the rate, and find a balance between utilization and low latency. Paceline employs a similar technique in the application level to prevent data from spending excessive amounts of time in kernel buffers. 6 Chapter 3 Transport: Latency Controller TCP provides in-order and reliable delivery of data, performs congestion control and is extremely aggressive in utilizing network bandwidth. Such properties, which have been fine-tuned over the past decades, cause the process of designing a new transport protocol that is able to compete with TCP in all aspects to become extremely challenging. TCP is currently, by far, the dominant transport in the Internet [6]. However, TCP provides no end-to-end timing guarantees, requiring time sensitive applications using TCP to deal with such issues. The work done in [7] shows a noticeable portion of the delay experienced by data communi- cated via TCP is spent in kernel buffers. An adaptive application that can limit it\u00E2\u0080\u0099s sending rate can both take advantage of TCP\u00E2\u0080\u0099s properties alongside operating in a low latency zone. In this chapter we describe how rate limiting is done by the application. We describe the logic behind the latency controller, along with a description about how the measurements needed by the controller are obtained. We then present an evaluation of our latency controller using emulation in an Emulab setup. 3.1 Methodology An application limiting the socket fill level to CWND + 3 \u00C3\u0097MSS should expect to experience the least amount of sender side buffering inside the kernel, without sacrificing utilization or claiming less bandwidth than it\u00E2\u0080\u0099s fair share, i.e. the amount of bandwidth it would claim without rate limit- ing. Inspired by work done in the kernel [7], where they limit sending rate of TCP sockets inside the kernel, we designed a similar latency controller in the application layer, referred to here as SOCK. Using information di- rectly provided from the kernel through the socket API, we have access to the instantaneous value of the congestion window size, CWND, of the un- 7 3.2. Application Level Measurements derlying TCP connection. The application queries the sock fill level using SIOCOUTQ ioctl, and writes the amount of data to socket to keep the fill level close to the aforementioned socket limit. We implemented and tested the SOCK latency controller inside Pace- line (see Figure 1.1). Even though the performance of this controller was extremely satisfying, it has its own limitations, two of which are of great im- portance. First, the SOCK controller may not behave as expected when the end-to-end connection is split over multiple TCP connections. For example, when using a local proxy with SSH tunneling, the first TCP connection is to the local proxy and the second TCP connection is over the Internet. This means information provided by the local TCP connection\u00E2\u0080\u0099s socket does not reflect true behaviour of the network. In such a scenario SOCK may suc- cessfully prevent its socket buffer from filling, even though the down stream SSH socket buffer has become full, thus paralysing our latency controller. Second, the SOCK latency controller relies on information from the socket API that is not provided in all implementations in different operating sys- tems, thus limiting portability of the SOCK latency controller to specific platforms. To overcome portability limitations mentioned above, we decided to de- sign a portable latency controller that behaves very similar to SOCK. The new controller, which we call the ADAPT latency controller and is the main subject of this chapter, does not rely on any information provided by the kernel to ensure portability. An application can keep count of the number of bytes it writes to TCP sockets, and hence, can keep track of each socket\u00E2\u0080\u0099s fill level. This eliminates the need to query for socket fill levels through the kernel. If a latency con- troller is able to estimate the value of TCP\u00E2\u0080\u0099s CWND at any time without the help of the kernel, it can act upon the same logic the SOCK controller does in limiting its sending rate. We, hereafter, call this estimation \u00CB\u0086CWND, and explain the process through which we achieved very accurate estima- tions. First, we explain how to measure data required to estimate CWND. 3.2 Application Level Measurements Considering the reliable and in-order service provided by TCP, implementing application level acknowledgements becomes very straight forward. Every message that is transmitted from the sender side is tagged with a unique sequence number, and the sender keeps a list of written sequence numbers. On the other side, when messages arrive, the receiver generates and sends a 8 3.2. Application Level Measurements P-ACK 1 of the sequence number of the message that has arrived, or the last sequence number if multiple messages arrive at the same time. Upon receipt of the P-ACK by the sender, it is safe to consider all sequence numbers before and equal to the P-ACKed sequence number in the list as acknowledged. Application level acknowledgements can be used to measure two impor- tant properties of the network: 1) Latency 2) Bandwidth. 3.2.1 Latency Measurements We keep track of the write time of every message that is written to socket, and the round trip time (rtt) it experienced is calculated by the time differ- ence between the write time and acknowledge time of that message. Con- sidering the fact that usually more than one message is acknowledged at a time, we face multiple round trip time measurements rather than one. This set of rtt calculations provides us with valuable information. The measurements are done in the application layer and are subject to many types of noise, such as scheduling delays inside the kernel, that are independent from network conditions. However, such noise can only increase round trip time measurements, meaning the minimum time in the set of rtt calculations represents the best sample of round trip time possible in the application level. We use exponentially weighted moving averaging (EWMA) to keep track of the smoothed value of the minimum round trip time, namely rttewma: ewma(avg, sample, \u00CE\u00B1) { ave = (1\u00E2\u0088\u0092 \u00CE\u00B1)\u00C3\u0097 sample+ \u00CE\u00B1\u00C3\u0097 avg (3.1) } TCP events, such as retransmits, are concealed from the application\u00E2\u0080\u0099s view. It is not feasible to infer accurately about real network characteristics using only the rttewma signal. However, there are occasions in which some messages go through the network with the least possible delay. In order to value such measurements, we use asymmetric EWMA, namely ewma\u00E2\u0088\u0092, to obtain rttewma\u00E2\u0088\u0092: ewma\u00E2\u0088\u0092(avg, sample, \u00CE\u00B1) { if(sample \u00E2\u0089\u00A4 avg) avg = sample (3.2) 1Paceline ACK 9 3.2. Application Level Measurements else avg = ewma(avg, sample, \u00CE\u00B1) } rttewma\u00E2\u0088\u0092 responds immediately to drops in round trip time measure- ments but smoothes increases. If the smoothing is done heavily, i.e. smooth- ing factor \u00CE\u00B1 is high, rttewma\u00E2\u0088\u0092 can be representative of the real network round trip time. Figure 3.1 is a sample plot of rttewma and rttewma\u00E2\u0088\u0092 compared to TCP\u00E2\u0080\u0099s measurements of round trip time, and shows how rttewma\u00E2\u0088\u0092 is closer to TCP\u00E2\u0080\u0099s measurements of rtt than rttewma. 0.07 0.08 0.09 0.1 0.11 0.12 0.13 0.14 0.15 0.16 220 225 230 235 240 Tim e (S ) Time (S) TCP RTT RTT EWMA RTT EWMA- Figure 3.1: rttewma versus rttewma\u00E2\u0088\u0092 versus TCP round trip time measure- ments. Notice how for a significant amount of time rttewma\u00E2\u0088\u0092 is less than TCP round trip measurements. Finally, the highest round trip time measurement in the set illustrates how data flows through this TCP connection. We find the maximum of the current highest rtt measurement and the oldest message written to socket and not yet acknowledged. We call this signal oldest unacked, and keep a profile of this signal as a metric of performance of the latency controller (see Section 3.4.2 for results). Higher values of oldest unacked imply that the application on top of this connection experiences a \u00E2\u0080\u009Cdead zone\u00E2\u0080\u009D and has to wait longer to send and receive data, an issue referred to as head of line blocking. We emphasize that rtt represents timing of data sent over the network, whereas oldest unacked represents time periods where nothing is sent. 10 3.3. Congestion Window Approximation 3.2.2 Bandwidth Measurements An incoming flow of P-ACKs can be used to measure network bandwidth as well. Each incoming P-ACK acknowledges one or more messages. Pe- riodically counting the number of bytes acked in a constant time slot, e.g. every 10ms, divided by the size of the time slot will give an approximation b\u00CC\u0082w of the network bandwidth. We apply three different filters to b\u00CC\u0082w: ewma, ewma\u00E2\u0088\u0092 and ewma+ (which is exactly the opposite of ewma\u00E2\u0088\u0092 and is defined by Equation 3.3) to obtain three smoothed signals bwewma, bwewma\u00E2\u0088\u0092 and bwewma+ respectively. ewma+(avg, sample, \u00CE\u00B1) { if(sample \u00E2\u0089\u00A5 avg) avg = sample (3.3) else avg = ewma(avg, sample, \u00CE\u00B1) } Figure 3.2 shows a sample plot of the three different bandwidth measure- ment signals compared to b\u00CC\u0082w, and Figure 3.3 illustrates how they relate to each other. As we will show, the value of calculating bwewma\u00E2\u0088\u0092 (bwewma+) lies in the fact that it catches the bases (peaks) bwewma misses due to smooth- ing and tends to under-estimate (over-estimate) bwewma. As we will show shortly, our strategy will be to combine these two signals in a way that matches the average, bwewma, but retains the ability to track peaks and bases. In the next section we explain how to use the measurements ob- tained so far to design the ADAPT latency controller. 3.3 Congestion Window Approximation TCP uses CWND as a limit on the number of bytes in flight in the net- work. Intuitively, having reasonably accurate measurements of bandwidth and latency, we expect the product of bandwidth and latency to approximate CWND. We used Equation 3.4 in the initial design of ADAPT. \u00CB\u0086CWND = rttewma \u00C3\u0097 bwewma (3.4) We tested this controller and discovered it doesn\u00E2\u0080\u0099t perform like SOCK (Figure 3.4). The main reason we smooth the signals is to reduce the effects 11 3.3. Congestion Window Approximation 50000 100000 150000 200000 250000 300000 350000 100 110 120 130 140 150 Ra te (b ps) Time (S) BW-EWMA BW (a) bwewma vs. b\u00CC\u0082w 50000 100000 150000 200000 250000 300000 350000 100 110 120 130 140 150 Ra te (b ps) Time (S) BW-EWMA- BW (b) bwewma\u00E2\u0088\u0092 vs. b\u00CC\u0082w 50000 100000 150000 200000 250000 300000 350000 100 110 120 130 140 150 Ra te (b ps) Time (S) BW-EWMA+ BW (c) bwewma+ vs. b\u00CC\u0082w Figure 3.2: Different rate measurement signals vs. b\u00CC\u0082w. 12 3.3. Congestion Window Approximation 50000 100000 150000 200000 250000 300000 350000 100 110 120 130 140 150 Rat e (bp s) Time (S) BW-EWMA BW-EWMA- BW-EWMA+ Figure 3.3: bwewma versus bwewma\u00E2\u0088\u0092 versus bwewma+. The invariant bwewma+ \u00E2\u0089\u00A5 bwewma \u00E2\u0089\u00A5 bwewma\u00E2\u0088\u0092 holds at any time. 6000 8000 10000 12000 14000 16000 18000 20000 22000 24000 26000 140 150 160 170 180 190 200 210 220 Byte s Time (S) ADAPT TCP CWND (a) Steady state. Notice how \u00CB\u0086CWND follows the trend of CWND on the top, but does not reset when CWND does 0 10000 20000 30000 40000 50000 60000 20 40 60 80 100 120 140 Byte s Time (S) ADAPT TCP CWND (b) Start-up. Notice it takes around 110 seconds for \u00CB\u0086CWND to reach CWND Figure 3.4: \u00CB\u0086CWND from Equation 3.4 compared to CWND 13 3.3. Congestion Window Approximation of measurement noise, and we found a latency controller based on non- smoothed measurements to be unstable. However, using smoothed measurements results in a very slowly respon- sive controller. Figure 3.4(a) displays \u00CB\u0086CWND compared to CWND of the TCP connection, each plus 3 \u00C3\u0097MSS. It is worth noting that the latency controller only works if it can approximate CWND. We can see how this controller is slow in following changes in CWND. This slow responsiveness of the controller impairs performance in two respects. First, when TCP drops it\u00E2\u0080\u0099s rate by resetting CWND to initial value (i.e. when a TCP re- transmission timeout occurs), the controller does not respond immediately, causing the socket to fill and latency to increase. Second, when bandwidth is available in the network, the controller is very weak in claiming available bandwidth, causing the controller to take much longer to reach steady state than a normal TCP connection normally would. Notice how in the first 140 seconds of the same run plotted in Figure 3.4(b) \u00CB\u0086CWND is no where close to CWND. With intentions to increase responsiveness, first we need an indicator of the network condition. Based on the continuous rate of incoming acknowl- edgments, we use alterations in bandwidth measurements to infer about the state of the network. If packets are dropped due to congestion, there is a sudden decrease in bw. On the contrary, if bandwidth is available in the net- work, bw will start rising. However, in both cases, bwewma gradually follows the changes and is almost constant as these changes occur. Thus the term pressure defined by the ratio of bw to bwewma can be used as the network condition indicator. When bandwidth is available, pressure climbs close to 1, and when congestion happens, pressure becomes very close to 0. We can re-write Equation 3.4 in the form of Equation 3.5, where bwewma has been replaced with the average of bwewma\u00E2\u0088\u0092 and bwewma+, the average of an under-estimate and an over-estimate of available bandwidth. \u00CB\u0086CWND = rttewma \u00C3\u0097 (0.5\u00C3\u0097 bwewma\u00E2\u0088\u0092 + 0.5\u00C3\u0097 bwewma+) (3.5) Furthermore, using the network condition indicator, by replacing the constant factors 0.5 with pressure we get to Equation 3.62. pressure = CLAMP ( bw2\u00C3\u0097bwewma , 0, 1) \u00CB\u0086CWND = rttewma\u00C3\u0097 (1\u00E2\u0088\u0092 pressure) \u00C3\u0097bwewma\u00E2\u0088\u0092 (3.6) +rttewma\u00C3\u0097 pressure \u00C3\u0097bwewma+ 2CLAMP(x, min, max) clamps the value of x to the range [min,max]. 14 3.3. Congestion Window Approximation What we achieve by this form of the equation is the ability to decide which bandwidth estimation to use as the dominant factor of our estimation. When bandwidth is available and pressure > 0.5, bwewma+ becomes the dominant factor, which is the highest rate recently observed in the network. This helps the underlying TCP connection grasp any available bandwidth and reach steady state like a normal TCP connection. On the other hand, when congestion happens in the network and pressure < 0.5, bwewma\u00E2\u0088\u0092 has more weight in the equation, which is the lowest bandwidth observed. This will act almost the same way TCP does in decreasing or reseting it\u00E2\u0080\u0099s CWND. In steady state conditions where pressure \u00E2\u0089\u0088 0.5, Equation 3.6 becomes a reasonable estimation of CWND. This is confirmed in Figure 3.5, which depicts \u00CB\u0086CWND compared to CWND using Equation 3.6. As we will show in the next section, the accuracy of the estimation is high enough for the ADAPT latency controller to have similar results as the SOCK latency controller. However, under very heavy load the controller mis-behaves. These are the conditions in which relatively small adjustments to rate can have large effects on queuing delay. This sometimes causes a positive feedback loop, where a small overestimate of bandwidth leads to an increase in rttewma, which results in further over estimation, manifesting as a persistently rising rttewma. These are the conditions in which rttewma rises without any significant increase in bwewma. 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 80 100 120 140 160 180 Tar get (By tes) Time (S) Rate Target Sock Target Figure 3.5: CWND obtained from the kernel versus \u00CB\u0086CWND estimated by Equation 3.6. To detect this situation, we introduce a new stability check in the con- troller, which uses the ratio of rttewma to rttewma\u00E2\u0088\u0092 as a conservative sign that a correction is necessary. This ratio is always greater than 1, but under 15 3.4. Evaluation normal operating conditions should not be much greater. The ratio is greater than 1 because we expect some packets to experience retransmissions on a regular basis. However, we expect at least some packets not to undergo retransmission. For these packets we expect the ratio to be between 1 and 2. This imples that if we have rttewma/rttewma\u00E2\u0088\u0092 > 2, we may take it as a sign that queueing delay in socket buffers is causing the delay, not retransmissions. In the case where the controller becomes instable, rttewma is no longer a reliable signal since it contains a fake amount of kernel buffering delay. To compensate the over estimation caused by this undesirable delay, we replace the first rttewma in Equation 3.6 with rttewma\u00E2\u0088\u0092. This results in a temporal under-estimation of CWND (recall Figure 3.1), until the controller re-stabilizes (Equation 3.7). if( rttewma rttewma\u00E2\u0088\u0092 \u00E2\u0089\u00A4 2) rttsignal = rttewma else rttsignal = rttewma\u00E2\u0088\u0092 (3.7) \u00CB\u0086CWND = rttsignal\u00C3\u0097 (1\u00E2\u0088\u0092 pressure) \u00C3\u0097bwewma\u00E2\u0088\u0092 +rttewma\u00C3\u0097 pressure \u00C3\u0097bwewma+ Equation 3.7 is currently the best latency controller we have designed as ADAPT. In the next section, we present results of evaluating the ADAPT latency controller under various conditions and comparing it to SOCK and to TCP with no latency controller. We collected the same results for Equations 3.4, 3.5 and 3.6. We do not present them as they are strictly worse than results obtained from Equation 3.7. 3.4 Evaluation We evaluate performance of our latency controller in terms of: \u00E2\u0080\u00A2 Oldest un-acked \u00E2\u0080\u00A2 Fairness \u00E2\u0080\u00A2 Utilization 16 3.4. Evaluation Oldest un-acked is a representation of how responsive the transport is from the application\u00E2\u0080\u0099s point of view. Our goal in designing the latency controllers was to improve this characteristic of TCP. However, TCP flows are very efficient in sharing bandwidth amongst themselves in a fair and distributed manner. It is very important that we do not sacrifice this fairness to achieve good latency. In the following subsections, we compare the ADAPT latency controller to normal TCP without a latency controller to see how a controller can improve quality of service provided by TCP. However, since ADAPT was meant to imitate SOCK, we also include SOCK in the evaluations as a metric of the optimum performance. We conduct our evaluation within an Emulab network testbed. 3.4.1 Experimental Setup Our network setup uses the common dumbbell topology of Figure 3.6, where a set of clients on a LAN connect through a single bandwidth-delay con- strained link, emulating a wide-area path, to a set of servers on a remote LAN. In the experiments, we vary the number of flows sharing the path. For the WAN delay, we use a 30ms round trip time and the WAN bottleneck has a bandwidth of 16Mbps using drop-tail queueing with a queue size of twice the bandwidth-delay product of the WAN path. In congested conditions, this effectively adds two round trip times (60ms) and the sum of propagation and queueing delay yield a total network round trip time of 90ms. We configure the number of clients and servers to ensure that the WAN path is the bottleneck, not some other client or server resource. Each exper- iment run lasts more than 6 minutes (380 seconds). To eliminate experimen- tal startup and shutdown effects, our measurements come from the middle 300 seconds of each run. The node in the middle, node0 in Figure 3.6, does traffic shaping to emulate the bottleneck link and collects tcpdump data if required. We use a set of scripts3 to automate the whole process. The nodes each run a separate instance of QStream. The client nodes request a video file from a server on the other side of the bottleneck link. Each node runs a local instance of Qmon, QStream\u00E2\u0080\u0099s data monitoring application to collect data. We set up various check-points between experiment runs, using emulab\u00E2\u0080\u0093sync, to ensure the nodes execute the experiment in synchronized fashion. Except where noted, we ensure that the bottleneck link is fully saturated. 3Inside QStream main repository 17 3.4. Evaluation Number of flows 2 4 8 16 24 32 Average CWND size (MSS) 60 30 15 7.5 5 3.75 Table 3.1: Average CWND size in the experiments based on the number of flows Based on our choice for bandwidth and delay, Table 3.1 shows the av- erage CWND size of TCP flows, in units of MSS, based on the num- ber of flows sharing the link. The average CWND size is computed as: total bandwidth (bps) \u00E2\u0088\u0097 rtt (s)/((1500 \u00E2\u0088\u0097 8(bits)) \u00E2\u0088\u0097 num flows). We believe 32 flows represent a highly congested network. Figure 3.6: Emulab setup used for evaluation 3.4.2 Latency As mentioned in Section 3.2, when a P-ACK arrives we calculate the worst round trip time experienced by the messages that were acknowledged and refer to it as oldest un-acked. Every 100ms, we find the maximum oldest- unacked in the current time slot and store time traces of this signal for every flow. We then calculate the overall cumulative distribution function (CDF) 18 3.4. Evaluation Number of flows Median (ms) 99.9 Percentile (ms) Worst Case (ms) 2 51.9 65.8 65.8 4 452.5 879.8 960.2 8 591.1 2588.5 3245.0 Table 3.2: Oldest un-acked measurements for 2, 4 and 8 TCP flows Number of flows Median (ms) 99.9 percentile (ms) TCP SOCK ADAPT TCP SOCK ADAPT 8 591.1 140.1 140.0 2588.5 319.5 584.9 16 652.4 164.2 182.5 4580.1 868.1 1165.0 24 707.6 180.6 217.2 4640.0 1212.4 1470.9 32 748.6 211.7 264.9 10256.1 1402.1 2064.5 Table 3.3: Oldest un-acked measurements for SOCK and ADAPT latency controllers versus normal TCP without a latency controller. Notice how the latency controllers decrease the median latency. of all the flows participating in the experiment as the oldest un-acked metric. We expect our latency controller to be useful in network conditions where normal TCP performance is un-acceptable for time sensitive applications. In order to find such conditions in our experimental setup, we ran experiments with various number of flows increasing from 2-8. Table 3.2 displays the results for 2, 4, and 8 flows when using normal TCP. The experiments have been set up in a way that bandwidth requirements of each flow is more than 5Mbps. Two flows is the only case in which the network bandwidth is more than the total requirements of the flows, leaving the bottleneck link un-saturated. Hence, in this zone, TCP has adequate latency. However, when the number of flows doubles to four and saturates the link, the median oldest un-acked increases an order of magnitude, to a zone where normal TCP is not usable for time sensitive applications. We chose eight flows to be the minimum network load for the rest of the ex- periments, where we thoroughly test our latency controllers in zones where TCP latency is unacceptable. The latency results can be found in Table 3.3 and Figure 3.7(a) (we will discuss 99.9 percentile further and consider worst cases in Chapter 4). The SOCK and ADAPT controllers have close responses in terms of median oldest un-acked, and are well below that of TCP. Even in extremely congested network conditions with 32 flows, the median oldest un-acked is well below that of TCP with 8 flows. 19 3.4. Evaluation Number of flows Jain fairness index TCP SOCK ADAPT 8 0.84 0.86 0.84 16 0.78 0.85 0.86 24 0.73 0.82 0.80 32 0.70 0.79 0.76 Table 3.4: Jain fairness index measurements for SOCK and ADAPT latency controllers versus normal TCP without a latency controller. 3.4.3 Fairness In order to investigate the effects of the latency controllers on bandwidth allocation, we use two measures to quantify bandwidth fairness. The first is a metric known as the Jain fairness index [8] defined by the following equation, where n is the number of flows and xi is the bandwidth allocated to flow number i in a given time slot, e.g. every 500ms. The overall Jain index is the average of the index of all time slots. This index ranges from 1/n (worst case) to 1 (best case). The bandwidth allocation of a specific flow is calculated from time traces of b\u00CC\u0082w every 100ms (see Section 3.2). fairness = ( \u00E2\u0088\u0091 xi)2 (n \u00E2\u0088\u0091 x2i ) (3.8) . Table 3.4 has the Jain fairness index measurements for ADAPT and SOCK compared to TCP with 500ms time slots. To understand fairness in greater detail than allowed by the Jain index, we also convert bandwidth measurements from the same experiments into a form of CDF. We subdivide the timeline into uniform time slots (e.g. every 500ms). For each time slot, we compute the ratio of each flow\u00E2\u0080\u0099s bandwidth to the fair bandwidth share (i.e. total bandwidth / number of flows). We then plot the CDF of the bandwidth share of all the flows. We prefer these CDF\u00E2\u0080\u0099s to a single metric because their shapes convey unfairness both in terms of per-flow bandwidth (x position) and degree of affected flows (y position). In the case of perfect fairness, the CDF would appear as a step function, i.e. all flows get 1 unit of fair share bandwidth in every timeslot. As fairness decreases, so will the slope of the CDF line. Also, if the left y-intercept is non-zero, it indicates that some of the flows experienced periods of total starvation. Figure 3.7(b) plots the CDFs of ADAPT and TCP as the number of 20 3.4. Evaluation 0 0.2 0.4 0.6 0.8 1 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Fr ac tio n Latency (ms) ADAPT-8 ADAPT-16 ADAPT-24 ADAPT-32 TCP-8 TCP-16 TCP-24 TCP-32 (a) Oldest un-acked 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Fr ac tio n of F lo ws Sharing Ratio ADAPT-8 ADAPT-16 ADAPT-24 ADAPT-32 TCP-8 TCP-16 TCP-24 TCP-32 (b) Fairness with 500ms time slots Figure 3.7: Oldest un-acked and fairness of ADAPT compared to TCP with 8, 16, 24 and 32 flows. 21 3.5. Incremental Deployment TCP flow % Median (ms) 99.9 percentile (ms) Worst Case (ms) 0 178.8 1159.8 1990.6 25 200.9 1288.1 1744.7 50 214.9 1470.2 2732.4 75 219.9 1264.1 1835.6 Table 3.5: Oldest un-acked measurements for the ADAPT latency controller and mixed TCP with 16 flows. Increase in percentage of TCP flows has little effect on the median. flows vary between 8-32. The vertical line around 1 represents the ideal sharing of bandwidth. Observe that TCP with 8 and ADAPT with 8 and 16 flows are the fairest, and TCP and ADAPT with 32 flows are the least fair. However, neither of the flows experience starvation, and the rest of the runs fall within the range of fairness between TCP with 8 flows and 32 flows. This means ADAPT flows are at least as fair to each as TCP flows. We believe this level of fairness to be acceptable. We also calculate the application level utilization of the bottleneck link to confirm we are not under utilizing available bandwidth in favor of lower round trip times. 3.5 Incremental Deployment In the results presented so far, all the flows participating in an experiment run were of the same type, i.e. they were either all running the same latency controller, or were simply TCP without a latency controller. We would like to investigate how our latency controller would perform if there are other flows in the network competing for bandwidth. This would be a very common situation if we were to incrementally deploy our latency controller in popular Internet applications. We conduct a series of experiments with 16 flows but with a mixture of TCP and ADAPT flows. We vary the mix of flows using TCP from 0-75 percent, while the rest use ADAPT. Figure 3.8 shows the latency results and Table 3.5 shows the median latency, 99.9 percentile and worst case latency for the flows using ADAPT, for each mixture of flows. Figure 3.8(b) is a magnification of the last 0.5% of Figure 3.8(a) in a larger time scale. As the fraction of TCP flows in the mix increases, the median latency worsens, but only slightly. Our latency controller still operates in an acceptable range. The overall fairness plot of Figure 3.9 also shows a fair sharing of band- 22 3.5. Incremental Deployment 0 0.2 0.4 0.6 0.8 1 0 100 200 300 400 500 600 700 800 900 1000 Fr ac tio n Latency (ms) 0%-TCP 25%-TCP 50%-TCP 75%-TCP (a) Oldest un-acked 0.995 0.996 0.997 0.998 0.999 1 0 500 1000 1500 2000 2500 3000 Fr ac tio n Latency (ms) 0%-TCP 25%-TCP 50%-TCP 75%-TCP (b) Oldest un-acked 99.5% to 100% Figure 3.8: Oldest un-acked of ADAPT with mixed number of TCP flows. 23 3.6. Summary 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Fr ac tio n of F lo ws Sharing Ratio 0%-TCP 25%-TCP 50%-TCP 75%-TCP Figure 3.9: Fairness of ADAPT with mixed number of TCP flows. width between the flows. This plot was generated from bandwidth mea- surements of all flows, not only the flows using ADAPT. We would also prefer that our flows do not harm existing TCP traffic unduly. To verify the effect our flows have on TCP flows, we chose the case where 50% of the flows are TCP and plotted two individual CDFs of fairness in Figure 3.10, one for bandwidth allocation to ADAPT flows and one for TCP flows. We notice that the pure TCP flows receive slightly more bandwidth than their fair share, leaving the ADAPT flows with less bandwidth. The fact that ADAPT flows do not consume more bandwidth than they should is a highly desirable property. 3.6 Summary In this chapter we designed a latency controller for TCP connections. The controller is based on application level measurements of network round trip time and bandwidth. Using time traces of oldest un-acked messages and bandwidth, we quantify how the controller behaves in terms of time response and bandwidth sharing. Through evaluating our controller under sever net- work conditions, we discovered we can achieve good time responses without disrupting TCP\u00E2\u0080\u0099s properties of bandwidth sharing and utilization. We also 24 3.6. Summary 0 0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Fr ac tio n of F lo ws Sharing Ratio ADAPT TCP Figure 3.10: Fairness of ADAPT flows versus normal TCP flows. The total number of flows is 16 where 50% are ADAPT. Notice how TCP flows receive slightly more bandwidth than their fair share. explored the effect flows using our latency controller may have on normal TCP flows in the network. We observed our flows experience a small in- crease in median latency, whereas TCP flows are allocated more bandwidth than their fair share. This confirms our latency controller is safe to be incrementally deployed in the Internet, while maintaining its performance. 25 Chapter 4 Transport: Failover In this chapter we present a technique used by Paceline, wherein Paceline can automatically detect when the TCP connection enters a period of inactivity, even when there is data ready to be sent, and replaces it with a standby TCP connection. In the following, first we explain why we need this technique, then we explain how connection management is done in QStream. Finally, we explain how we implemented and evaluated it to be useful. 4.1 Motivation The results of the latency controller are extremely pleasing when compared to normal TCP. However, in the extreme cases tested in the previous chapter, there are worst case situations in which even using the latency controller may not be satisfactory for time sensitive applications. Table 4.1 lists worst case oldest un-acked messages for ADAPT and SOCK latency controllers along with TCP. Notice how the worst case increases as the number of flows increases. Even though the cases between the 99.9 percentile and worst case happen at most 0.1% of the time, such delays could be unsatisfactory. For example for a 100ms sampling period, 0.1% translates to one quality impairment every 100 seconds. We would prefer to minimize the magnitude of those impairments, if possible. Under heavy congestion, TCP can experience back to back packet losses, # of flows Worst Case (ms) Worst Case/Network RTT TCP SOCK ADAPT TCP SOCK ADAPT 8 3245.0 525.0 1085.3 36 5.8 12.1 16 5192.6 1191.8 2228.9 57.7 13.2 24.8 24 6536.7 1694.9 2311.0 72.6 18.8 25.7 32 18587.6 2239.9 3180.5 206.5 24.9 35.3 Table 4.1: Worst case oldest un-acked measurements for SOCK and ADAPT latency controllers versus normal TCP without a latency controller. 26 4.2. Connection Management leading to one or more retransmission timeouts. We observed such timeouts and exponential back-offs are correlated with many of the worst case round trip times. At some point, TCP spends a long time, in orders of hundreds of milliseconds and even seconds, waiting. Considering how the load on the network is constant, we believe these timeouts are spurious events, and there is no need for exponential back-off. We could effectively reduce some of the worst cases if in such state we closed and switched from the current TCP connection to a fresh new connection, which is safe and will not disrupt the stability of the Internet [12]. This is similar to a web user clicking the stop button in a browser followed by reload, when loading a web page takes an un-expected long time. We call this mechanism failover, as we fail from one connection over to another one. There are two components to failover: 1) connection management 2) control. Our main contribution is in the control of failover, but first we overview connection management. 4.2 Connection Management The connection management component implements the control plane func- tions associated with initiating and establishing the transport instances used by Message Port sessions (Figure 1.1). A session is a full duplex redundant channel. For each session, the connection manager may employ several TCP sock- ets. The precise number of sockets and their roles depends on configuration settings, such as the number of standbys used for failover as depicted in Figure 4.1. Prior to the flow of application data, the server and client execute a mini-protocol to establish the initial set of session sockets. This protocol has two phases. The first phase exchanges a configuration greeting and re- sponse between the client process and server process, using the first socket. The client greeting sets the session-wide configuration parameters, such as a globally unique identifier (UUID) for the session, and the number of standby connections. The second phase of the protocol creates the remaining sock- ets, and exchanges a bind greeting and response on each of them. The bind greeting contains the UUID of the session, allowing the server to associate the socket to the correct session state. It also contains information necessary to synchronize the role of each socket between the client and server. We de- fine four socket roles: client to server data, client to server acknowledgement channel, server to client data, and server to client acknowledgement chan- 27 4.3. Implementation Figure 4.1: Message port session nel. A data and acknowledgement socket form a half duplex communication channel (from client to server or from server to client). After the initial set of channels are established, the connection manager may also remove and replace failed channels, as part of the failover functionality described next. 4.3 Implementation 4.3.1 Mechanism The implementation of failover is fully transparent to the application. The failover mechanism works by maintaining a pool of channels between the pair of communicating applications. Each channel contains two TCP sock- ets. We only use one of these sockets (in each direction) for data delivery at a time. As the sender writes messages to socket, it starts a timer set to failoverthresh for that message, indicating the maximum time frame allowed for this message to be acknowledged by the receiver. If a timer for a message fires, the sender migrates data delivery away from the current active chan- nel to one of the available standby channels. All outstanding data has to be retransmitted over the new channel, since we have no information about how much of that data has successfully traveled through the network. Ad- ditionally, the receiver-side needs some logic to suppress duplicate messages that may result of retransmission. Previously, the receiver could simply assemble all incoming messages and 28 4.3. Implementation forward them to higher levels. Since with failover there is a possibility that some messages be duplicates of others, the transport needs to keep track of all messages that have been received from the beginning of the connection. Considering the fact that messages are numbered when they are created on the sender side and get re-ordered before they are written to socket, received sequence numbers are not necessarily in ascending order. Keeping a list of all sequence numbers is also not feasible. To address this problem, we use an illegal window of sequence numbers. The window is calculated in a manner that only sequence numbers inside the illegal window may be duplicates. Hence, if the receiver keeps a list of only the sequence numbers inside the illegal window, it can easily detect duplicates. The sender notifies the receiver to slide the illegal window periodically. Concurrent to activation of the new active channel, the connection man- ager terminates the old one, and initiates a new replacement channel that once established enters the pool of available standbys. As mentioned in the previous section, client to server and server to client communication are handled by separate channels, which allows failover to be managed indepen- dently in each direction as well. When replacing a failed channel, the connection mini-protocol sends a failover message from the sender to the receiver on the newly selected ac- tive channel sockets to ensure both sides are synchronized. This message includes a counter value, maintained by the sender side, incremented each time the sender fails the current active channel. If very severe congestion causes more than one failover to occur back to back, then it is possible that standby channels will be established in an order different than how they were initiated. The counter is used to ensure that the sender and receiver remain correctly synchronized on which channel is the newly active one. 4.3.2 Controller If we manage to setup the failoverthresh to be greater than normal message transmission over TCP, but be less than TCP exponential back-off time- out, we are able to fail over at the right time. We calculate failoverthresh in a similar fashion that TCP calculates its retransmission timeout (RTO) [14]. We keep track of the EWMA smoothed values of maximum variance in round trip times experienced by messages, namely rttvar, when acknowl- edgements arrive. We then calculate the threshold using Equation 4.1, where thresholdmin is set to 225ms, which is the minimum RTO threshold in TCP\u00E2\u0080\u0099s calculations. failoverthresh = MAX(thresholdmin, rttewma\u00E2\u0088\u0092 + 5\u00C3\u0097 rttvar) (4.1) 29 4.4. Evaluation Number of flows Median (ms) 99.9% (ms) ADAPT ADAPT + FO ADAPT ADAPT + FO 8 140.0 140.0 584.9 419.3 16 182.5 179.7 1165.0 707.8 24 217.2 209.6 1470.9 840.3 32 264.9 239.4 2064.5 1023.5 Table 4.2: Median and 99.9 percentile oldest un-acked measurements for the ADAPT latency controller with failover compared to ADAPT without failover. Notice how failover helps decrease the 99.9 percentile, without effecting the median. Number of flows Worst Case (ms) ADAPT ADAPT + FO 8 1085.3 611.8 16 2228.9 892.3 24 2311.0 1052.4 32 3180.5 1281.9 Table 4.3: Worst case oldest un-acked measurements for the ADAPT la- tency controller with failover compared to ADAPT without failover. Notice how failover helps decrease the worst case latency. Notice TCP uses four times the variance in round trip time, but having tested both four and five, we use five. Failing over to a new connection is a costly function, since we are retransmitting some data, and also in the case of false positives (i.e. failover timer firing incorrectly), we are swapping a TCP connection with a large CWND with a fresh TCP connection with CWND = 1 in slow start. Considering the fact that our measurements are done in the application level and are subject to noise, we add a safety margin of an extra rttvar to decrease false positives. Our results confirm a factor of five has fewer false positive than four. 4.4 Evaluation We implemented failover as a complement to our latency controllers. How- ever, using failover with normal TCP does not seem like a reasonable solu- tion, since huge portions of the delay experienced by messages is inside the kernel buffers, not in the network. This causes our failover threshold calcu- 30 4.5. Incremental Deployment Number of flows Jain fairness index TCP ADAPT 8 0.84 0.86 16 0.78 0.85 24 0.73 0.81 32 0.70 0.76 Table 4.4: Jain fairness index measurements for ADAPT latency controller with failover vs TCP. lation to mis-behave. We performed the same experiments we did with the ADAPT latency controller in Section 3.4 armed with failover using the same experimental setup. Table 4.2 presents the median and 99.9 percentile of ADAPT with and without failover, Table 4.3 has the worst case. We notice how failover has almost no effect on the median, but effectively decreases the worst case to less than half. Table 4.4 shows the Jain fairness index for ADAPT with failover com- pared with TCP. The fact that the fairness indices don\u00E2\u0080\u0099t change noticeably with failover is a positive indication that our failover threshold calculation is not mis-behaving. 4.5 Incremental Deployment Once again we evaluate our latency controller while competing with normal TCP flows, this time aided with failover. Figure 4.2 shows the latency results, and Table 4.5 presents the median, 99.9 percentile and worst case latencies for a mixture of 0%-75% TCP flows. We notice the worst case has improved, as compared to results of Section 3.5, without major effects on the median. We do not present the fairness plots for this evaluation since they are very similar to that of Section 3.5. This confirms correctness of the failover mechanism. 4.6 Summary In this chapter we presented a supplementary technique, failover, to the la- tency controller designed in Chapter 3. Failover is designed to detect when TCP suffers from exponential back-off, and close and move data transfer from the current stuck TCP connection to an already open standby TCP 31 4.6. Summary 0 0.2 0.4 0.6 0.8 1 0 100 200 300 400 500 600 700 800 900 1000 Fr ac tio n Latency (ms) 0%-TCP 25%-TCP 50%-TCP 75%-TCP (a) Oldest un-acked 0.995 0.996 0.997 0.998 0.999 1 0 500 1000 1500 2000 2500 3000 Fr ac tio n Latency (ms) 0%-TCP 25%-TCP 50%-TCP 75%-TCP (b) Oldest un-acked 99.5% to 100% Figure 4.2: Oldest un-acked and fairness of ADAPT with mixed number of TCP flows. 32 4.6. Summary TCP flow % Median (ms) 99.9 percentile (ms) Worst Case (ms) 0 179.4 679.9 892.5 25 188.5 699.6 1069.8 50 211.5 732.6 1012.3 75 208.3 827.3 1178.5 Table 4.5: Oldest un-acked measurements for the ADAPT latency controller with failover and mixed TCP with 16 flows. Increase in percentage of TCP flows has little effect on the median. connection. Results show how failover can effectively decrease the worse case oldest un-acked message to less than half. We also confirmed that our la- tency controller still behaves in an acceptable manner when competing with normal TCP flows, and observed we do not hurt other TCP flows severely. This can be considered a green light towards incremental deployment of our latency controller in popular Internet applications. 33 Chapter 5 Application: Live-In In this chapter we focus on an application layer built on top of Paceline. Referring back to Figure 1.1, Live-In resides in QVid. Live-In was designed to provide QStream with the ability to stream, in real time, live video cap- tured and encoded directly from a camera, e.g. from a webcam as done in video conferencing. In the rest of this chapter, we first explain the ba- sic functionality of QStream and Live-In, then evaluate the application level performance of streaming live video over Paceline compared to using normal TCP. 5.1 QStream and Live-in Figure 5.1 provides a detailed view of QVid. QVid is an adaptive video player and consists of various components. QVid can either run in server mode, i.e. QServer, or in client mode, i.e. QClient. They communicate with each other using Paceline via Pps Server/Pps Client that just handle seralization/de-serialization of application data units. QServer is responsible for locating and streaming video data to QClient, which in turn provides a means for decoding and displaying the video data using QVid player, Decoder and Video-Out Upon connection, QClient requests one or more videos from QServer. QServer initializes either File-In or Live-In based on the requested filename. A filename of \u00E2\u0080\u009Clive\u00E2\u0080\u009D indicates that live data has been requested by the client, or else the filename is looked-up for on disk. Once the source of data has been found, QServer transmits the video frames to QClient through Pps Server. Upon notification of receipt of video data by Pps Client, QClient forwards the data to the player which in turn decodes the data and prepares it for displaying via Video-Out. Live-In lies between QServer and Video-In. Video-In initializes the cap- turing device, and starts pumping out raw captured images at an almost regulated rate specified by Live-In. If the rate at which images are captured from the device is slower (faster) than the requested frame rate, Video-In duplicates (drops) some frames to maintain a constant frame rate. Live-In 34 5.1. QStream and Live-in Figure 5.1: Qvid Structure. is notified each time a frame is output from Video-In through the on capture callback. Raw frames are fed into an SPEG encoder as soon as they become avail- able and Live-In is notified when the frame has been encoded by the en- coder produce callback supplied with a bitstream of the encoded frame. The bitstream contains a header, containing meta data about the frame, a frame base layer, which is required for reconstructing the frame, and at most seven enhanced spatial layers that provide extra detail for the base frame. Video-In tries to regulate the rate at which frames are output. However, the frames that are output are not always evenly spaced in time. To ensure frames are transmitted at a constant rate, Live-In does not transmit the frames as soon as they are encoded. Instead, once the bitstream has been parsed, two events are scheduled for the frame: \u00E2\u0080\u00A2 Output Event \u00E2\u0080\u00A2 Expire Event When the output event fires, the frame can be forwarded to Pps Server to be transmitted over the network. The output event schedule time is obtained from: video start time+ frame number/frame rate. QStream is based on adaptation. We expect our video applications to adapt to the rate at which data can be sent. This requires the application to be able to cancel frames when there is no use in sending them anymore. The expire event is scheduled for such occasions and we cancel a frame when it fires. The expire event is scheduled using the live timeline. QStream uses an adaptive timeline for playing stored video [10]. The adaptive window size grows up to 30 seconds for pre-fetching data from 35 5.1. QStream and Live-in Figure 5.2: Live timeline. storage. However, the timing requirements for live video is slightly different. Figure 5.2 shows the timeline used by Live-In. Setup refers to when the image is captured and xmit is the time where the frame is encoded and can be forwarded to Pps Server for transmission. Decode is the time where decoding of the frame should begin by the decoder. At this point, the decoder starts decoding with whatever number of layers it has received from that frame so far, unless either of the frame meta data or base layer are missing. In such occasions, the entire frame is dropped. Decode is also the time where it is too late to transmit any data for the current frame, which is why we refer to decode as xmit expire. Finally, output is the time where we expect the frame to be displayed, which is the time by which the decoder should finish decoding. Two of the times between these different events, which are measured in number of frames, are set to some constant values for a given session. We chose one frame time for setup to xmit and two frames for decode to output. The xmit window size, the time between xmit and xmit expire, is a user configurable parameter. We will get back to the xmit window size in Section 5.2. In normal operation QStream does not assume clock synchronization. The server and client independently keep track of the timeline. This may cause the timelines to be out of phase or rate, out of phase caused by net- work delay, and out of rate because of inaccurate hardware clocks. For the purpose of evaluation, we synchronize the timelines with NTP, but in normal operation there exists a feedback loop inside the application to synchronize the timelines. The frame types output from the encoder are not always I-frames. The bitrate of the video can be highly reduced if a greater group of images (gop) is used. Using a gop size greater than one introduces inter-frame dependencies. This inter-frame dependency should be handled with care. If all expire events of frames are set to xmit expire in the timeline and an I-frame is not transmitted by its deadline, it will be dropped by the sender. This means all subsequent P-frames in the gop, even if they make it 36 5.1. QStream and Live-in through to the other side, are useless to the decoder. The same can happen for P-frames dependant upon earlier dropped P-frames. To handle this, we postpone expiring all frames in a gop to the xmit expire of the last frame. To ensure the frames are not re-ordered in Message Port4, the utility of each frame in the gop decreases as the frame approaches the end of the gop. One minor problem of this approach is that the gop output from the encoder is not always equal to the requested gop size. The encoder may have to end the current gop and start a new one if the captured video has sudden changes. A shorter gop will cause the frames to expire later than they should, and be transmitted if bandwidth becomes available. Fortunately, this is a rare case and we leave implementing a more complicated solution for future work. QServer creates a new instance of all the resources of Figure 5.1 for every client. Unlike other resources in the system that can be shared be- tween different processes, only one instance of Video-In can exist at any time. Hence, for Live-In to have the ability to support a multi-party video conference, different instances of Live-In should be able to share the same instance of Video-In. The first instance of Live-In initializes Video-In and encodes the captured frame. Once the encoded frame is ready, it is dupli- cated for each instance of Live-In, if more than one instance exist. Each Live-In re-numbers its own copy of the frame\u00E2\u0080\u0099s data structures to match the current state of the session and forwards it to its own Pps Server upon the output event. Live-In was implemented to be compatible with Qvid player. This im- plies Live-In should support commands such as pause, play and seek even though they may not have a reasonable meaning with live video. In QServer, the response with a seek is usually followed by the first video frame. It is important to ensure the first frame sent to the other side is an I-frame. So Live-In refrains from sending a seek result until an I-frame is produced by the encoder. A big gop size in the encoder can cause a noticeable delay in such cases. A specific client pausing the video is equal to the corresponding instance of Live-In dropping all its duplicated data. In the case where all viewers pause their video, the data is no longer encoded and is dropped right after being captured. The reason we don\u00E2\u0080\u0099t stop capturing frames is due the fact that initializing the capture device takes a noticeably long time. So as long as there are viewers of the video, even paused viewers, we continue capturing and locally dropping captured video frames. Figure 5.3 shows an image captured from the output of Live-In when streaming a 30 frame per second video locally. The clock on the bottom is 4Recall that Message Port re-orders data based on their utility 37 5.2. Application Level Evaluation system time. The Local Webcam window is the image displayed right after being captured without going through the encoder. Notice how it is 100ms behind the local clock, even though we allowed one frame of encoding time. This is due to the delay the camera has in capturing and outputting the image, along with the delay of displaying the image. The Live window is the video that has gone through the process of encoding, transmission and decoding with an xmit window size of 3 frames. A 200ms delay between the two windows is due to a six frame timeline, one for setup, 3 for xmit and 2 for decode at a 33ms frame time. Figure 5.3: Live-In output for local streaming In the next section we investigate the application level performance when running over Paceline. 5.2 Application Level Evaluation In this section we evaluate the performance of streaming live video over a congested network with and without a latency controller. We use the same network setup of Section 3.4 for our experiments. However, since capturing directly from a camera was not possible for us in our Emulab setup, instead we stream a special stored video with the live timeline. We believe this video to closely resemble a live video. The video has an average bitrate of 5Mbps with 30 frames per second, representing a high definition video conference. In the first experiment, we varied the xmit window size. When the client and server timelines are static and synchronized, the timeline configuration almost exactly determines the end-to-end delay, and frames that were dis- played on the screen have a maximum end-to-end delay specified by the timeline (i.e. setup+xmit window+decode). Therefore, we define goodput 38 5.2. Application Level Evaluation 0 20 40 60 80 100 1 10 100 1000 10000 G oo dp ut (% of ac ke d d ata ) Xmit Window Size (frames) TCP ADAPT Figure 5.4: Xmit window size versus application goodput. The end-to-end latency is xmit window size plus a constant 3 frames for setup and decode (1 frame = 33.3ms). to be the percentage of bandwidth which contributed to data used and dis- played by the application, i.e. arrived on time for decoding. This quantifies the overhead of cancelling messages during transfer. In this experiment we increased the xmit window size to examine the trade off between low latency interactions and goodput. In figure 5.4, we see that for 8 video flows with a total bitrate of 40Mbps, going through a 16Mbps link, the goodput generally increases as the xmit window size increases. Notice the end-to-end latency is equal to the xmit window size plus 3 frames for setup and decode, and each frame is 33.3ms. For normal TCP, we noticed that goodput is significantly lower that ADAPT with failover in low latency ranges below 1 second (30 frames). We also see that by allowing applications more than 1 second of xmit window, the goodput increases very slowly with a negligible improvement. This means in a congested network where a high definition video conference is running with a 5 frame xmit window size (166ms) and more than 60% goodput, the same application would have to allow more than 60 frames of delay (2 seconds) to have the same goodput on normal TCP. Recall from Table 4.2 the median latency for 8 ADAPT flows with failover is 140ms. 39 5.2. Application Level Evaluation 0 5 10 15 20 25 4 6 8 10 12 14 16 Av er ag e- Fp s Number of videos TCP ADAPT (a) Average frames per second 0 1 2 3 4 5 6 7 4 6 8 10 12 14 16 Av er ag e- Sp at ia l-L ay er s Number of videos TCP ADAPT (b) Average spatial layers Figure 5.5: Average frames per second with an 8 frame xmit window size. 40 5.3. Summary Goodput represents how the application is able to make good use of bandwidth, but considering how only a few spatial layers are enough to reconstruct a frame, an application with low goodput may still be able to present a reasonable frame rate to the user. This is due to the fact that our application allows configuration of policy to balance the importance of spatial and temporal quality. These experiments were run with the default policy of preserving temporal quality over spatial quality. This policy is based on the subjective observation that decreased frames per second is more annoying than decrease in spatial fidelity. To invesitgate how Paceline can effect the frame rate, using an xmit window size of 8 frames (267ms), we run a variable number of videos from 3, the lowest number of videos able to saturate the bottleneck link, to 16 using ADAPT with failover or TCP. We then calculate the average number of frames per second in Figure 5.5. Figure 5.5(a) shows the average frames per second and Figure 5.5(b) shows the average spatial layers used to reconstruct frames. When the number of flows increases, as we expect, the average spatial layers used in decoding frames decreases for both ADAPT and TCP. How- ever, with 5 videos TCP\u00E2\u0080\u0099s average frame rate drops below 5fps, whereas ADAPT is able to maintain an average fps of 25 with 5 videos and the frame rate drops to 8 with 16 videos. Using ADAPT, we are able to stream a high definition live video in a congested network with a negligible decrease in frame rate, or equivalently, allow 3 times more users to use the network and experience the same quality of service they previously would when using TCP. Additionally, using ADAPT the frame rate has a graceful decrease as the the network undergoes higher loads. 5.3 Summary In this chapter we presented an application layer inside QStream called Live-In. The main purpose of Live-In is to enable QStream applications to stream live video captured from a webcam over the Internet in real-time. We evaluated Live-In over Paceline in terms of goodput and average frames per second in different network conditions. Our comparison with Live-In over TCP show significant improvements. Further evaluation of Live-In is still ongoing work. 41 Chapter 6 Application: QPS In this chapter we present another application called QStream Publish Sub- scribe (QPS). Our experience with Live-In over Paceline proved QStream to be very desirable for video conferencing. We can stream high definition video (5Mbps bitrate) over a congested network with end-to-end latency of 200ms with very good quality. Our application also supports multi-party confer- encing. However, for normal Internet users, there is one valuable resource that tends to become a bottleneck resource when video conferencing: upload bandwidth. Most ISPs provide much more download bandwidth than they do for upload, sometimes up to a ratio of 10. The peer-to-peer conferencing of Figure 6.1(a) (currently supported by Live-In) requires N times bandwidth of one video stream, for both upload- ing and downloading, where N is the number of peers. To make our video applications more scalable, we decided to design a publish-subscribe appli- cation that could act as the broker in figure 6.1(b), where each peer uploads at most one video. The download bandwidth requirements for the peers are still the same as before, and Qps requires N2 times bandwidth of one video stream. (a) Video conference with peer to peer connections (b) Video conference with using QPS Figure 6.1: Peer-to-peer conferencing vs. QPS conferencing. 42 6.1. Description The peer-to-peer approach may be acceptable if the best last mile band- width was not the limit. However, this seems to be unlikely in the forseeable future, so even though the second architecture has a potential increase in end-to-end latency, it is currently a reasonable solution especially for cloud computing. Amazon charges at most $0.17/GB for any data transfer. In a video conference with 5 peers with 1Mbps videos, Qps will use 30Mbps of bandwidth. This is equal to a total cost of $0.306/minute for video confer- encing, or equivalently $0.061/minute for each peer. This rate is comparable to the rate of VoIP services provided by companies such as Skype. 6.1 Description Qps was designed to be a general purpose pub-sub library, with the ability to support any form of real-time traffic and is built on top of Paceline. Handling specific data messages which depend on the type of application using Qps, i.e. video data, game data, web data, etc, is done by the application itself via various callbacks. The application created to support video data is QpsDemo. Qps creates a global listener that waits for publishers and subscribers. All Qps connections are initiated through this listener. Every connecting client should register itself with Qps with a unique name, which will be used for all future identifications purposes, and the type of service it provides, i.e. whether it is a publisher or subscriber. If the newly connected client is a subscriber, it receives the list of all current publishers from Qps. If the new client is a publisher, its name is added to the list of publishers. Each time a publisher is added or removed from the list, all subscribers are notified immediately. A subscriber may subscribe to any publisher available in the list. The pseudo code Qps uses for handling incoming subscribe messages can be found in Program 6.1. A subscriber sends a sub message to Qps, along with the name of the desired publisher. Qps will look for an active session from the requested publisher in a hash table of sessions. If no such session is found, Qps creates a new Pps Client and connects to the publisher. Once data is received from the publisher, a listener is created for the new session and the port number of the listener is forwarded to the subscriber. If the active session is found in the hash table, the port number is simply forwarded to the subscriber since there is no need to re-create the listener. The subscriber will now connect to the listener via its own Pps Client, as it normally would when directly connecting to QServer. Since QpsDemo was designed for live 43 6.1. Description Figure 6.2: Qps session and viewers. video only, a filename of \u00E2\u0080\u009Clive\u00E2\u0080\u009D is used in all connections. Previously, Live-In contained the logic for handling multiple clients of a live stream. It managed data duplication, re-numbering and forwarding via separate Pps Servers, or dropping data in the case where video had been paused by the client. It also handled cancelling data once it was too late to send them via the expire event. Similar logic is required inside Qps for every session, which is handled by what we call a Qps viewer. Figure 6.2 shows the relation between a session and a viewer. A session includes one or more viewers, each one serving a single QClient via its own Pps Server. As before, the video flow is only paused when all viewers pause the video. Qps uses the same live timeline of Figure 5.2 with the difference that there is no need for the setup part of the timeline, since incoming data is already captured and encoded. One difference Qps viewers have with Live-In lies in the fact that an in Live-In an encoded frame becomes available altogether when the encoder finishes encoding. However, in Qps different parts of a frame are arriving from the network with some delay, and some do not make it. For this, we currently duplicate and forward data for a session as it arrives from the publisher. When a subscriber un-subscribes from a session, its corresponding viewer 44 6.1. Description Program 6.1 Pseudo code for handling incoming subscribe messages. Qps receives a subscribe message from subscriber sub requesting subscription to publisher pub. function recv_subscribe(sub, pub) : session = lookup_session(pub.name) if session = NULL : session = new_session() store_session(session) session.pps_client = create_pps_client() session.pps_client.connect(pub) session.pps_client.recv_data(pub) session.listener = new_listener() end sub.send(session.listener.port) end is deleted, but as long as there are active viewers in the session, the session is kept alive. Once all viewers un-subscribe, the listener is deleted and the session is terminated. If a publisher leaves the session, all viewers of the session are notified. Since we do not know of the state of data in flight, we wait for the subscribers to un-subscribe before we terminate the session. Figure 6.3 shows a client\u00E2\u0080\u0099s view of Qps in a simple setting where it has subscribed to two video streams provided by two different publishers. Figure 6.3: Qps from a subscriber\u00E2\u0080\u0099s view. 45 6.2. Preferential Subscriptions 6.2 Preferential Subscriptions We have provided the ability for users of Qps to subscribe to data they choose, but that is not all. What if Qps could make the decision for the users? The work done in the context of preference aware pub/sub [11] was the main motivation behind creating Qps. In traditional pub/sub systems, users subscribe to the events in which they are interested, and are notified when such events occur. The duties of the pub/sub system summarize in forwarding events to proper handlers. However, in some systems the average number of subscriptions may be high enough to overwhelm the subscribers. In such systems, the user provides the pub/sub system with preferential criteria to rank events in order of importance. The pub/sub system can now always forward only a constant number of the most important events to any subscriber based on their preference. We added the support of preferential subscriptions to Qps. Qps main- tains a list of subscribers, and for each element in the list keeps a list of viewers, i.e. keeps track of all the sessions a specific subscriber is a member of. Performing any type of mix on the viewers just requires minor changes to data structures, such as swapping data pointers. For a means of demonstra- tion, the special option in Figure 6.3 enables preferential subscriptions on Qps. This will cause Qps to flip the incoming data for the first two viewers of this subscriber every two seconds. Such real-time video mixing can be useful in many occasions. Consider the live TV coverage of a sport event in Figure 6.4. Many cameras cover such events, each one following a specific player or recording from a specific angle. All videos are fed into the mixing station, where they are multiplexed manually into a single video stream, which is then broadcasted. However, the interests of the spectators is not always the same as the TV director\u00E2\u0080\u0099s, and modern cameras provide information about directions at which they are pointing. A more desirable solution would be to substitute the TV mixing station with a real-time video mixer like Qps. Spectators can specify their order of preference based on area of coverage and location of each camera, and watch a video specifically editted for them. Another direction in which we plan to take Qps is in the realm of online multi-player games. DonneyBrook [2] is an online multi-player game based on Quake3 with the ability to support up to 900 players. Since the game state grows as the number of players increases, not all clients have the ability to forward their data to every player, similar to the problem we had with peer-to-peer conferencing. Qps can be used to forward some of the game state data in such games. 46 6.2. Preferential Subscriptions Figure 6.4: Live TV coverage of an event. Evaluation of Qps is ongoing work and is not in the scope of this thesis. We presented Qps as an assistance for video conferencing in this chapter. Qps follows some of the same logic inside Live-In for handling multiple clients. We also added a simple form of preferential subscriptions in Qps, as an introduction to our future work. 47 Chapter 7 Conclusion and Future Work A transport that can provide high bandwidth with low latency can make way for the next generation of multimedia applications. High definition video conferencing and massive multiplayer online games are two of such applications. However, the lack of a suitable solution for such transport has been a major issue for many years. In this thesis our main contribution was to move closer to bringing such a transport to life. We designed an application layer latency controller based on top of TCP. The latency controller uses measurements of the network\u00E2\u0080\u0099s bandwidth and latency to calculate the amount of data it allows to be passed down to TCP. We also aided our latency controller with a failover mechanism that automatically detects when the TCP connection becomes \u00E2\u0080\u009Cstuck\u00E2\u0080\u009D, i.e. suf- fers back to back packet drops and moves into exponential back-off state. It helps improve latency by replacing the currently stuck TCP connection with a new standby connection. We evaluated Paceline using an Emulab network setup, comparing it to TCP. We have observed that it can noticeably improve the latency perfor- mance of TCP, while almost leaving its utilization and fairness properties intact. We also confirmed flows using Paceline do not harm other TCP traffic inside the network. On top of Paceline, we designed two different applications for live video streaming. We evaluated the performance of the application when using Paceline, compared to using normal TCP. Our evaluation showed great im- provements over TCP. The work presented here may be continued in myriad possible directions. In our evaluation we tested a normal droptail FIFO queue in the bottleneck link, which represents a majority of the Internet. However, it may be inter- esting to investigate how our results may vary where the network provides active queue management for our flows. We use SOCK as a reference to an optimal solution. Our results show ADAPT\u00E2\u0080\u0099s performance seems to be better than SOCK with lower load on the network, and shifts to becoming worse under heavy load. Perhaps we could better tune our latency controller to at least be in the same ballpark 48 Chapter 7. Conclusion and Future Work to that of SOCK under heavy load. Additionally, SOCK may in turn not be the best possible solution under extremely heavy load. Such zones of operation may also be of interest. Finally, even with failover enabled, we still observe some high worst case latencies. Examining the worst cases in greater detail with the help of packet level analysis, e.g. using tools like tcpdump and wireshark, may result in ways to further improvements. Qps can still undergo further improvements, one of which stands out in terms of importance. The preferential subscriptions are currently pro- grammed in the source code. However, we would prefer our criteria to be more flexible and provide the user with the ability to define and change his criteria using a high level language. 49 Bibliography [1] Sumitha Bhandarkar, A. L. Narasimha Reddy, Yueping Zhang, and Dimitri Loguinov. Emulating aqm from end hosts. In SIGCOMM \u00E2\u0080\u009907: Proceedings of the 2007 conference on Applications, technologies, archi- tectures, and protocols for computer communications, pages 349\u00E2\u0080\u0093360, New York, NY, USA, 2007. ACM. [2] Ashwin Bharambe, John R. Douceur, Jacob R. Lorch, Thomas Mosci- broda, Jeffrey Pang, Srinivasan Seshan, and Xinyu Zhuang. Don- nybrook: enabling large-scale, high-speed, peer-to-peer games. SIG- COMM Comput. Commun. Rev., 38(4):389\u00E2\u0080\u0093400, 2008. [3] Lawrence S. Brakmo and Larry L. Peterson. TCP Vegas: End to End Congestion Avoidance on a Global Internet. IEEE Journal on Selected Areas in Communications, 13(8):1465\u00E2\u0080\u00931480, 1995. [4] Sally Floyd Deepak Bansal, Hari Balakrishnan and Scott Shenker. Dy- namic behavior of slowly-responsive congestion control algorithms. In In Proceedings of SIGCOMM 2001, San Diego, CA, August 2001. [5] Bryan Ford. Structured streams: a new transport abstraction. SIG- COMM Comput. Commun. Rev., 37(4):361\u00E2\u0080\u0093372, 2007. [6] Chuck Fraleigh, Sue Moon, Bryan Lyles, Chase Cotton, Mujahid Khan, Deb Moll, Rob Rockell, Ted Seely, and Christophe Diot. Packet-level traffic measurements from the sprint ip backbone. IEEE Network, 17:6\u00E2\u0080\u0093 16, 2003. [7] Ashvin Goel, Charles Krasic, and Jonathan Walpole. Low-latency adap- tive streaming over tcp. ACM Transactions on Multimedia Computing, Communication, and Applications, 4(3):1\u00E2\u0080\u009320, 2008. [8] R. Jain, D. Chiu, and W. Hawe. A quantitative measure of fairness and discrimination for resource allocation in shared computer systems. Technical Report TR-301, DEC Research, September 1984. 50 Chapter 7. Bibliography [9] Eddie Kohler, Mark Handley, and Sally Floyd. Designing dccp: conges- tion control without reliability. SIGCOMM Comput. Commun. Rev., 36(4):27\u00E2\u0080\u009338, 2006. [10] Charles Krasic and Jean-Se\u00CC\u0081bastien Le\u00CC\u0081gare\u00CC\u0081. Interactivity and scalability enhancements for quality-adaptive streaming. In MM \u00E2\u0080\u009908: Proceeding of the 16th ACM international conference on Multimedia, pages 753\u00E2\u0080\u0093756, New York, NY, USA, 2008. ACM. [11] Drosou Marina, Stefanidis Kostas, and Pitoura Evaggelia. Preference- aware publish/subscribe delivery with diversity. In DEBS \u00E2\u0080\u009909: Pro- ceeding of the 3rd ACM international conference on Distributed Event- Based Systems. ACM, 2009. [12] Amit Mondal and Aleksandar Kuzmanovic. Removing exponential backoff from tcp. SIGCOMM Comput. Commun. Rev., 38(5):17\u00E2\u0080\u009328, 2008. [13] L. Ong, Ciena Corporation, J. Yoakum, and Nortel Networks. RFC 3286 - An Introduction to the Stream Control Transmission Protocol (SCTP). RFC 3286, May 2002. [14] V. Paxson and M. Allman. Rfc 2988 - computing tcp\u00E2\u0080\u0099s retransmission timer. RFC 3988, 2000. [15] Andreas Petlund, Kristian Evensen, P\u00CC\u008Aal Halvorsen, and Carsten Gri- wodz. Improving application layer latency for reliable thin-stream game traffic. In NetGames \u00E2\u0080\u009908: Proceedings of the 7th ACM SIGCOMM Workshop on Network and System Support for Games, pages 91\u00E2\u0080\u009396, New York, NY, USA, 2008. ACM. [16] Maxim Podlesny and Sergey Gorinsky. Rd network services: differenti- ation through performance incentives. SIGCOMM Comput. Commun. Rev., 38(4):255\u00E2\u0080\u0093266, 2008. [17] Steven Wilson. An Adaptive Packet Size Approach to TCP Conges- tion Control. Master\u00E2\u0080\u0099s thesis, University of British Columbia, Canada, October 2008. [18] www.qstream.org. 51"@en . "Thesis/Dissertation"@en . "2009-11"@en . "10.14288/1.0051679"@en . "eng"@en . "Computer Science"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "Attribution-NonCommercial-NoDerivatives 4.0 International"@en . "http://creativecommons.org/licenses/by-nc-nd/4.0/"@en . "Graduate"@en . "Paceline : toward high-bandwidth interactive continous media applications over TCP"@en . "Text"@en . "http://hdl.handle.net/2429/12570"@en .