UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Improving transport layer performance over multi-hop wireless networks by machine learning Arianpoo, Nasim 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2017_may_arianpoo_nasim.pdf [ 7.82MB ]
JSON: 24-1.0345639.json
JSON-LD: 24-1.0345639-ld.json
RDF/XML (Pretty): 24-1.0345639-rdf.xml
RDF/JSON: 24-1.0345639-rdf.json
Turtle: 24-1.0345639-turtle.txt
N-Triples: 24-1.0345639-rdf-ntriples.txt
Original Record: 24-1.0345639-source.json
Full Text

Full Text

Improving Transport Layer Performance over Multi-hop Wireless Networksby Machine LearningbyNasim ArianpooB.Sc., University of Tehran, Tehran, Iran, 2005M.Sc., The University of British Columbia, Vancouver, Canada, 2008A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)April 2017c© Nasim Arianpoo, 2017AbstractThe emerging use of multi-homed wireless devices along with simultaneous multi-pathdata transfer offers tremendous potentials to improve the capacity of multi-hop wirelessnetworks. Concurrent Multi-path Transfer (CMT) over Stream Control Transmission Pro-tocol (SCTP) is a form of reliable multi-path transport layer protocol with unique featuresthat resonate with multi-path nature of the multi-hop wireless networks. The present the-sis identifies and addresses some of the challenges of CMT-SCTP over wireless multi-hopnetworks.One main challenge raised by the multi-hop wireless network for CMT-SCTP is theout-of-order packet arrival. SCTP uses packet sequence number to ensure delivery. Assuch, the out-of-order packet arrival triggers receive buffer blocking in CMT-SCTP thatcauses throughput degradation. Another challenge in using CMT-SCTP over multi-hopwireless networks is the unfair resource allocation towards flows coming from farther awayhops.The first part of this thesis focuses on integrating machine learning and network codingin CMT-SCTP to resolve the receive buffer blocking problem. Our state-of-the-art schemeuses Q-learning, a form of Reinforcement Learning (RL), to enable the network codingmodule to adjust to network dynamics. We confirm the veracity of our proposal by aqueuing theory based mathematical model. Moreover, the effectiveness of the proposedscheme is demonstrated through simulations and testbed experiments.In the second part of the thesis, we investigate the fairness behavior of CMT-SCTPiiAbstracttowards other CMT or non-CMT flows coming from farther away hops on a multi-hopwireless network. We introduce a Q-learning distributed mechanism to enhance fairnessin CMT-SCTP. The proposed method uses Q-learning to acquire knowledge about thedynamics of the network. Consequently, the acquired knowledge is used to choose the bestaction to improve the fairness index of the network. We evaluate our proposal againststandard CMT-SCTP and Resource Pool CMT-SCTP (CMT/RP-SCTP).In the third part of this thesis, we apply our findings in the second part to TCP todemonstrate that the benefits of our fairness mechanism can be extended to other transportlayer protocols. The findings of this thesis bring us closer to realization of the vast potentialof multi-path data transfer over multi-hop wireless networks.iiiPrefaceHereby, I declare that the present thesis is my own original work. The work in Chapters2 - 5 is published. The corresponding papers are co-authored by Dr. Victor C.M. Leungwho supervised me through this research. The papers corresponding to Chapter 3 areco-authored by Dr Aydin who has contributed to the work by providing the QualNetmodule for CMT-SCTP simulations. Moreover, the papers corresponding to Chapter 5 areco-authored by Dr. Jokar who has contributed to the protocol stack design of the proposal.Journal Papers, Published• Nasim Arianpoo, Paria Jokar, Victor C.M Leung, “Applications of network cod-ing to improve TCP performance over wireless mesh networks: a survey’’, WirelessCommunications and Mobile Computing, 2014 (Chapter 1)• N. Arianpoo, I. Aydin, V. Leung, “Network Coding As a Performance Booster forConcurrent Multi-path Transfer of Data In Multi-hop Wireless Networks,” in IEEETransactions on Mobile Computing, vol.PP, no.99, pp.1-1 (Chapter 3)• Nasim Arianpoo, Victor C.M Leung, “A smart fairness mechanism for Concurrentmultipath transfer in SCTP over wireless multi-hop networks ”, in Ad Hoc Networks,Volume 55, February 2017, Pages 40-49 (Chapter 4)• Nasim Arianpoo, Victor C.M Leung, “How network monitoring and reinforcementlearning can improve tcp fairness in wireless multi-hop networks”, in EURASIP Jour-nal on Wireless Communications and Networking 2016 (Chapter 5)ivPrefaceConference Papers, Published• Nasim Arianpoo, Paria Jokar, Victor C.M Leung, “Enhancing TCP performance inwireless mesh networks by cross layer design”, Conference on Computing, Networkingand Communications (ICNC), 2012 International, 2012 (Chapter 5)• Nasim Arianpoo, Ilknur Aydini, Victor C.M Leung, “Network coding: A remedy forreceive buffer blocking in the concurrent multipath transfer of data over multi-hopwireless networks”, IEEE International Conference on Communications (ICC 2014),2014 (Chapter 3)vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Overview of CMT-SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2 Challenges of CMT-SCTP over Wireless Multi-hop Networks . . . . . . . 61.2.1 Receive buffer blocking . . . . . . . . . . . . . . . . . . . . . . . . 61.2.2 Fairness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Summary of Results and Contributions . . . . . . . . . . . . . . . . . . . 141.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16viTable of Contents2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.1 Network Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.1.1 Why network coding? . . . . . . . . . . . . . . . . . . . . . . . . . 172.1.2 What is network coding? . . . . . . . . . . . . . . . . . . . . . . . 182.1.3 How to estimate the number of redundant packets? . . . . . . . . . 192.2 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2.1 What is MDP? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2.2 Why MDP is a good fit for multi-hop wireless environment? . . . . 202.2.3 Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.3 Testbed Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.3.1 CMT-SCTP testbed setup . . . . . . . . . . . . . . . . . . . . . . . 242.3.2 TCP testbed setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Receive Buffer in CMT-SCTP . . . . . . . . . . . . . . . . . . . . . . . . . 263.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.2 Coded CMT-SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.2.1 Sender side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2.2 Receiver side . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3 A Queuing Theory Based Proof on the Performance Gain of Coded CMT-SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3.1 Virtual queue in the coded multi-path transport layer . . . . . . . 353.3.2 Re-sequencing queue in standard multi-path transport layer . . . . 363.4 Testbed Experiments For Accuracy Analysis Of The Q-learning Algorithm 393.5 Simulations For The Performance Of Coded CMT-SCTP . . . . . . . . . . 423.6 Testbed Experiments For Performance Of Coded CMT-SCTP . . . . . . . 463.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49viiTable of Contents3.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504 Distributed Fairness For CMT-SCTP . . . . . . . . . . . . . . . . . . . . 514.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.2 Study of CMT-SCTP Fairness . . . . . . . . . . . . . . . . . . . . . . . . 524.3 Q-learning CMT-SCTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.3.1 Features and states . . . . . . . . . . . . . . . . . . . . . . . . . . 584.3.2 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594.3.3 Reward function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.3.4 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.5 Discussion and Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 664.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695 TCP Fairness over Wireless Multi-hop Network . . . . . . . . . . . . . . 715.1 Q-learning TCP Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 725.1.1 States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 735.1.2 Action set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.1.3 Transition probabilities . . . . . . . . . . . . . . . . . . . . . . . . 765.1.4 Reward function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 765.2 Integration of Q-learning Agent and TCP . . . . . . . . . . . . . . . . . . 775.3 Performance Evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.3.1 Chain topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.3.2 Larger scale WMN . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.3.3 Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 865.4 Discussion and Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 885.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90viiiTable of Contents6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 926.1 Research Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.2 Suggestions For Future Work . . . . . . . . . . . . . . . . . . . . . . . . . 94Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96ixList of Tables3.1 Performance comparison between naive Bayes and logistic regression classifiers. 424.1 Throughput comparison of different scenarios. In scenario 1 of Figure 4.1,nodes A and B send data using TCP. In scenario 2, all sources except nodeB use TCP; node B uses SCTP. Scenario 3 is similar to scenario 2; however,node B uses CMT-SCTP. In scenario 4, all nodes use SCTP. Scenario 5,all nodes use SCTP, node B uses CMT-SCTP. All the above numbers aremeasured in kbps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.2 Evaluation scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.3 Brief comparison on different flavors of CMT-SCTP . . . . . . . . . . . . . 695.1 Network metrics parameters for different TCP variations of Figure 5.1 . . . 835.2 Network metrics parameters for different TCP variations of scenario 5.7 . . 865.3 A comparison between Q-learning TCP and TCP Reno . . . . . . . . . . . 885.4 A comparison between Q-learning TCP and other well-known TCP solu-tions. The throughput and fairness enhancement of each TCP flavor iscompared against the standard TCP. As such, when a decrease/increase ismentioned, it is relative to standard TCP. . . . . . . . . . . . . . . . . . . 91xList of Figures2.1 Testbed setup for CMT-SCTP evaluations . . . . . . . . . . . . . . . . . . 242.2 IEEE 802.11 channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.3 Testbed setup for TCP evaluations . . . . . . . . . . . . . . . . . . . . . . 253.1 Markov Decision Process of coded CMT-SCTP. When the transition proba-bilities P1, P2, ..., P6 are unknown, Q-learning is used to learn the reward oneach state-action pair. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 Markov chain for the length of virtual queue in coded CMT-SCTP. . . . . 363.3 Queuing schematic for the re-sequencing process in standard multi-path datatransfer; server 1 models the channel behavior while server 2 models the re-sequencing behavior of receive buffer. . . . . . . . . . . . . . . . . . . . . . 373.4 Markov chain for the re-sequencing behavior of the receive buffer. . . . . . 373.5 Testbed topology - showing multiple data paths for CMT-SCTP. . . . . . . 393.6 Simulation topology - Actual data flow is over chain A, while interferencetraffic is over chain B. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.7 Goodput of standard CMT-SCTP vs. coded CMT-SCTP (proposed in thischapter) vs. coded CMT-SCTP ver. 1 [1] with different receive buffer (rBuf)sizes, in simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.8 Percentage of packets in CMT-SCTP association that experience receivebuffer blocking for standard CMT-SCTP vs. coded CMT-SCTP. . . . . . . 47xiList of Figures3.9 Goodput of the standard CMT-SCTP vs. CMT-SCTP with buffer splitting[2] vs. coded CMT-SCTP with different rBuf sizes, over the testbed. . . . . 484.1 Testbed topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.2 Fairness comparison of different scenarios. In scenario 1, node A and B inFigure 4.1 send data using TCP. In scenario 2, all sources except node Buse TCP; node B uses SCTP. Scenario 3 is similar to scenario 2; however,node B uses CMT-SCTP. In scenario 4, all nodes use SCTP. Scenario 5 issimilar to 4 except for node B that uses CMT-SCTP. . . . . . . . . . . . . 564.3 Comparison of the average congestion window size of node B in scenario 2and 3 of Figure 4.1. The solid line shows the average window size of nodeB when all sources are using TCP and node B used CMT-SCTP; while thedashed line shows the average window size of node B when all source nodesare using TCP and node B uses SCTP. . . . . . . . . . . . . . . . . . . . . 574.4 The MDP of CMT-SCTP Q-learning agent. The learning agent can chooseone of the actions in each state based on the Q-learning rule. . . . . . . . . 594.5 The Jain’s fairness index of Q-learning SCTP and CMT-SCTP in compar-ison with standard SCTP and CMT-SCTP in different scenarios of Table4.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.6 Average throughput of different scenarios (kbps) in different scenarios ofTable 4.2. Scenarios 1,2,3 and 4 are the standard SCTP and CMT-SCTP,while scenarios 5, 6, 7, and 8 are the Q-learning SCTP and CMT-SCTP.Scenario 9 shows the measurement results of CMT/RP-SCTP. . . . . . . . 654.7 Average throughput of flows with different level of QoS using Q-learningfairness mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.1 Chain topology - 3 nodes, 2 flows . . . . . . . . . . . . . . . . . . . . . . . 79xiiList of Figures5.2 Throughput changes of the flows in the chain topology of Figure 5.1. Eachcycle is equivalent of 40 seconds . . . . . . . . . . . . . . . . . . . . . . . . 805.3 Learning rate (proof of convergence) in topology of Figure 5.1. Each learningcycle consists of 40 seconds. . . . . . . . . . . . . . . . . . . . . . . . . . . 815.4 Network utilization factor (kbytes/sec) for the topology shown in Figure 5.1.Each learning cycle consists of 40 seconds. . . . . . . . . . . . . . . . . . . 825.5 Jain’s fairness index in the chain topology of Figure 5.1. Each cycle isequivalent of 40 seconds . . . . . . . . . . . . . . . . . . . . . . . . . . . . 835.6 Performance comparison between legacy TCP, Q-learning TCP, TCP-AP,and TCP ex Machina . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 845.7 Random network topology - 10 nodes . . . . . . . . . . . . . . . . . . . . . 845.8 TCP flow throughput and network utility in kbytes/sec for scenario of Figure5.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855.9 Learning rate of nodes 3, 5, and 7 in scenario of Figure 5.7 . . . . . . . . . 865.10 Testbed topology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87xiiiList of AbbreviationsACK AcknowledgmentCMT Concurrent Multi-path TransferECN Explicit Congestion NotificationIETF Internet Engineering Task ForceMAC Medium Access ControlMDP Markov Decision ProcessMHHP More Hops Higher PriorityNRED Neighborhood REDOSI Open Systems InterconnectionPSTN Public Switched Telephone NetworkQoS Quality of ServiceRED Random Early DetectionRLC Random Linear Network codingRP-SCTP Resource Pool SCTPRRSI Received Signal Strength IndicationRTSCTS Request-to-Send/Clear-to-SendRTT Round Trip TimeSACK Selective AcknowledgmentSCTP Stream Control Transmission ProtocolxivList of AbbreviationsTCP Transmission Control ProtocolTSN Transmission Sequence NumbersxvAcknowledgmentsI would like to express my deepest sense of gratitude to my supervisor Dr. Victor C.M.Leung without whom this research would not be possible. I want to thank him for hiscontinuous support, patience, and technical support.I am also thankful to my best friend and companion Ali Tajsekandar for his uncondi-tional love and support both technical and non-technical without whom this journey wouldnot be possible. I am always grateful for his support and understanding during the wholetime.I feel indebted to my great family, my father, my mother and my beloved sister, Nas-taran for their everlasting love, support, and protectivity not only in years of doing PhDbut also throughout my entire life. Without their love and support I was not where I amtoday now.This work was supported by the Natural Sciences and Engineering Research Council(NSERC) of Canada.xviDedicationTo Mom & DadxviiChapter 1IntroductionMulti-hop wireless networks have gained attention in recent years as they can extend thenetwork coverage via providing non-line-of-sight connectivity. Moreover, the low deploy-ment costs of these networks, as compared to for example cellular single-hop technologiessuch as cellular in which the coverage area depends on the number of base stations, makesmulti-hop wireless networks an attractive solution for the last mile Internet access [3].However, multi-hop wireless networks also have certain disadvantages as well. The presentthesis aims to mitigate some of these disadvantages, and offers the possibility of makingmulti-hop wireless networks a more attractive solution for last mile access.Multi-hop wireless networks operate based on the idea of cooperation among the nodesfor packet forwarding. Multi-hop wireless networks exist in different sizes and shapes; itcan be a small residential network or a network in a large commercial complex. Whenthe destination is out of the transmission range of the sender, intermediate nodes actas relays and forward the packet to the destination. Therefore, a packet may pass viamultiple hops before reaching its final destination. However, the multi-hop nature of thesenetworks makes them vulnerable towards link failure or variable link quality which mayresult in dynamic/frequent data path changes. This drastically degrades the transport layerprotocol performance [4]. The objective of the transport layer protocol is to provide anend-to-end connection between the peers and to relieve the upper layer from the concernsand issues of the lower layers. The Transmission Control Protocol (TCP) is the oldestavailable reliable transport layer protocol that is widely used over the network. TCP was1Chapter 1. Introductiondesigned for and does well in a wired environment. TCP relies on the smooth changesin Round Trip Time (RTT) of packets, but, the dynamic nature of the paths in a multi-hop wireless environment can cause substantial and sudden changes in RTT. Moreover,the lack of knowledge on the available resources of the shared medium, i.e., the wirelesschannel, and the dynamic network topology violates the design principles of TCP. Hence,TCP fails to perform well over multi-hop wireless settings. Stream Control TransmissionProtocol (SCTP), another connection oriented transport layer protocol, shares similaritieswith TCP; yet, SCTP addresses some of the short comings of TCP for newer applications.The unique built-in features of SCTP, such as multi-homing and multi-streaming, makesit an attractive choice as a reliable multi-path transport layer protocol in data networks.Multi-homing is the ability of an end-point to support more than one IP addresses. Multi-streaming is the ability of the transport layer protocol to segment and transmit the datainto parallel streams. Multi-homing and multi-streaming position SCTP as a promisingcandidate for newer applications over multi-hop wireless networks. Newer applicationsuse parallel streams to transmit data to maximize the network utilization. This featureresonates with the multipath nature of the multi-hop wireless networks. However, withouta proper transport layer protocol that can support multi-path transfer, the great potentialsof newer applications cannot be realized completely over multi-hop wireless networks.At present, two main approaches account for reliable multi-path transport layer proto-cols: the Multi-Path TCP (MPTCP) extension for Transmission Control Protocol (TCP)[5], and the Concurrent Multi-path Transfer (CMT) extension for Stream Control Trans-mission Protocol (SCTP) [6]. CMT-SCTP is a more attractive approach for multi-pathdata transfer, since SCTP already supports multi-homing and the work required to ad-dress the shortcomings of CMT-SCTP over multi-hop wireless environment is minimal ascompared to MPTCP. CMT-SCTP [7][8] is one of the first proposals to incorporate con-2Chapter 1. Introductioncurrent transfer of data over multiple disjoint sub-flows into the original SCTP. The mainobjective of CMT-SCTP is to increase network utilization and resource usage via paralleldata transfer over different paths between the sender and receiver.The multi-path data transfer in CMT-SCTP offers great potentials to multi-hop wirelessnetworks in terms of resource usage and reliability. However, there are some limitations ofusing CMT-SCTP over multi-hop wireless settings:• One of the key challenges of designing a multi-path transport layer protocol overwireless multi-hop networks is the receive buffer blocking caused by the out-of-orderpacket arrivals [8][9][10] [11]. The receive buffer stores out-of-order packets and de-livers them to the application layer consequently after having received all the missingpackets. Due to path discrepancies in multi-path data transfer, the receive buffer canbe flooded by out-of-order packets. As a result, the data sender can be throttled,which causes the overall throughput of the connection to be reduced or even becomezero at times. This phenomenon is called receive buffer blocking. Receive bufferblocking results in performance degradation at the transport layer. The receivebuffer blocking exacerbates over wireless multi-hop networks as the unreliability ofthe paths creates more complexity for the transport layer. Of the existing literature,that focuses on addressing receive buffer blocking over simplified scenarios, none ofthe methods have been tested over multi-hop wireless networks [10][11] [12][13][8][9].• The fairness behavior of a multi-path transport protocol is another major challengewhen deployed in wireless multi-hop networks. It can inflict harm not only to theprotocol itself but also to the performance of other existing protocols such as TCPflows or other single-homed SCTP flows [14][15][16][17]. The fairness of CMT-SCTPhas been partially investigated over simple bottleneck scenarios [18] and [19]; however,studies that have focused on fairness behavior of CMT-SCTP over wireless multi-hop3Chapter 1. Introductionnetworks against non-CMT flows coming from farther away hops are limited.A significant portion of the proposed algorithms have focused on wired settings and theresulting insights cannot be applied to wireless multi-hop networks [20] [21] [22] [23][24][25][9] [10] [26][8] [27]. The wireless multi-hop setting introduces both severe path dissim-ilarities and severe unfairness to the picture for CMT-SCTP. Therefore, novel proposalsare required to accommodate the intrinsic characteristics of the wireless multi-hop networkwithin the CMT-SCTP. We focus our interest on the performance of CMT-SCTP over awireless multi-hop setting to address the shortcomings of the literature available.The main questions of this thesis are (1) how to address the fairness issue of the multi-path transport layer protocol over multi-hop wireless networks and (2) how to enhanceperformance of the multi-path transport layer over multi-hop wireless settings by alleviatingreceive buffer blocking.The present thesis is divided into two main parts: the first part focuses on algorithms toalleviate receive buffer blocking in CMT-SCTP over wireless multi-hop networks, whereasthe second part focuses on improving CMT-SCTP fairness against non-CMT flows comingfrom farther away hops within the wireless multi-hop setting. The results provide insightinto the behavior of CMT-SCTP over wireless multi-hop networks which can be used toimprove the performance of CMT-SCTP over the existing infrastructure. The proposals inthe present thesis are one step ahead in transiting swiftly from legacy transport protocolsto multi-path transport protocols. In addition, the requirements of newer applications areaddressed.The remainder of this chapter is organized as follows: In Section 1.1 we highlightdetails of CMT-SCTP and its known issues. Subsequently, we identify the challengesof using CMT-SCTP over wireless multi-hop networks and explain how addressing thesechallenges may contribute to the overall goal of this thesis, which is the focus of Section4Chapter 1. Introduction1.2. In Section 1.3, we focus on the mechanisms proposed in this thesis, and explain thecontribution and novelty of the solutions. In Section 1.4, a detailed map of the thesis ispresented.1.1 Overview of CMT-SCTPExcept for some minor differences, SCTP and TCP share the same congestion controlmechanism. TCP uses Acknowledgment packets (ACK) to inform the transmitter aboutthe missing packets using the packets’ sequence numbers. However, SCTP uses SelectiveAcknowledgment (SACK) to acknowledge both the received data and the gaps in thereceived data. Moreover, TCP is byte oriented, whereas SCTP is message (chunk) oriented.Aside from the minor differences in the congestion control mechanism of SCTP and TCP,multi-homing and multi-streaming are unique to SCTP. In multi-homing, SCTP socketholds more than one IP addresses for each host and creates redundancy for transport layer.Therefore, should any of the interfaces fails, the SCTP association continues to transmitdata on the other interface. In multi-streaming, the SCTP association divides the datainto multiple streams and transmits each stream independently. SCTP association uses theTransition Sequence Number (TSN) of each packet to keep track of packet losses withineach stream. The association uses the stream identifier/stream sequence number to trackany gaps in data delivery to the application layer. When there is a missing packet in oneof the streams in a multi-stream association, data chunks from unaffected streams can bedelivered to the upper layer. The main objective of multi-streaming in SCTP is to increaseredundancy and resilience against network failure. Multi-streaming in its original formdoes not support simultaneous data transmission over all streams. Therefore, ConcurrentMulti-path Transfer of data was proposed as an adds-on suite to SCTP [8] . CMT-SCTPuses multi-homing in an SCTP association to transmit data concurrently over all available5Chapter 1. Introductionsub-flows within the association. The main objective of CMT-SCTP is to increase networkutilization and resource usage.1.2 Challenges of CMT-SCTP over Wireless Multi-hopNetworksThe wireless multi-hop environment induces unexpected characteristics that are not ad-dressed in the original design of CMT-SCTP. Therefore, CMT-SCTP fails to enhancenetwork utilization over multi-hop wireless networks. The un-addressed challenges createinteresting research topics.1.2.1 Receive buffer blockingConsidering the drawbacks of receive buffer blocking on the performance of the multi-path transport layer protocol, the receive buffer blocking topic is investigated by manyresearchers. One of the common practices is to set aside enough space in the receive buffer[12]. To decrease the possibility of receive buffer blocking during the lifetime of the trans-port connection in multi-path transport protocol, Barre et al. [13] suggested to reserve aspace of twice the bandwidth delay product of the path. Although the proposed mech-anism in [13] mitigates the receive buffer blocking, reserving space in the receive bufferdecreases the receiver tolerance against the changes in path delays. To solve the issue ofreceive buffer blocking in CMT-SCTP, Iyengar et al. [8] proposed different retransmissionpolicies. These policies are based on the packet loss rate and the congestion window size ofeach path [8]. Although Iyengar’s approach of using a proper retransmission policy allevi-ates the receive buffer blocking problem, in the presence of non-negligible and unavoidablepath dissimilarities, none of the proposed retransmission policies is effective in eliminat-ing the receive buffer blocking problem. To improve the performance of SCTP-CMT over6Chapter 1. Introductiondissimilar paths, Deibholz et al. [2] proposed a resource pooling algorithm that dividesthe receive buffer among all the paths to prevent the receive buffer from being flooded byout-of-order packets from one path and prevent other paths from sending data. Deibholz’sresource pooling partially alleviates buffer blocking and reduces the receive buffer blockingproblem from the entire buffer to a portion of the buffer. However, in the presence of severepath dissimilarities, the local blocking infiltrates to the rest of the buffer. In Chapter 3, wepropose an adaptive network coding mechanism to desensitize the receiver against packetreordering and consequently eliminate the receive buffer blocking problem. Our mechanismintegrates network coding [28][29] into CMT-SCTP to resolve receive buffer blocking. Ourstate-of-the-art network coding scheme uses a combination of Q-learning [30] and logisticregression for rare data events to control the number of redundant packets based on thenetwork dynamics.1.2.2 FairnessFairness measures have different definitions based on the perspective and can be categorizedinto three groups [18]:• Link-centric sub-flow fairness only focuses on how the resources of a bottleneckis divided among the passing by sub-flows without taking into account of the totalresource allocation to each flow. For example, if there are n sub-flows belonging tom flows passing through the bottleneck link l with a capacity of C(l), assigning abandwidth of C(l)nto each sub-flow is considered link-centric sub-flow fair.• Link-centric flow fairness focuses on resource allocation to flows sharing a bot-tleneck regardless of the number of sub-flows. Assuming n sub-flows belonging to mflows passing through the bottleneck link l with a capacity of C(l), assigning a band-width of C(l)mto each flow is considered link-centric flow fair. As such, in link-centric7Chapter 1. Introductionflow fairness, each flow divides the allocated resource evenly among its sub-flows.• Network-centric fairness divides the network resources evenly among all flows inthe network regardless of the bottlenecks. (the present thesis perspective for fairnessmeasure)Whenever fairness is mentioned throughout this thesis, we are referring to network-centric fairness.A substantial portion of the available literature of transport layer fairness over wirelessmulti-path networks is focused on link-centric fairness resolutions for TCP [4, 31–43].Studies that focus on the fairness of SCTP or CMT-SCTP are limited. As TCP and SCTPshare a similar congestion control mechanism, studying the available fairness solutionsfor TCP over wireless multi-hop settings provides insights into the fairness behavior ofboth SCTP and CMT-SCTP. Therefore, we focused on studying the available fairnessmechanisms for TCP over wireless multi-hop networks. Subsequently, we developed adistributed fairness mechanism for SCTP and CMT-SCTP using a reinforcement learningapproach. Eventually, we adopted the reinforcement learning fairness mechanism withadditional changes to TCP [44][45]. Our methodology offers both backward compatibilitytowards TCP and enhances CMT-SCTP fairness over the multi-hop setting.Fairness of TCPThe existing literature on TCP fairness solutions over wireless multi-hop networks canbe categorized into cross-layer designs e.g., [44, 46–54], and layered proposals e.g., [55–68].While layered proposals aim to keep the end-to-end semantic of TCP intact, the cross-layerdesigns use the information from different layers to adjust TCP parameters. Random EarlyDetection (RED) is among one of the first proposals to enhance TCP fairness over wiredconnections. Neighborhood RED (NRED) is a cross-layer adaptation of RED for wireless8Chapter 1. Introductionmulti-hop settings [48]. NRED is implemented in Medium Access Control (MAC) and useschannel utilization from the physical layer to calculate the probability of packet dropping.In [49], the gateway triggers Explicit Congestion Notification message (ECN) for the nearbyTCP sources with a higher probability to favor the farther away flows and allow them touse more network resources. The fact that the gateway controls the ECN mechanism is oneof the key limitations of the cross-layer approach of [49]. Raniwala et al. [50] proposed across-layer solution that uses the topology of the network to create a connection graph andcalculate the flow rates. The heavy overhead of feedback messages caused by [50] to createthe global graph of the network is undesirable. The cross-layer method of [51], which iscalled TCP-AP, attempts to eliminate the reliance of TCP fairness solutions on feedbackmessages. The approach of [51] relies on the information from the physical layer to inferthe fair share of each node of network resources. The main drawback of TCP-AP is thereliance on received signal strength indication (RSSI), which is not an accurate estimateof the receiver power level. Therefore, TCP-AP relies on feedback from the receiver tofunction effectively. XCP, another example of cross-layer design [53], uses the feedbackfrom routers to adjust the congestion control window size. XCP suffers from excessiveoverhead in presence of heavy traffics. CoDel [69] is another cross-layer example that usesactive queue management on selected bottle neck links, i.e., links with large queuing delay.CoDel uses spacing in transmission times as the queue management method over bottleneck links. D2TCP is a recent variation of TCP for on-line data intensive applicationsthat uses a varying rate based on a deadline set by the application and the congestioncondition of the network reported by ECN [54]. D2TCP is a cross-layer approach thatneeds compatibility in both application layer and routing protocol to perform effectivelywhich is a huge disadvantage. In [44], a cross-layer algorithm, More Hops Higher Priority(MHHP), is proposed that assigns higher priority to flows traversing a larger number of9Chapter 1. Introductionhops. Although MHHP improves TCP fairness in a multi-hop environment, the designdoes not fully capture the dynamic nature of the wireless environment.Unlike the cross-layer approaches, the layered solutions preserve the end-to-end se-mantic of Open Systems Interconnection (OSI) model. According to Gambiroza et al.[55], the binary back-off mechanism, Request-to-Send/Clear-to-Send signaling mechanism(RTS/CTS), and the end-to-end congestion mechanism play key roles in TCP unfairness.Gambiroza et al. [55] proposed a MAC layer solution in which each access point calculatesthe fair share of incoming and outgoing flows in terms of flow rates and broadcasts the ratesto other nodes. Reference [57] is another MAC layer approach in which nodes negotiate toset up a fair transmission schedule throughout the network. The proposed methods in [55]and [57] rely on feedback messages, which significantly increases the network overhead. In[59], a layered approach is proposed that uses the TCP advertised window and delayedACK to control flow rates of different flows. The algorithm in [59] only works in scenarioswhere all flows are destined to the gateway. In [60], authors proposed a mechanism thatprioritizes TCP ACK packets in the reverse path to enhance TCP fairness. TCP BIC(Binary Increase Congestion) is an end-to-end layered approach that uses a combinationof additive linear increase and binary search of the congestion window to ensure fairnessamong the flows [64]. When the congestion window is small, TCP uses the binary searchalgorithm to find the correct size of the increase to the congestion window to let the flowutilizes its network share. TCP CUBIC is an enhanced version of TCP BIC that uses thecube of the elapsed time from the last window reduction to control the congestion windowsize [65]. The TCP CUBIC approach creates independence for congestion control windowfrom Round Trip Time (RTT). Therefore, TCP CUBIC increases fairness in favor of flowswith longer RTT. TCP Veno, another instance of layered solutions to TCP [66], gained alot of attention in recent years due to its better performance over wireless settings. Veno is10Chapter 1. Introductionbasically an integration of the idea of congestion detection of TCP Vegas into TCP Renoand does not significantly contribute to the fairness. TCP Jersey [70] is another flavor ofTCP that focuses on recognizing the random losses of the wireless environment from thecongestion loss. Among the end-to-end solutions to enhance TCP performance, only fewuse machine learning as an interactive tool to observe network dynamics and change TCPparameters based on some prediction of the network behavior. Sprout is a good exampleof the interactive transport layer protocol [67]. Sprout uses the arrival time of packets topredict how many bytes the sender can successfully transmit without creating congestion inthe network while maximizing network utilization. The main disadvantage of Sprout is thefact that it needs to run over CoDel to outperform other TCP variants. Thus, it requireschanges to the infrastructure to enable active queue management. TCP ex Machina, alsoknown as Remy, is another end-to-end interactive solution that uses machine learning tocreate a congestion control algorithm, which controls the packet transmission rate basedon the perceived network model by the learning algorithm [68]. The main disadvantage ofTCP ex Machina is its resource-intensive nature that results in lengthy learning time. Ittakes almost forever for a single Linux machine to come up with a congestion algorithmsuitable for a specific network using TCP ex Machina.Fairness of CMT-SCTPAlthough TCP fairness for over both wired and wireless environments has been extensivelystudied, SCTP fairness, specifically CMT-SCTP fairness, over wireless environment is a lessexplored field. One of the initial studies on CMT-SCTP performance over wireless multi-hop networks [71] focused on the throughput of CMT-SCTP over a multi-hop networkin QualNet without considering the fairness issues. In another study [19], Ilknur Aydinet al. investigated CMT-SCTP fairness against TCP and SCTP in a wired setting andconcluded that CMT-SCTP is indeed fair on wired bottleneck scenarios. However, the11Chapter 1. Introductionsuperior loss-recovery mechanism in CMT-SCTP results in a better bandwidth utilizationin CMT-SCTP as compared to TCP. The fairness study in [19] did not include any wirelessscenarios. In a previous study that focused on CMT-SCTP fairness alone [72], it was shownthat CMT-SCTP is not fair to non-CMT flows when handling the congestion control ofeach path independently. Consequently, authors of [72] proposed Resource Pool CMT(CMT/RP-SCTP) that integrates resource pooling into congestion control mechanism ofCMT-SCTP to improve fairness against non-CMT flows. Based on [72], a CMT/RP-SCTP-based transport layer has three main objectives:• CMT/RP-SCTP should have a throughput gain over SCTP and non-CMT flows asit is using extra paths for concurrent data transfer.• CMT/RP-SCTP must be fair for non-CMT flows over bottlenecks.• CMT/RP-SCTP must apply congestion control as a whole to balance the load on allthe paths.Although the above goals are satisfied in CMT/RP-SCTP over wired settings, not a singlestudy reports on the use of CMT/RP-SCTP over a multi-hop wireless environment. Fol-lowing [72], the same group of authors proposed a change in congestion control mechanismof CMT-SCTP to balance the congestion on each path while keeping it fair to single pathtransport protocols. The proposal used in [18] increases the congestion window size ofeach path proportional to the average RTT of all paths. While the fairness mechanismin [18] makes multi-path transport layer fair to single path transport layers, it does notoffer Quality of Service (QoS) for costumers paying for higher bandwidths, and thus theproposal is not financially attractive to service providers. In [73], authors proposed a TCP-friendly congestion control mechanism. The method in [73] applies a Vegas-like congestioncontrol mechanism based on the RTT variant of the paths to balance the load of each12Chapter 1. Introductionpath within the CMT-SCTP association. The algorithm in [73] is evaluated in ns-2 usinga very abstract model. More detailed investigation is required to study the fairness of theproposal on competing non-CMT flows within the network.To the extent of our knowledge, a major portion of the available literature on CMT-SCTP is focused on improving performance of CMT-SCTP as compared to SCTP [74], [75],[8], [76], [77], [78], [79]. In publications, such as [80] and [81], the performance of CMT-SCTP over wireless networks was investigated without any fairness evaluations. Studiesdescribed in [80] and [81] proposed methods to improve packet loss and quality of serviceof multimedia without any fairness considerations.The importance of fairness in CMT-SCTP over wireless multi-hop networks againstother non-CMT flows cannot be overemphasized. In recognition of this importance anddue to the lack of extensive literature on this topic, Chapter 4 investigates the fairnessof CMT-SCTP over a wireless multi-hop testbed and proposes a dynamic algorithm toimprove the fairness of CMT-SCTP against non-CMT flows. Our proposal is an attemptto integrate fairness to the current CMT-SCTP design to create a more coherent networkbody.In Chapter 5, we adapted our distributed fairness mechanism to TCP. In our approach,we deployed Q-learning, a reinforcement learning mechanism, to monitor the dynamics ofthe network. The Q-learning agent creates a state map of the network based on the MACparameters and takes actions to enhance TCP fairness and throughput of the starved flowsin the network. The proposal creates a distributed cooperative mechanism where eachnode hosting a TCP source uses local fairness index and aggressiveness index of the flowto adjust its TCP parameters.13Chapter 1. Introduction1.3 Summary of Results and ContributionsThis thesis contains new proposals to accommodate the intrinsic characteristics of thewireless multi-hop network within CMT-SCTP in terms of fairness and performance. Thecontributions of the chapters are:• In Chapter 3, we address the issue of receive buffer blocking in CMT-SCTP. Wepropose a network coding scheme for CMT-SCTP that creates cooperation amongthe multiple sub-flows of a CMT-SCTP association to eliminate the receive bufferblocking problem. We use Random Linear Network Coding (RLC) to create coop-eration among the sub-flows of a CMT-SCTP association. The cooperation amongthe sub-flows evenly distributes the loss rate probability on all sub-flows. Thus, thediscrepancy among the sub-flows diminishes. Our contributions are as follows:– We integrated network coding into CMT-SCTP (called coded CMT-SCTP).– We proposed an adaptive network coding scheme to control the number of re-dundant packets depending on the network dynamics.– We proved that our scheme increases the frequency of receive buffer unloadingusing a queuing theory approach.– Our coded CMT-SCTP scheme outperforms the original CMT-SCTP by 22%-62% depending on the path dissimilarities and receive buffer size.• In Chapter 4, we investigated the effect of utilizing more than one path on thefairness of CMT in competing with other SCTP or TCP flows coming from fartheraway sources. The results of our investigation motivated us to design a dynamicmechanism using reinforcement learning to alleviate the aggressive behavior of CMT-SCTP by fine tuning CMT-SCTP parameters. Note that our investigation is focused14Chapter 1. Introductionon multi-hop networks with minimal to zero mobility, such as wireless mesh networks.The result of this study is one step forward in practical exploitation of CMT-SCTPin multi-hop home or office area networks or even in larger scale wireless networks.The contributions are as follows:– We evaluated the fairness of CMT-SCTP against non-CMT flows coming fromfarther away hops in a real multi-hop wireless environment.– We used our findings to develop a dynamic distributed fairness mechanism toalleviate the aggressive behavior of CMT-SCTP towards non-CMT flows comingfrom farther away hops.– We compared the results of our findings to standard CMT-SCTP and CMT/RP-SCTP. Our proposal outperforms the standard CMT-SCTP with a high margin.• In Chapter 5, we adapted the idea of the distributed fairness mechanism to TCP. Inour proposed algorithm, each node uses a reinforcement learning approach to modelthe dynamics of the network and fine tune TCP parameters based on the perceivedcharacteristics of the environment. Our algorithm preserves autonomy of each nodein the decision making process and does not require a central control mechanismor control message exchange among nodes. Unlike the existing machine learningsolutions, our proposal is compatible with the computational capacity of the currentinfrastructure. We have named our approach Q-learning TCP. The contributions ofChapter 5 can be summarized as:– We proposed a dynamic distributed fairness mechanism for TCP over wirelessmulti-hop networks that does not require any feedback messages or a centralcontrol.15Chapter 1. Introduction– We evaluated our proposal against standard TCP, TCP ex Machina and otherTCP flavors over a real testbed.– Our proposal enhances TCP fairness by 10% to 20% without any feedbackmessaging and any extra overhead to the medium.1.4 Thesis OrganizationThe present thesis is organized as follows: In Chapter 2, we presented a brief summaryof the tools we used to design our proposed algorithm along with details of our testbedsetup. In Chapter 3, we proposed the coded CMT-SCTP to address the receive bufferblocking in CMT-SCTP over a multi-hop environment. The proposed mechanism is eval-uated and compared against standard CMT-SCTP and proved to be effective. In Chapter4, we proposed our distributed fairness mechanism for CMT-SCTP over multi-hop wire-less networks. We evaluated our proposal against other available fairness mechanism forCMT-SCTP over multi-hop wireless setting. We used our findings in Chapter 4 to enhanceTCP fairness over multi-hop wireless networks in Chapter 5. The proposed mechanism inChapters 5 and 4 uses Q-learning to monitor the dynamics of the network and fine-tunethe parameters of the transport layer to enhance fairness against flows coming from fartheraway hops. We used real testbeds to evaluate the performance of our proposals both inChapter 5 and 4. Finally, this thesis is concluded in Chapter 6 and directions for futurework are discussed. The findings of each chapter have been published in a journal paper.16Chapter 2PreliminariesIn this chapter, we go over the tools we used to address the issues of the transport layerprotocol over multi-hop wireless networks. Besides, we explain the details of our testbedsetup. The testbed is used with different number of nodes and different topologies in eachchapter; the network topology of each chapter is explained in each chapter in the testbedsection.Section 2.1 provides a brief summary on network coding. Sections 2.2 and 2.2.3 presentan introduction to Markov Decision Process (MDP) and Q-learning. We go over our testbedsetup in Section Network CodingWe use network coding in Chapter 3 to address receive buffer blocking in CMT-SCTP. Butwhy network coding is an effective solution for addressing the receive buffer blocking?2.1.1 Why network coding?SCTP uses transmission sequence number of the packets as a mean to guarantee the reliabledelivery of the data to the receiver. In case of a packet loss, the packets with highertransmission sequence number are stored in the receive buffer until the arrival of the missedpacket. The conventional transport protocols are designed to recover from congestion lossesby reducing congestion, and are inefficient when random losses are encountered. Networkcoding is one way to introduce redundancy in the transmitted data stream so that the17Chapter 2. Preliminariesreceiver needs not wait for retransmission of a dropped packet if it can be recovered fromcoded versions of the packet.2.1.2 What is network coding?Network coding is a simple but effective method in communication networks in whichpackets are combined together instead of simply being forwarded at the routers. Thecoded packets are processed to recover the original packets in the receiver. The codedpackets are categorized into two groups: redundant and innovative. An innovative packetis a coded combination which is linearly independent from the previously coded packets;while a redundant packet is a coded combination with dependency on previously codedpackets. The idea was first implemented by Ahlswede et al. [82] in a butterfly network.Following Ahlswede’s work, extensive literature on practical and theoretical analysis ofnetwork coding demonstrate that network coding results in substantial improvement inboth throughput and reliability of a network [83][84][28][85][86][87].The pioneer work of Sundararajan et al. [86][87] in adding a network coding layerbetween the IP and TCP layer demonstrates that RLC can be adopted to higher layersand enhance throughput. Sundararajan et al. used the concept of degrees of freedom at thereceiver side to decide whether an ACK should be sent out. If the recently arrived codedpacket carries a new linear combination of the original packets (innovative packet), thereceiver increases the number of seen packets so far and sends an ACK confirming a newpacket arrival. The number of seen packets is a representative of the number of independentlinear equations received so far. Every time a new packet is seen at the receiver, thedecoding process is triggered and an original packet is recovered with a high probability. Assuch, network coding does not disturb the timing of the TCP acknowledgment mechanism.One of the drawbacks of the Sundararajan et al. approach is that the method has to betailored to the byte-oriented nature of TCP and hence requires extra packet pre-processing.18Chapter 2. PreliminariesHowever, the message-oriented nature of SCTP eliminates the need for pre-processing of thepackets. In addition, the coding algorithm proposed in [86] is not attuned to a dynamicenvironment. Instead, it assumes a constant packet loss rate and transmits redundantpackets just to cover the pre-calculated loss regardless of any changes in the environmentwhich makes the proposed method less suitable for dynamic and unreliable environmentssuch as multi-hop wireless networks.2.1.3 How to estimate the number of redundant packets?To recover all the original packets involved in the linear network coding process, we need tohave enough linear combinations at the receiver. Thus, we need to add enough redundantpackets to mask the random losses of the network. When using network coding in adynamic environment such as multi-hop wireless network, the packet loss rate is unknown.Therefore, we need to find a way to estimate the number of redundant packets to maskthe random losses of the network. We need a sequential decision making process to decidewhen to send an innovative and when to send a redundant packet based on the dynamicsof the network. Markov Decision Process (MDP) is a powerful technique for sequentialdecision making problems in dynamic environments. We are going to elaborate more onMDP in 2.2.in Chapter 3, we adapt Sundararajan’s degrees of freedom approach to CMT-SCTPto eliminate the reliance of the congestion control algorithm on the packet’s transmissionsequence number. First, our approach creates cooperation among multiple sub-flows of datawithin an CMT-SCTP association by combining packets from different sub-flows together;this way, different characteristics of the paths are less disruptive for packet processing at thereceiver side. Second, the machine learning aspect of our scheme addresses the inflexiblebehavior of [86] by monitoring the ever-changing nature of an underlying network andadjusting the number of redundant packets accordingly.19Chapter 2. Preliminaries2.2 Markov Decision ProcessIn Chapters 3, 4, and 5, we use MDP [88] to model the multi-hop wireless network.2.2.1 What is MDP?Each MDP has finite number of states and there is a limited number of actions available tothe learner module in each state. The learner module monitors the changes of environmentvia the feedback of the reward function. The transition probability matrix of the MDPis not known at the beginning of the learning process. One popular way to learn thetransition probabilities of the states on an MDP is to use Q-learning [89].MDP is a powerful tool to model a multi-hop wireless network. These networks have astochastic environment that changes frequently; each node within the network is a decisionmaker or an agent that interacts with the environment via taking actions. The actionscan be complex such as deciding whether to send a redundant or innovative packet or assimple as changing the maximum congestion window size of TCP or CMT-SCTP. Theobjective of the network can be translated into a reward function to give feedback to thenode. The environment can be modeled as an MDP to optimize the network objectivesuch as fairness, network utilization, or any other network objective. Besides, MDPs arevery versatile and can be defined with different parameters and states to serve the designrequirements of each network. For instance, in Chapter 3, we use MDP to optimize thenetwork coding process while in Chapters 4 and 5, we use MDP to optimize the fairnessaspect of the network with unique state definition in each case.2.2.2 Why MDP is a good fit for multi-hop wireless environment?In dynamic environment such as a multi-hop wireless network, static decision commandscan lead to inefficient resource allocation. Thus the dynamic set of policies by MDP is20Chapter 2. Preliminariesa logical choice. Moreover, MDP can support a balance utility function with differentobjectives.When the transition probabilities of the MDP are not known, reinforcement mechanismsuch as Q-learning are used to infer the transition probabilities via learning. More detailson Q-learning as presented in Q-learningQ-learning is a class of Reinforcement Learning mechanism (RL) first introduced by Watkinset al. [89]. RL algorithms model the world as a Markov Decision Process (MDP). RL algo-rithms are based on the basic idea of learning via interaction. The learner module is calledan agent, and the interactive object which is the subject of learning is the environment.During the learning process the time is divided into decision epochs. Each Q-learningagent has four elements: a set of states, a set of actions, a state-action dependent rewardfunction, and a state-action dependent transition probability matrix.The learning agent uses a reward function to receive feedback on the consequences oftaking an action. The interaction between the agent and the environment helps the agentto construct a mapping of the possible states-actions. The agent mapping is called thepolicies and shows how the agent changes its decisions based on the different responsesfrom the environment. As time passes, the mapping gets more inclusive and the agentlearns almost all the possible (state, action) pairs and their associated reward/penaltyvalues and can cope with any changes in the environment. The memory of the agent iskept in a matrix called Q. The rows of Q represent the current state of the system and thecolumns represent the possible actions leading to the next state. At the beginning of thelearning process, the learning agent does not know the transition probabilities of the MDPand the associated reward with each transition. After each decision epoch, Q is updated21Chapter 2. Preliminariesas in equation (2.1).new value︷ ︸︸ ︷Qt+1(st, at) = (1− α)old value︷ ︸︸ ︷Qt(st, at) +learned value︷ ︸︸ ︷α[r(st, at)︸ ︷︷ ︸reward observed after performing at in st+estimated future value︷ ︸︸ ︷γmaxaQt(st+1, a)] (2.1)Equation (2.1) is called the Q-learning or Ballard rule [88]. The objective of the Q-learningrule is to inform the agent of the possible future rewards along with the immediate rewardsof the latest action. α is the learning rate of the agent and determines the importance ofthe newly acquired reward. α varies between 0 and 1. A factor of 0 causes the agent notto learn from the latest (action, state) pair, while a factor of 1 makes the agent to onlyconsider the immediate rewards without considering the future rewards. γ is the discountfactor and determines the importance of future rewards. γ varies between 0 and 1. A 0discount factor prohibits the agent from acquiring future rewards, while a factor of 1 pushesthe agent to only consider future rewards. The goal of the learning agent is to maximizeits total reward via taking the suitable action in each state.Choosing α and γ has a direct effect on the convergence of the Q-learning mechanism[88]. Based on the recommendation of [89][90] choosing γ = 0.9 and α = 1(1+t)ωcan decreasethe convergence time to a polynomial degree. Authors of [90] showed that ω = 0.77 canbe an optimum for the learning rate.The reward function plays a key role in the Q-learning process [90][91][92]. The learningagent uses the reward function to determine the consequence of the latest action and toencourage the learning agent to stay in/approach the goal state. The reward functionserves as a gradient function to the learning agent. The correct direction to maximize thelearning rule is conveyed by the reward function through immediate rewards or penaltiesafter each action. Choosing a Gaussian reward function complies with the suggestions of22Chapter 2. Preliminaries[93] for a good reward function. The Gaussian function is almost uniform in states far fromthe goal state, and has an increasing gradient in a belt around the goal state. As such, theGaussian reward function can provide efficient direction to the agent on how to approachthe goal state from any other state.2.3 Testbed SetupTo evaluate the performance of our proposals, we set up a testbed in an actual office envi-ronment (Figure 2.1). We used Raspberry Pies as wireless nodes in our testbed. RaspberryPi [94] is a tiny affordable computer that can host multiple wired/wireless interfaces. In ourtestbed, A and B are the client (data sender) nodes, D, E, and F are data forwarder/routernodes, and G is the server (data receiver) node. We used FreeBSD [95] on client and servernodes and Debian 7 [96], another Unix-like operating system, on nodes acting as a router.While FreeBSD provides a very extensive SCTP suite, including the CMT feature, it onlysupports Wi-Fi adapters in the client mode, and not in the host or forwarder mode. Assuch, we used Debian 7 for nodes in the middle that has to play a role in forwarding/routingpackets. We used TP-LINK nano USB adapters as 802.11 g Wi-Fi interface [97], as theyare cheap and they work well with FreeBSD and Debian 7.In terms of the interference, note that there are additional sources of noise from the40 laptop users, 5 Cisco routers, and other wireless devices (such as the cell phones of thestaff) present in the office area where we set up our testbed. The data collections tookplace in different hours of the day to make sure that all possible outcomes are covered inour data set. We used Wireshark [98] for data collection in our measurements.23Chapter 2. Preliminaries2.3.1 CMT-SCTP testbed setupTo evaluate the performance of our CMT-SCTP related proposals, we set up the testbedwith the following characteristics: The testbed is set up in a way that one wireless interfaceof node B and C communicates with node D, which forwards data to one wireless interfaceof node G. The other wireless interface of node B and C communicates with node F, whichforwards data to node E (using static routing). In turn, node E forwards data from nodeF to the other wireless interface of node G. Node A can communicate with either node Dor node B depending on the testing scenario ( node D for evaluations of Chapter 3 andnode B for evaluations of Chapter 4), which forwards the data to one wireless interface ofnode G.ABC DEFGRaspberry Piwireless interface on channel 1wireless interface on channel 6wireless interface on channel 11wired interfaceFigure 2.1: Testbed setup for CMT-SCTP evaluationsAs depicted in Figure 2.1, nodes A, D, E, and F have only one wireless interface; whilenodes B, C, and G have two wireless interfaces. In addition nodes D, E, and F have wiredinterfaces to allow static routing. Note that D has two wired interfaces. Each wirelessinterface is setup on a different channel (see Figure 2.2) to guarantee an independent datapath for sub-flows. Only wireless interface of node A, one wireless interface of node B andC (forwarding data to node with D), only wireless interface of node D, and one wirelessinterface of node G (receiving data forwarded by wireless interface of node D) are tuned tochannel 1; the other wireless interface of node B and C (forwarding data to node F) and24Chapter 2. Preliminariesonly wireless interface of node F are tuned to channel 6; only wireless interface of nodeE and the other wireless interface of node G (receiving data forwarded from the wirelessinterface of node E) are tuned to channel 11.1 2.42 2.43 2.44 2.45 2.46 2.47 2.48 2.49 2.410 2.411 2.4channel central frequency (MHz)Figure 2.2: IEEE 802.11 channels2.3.2 TCP testbed setupTo evaluate the performance of our proposal in real world setting, we set up the topology asfollows: The blue nodes in Figure 2.3 are the employees with their laptops (MacBook Proor MacBook Air) which connect to the Internet via Router R1 (extender) or R2. RoutersR1 and R2 connect to the Internet through the gateway (purple node). We added node Bwhich is both a source and forwarder node. Node B can forward to R1 and R2; moreover,we set up a static routing table inside node B that forwards packets from node A to nodeC without forwarding them to R1 or R2.gatewayrouterwireless nodessample wireless nodesforwarder/wireless nodesAB CR1R2Figure 2.3: Testbed setup for TCP evaluations25Chapter 3Network Coding As a Performance Booster forConcurrent Multi-path Transfer of Data In Multi-hopWireless Networks3.1 IntroductionIn this chapter, we propose a network coding scheme for CMT-SCTP that creates cooper-ation among the multiple sub-flows of a CMT-SCTP association to eliminate the receivebuffer blocking problem. The main cause of the receive buffer blocking in reliable transportlayer protocols is the dependency of the congestion control mechanism on transmission se-quence number of packets. The proposed scheme eliminates the reliance of transport layerprotocol on TSN of packets using RLC. RLC is a technique where all participating nodessend out random linear combination of the packets in hand to utilize the lossy path ca-pacity [29][28]. RLC uses redundant packets to mask the random losses of the path andto guarantee that the receiver is capable of recovering the original packets. Our proposednetwork coding scheme uses machine learning to control the number of redundant packets.In particular, we use a Q-learning mechanism [89](See Section 2.2.3) along with a classifierinside the Q-learning mechanism to determine when the sender has to transmit redundantpackets. Our contributions are as follows:• We integrated network coding to the multi-path transport protocol CMT-SCTP(called coded CMT-SCTP).26Chapter 3. Receive Buffer in CMT-SCTP• We proposed an adaptive network coding scheme based on a Q-learning algorithm[89] to control the number of redundant packets depending on the network dynamics.• Using a queuing theory approach, we proved that our coded CMT-SCTP schemereduces the possibility of receive buffer blocking.• Our coded CMT-SCTP scheme outperforms the standard CMT-SCTP with a highmargin (up to 62% depending on the path dissimilarities and receive buffer size).• We also set up a real testbed and confirmed our theoretical and simulation results.This chapter is organized as follows. Section 3.2 presents the details of our coded CMT-SCTP scheme. Our claims in eliminating receive buffer blocking using coded CMT-SCTPis proved using queuing theory in Section 3.3. The accuracy of the machine learning mech-anism is evaluated on a testbed in Section 3.4. Section 3.5 and 3.6 discuss the simulationand testbed results on performance of coded CMT-SCTP, respectively. Section 3.7 dis-cusses how to adapt coded CMT-SCTP to more general network settings rather than justmulti-hop wireless networks. Section 3.8 concludes this chapter.3.2 Coded CMT-SCTPCoded CMT-SCTP is designed as an extension to the multi-path transport protocol CMT-SCTP [6]. Coded CMT-SCTP has the sender and the receiver sides. Most of the changes tothe standard SCTP-SCTP scheme happens at the sender side. Coded CMT-SCTP at thesender side consists of three parts: the CMT-SCTP module, a coding module, and a groupof learning agents, one for each sub-flow within the CMT-SCTP association. The codingmodule manages the network coding process in coordination with the learning agents oneach sub-flow. The learning agent is designed to monitor and track the dynamics of thepath on each sub-flow to ensure the recovery of the coded packets at the receiver side.27Chapter 3. Receive Buffer in CMT-SCTPOne of the main challenges of coding module in managing the coding process over anunreliable environment such as multi-hop wireless setting is the recovery of the originalpackets at the receiver despite the random losses of the network. Using redundant packetsto cover the random losses of network and to assure the recovery of all the packets involvedin the coding at receiver side is one of the known techniques in network coding. However,one of the main issues of the available network coding proposals is that they simply assumea prior knowledge of the path loss rate and inject redundant packets based on the assumedloss rate [86][87]. Our proposal addresses this issue by using a Q-learning agent to estimatethe number of redundant packets needed to mask the random losses of the environment.We go over the details of coded CMT-SCTP at both sender and receiver side in Sections3.2.1 and 3.2.2, respectively.3.2.1 Sender sideThe sender side of coded CMT-SCTP is composed of the standard CMT-SCTP suite,coding module and a group of Q-learning agents (one per sub-flow of the association).Coding moduleThe coding module is responsible for generating the coded packets from the originalpackets and to keep track of original packets within the sender buffer. The coding modulecreates a coded packet pk in the form of pk = Σiαipi where αi is a randomly pickedcoefficient and pi is the ith original packet in the send buffer. The coded packet can beeither redundant or innovative. The redundant packets carry linear equations with the sameunknown set as the previous coded packet but with different random coefficients. Whena coded packet is not redundant, i.e., it carries a different set of unknowns compared tothe previous coded packet, it is called an innovative packet. The details of our codingalgorithm at the sender side is presented in Algorithm 3.1. In Algorithm 3.1, destinationi28Chapter 3. Receive Buffer in CMT-SCTPAlgorithm 3.1 Coding Module at Sender Side1: numSacked = 02: for each sub-flowi do3: numSackedi = 0, numInnovativei = 04: numRedundanti = 0, numSenti = 05: end for6: if packet from application layer then7: if destinationi is the next destination then8: while CWNDi has room do9: if rBuff has room then10: numSenti = numInnovativei + numRedundanti11:action =Qlearni(numSeen, numSeeni,numInnovativei, CWNDi, numRedundanti,RTTi, numSenti)12: if action is send an innovative packet then13: numInnovativei + +14: Create an innovative packet p = Σαipi15: else16: numRedundanti + +17: Create a redundant packet p = Σαipi18: end if19: Add packet p to send buffer20: Send packet p to IP layer21: end if22: end while23: end if24: else if SACK arrives from the receiver then25: Update numSacked as numSeen26: Update each numSackedi as numSeeni27: Dequeue send buffer up to numSacked28: if Duplicate SACK arrives then29: retransmit the missing packets30: end if31: end if29Chapter 3. Receive Buffer in CMT-SCTPis the destination address bound with sub-flowi, CWNDi is the congestion window ofsub-flowi, and rBuff is the receive buffer size for the entire association.Q-learning agentEach sub-flow within the CMT-SCTP association is equipped with a stand alone Q-learningagent that monitors the path features such as RTT, number of seen packets so far, numberof sent packets so far, and the congestion window size of the path. The learning agentconsiders the network as an MDP with two states: decodable and non-decodable (Figure3.1).decodablenon- decodableredundant P2innovative P1redundant, P4redundant P6innovative P5innovative, P3Figure 3.1: Markov Decision Process of coded CMT-SCTP. When the transition probabil-ities P1, P2, ..., P6 are unknown, Q-learning is used to learn the reward on each state-actionpair.In the decodable state, there are enough coded packets (independent linear equations)for the receiver to recover the original packets and the receiver is able to decode the codedpackets. In the non-decodable state, due to random losses of the medium, there are notenough coded packets (independent linear equations) for the receiver to recover the originalpackets. Obviously, the proposed model is an MDP because (i) the transition probabilitiesonly depend on the previous state of the system and (ii) the Markovain rule holds for theabove model. We use learning agents in our coding algorithm to control the number ofredundant packets per sub-flow and to make sure that the receiver is able to decode thepackets and recover all the original packets sent. Moreover, the Q-learning agent uses aclassifier to determine the state of the system as either decodable or non-decodable. Each30Chapter 3. Receive Buffer in CMT-SCTPQ-learning agent uses logistic regression classifier [99] to determine the state of the system.Then, the state of the system along with the Q-learning rule in equation (2.1) determinesthe action to be taken. The Q-learning take actions by informing the coding module tosend either a redundant packet or an innovative one.When the sender plans to transmit a packet, it makes an inquiry to the Q-learning agentto determine whether a redundant or innovative packet has to be sent. Upon arrival of aninquiry, the Q-learning agent determines the state of the network based on the observedvariables from the network using the logistic regression classifier. The set of observedvariables include the number of seen packets, number of sent packets so far, round triptime, congestion window size, number of innovative and redundant packets per sub-flow andnumber of seen packets for the whole association. The observed variables are accessible bySCTP and the learning agent to classify the network into either decodable or non-decodablestate. The learning agent determines the next action (that is creating an innovative packetor redundant packet (Figure 3.1) based on the Q-learning rule of equation (2.1) and informsthe coding module. Accordingly, the coding module creates a (redundant or innovative)coded packet by adding coding coefficients to the original packets involved in the codingand then sends the coded packet to the IP layer. To guarantee the orderly arrival of thepackets at the receiver, we buffer the packets at the sender in First-In First-Out (FIFO)fashion prior to the coding process. When the sender plans to transmit another packet,the Q-learning agent determines the new state of the system and calculates the reward ofthe last action and updates the Q-matrix.We define the reward function as a Gaussian with the parameter ds in such a way tofollow the difference between the number of innovative packets and number of seen packetsin each state and reward/penalize the agent for any changes in the gap size as in equation(3.1). This way, the agent gets rewarded or penalized as the difference in equation (3.1)31Chapter 3. Receive Buffer in CMT-SCTPgets smaller or bigger, respectively.ds = number of innovative packets− number of seen packets (3.1)Since the learning agent is located at the sender module in the transport layer, it canstay alive even after the SCTP association is closed. The Q-matrix of the learning agentworks as a form of memory; therefore, if a new association starts, the memory of thelearning agent already estimates the number of redundant packets precisely. Algorithm3.2 is for the learning agent process. In Algorithm 3.2, in line 11, s is the current state,a is the action taken in state s, s′ is the state that the agent ends up after taking actiona′. Qt(s′, a′) is the speculation on future rewards. α and γ are the learning rate anddiscount factor, respectively. We choose α = 1(1+t)0.77to increase the convergence rate to asatisfactory level according to [90]. We choose γ = 0.9 according to [90] [89] to stay nearthe optimal zone while encouraging the agent to explore more states.In Algorithm 3.2, the number of redundant or innovative packets is not determinedby the actions; the action only tells the CMT-SCTP to either “send a redundant packet”or “send an innovative packet”. However, the classifier uses the number of redundant orinnovative packets so far as a feature to determine the decodability or non-decodabilitystate of the system. Hence, the classifier does not change the definition or feature of anystate. As an example, an agent might be used to guide a robot to a goal state using rightor left commands as actions. Then, the MDP might use the number of right and leftcommands so far as a feature to determine if the robot is moving towards the right state.3.2.2 Receiver sideCoded CMT-SCTP at the receiver side consists of a decoding module and a slightly changedCMT-SCTP.32Chapter 3. Receive Buffer in CMT-SCTPAlgorithm 3.2 Learning Agent of sub-flowi{Initialization}1: γ = 0.9, α = 1(1+t)0.772: Set matrix Qi to zero{Execution of the Learning Agent module}3: if first decision epoch then4: Determine the state s, using logistic regression classifier [99]5: Choose an action that maximizes Qt+1(s′, a′)6: Notify CMT-SCTP on new action, i.e., redundant or innovative7: end if8: for each decision epoch except the first one do9: Determine the state s, using logistic regression classifier [99]10: Calculate the reward based on s, and the action a11: Update Q-learning rule [89]Qt+1(s, a) = (1− α)Qt(s, a) + α[r(s, a) + γmaxa′Qt(s′, a′)] (3.2)12: Calculate all possible Qt+1(s′, a′) where s′ is the next possible state and a′ is the set ofavailable actions in that state.13: Choose an action that maximizes Qt+1(s′, a′)14: Notify CMT-SCTP on new action, i.e., redundant or innovative15: end forDecoding moduleAt the receiver side, upon arrival of a new packet from any of the paths, if the newlinear equation carried by the packet is independent of the existing equations (that is if aninnovative packet arrives) and results in “see”-ing the next packet, the packet is bufferedand the coding coefficients are retrieved from the packet header and added to the decodingmatrix as a new row. The term “seen” was first introduced by Sundararajan et al. [87].Based on Sundararajan’s definition, packet pi is seen when the receiver admits a packetwith a linear equation in the form of pi + Σjαjpj in which j > i. After adding each newrow, a Jordan-Gaussian elimination process is deployed to the decoding matrix to retrieveany original packet and deliver the decoded packet to the upper layer. The decodingmodule, then, informs the CMT-SCTP protocol suite on the number of seen packets sofar. Algorithm 3.3 summarizes the decoding module mechanism on the receiver side.33Chapter 3. Receive Buffer in CMT-SCTPAlgorithm 3.3 Receiver Side{Initialization}1: numSeen = 02: for each sub-flowi do3: numSeeni=04: end for{Handling Events}5: if data packet from sender then6: if Innovative packet arrives then7: numSeen+ +8: numSeeni + +{where i is the sub-flow the innovative packets arrives from}9: Add the packet to the receive buffer10: Retrieve the coding header11: Add the linear combination to the decoding matrix and perform Gauss-Jordan elimina-tion12: Pass any newly decoded packet to the upper layer13: Inform CMT-SCTP suite of numSeen and each numSeeni14: else if Redundant packet arrives then15: Drop the packet16: end if17: end ifChanges to standard CMT-SCTPCoded CMT-SCTP does not change the selective acknowledgment mechanism of standardCMT-SCTP. However, when the receiver sends a SACK, instead of inserting the expectedTSN number, the receiver inserts the expected number of seen packets in the acknowledg-ment field (based on Sundarajan’s degree of freedom concept) [86][87].3.3 A Queuing Theory Based Proof on the Performance Gainof Coded CMT-SCTPWe use a queuing theory approach to prove that network coding can effectively solve thereceive buffer blocking issue. To provide a fair comparison, we use a concept called thevirtual queue at the receiver side and compare the average number of packets in the virtual34Chapter 3. Receive Buffer in CMT-SCTPqueue in both standard CMT-SCTP and the coded CMT-SCTP. In network coding, thevirtual queue is defined as the difference between the number of original packets involvedin the coding process and the number of seen packets so far. Virtual queue shows howmany packets are required to decode a group of original packets. Decoding at the receiverhappens every time the length of the virtual queue becomes zero. Then, the receiverempties its receive buffer by delivering all the decoded packets to the application layer(called unloading of receive buffer). Therefore, the average length of the virtual queuein a coded multi-path transport layer is a reasonable metric to monitor the unloading ofreceive buffer. We adapted the virtual queue concept to the standard CMT-SCTP andcoded CMT-SCTP (called it re-sequencing queue) to make the comparison possible. Wedefined the re-sequencing queue as the difference between the string sent by the sender sideand the one received at the receiver side. Imagine that sender has the following packetsin the buffer with TSNs of 1,2,3,4,5,6. The sender transmits the packets and the receivergets the following string 1,2,4,5,6. Although the receiver has all the packets except 3, thetransport layer can only deliver packets 1 and 2 to the application layer and packets 4,5,6have to stay in the receive buffer. Comparing the sender string and the receiver string,there is a difference of 1 between the two strings (packet 3 is missing) and the differencecauses the receiver to stall other packets in the buffer. When the difference between thesender string and receiver string is 0, i.e., there is no difference between the receiver andsender string, then the receive buffer gets emptied.3.3.1 Virtual queue in the coded multi-path transport layerWe use the findings of [100] to calculate the virtual queue characteristics. We assume thatpackets arrive to the sender with a Bernoulli process at rate λ. This is a safe assumptionconsidering the fact that most of network queuing theory modeling assumes a Poissonarrival process which is the continuous form of Bernoulli distribution. In concurrent multi-35Chapter 3. Receive Buffer in CMT-SCTPpath data transfer, we have multiple transmission paths between the sender and receiver; ifwe assume the ith transmission path delivers the data with probability of µi in each time slot(exponential distribution by definition), then all transmission paths hold an exponentialdistribution with mean service rate of µ =∑µi in coded CMT-SCTP. The Markov chainfor the length of the virtual queue is presented in Figure 3.2.0 1 2λµ̄µλ̄λµ+λ̄λµ+µ̄λ̄ λµ+µ̄λ̄λµ̄ λµ̄µλ̄ µλ̄Figure 3.2: Markov chain for the length of virtual queue in coded CMT-SCTP.If ρ = λµ< 1, then the chain is positive recurrent and the steady state probability ofthe virtual queue is pii = (1−α)αi [101], where α = λ(1−µ)µ(1−λ) . Consequently, the steady stateexpected length of the virtual queue (E{qv(t)}) is calculated as in equation (3.3):E{qv(t)} =∞∑i=0i.pii = (1− µ) ρ1− ρ (3.3)Then, the average time it takes for the receive buffer to unload is proportional to thesteady state expected length of the virtual queue in equation (3.3).3.3.2 Re-sequencing queue in standard multi-path transport layerBased on [102], the orderly transmission of the packets over a channel can be depicted as atandem network as in Figure 3.3. The first server represents the channel behavior and thesecond server represents the re-sequencing behavior of the receive buffer. The first queueassociated with the first server is basically representing all the different paths betweenthe sender and the receiver. Because we assume an exponential distribution for the ith36Chapter 3. Receive Buffer in CMT-SCTPtransmission path erasure behavior, multiple queues can be substituted with one queuewith an exponential behavior. We know the arrival rate of the first queue, but not thearrival rate of the second queue which is equal to the departure rate of the first queue. TheBurke’s theorem [102] offers a great solution for sequential queues. According to Burke’stheorem, for a M/M/1 queue, M/M/c or M/M/∞ queue in a steady state, the arrival rateis the same as the departure rate [102]. Using Burke’s theorem [102], the arrival rate of thesecond queue is λ; that is, same as the departure rate of the first queue and is independentof the service rate of the first queue. As such, we can analyze the behavior of both queuesseparately.λ µserver 1 server 2ξFigure 3.3: Queuing schematic for the re-sequencing process in standard multi-path datatransfer; server 1 models the channel behavior while server 2 models the re-sequencingbehavior of receive buffer.The Markov chain of the first queue in Figure 3.3 is exactly the same as that in Figure3.2. If we assume that the service rate of the second queue is ξ, i.e., in-order packet arrivalshave the probability of ξ and out of order packet arrivals have the probability of 1 − ξ,then the Markov chain for the re-sequencing behavior of the receive buffer (i.e., the secondqueue in Figure 3.3) looks like that in Figure 3.4.0 1 2λξ̄λ̄ξλξ+λ̄λξ+λ̄ξ̄λξ̄ λξ̄λ̄ξ λ̄ξλξ+λ̄ξ̄Figure 3.4: Markov chain for the re-sequencing behavior of the receive buffer.If ρ1 =λµassociates with the first queue and ρ2 =λξassociates with the second queue,37Chapter 3. Receive Buffer in CMT-SCTPthen based on [101], the steady state probability of each queue can be calculated as equation(3.4) and (3.5), respectively.pi1i = (1− α1)αi1, α1 = λ(1−µ)µ(1−λ) (3.4)pi2i = (1− α2)αi2, α2 = λ(1−ξ)ξ(1−λ) (3.5)Because the behavior of the two queues are independent of each other, the total numberof re-sequenced packets (E{qr(t)}) can be calculated according to equation (3.6).E{qr(t)} = E{size of queue1}+ E{size of queue2}=∞∑i=0i.pi1i +∞∑j=0j.pi2j= (1− µ) ρ11− ρ1 + (1− ξ)ρ21− ρ2(3.6)E{qr(t)} shows the average amount of difference between the actual sequence transmittedby the sender and the one that arrives at the receiver. A higher E{qr(t)} results in higherreceive buffer occupancy rate and longer waiting time for the packets to be transfered tothe upper layer. Comparing equations (3.3) and (3.6), one can easily conclude E{qr(t)} >E{qv(t)}. That is, on average the receive buffer of the coded transport layer unloads morefrequently and the wait times of the packets are less as compared to those of the standardCMT-SCTP. Therefore, we conclude that the probability of receive buffer blocking issuein the coded CMT-SCTP is smaller as compared to in the standard CMT-SCTP.38Chapter 3. Receive Buffer in CMT-SCTP3.4 Testbed Experiments For Accuracy Analysis Of TheQ-learning AlgorithmIn our earlier work [1], we used a naive Bayes classifier with the Q-learning agent todetermine the state of the network. However, the performance of the naive Bayes classifiercan be compromised because of the unrealistic assumption that all features are equallyimportant and independent given the value of the class. To increase the accuracy of eachQ-learning agent in determining the state of the system, naive Bayes classifier is substitutedby logistic regression [99]. Although logistic regression is not as biased as naive Bayestowards the features, it suffers from over-fitting. We propose using logistic regressionclassifier with some correction to the prior estimation and data sampling to prevent theover-fitting behavior. In this section, a comparison between the performance of the naiveBayes classifier and logistic regression classifier is studied via testbed experiments.ABC DEFGRaspberry Piwireless interface on channel 1wireless interface on channel 6wireless interface on channel 11wired interfacelaptop used for pingingFigure 3.5: Testbed topology - showing multiple data paths for CMT-SCTP.For evaluation purposes, we used the testbed setup described in 2.3 and Figure 3.5. Interms of the data traffic, we created two multi-homed CMT-SCTP associations betweennodes B-G and C-G. In addition, we created a single-homed CMT-SCTP association be-tween nodes A-G. Nodes A, B, and C each send 10,000 SCTP packets to node G duringthe experiment. Each SCTP packet carries 1000 bytes of application data. The data paths39Chapter 3. Receive Buffer in CMT-SCTPare as follows:• node A: A-D-G• node B: B-D-G and B-F-D-E-G• node C: C-D-G and C-F-D-E-GTo guarantee path discrepancies, we created two types of interference traffic in thenetwork. One interference traffic is generated by the two CMT-SCTP associations setup between nodes A-G and C-G to create path discrepancies between the sub-flows ofthe CMT-SCTP associations between nodes B-G and C-G. Nodes A and C send SCTPpackets carrying 1000 bytes of application data to node G during the experiment. Theother interference traffic is generated by a laptop with a wireless interface on channel 11.The laptop pings the wireless interface of node E at different rates of 10, 20, 30, 40, 50 pingpackets per second. Ping packets contribute as a controllable source of interference trafficto create path discrepancies for the sub-flows of the CMT-SCTP association between nodesB-G.Because we investigated the accuracy of the classifier within the learning agent whichis in charge of determining the state of the packet transmission over the path as either lostor received, we collected the training data and label them based on the receiver (node G)output on the status of each packet.The accuracy of a machine learning algorithm is defined as in equation (3.7). However,accuracy is not the only factor in choosing the right learning algorithm for a data set.Another important factor is the sensitivity of the machine learning mechanism to thepositive class incidents, especially when dealing with data sets of rare positive class. Ina data set with a rare positive class population, the learning algorithm might have ahigh accuracy due to over-fitting and classifying all samples as negative class or the large40Chapter 3. Receive Buffer in CMT-SCTPnumber of negative class samples. As such, another factor has to be measured to show thesensitivity of the learning algorithm towards the positive class incidents. The sensitivityof a learning mechanism is defined as in equation (3.8).Accuracy =number of correct predictionssample size(3.7)Sensitivity =correct predictions of the positive classtotal number of prediction of positive class(3.8)Although naive Bayes holds a reasonable accuracy, the sensitivity of the learner to thethe rare positive incidents is very low. Logistic regression on the other hand, when usedover an endogenous data set, holds a very high accuracy and sensitivity measure. Thestrategy for endogenous selections is to collect the instances of the positive class and thenchoose randomly from within the negative class instances. Reference [99] suggests to chooseas many as positive class within the samples from the negative class to create a balancetraining set. Logistic regression classifier with prior correction uses the pre-knowledgeabout the positive class to adjust its prediction. Both prior correction and endogenoussampling are methods to increase the sensitivity of the logistic regression classifier.For our Q-learning mechanism which is responsible for the timely transmission of theredundant packets to guarantee the solvability of the coding process at the receiver side, thesensitivity of the classifier is crucial. For a coding process, not only it is important to sendenough redundant packets to cover for all the random losses, but also it is crucial to sendthe redundant packets timely, to avoid any unwanted delay for decoding at the receiver.To show different levels of accuracy, we run naive Bayes, standard logistic regression,and logistic regression with prior correction (our proposal in this chapter) over a randomsampling training set and over an endogenous training set.All data sets collected from the testbed in Figure 3.5 are presented in Table 3.1 which41Chapter 3. Receive Buffer in CMT-SCTPClassifier Training set type Accuracy SensitivityNaive Bayes random sampling 60% 1%Naive Bayes endogenous sam-pling69% 45%Standard logistic regression random sampling 99% 13%Standard logistic regression endogenous sam-pling95% 70%Logistic regression with prior correc-tionrandom sampling 99% 13%Logistic regression with prior correc-tionendogenous sam-pling95% 75%Table 3.1: Performance comparison between naive Bayes and logistic regression classifiers.summarizes the different accuracy and sensitivity levels of naive Bayes, standard logisticregression, and logistic regression with prior estimation correction. As shown in Table 3.1,logistic regression with prior correction is the logical choice that can satisfy both sensitivityand accuracy needs. Thus, we used logistic regression with prior correction in Q-learningagents’ classifiers for the performance studies in Section 3.5 and Simulations For The Performance Of Coded CMT-SCTPWe used QualNet 7.1 [103] for our simulations. The standard CMT-SCTP module inQualNet was developed by Ilknar Aydin [74]. We used this CMT-SCTP module to developour coded CMT-SCTP module in QualNet as well.1 2 3 4 5path of sub-flow 11 2 3 4 5Chain AChain Bpath of sub-flow 2wireless interface on channel 1path of interference  trafficwireless interface on channel 6Figure 3.6: Simulation topology - Actual data flow is over chain A, while interference trafficis over chain B.42Chapter 3. Receive Buffer in CMT-SCTPWe set up a multi-hop wireless network of 5 multi-homed nodes located in a chaintopology (Figure 3.6, Chain A). Each multi-homed node in the chain is equipped with twoIEEE 802.11b wireless network interfaces tuned to two different channels (1 and 6) to avoidany interference among the CMT-SCTP sub-flows. The data rate for IEEE 802.11b is 2Mbps and the RTS/CTS mechanism is on. The transmission range is around 370 meters,the interference range is around 812 meters, and the carrier sensing range is around 520meters for both of the wireless interfaces. Neighboring nodes in the chain are 300 metersapart. A data source over CMT-SCTP is located at node 1 of chain A and continuouslytransmits data to node 5 of chain A. CMT-SCTP uses RTX-CWND as the re-transmissionpolicy. Each SCTP data chunk carries 1000 bytes of application data.As we are interested in the performance of coded vs. standard CMT-SCTP in thepresence of severe path discrepancies between the CMT-SCTP sub-flows, we set up asecond chain of nodes in the network (Figure 3.6, Chain B). The number of nodes in chainB is the same as the number of nodes in chain A. However, each node in chain B has onlyone IEEE 802.11b wireless interface operating on channel 1 and with the same wirelessproperties as the first wireless interface of the nodes in chain A. Chain B is located 450meters away from the chain A. To create background traffic (i.e., interference and henceloss) for the CMT-SCTP sub-flow running on path 1 of chain A, we set up CBR (ConstantBit Rate) traffic on chain B for the entire simulation time. Each CBR packet carries 1000bytes of data. We vary the CBR traffic rate as 0, 4, 8, 16, 24, 50, 60, 70, and 80 packetsper second to create different levels of background traffic in path 1 of chain A.Each simulation configuration is run 30 times and average values are calculated with95% confidence intervals. The simulation time is 7 minutes and the measurements are donefrom 2nd to 6th minute to ensure stable results. To show that coded CMT-SCTP effectivelyhandles receive buffer limitations, we tested our scheme with different receive buffer sizes.43Chapter 3. Receive Buffer in CMT-SCTPOur goal is to show that receive buffer limitation does not have a substantial effect on theperformance of coded CMT-SCTP unlike the standard CMT-SCTP. Based on the findingsof Aydin et al. [71], a small receive buffer size (less than 128KB) causes performancedegradation in standard CMT-SCTP. Therefore, we ran simulations for unlimited buffersize as well as 16 KB and 8 KB receive buffer sizes.Note that, our simulation results over the topology of Figure 3.6 can be generalizedto real life multi-hop wireless networks. In real life scenarios, different number of hopsor more complicated network topologies would only contribute to the path discrepanciesof the sub-flows of CMT-SCTP; therefore, the simulation results based on the simulationset up we used provides a strong evidence for the validity of our coding scheme over moregeneralized scenarios.We measured the number of bytes delivered to the receiving application over time(goodput) of standard CMT-SCTP vs. coded CMT-SCTP to show the effectiveness of ournetwork coding algorithms in a multi-hop wireless setting.Figures 3.7(a), 3.7(b), and 3.7(c) compare the performance of coded CMT-SCTP vs.standard CMT-SCTP when the receive buffer size is unlimited, rBuf = 16K, andrBuf = 8K. The rule of thumb to choose receiver buffer size is that the size has to beat least the bandwidth-delay product of the path (Round trip time multiply by the linkcapacity). For our simulation setup, the link has a data rate of 2Mbps; with a RTT of10 to 20ms, the bandwidth-delay product of the network is roughly around 2.5K to 5Kbytes. Thus the receive buffer sizes of 8K or 16K are large enough, yet under 128K to seethe receive buffer blocking issue.As depicted in Figure 3.7, both coded CMT-SCTP and standard CMT-SCTP performsimilarly with an unlimited receive buffer. However, when the receive buffer size is de-creased to 16KB, the performance of standard CMT-SCTP degrades drastically while the44Chapter 3. Receive Buffer in CMT-SCTP0 10 20 30 40 50 60 70 80CBR (packets/sec)20020406080100Goodput (Kbps)original CMT-SCTPcoded CMT-SCTPcoded CMT-SCTP ver. 1(a) rBuf = maximum in Linux0 10 20 30 40 50 60 70 80CBR (packets/sec)20020406080100Goodput (Kbps)original CMT-SCTPcoded CMT-SCTPcoded CMT-SCTP ver. 1(b) rBuf = 16K0 10 20 30 40 50 60 70 80CBR (packets/sec)10010203040506070Goodput (Kbps)original CMT-SCTPcoded CMT-SCTPcoded CMT-SCTP ver. 1(c) rBuf = 8KFigure 3.7: Goodput of standard CMT-SCTP vs. coded CMT-SCTP (proposed in thischapter) vs. coded CMT-SCTP ver. 1 [1] with different receive buffer (rBuf) sizes, insimulation.coded CMT-SCTP performs as if the receive buffer size is unlimited. Our results showthat coded CMT-SCTP outperforms the standard CMT-SCTP by a factor of 20% to 47%.When the receive buffer size is decreased to 8KB, the performances of both standard andcoded CMT-SCTP suffer; however, coded CMT-SCTP still out performs the standardCMT-SCTP by a factor of 22% to 62%. Note that, the performance of coded CMT-CMTstays close to the ideal performance (when the buffer size is unlimited) as the receive buffersize gets smaller, showing the effectiveness of coded CMT-SCTP to solve the receive bufferblocking problem in the standard CMT-SCTP.We also measured the performance of (the older) version 1 of our coded CMT-SCTPproposal in [1] for comparison and included the results in Figure 8. Older coded CMT-45Chapter 3. Receive Buffer in CMT-SCTPSCTP uses a single Q-learning agent for the entire CMT-SCTP association while ourcurrent coded CMT-SCTP proposal in this chapter has a group of learning agents (oneagent per sub-flow of the CMT-SCTP association). Expectedly, the performance of olderand those of current versions of coded CMT-SCTP are similar when the sub-flow paths havesimilar loss rates (both paths have low interference). However as the discrepancy in the lossrates of the two sub-flow paths increases, the performance of older version of coded CMT-SCTP gets worse. In the presence of severe path discrepancies, the older version of codedCMT-SCTP association stalls because of over-estimating the required number of codedpackets and hence flooding the network with redundant packets. However, the currentcoded CMT-SCTP proposal estimates the number of coded packets independently andmore correctly for each sub-flow path and does not send as much unnecessary redundantpackets to the network.3.6 Testbed Experiments For Performance Of CodedCMT-SCTPIn this section we investigate the performance of our coded CMT-SCTP using the wirelesstestbed in Figure 3.5. The experiment configuration and data and interference traffic setup are all the same as described in Section 3.4. The objective of the experiments is tocompare the performance of our coded CMT-SCTP with the standard CMT-SCTP andCMT-SCTP with buffer splitting [2] scheme.Our main claim is that coded CMT-SCTP enhances the performance by eliminating thereceive buffer blocking in presence of path discrepancies. As such, a testbed with dissimilarindependent paths for each sub-flows within an CMT-SCTP association is a prefect set upfor proof of concept. We measured the space left in the receive buffer of node G which isdata receiver of SCTP association between nodes B and G. Each experiment is repeated 1046Chapter 3. Receive Buffer in CMT-SCTPtimes with the same set of parameters to make sure that the results are accurate. Figure3.8 shows the percentage of packets that experience receive buffer blocking during the datatransmission versus the ping message rate to node E. Note that the ping messages are usedto create interference and hence loss in one of the sub-flows of the B-G association.10 15 20 25 30 35 40 45 50Ping rate (packet/sec) of rBuf blockingcoded SCTP-CMT rBuf = max in Linuxcoded SCTP-CMT rBuf = 16Kcoded SCTP-CMT rBuf = 8Koriginal SCTP-CMT rBuf = max in Linuxoriginal SCTP-CMT rBuf = 16Koriginal SCTP-CMT rBuf = 8KFigure 3.8: Percentage of packets in CMT-SCTP association that experience receive bufferblocking for standard CMT-SCTP vs. coded CMT-SCTP.As depicted in Figure 3.8, as the receive buffer (rBuf) size decreases, the percentageof packets experiencing rBuf blocking increases for the standard CMT-SCTP. Note that,the percentage of the rBuf blocking incidents increase by 6 times for rBuf size of 8 KBas compared to the unlimited rBuf size for the standard CMT-SCTP. On the other hand,coded CMT-SCTP packets experienced no rBuf blocking during the experiment. (See thecoded CMT-SCTP curves laying at zero on the x-axis in Figure 3.8)We also measured the goodput (amount of bytes delivered to the receiving applicationover the experiment time) of the CMT-SCTP associations (see Figure 3.9).Coded CMT-SCTP effectively increases the goodput by a factor of 50% as comparedto the standard CMT-SCTP and CMT-SCTP with buffer split [2] when the receive buffersize is set to max. size in Linux. The goodput of coded CMT-SCTP becomes 40% better47Chapter 3. Receive Buffer in CMT-SCTP10 20 30 40 50Ping rate (packet/sec) (Mbps)Coded CMTStandard CMTBuffer-split CMT(a) rBuf = maximum in Linux10 20 30 40 50Ping rate (packet/sec) (Mbps)Coded CMTStandard CMTBuffer-split CMT(b) rBuf = 16K10 20 30 40 50Ping rate (packet/sec) (Mbps)Coded CMTStandard CMTBuffer-split CMT(c) rBuf = 8KFigure 3.9: Goodput of the standard CMT-SCTP vs. CMT-SCTP with buffer splitting [2]vs. coded CMT-SCTP with different rBuf sizes, over the testbed.than the standard CMT-SCTP and CMT-SCTP with buffer splitting when the buffersize gets reduced to 16 KB. These results show the effectiveness of coded CMT-SCTP.However, when the receive buffer size becomes 8 KB, coded CMT-SCTP seems to lose itseffectiveness as compared to the other two schemes. The reason is that the receive buffereventually becomes too small to handle overhead caused by the redundant packets. Whenusing CMT-SCTP, the out of order packet arrivals, not only triggers the receive bufferblocking, but also causes unnecessary retransmission of packets. The CMT-SCTP withbuffer splitting seems to handle the packet retransmissions better for really small receivebuffer sizes. Comparing Figure 3.9(c) with its simulation counterpart, coded CMT-SCTPseems to perform worse than the standard CMT-SCTP in the testbed experiment whileit performs better than the standard CMT-SCTP in simulation results for rBuf size of48Chapter 3. Receive Buffer in CMT-SCTP8 KB. The reason behind this behavior in the testbed is the CPU cycle of Raspberry Pi.Raspberry Pi has a “limited CPU cycle” [104]; as such in computationally demanding tasks,the performance of the buffer becomes a bottleneck on the overall performance as buffersize gets smaller [104]. According to Raspberry Pi Foundation, the overall performance ofthe Pi’s processor is comparable with a PC using an Intel Pentium 2 processor clocked at300MHz which introduces some limitation for computationally demanding tasks.3.7 DiscussionIn this section, we discuss the delay overhead of coded CMT-SCTP and also how it canbe integrated into more general networking scenarios, particularly to the heterogeneousnetworks. Network coding increases the complexity of the mechanism by combining thepackets instead of simply forwarding the packets. The decoding delay at the receiverdepends greatly on the coding method at the sender. Because we used random linearnetwork coding, the decoding delay is in the order of O( 11−ρ) where ρ =λµis the networkload. λ is the arrival rate for the traffic and µ is the service rate of the channel [105–107].In terms of integrating coded CMT-SCTP into the legacy networks, a transport pro-tocol’s end points negotiate the options of the connection during handshake and can loadthe necessary end-to-end modules to support wireless links. Similarly, coded CMT-SCTPoption can be negotiated during handshake to work in harmony with the other legacy proto-cols in the network. In terms of the use of coded CMT-SCTP in heterogeneous networks,there are two main approaches in designing a transport protocol that supports wirelesslinks in the network. The first approach is a transport connection which is establishedover the end-to-end path that may have wireless links [108]. The second approach is todeploy a gateway to separate the wireless and wired portions of the network [109]. Ourcoded CMT-SCTP can be adapted to work with both of these approaches. For the first49Chapter 3. Receive Buffer in CMT-SCTPapproach, the proof of concept is already provided in this chapter with the experimentsusing the heterogeneous testbed in Figure 3.5. However, more in-depth investigation isneeded to fine-tune the Q-learning algorithm to choose features that fully represent themixed network characteristics. The second approach is more practical, as the wireless partof the network can benefit from the better performance of the network while it is shieldedfrom the wired part of the network. The proxy gateway basically creates a shield betweenthe two parts and provides the necessary changes from one protocol to another when thepacket passes through the gateway.3.8 ConclusionNetwork coding is shown to enhance the capacity of multi-hop wireless networks. In thiswork, we showed that network coding can also be used as a tool to eliminate the inherentreceive buffer blocking issue during the concurrent multi-path transfer of data and henceimprove throughput. We proposed an adaptive network coding scheme for CMT-SCTP,called coded CMT-SCTP, and used an adaptive Q-learning algorithm to control the num-ber of redundant packets to effectively recover the lost packets. Our coded CMT-SCTPis highly effective in alleviating the receive buffer blocking. Our coded CMT-SCTP out-performs the standard CMT-SCTP in terms of throughput up to 62% depending on thereceive buffer size and path dissimilarities. Testbed findings support the simulation andtheoretical results.50Chapter 4A Smart Fairness Mechanism for ConcurrentMultipath Transfer in SCTP over Wireless Multi-hopNetworks4.1 IntroductionIn this chapter, we studied the fairness behavior of CMT-SCTP on a multi-hop wirelesstestbed. We introduced a Q-learning distributed mechanism to enhance fairness in SCTPassociation. The proposed method uses RL mechanism to acquire knowledge about thedynamics of the network. Consequently, the acquired knowledge is used to choose the bestaction to improve the fairness index of the network. We evaluated our proposal against thestandard CMT-SCTP and resource pool CMT-SCTP (CMT/RP-SCTP). Our proposaloutperforms the available fairness mechanisms for CMT-SCTP by a significant margin.Fairness of CMT-SCTP is still an open issue and well under discussion as there exists somany unanswered questions on this topic. Our investigation contributes to the field inthree ways:(a) It is a known fact that the window based transport protocols are not designed for theimperfections of the wireless multi-hop network and tend to act unfairly against flowscoming from nodes farther away from the gateway [110], [48], [55]. SCTP is not anexception as well; however, when using CMT-SCTP, the goal is not to compromisefairness of the base protocol to get a higher bandwidth utilization or throughput.51Chapter 4. Distributed Fairness For CMT-SCTPAs such, fairness consideration has to be integrated within any adds-on algorithmin multipath transport protocols. The above unaddressed concern motivated us toevaluate the fairness of CMT-SCTP against non-CMT flows that are coming fromfarther away hops in a multi-hop wireless environment. Our investigation revealsthat CMT-SCTP is indeed unfair towards flows coming from farther away hops.(b) We used our findings to develop a dynamic distributed fairness mechanism whichalleviates the aggressive behavior of CMT-SCTP towards non-CMT flows comingfrom farther away hops.(c) Our state of the art fairness mechanism, Q-learning CMT-SCTP, enhances the fair-ness behavior of CMT-SCTP tremendously. We compare the performance of ourproposal with the standard CMT-SCTP and CMT/RP-SCTP; the comparison con-firms that Q-learning CMT-SCTP significantly outperforms the available fairnessmechanisms. Moreover, Q-learning CMT-SCTP is capable of supporting differentlevels of quality of service (QoS) based on user demand.The present chapter is divided into the following parts: Section 4.2 presents the resultsof investigation and measurements of CMT-SCTP in different scenarios ( part (a) of thecontribution ). In Section 4.3, the proposed dynamic distributed fairness mechanism isexplained elaborately. The evaluation of the proposed algorithm and a brief discussion oncomparable available mechanisms is presented in Section 4.4. Section 4.6 concludes thischapter.4.2 Study of CMT-SCTP FairnessTo investigate the fairness of CMT-SCTP and compare it with SCTP, we set up a multi-hop wireless testbed and measured the average throughput of flows in different scenarios52Chapter 4. Distributed Fairness For CMT-SCTP(refer to Section 2.3 for testbed setup details). To create a test-bench for comparison,the average throughput of node B is measured when sending data over SCTP. Then, theaverage throughput of node B is measured when sending data over CMT-SCTP. Comparingthe two values gives us a good insight on how aggressive CMT-SCTP is compared to itsbased protocol SCTP.ABCDEFGRaspberry PiFlow 1: ABDGFlow 2: BDG, BEDFGFlow 3: CDG, CEDFGFigure 4.1: Testbed topology.Figure 4.1 shows the testbed topology. Node A has only one wireless interface; whileeach of nodes B, C, and G has two wireless interfaces. Node D is a forwarder node thathas one wireless interface. Each of nodes E and F has one wireless interface. The testbedis setup in a way that the first interface of nodes B, C and G communicates with nodeD. The second interface of nodes B and C communicates with node E, while the secondinterface of node G communicates with node F. The data collections took place in differenthours of the day to make sure that all possible outcomes are covered in our data set.We measured network statistics in 5 different scenarios:• scenario 1: all source nodes use TCP• scenario 2: all source nodes use TCP, node B uses single-homed SCTP• scenario 3: all source nodes use TCP, node B uses CMT-SCTP with two interfaces53Chapter 4. Distributed Fairness For CMT-SCTP• scenario 4: all source nodes use single-homed SCTP• scenario 5: all source nodes use single-homed SCTP, node B uses CMT-SCTPScenario 1 is designed to act as a benchmark to get a better insight on fairness behaviorof CMT-SCTP as compared to SCTP and TCP. Scenario 2 is designed to reveal the dif-ference between TCP and SCTP performance over a wireless multi-hop setting. Scenario3 is designed to reveal any unfairness in CMT-SCTP in comparison to its base protocolSCTP in scenario 2. In each of these scenarios, we only changed one parameter on onenode to make sure that any changes in the result is caused by the transport layer perfor-mance. Scenarios 4 and 5 are designed to monitor the fairness behavior of CMT-SCTPagainst single-homed SCTP. The measurement results from the above experiments provideus with enough evidence on CMT-SCTP fairness. All scenarios are run for 50 times tocreate an acceptable result within the 95% percentile confidence interval.To compare performance of difference scenarios, we measured the throughput of eachflow in kbps and Jain’s fairness index in each scenario. Jain’s fairness index is defined as[111]:J(x1, x2, ..., xn) =(Σni=1xi)2nΣni=1x2i(4.1)where xi is the data rate of the ith flow. The measurement results of the above scenariosare presented in Table 4.1.Table 4.1 represents the average throughput of the competing flows in the above sce-narios. Looking into the first column of Table 4.1, when both nodes A and B use TCPto transfer data, node B which is located in a closer hop-count from node G, i.e. desti-nation, has a higher throughput. We are not going to elaborate on why TCP is not fairto flows with different number of hops to destination as there exist sufficient literature onthe reasons behind this behavior. The measurements in scenario 1 and scenario 2 showthat TCP and single-homed SCTP have similar fairness behavior when it comes to the54Chapter 4. Distributed Fairness For CMT-SCTPsource scenario scenario scenario scenario scenarionode 1 2 3 4 5A 1050.54 822.55 225.94 396.68 120.33B 4304.12 2892.51 3370.45 2962.86 2398.18Table 4.1: Throughput comparison of different scenarios. In scenario 1 of Figure 4.1, nodesA and B send data using TCP. In scenario 2, all sources except node B use TCP; nodeB uses SCTP. Scenario 3 is similar to scenario 2; however, node B uses CMT-SCTP. Inscenario 4, all nodes use SCTP. Scenario 5, all nodes use SCTP, node B uses CMT-SCTP.All the above numbers are measured in kbps.flows coming from farther away nodes. SCTP and TCP share similar congestion controlmechanism, as such similar results are expected in scenario 1 and 2. However, as stated inTable 4.1, SCTP on node B has lower throughput as compared to TCP in scenario 1. Thereason behind the drop in throughput of SCTP is the processing power of Raspberry Pi.SCTP uses more CPU cycle per transfer as compared to TCP [112]; therefore, the SCTPthroughput takes a plunge in scenario 2. Our measurements showed that when there areno other flows in the testbed, TCP source on node A can have an average throughput of1443 kbps, which is 30% more than the throughput of node A in scenario 1.Measurements of scenario 2 show that there is not a huge difference between perfor-mance of single-homed SCTP and TCP in terms of fairness towards flows coming fromnodes farther away from destination. The measurement results in Table 4.1 confirms thatCMT-SCTP (scenarios 3 and 5) starves flow A more as compared to single-homed SCTP(scenarios 2 and 4). The third column in Table 4.1 shows that the average throughputof CMT-SCTP in node B is 16% higher than the average throughput of SCTP in node Bin scenario 2. Having more than one interface provides node B with more opportunitiesand eventually a higher share of network resources, which causes an 80% drop in averagethroughput of node A as compared to scenario 1, and a 70% drop when compared to sce-nario 2. It is fair to conclude that CMT-SCTP has a more aggressive congestion control55Chapter 4. Distributed Fairness For CMT-SCTPmechanism as compared to SCTP when it interacts with non-CMT flows. In scenario 4, allsource nodes use single-homed SCTP to send data. Measurement results depicted in Table4.1 for scenario 4 show that the SCTP congestion control mechanism is more aggressivetowards other SCTP flows that are coming from farther away hops as compared to TCP.Scenario 5 shows that CMT-SCTP is even more aggressive towards other non-CMT flowsas compared to SCTP and TCP.1 2 3 4 5scenarios0. fairness indexFigure 4.2: Fairness comparison of different scenarios. In scenario 1, node A and B inFigure 4.1 send data using TCP. In scenario 2, all sources except node B use TCP; nodeB uses SCTP. Scenario 3 is similar to scenario 2; however, node B uses CMT-SCTP. Inscenario 4, all nodes use SCTP. Scenario 5 is similar to 4 except for node B that usesCMT-SCTP.Figure 4.2 depicts Jain’s fairness index for different scenarios; the Jain’s fairness indexhas not changed drastically from scenario 1 to scenario 2. Scenario 3 reveals the impact ofCMT-SCTP in comparison with SCTP.To take a closer look into the factors that contribute to the unfair behavior of CMT-SCTP, we monitor some of the parameters of the transport layer and realize that the sizeof congestion control window plays a crucial role in the outcome, as shown in Figure 4.3.The congestion window size of the SCTP association on node B in scenarios 2, 3, 4, and 5is monitored in all experiments via sampling. The average congestion window size is then56Chapter 4. Distributed Fairness For CMT-SCTPcalculated for the experiments in scenarios 2, 3, 4, and 5 and graphed for each experiment(we only include 40 experiments on the x axis). The solid lines in Figure 4.3 show theaverage congestion window size of node B when using CMT-SCTP in scenarios 3 and 5,while the dashed lines represent the average congestion window size of node B when usingSCTP in scenarios 2 and 4. The difference between the solid and dashed lines shows theaggressive behavior of CMT-SCTP as compared to SCTP in presence of non-CMT flows.0 5 10 15 20 25 30 35 40020000400006000080000100000120000140000 SCTP on node B, TCP on node ACMT-SCTP on node B, TCP on node ASCTP on node B, SCTP on node ACMT-SCTP on node B, CM-SCTP on node AFigure 4.3: Comparison of the average congestion window size of node B in scenario 2 and3 of Figure 4.1. The solid line shows the average window size of node B when all sourcesare using TCP and node B used CMT-SCTP; while the dashed line shows the averagewindow size of node B when all source nodes are using TCP and node B uses SCTP.It is expected that utilizing multiple paths in CMT-SCTP provides the user with higherthroughput; however, the aggregate throughput on all interfaces has to be comparable witha single-homed SCTP to maintain fairness among the flows. Based on our findings in thissection, CMT-SCTP aggressive congestion control window behavior in maximizing theaggregate bandwidth causes other non-CMT flows coming from farther away hops to sufferand struggle for bandwidth. The results of our investigation motivated us to propose adynamic mechanism to control the maximum congestion window size as an effective tool57Chapter 4. Distributed Fairness For CMT-SCTPto alleviate the aggressive behavior of CMT-SCTP against other transport layer protocols.4.3 Q-learning CMT-SCTPThe investigation in Section 4.2 shows that CMT-SCTP is unfair to non-CMT flows ina multi-hop wireless environment, especially to those farther away from the destination.Considering that CMT-SCTP on each node deals with an unpredictable environment overthe wireless multi-hop network, a dynamic fairness solution is required to address the unfairbehavior of CMT-SCTP. Reinforcement learning methods are the best candidate for aninteractive solution capable of adjusting with the changes of the environment as the systemruns. Among various reinforcement learning methods, Q-learning fits our needs the best,as it is a model-free technique that can be used to find an optimal action-selection policyfor any given finite state MDP.In this section, we go over our dynamic distributed fairness mechanism, Q-learningCMT-SCTP, which uses machine learning and network statistics to determine fairnessof CMT-SCTP. Our distributed fairness mechanism controls the aggressive behavior ofCMT-SCTP by using a dynamic damp on the SCTP maximum congestion window.4.3.1 Features and statesAt the beginning of each decision epoch, the learning agent in the CMT-SCTP sourcenode receives transport layer statistics including round trip time (RTTi), congestion controlwindow size (cwndi), retransmission timeout (RTOi), and flight size (FLi) on all interfaces.RTTi, RTOi, cwndi, and FLi are the statistics of the ith interface that are fed to thelearning agent. Subsequently, the agent uses a Bayes classifier to determine the state ofthe network as the source node sees it. The MDP inside each CMT-SCTP source node hasthe following four states, as shown in Figure 4.4:58Chapter 4. Distributed Fairness For CMT-SCTP• state 1: the network condition is classified as (fair,unfair) in decision epoch (t− 1,t).• state 2: the network condition is classified as (unfair,fair) in decision epoch (t− 1,t).• state 3: the network condition is classified as (unfair,unfair) in decision epoch (t−1,t).• state 4: the network condition is classified as (fair,fair) in decision epoch (t− 1,t).2 413dec_win()inc_win()stay()dec_win()inc_win()stay()dec_win()inc_win()stay()dec_win()inc_win()stay()dec_win()inc_win()stay()Figure 4.4: The MDP of CMT-SCTP Q-learning agent. The learning agent can chooseone of the actions in each state based on the Q-learning rule.At the beginning of the learning process, the learning agent does not know the transitionprobabilities of the MDP and the associated reward with each transition. To find thetransition probabilities of the MDP, we use the Q-learning algorithm [88].4.3.2 ActionsAfter the agent determines the state of the network in each decision epoch, the CMT-SCTPlearning agent chooses an action from the action set A = {decWin(), incWin(), stay()}and informs CMT-SCTP to adjust the maximum congestion window size. The functionsin the action set A are the tools for communicating the learning agent policies to the59Chapter 4. Distributed Fairness For CMT-SCTPtransport layer. decWin() decreases the maximum congestion window size of the CMT-SCTP association to half of its current size. incWin() increases the maximum congestionwindow size of the CMT-SCTP association to 1.5 times of its current size. stay() maintainsthe maximum congestion window size at its current value.4.3.3 Reward functionThe learning agent uses a reward function to receive feedback on the consequences of takingan action. In the next decision epoch, the environment responds with the new state. Basedon the new state of the MDP, the learning agent inside CMT-SCTP receives a reward. Thereward function is in form of Eq. (4.2):rt = etpt−Tσ (4.2)where tpt is the throughput of the CMT-SCTP association during decision epoch t. T isthe estimated threshold for fair bandwidth share of CMT-SCTP association.The reward function plays an important role in the convergence rate of the Q-learning,and studies suggest that Gaussian reward functions can effectively increase the convergencerate of the learning while directing the agent toward the goal state [91]. As such, wechoose the reward function in the form of equation (2). Moreover, the chosen action setin subsection 4.3 provides the agent with the enough tools to control the transport layerdynamically based on the characteristics of the network. Choosing the right action set isvery critical in the design of the underlying MDP. Providing the Q-learning agent witha large set of actions in each state results in a prolong convergence time. As such, wechose to have three actions in each state to shorten the optimization period (increase theconvergence rate) while finding the right maximum congestion window size for CMT-SCTPto act fairly toward other flows.60Chapter 4. Distributed Fairness For CMT-SCTPThe learning process continues as long as the network is up and running; it basicallyworks as a memory that can be adjusted according to network changes. Algorithm 4.1 isthe pseudo code of the distributed learning mechanism.Algorithm 4.1 CMT-SCTP learning algorithm1: action set ={decWin(), incWin(), stay()}2: inputs = {RTTi, cwndi, RTOi, FLi}3: output = {ai ∈ action set}4: while have packets to send at each decision epoch do5: get RTTi, cwndi, RTOi, FLi for all interfaces6: cwnd = Σicwndi7: RTT =Σni RTTin8: FL = ΣiFLi9: RTO =Σni RTOin10: determine the state of the MDP, st, using the Bayes classifier11: calculate the reward rt12: update the Q matrix using Eq. (5.2)13: choose the action with maximum Q value14: inform the transport layer to change the maximum congestion window size15: end whileAs time passes, the learning agent develops a memory of all the events and creates amap of actions; therefore, any changes in the environment can be handled instantaneouslyby the agent.4.3.4 ArchitectureEach CMT-SCTP source is equipped with a Q-learning agent. The Q-learning agent sitsin the transport layer. At each decision epoch, the transport layer provides the Q-learningagent with its statistics. The agent uses the Bayes classifier to determine if the sourcenode is fair within the decision epoch. Based on the state of the system, Q-learning agentchooses an action and informs the transport layer about the action. In the next decisionepoch, the Q-learning agent receives the new statistics which reflects the effectiveness ofthe taken action. The Q-learning agent uses the outcome to bestow a reward or penalty61Chapter 4. Distributed Fairness For CMT-SCTPto the action and stores the state-action reward in its memory for future use. The cyclecontinues until the Q-learning agent calculates all state-action transition probabilities.To provide the Bayes classifier with training data, the agent inside each source nodestarts to collect data as soon as the node starts transmitting. Because all flows within thewireless domain pass through the gateway to access the rest of the network, the gatewayhas almost an inclusive knowledge of the flow rates within the wireless domain; therefore,it can determine if a node is being fair to other nodes or not based on its share of networkbandwidth. At the beginning of the learning process, the gateway monitors the averagethroughput of incoming flows and labels the flows as fair or unfair using Jain’s fairnessindex in Eq. (4.1) and sets a bit in the shut down message of the transport layer. The shutdown message in a CMT-SCTP association is used to notify the sender that the destinationis closing the association and does not accept any new packets. The source node, on theother hand, monitors the average congestion window size, RTT, RTO, and the flight sizeon all interfaces and labels the collected data using the set bit in the shut down message.After collecting n data points, the Bayes classifier uses the data set for training and thereis no need for the gateway shutdown message feedback. The Q-learning agent starts itslearning after the Bayes classifier has enough data point to classify new observations. Theabove design does not add any overhead to the network while providing each node withan accurate correlation between its local parameters and the global fairness index. We useBayes classifier because it is highly scalable and the empirical results shows that it performssurprisingly well in many domains even the ones containing clear attribute dependences[113].62Chapter 4. Distributed Fairness For CMT-SCTP4.4 EvaluationTo evaluate the performance of Q-learning CMT-SCTP, we implemented the Q-learningmechanism as a Python suite, based on the algorithm 4.1. The Python suite communicateswith transport layer congestion mechanism via action functions (Section 4.2) to tune themaximum congestion window damp based on the network dynamics in real-time. Thecurrent and new maximum congestion window sizes are accessible and tunable via socketfunctions. In our approach, each CMT-SCTP source is equipped with a learning agent(Python suite). The learning agent monitors the transport layer dynamics in real-timeand determines the state of the MDP in each decision epoch. Based on the current stateof the system, the learning agent decides to take an action via changing CMT-SCTPmaximum window size. To find the optimum maximum window size for CMT-SCTP, thelearning agent uses a reward function which is implemented as a Python suit. As suchthe whole agent sits on top of the Kernel and communicates with the Kernel regardingthe necessary actions.s We chose to implement the Q-learning agent on top of the Kernelto facilitated integration of Q-learning SCTP into standard SCTP on any machine. Thecurrent design does not require any changes to the Kernel which normally exists withinthe manufacturer setting; however, it can be added on top of the manufacturer setting forany machine.We used the same scenarios as in Section 4.2 to evaluate the performance of our learningmechanism. We collected 1000 data points as our training set to train the Bayes classifierand a test set of 1000 points to check the sanity of our classifier. The prediction accuracyof the Bayes classifier is 79%.To evaluate the performance of our distributed fairness mechanism, we collected dataover the scenarios presented in Table 4.2. Moreover, we compared the performance againstthe standard CMT-SCTP, and CMT/RP-SCTP in scenario 9. To investigate the effec-63Chapter 4. Distributed Fairness For CMT-SCTPscenario description1 TCP on node A, standard SCTP on node B2 TCP on node A, standard CMT-SCTP on node B3 SCTP on node A, standard SCTP on node B4 SCTP on node A, standard CMT-SCTP on node B5 TCP on node A, Q-learning SCTP on node B6 TCP on node A, Q-learning CMT-SCTP on node B7 SCTP on node A, Q-learning SCTP on node B8 SCTP on node A, Q-learning CMT-SCTP on node B9 SCTP on node A, CMT/RP-SCTP on node BTable 4.2: Evaluation scenariostiveness of our proposal on the standard SCTP, we implemented the Q-learning fairnessmechanism in single-homed SCTP as well and measured the performance of Q-learningSCTP against standard SCTP and TCP in scenarios 5 and 7. The measurement resultsfrom scenarios 5 and 7 demonstrate that our proposed fairness mechanism can be adoptedto other transport layer protocols.1 2 3 4 5 6 7 8 9scenarios0.'s fairness indexFigure 4.5: The Jain’s fairness index of Q-learning SCTP and CMT-SCTP in comparisonwith standard SCTP and CMT-SCTP in different scenarios of Table 4.2Figure 4.5 shows that the Q-learning mechanism boosts the fairness index drastically,i.e., the Jain’s fairness index of the network rises very close to 1 in scenarios in which Q-learning CMT-SCTP or Q-learning SCTP is used on node B (the gray bars, scenarios 5, 6,7, and 8). The dynamic damp on congestion window size does not interfere with congestion64Chapter 4. Distributed Fairness For CMT-SCTPcontrol mechanism of SCTP; instead, it creates a virtual limit for the available bandwidthof the source node. Therefore, the dynamic damp controls the number of packets pouredinto the channel by slowing down the transport layer and provides other nodes with moreopportunities for transmission. Following a decrease in the number of packets poured tothe channel by an aggressive node, nodes located in farther away hops from the gatewaystart to utilize the channel more efficiently.1 2 3 4 5 6 7 8 9scenarios05001000150020002500300035004000throughput (kbps)Figure 4.6: Average throughput of different scenarios (kbps) in different scenarios of Table4.2. Scenarios 1,2,3 and 4 are the standard SCTP and CMT-SCTP, while scenarios 5, 6,7, and 8 are the Q-learning SCTP and CMT-SCTP. Scenario 9 shows the measurementresults of CMT/RP-SCTP.Figure 4.6 depicts the throughput of the flow coming from node A (bars with the whitebackground) as compared to the flow coming from node B (bars with the gray background)in different scenarios. The powerful effect of Q-learning CMT-SCTP in suppressing theaggressive behavior of the standard CMT-SCTP is obvious when comparing scenario 2 with6 in Figure 4.6. Although the average throughout of CMT-SCTP on node B decreases inscenario 6, the throughput of node A increases drastically (5 times as compared to scenario2). The same trend can be seen for scenario 4 vs. 8 where node B compromises throughput65Chapter 4. Distributed Fairness For CMT-SCTPto contribute to network fairness. Both scenarios 6 and 8 reveal the effectiveness of ouralgorithm in a real life wireless multi-hop setting and in presence of interferences and othersources of noise.We used our algorithm with single-homed SCTP as well to investigate the effect of ourfairness mechanism on SCTP. The measurements in Figure 4.6 show that the distributedfairness mechanism is in fact effective even in single-homed SCTP ( scenario 1 vs. scenario5 and scenario 3 vs. scenario 7).4.5 Discussion and ComparisonTo propose an effective fairness solution for CMT-SCTP over wireless multi-hop networks,one has to consider the dynamic nature of the environment as an important design fac-tor. That is, a dynamic solution that changes strategy based on the network conditionis required. Therefore, the effective solution requires two characteristics: (a) monitor-ing/learning network conditions, and (b) choosing the correct strategy based on the per-ceived condition. Reinforcement learning methods meet the design characteristics in (a)and (b). Among various reinforcement learning methods, Q-learning fits our needs thebest, as it is a model-free technique. Q-learning can be used to find an optimal strategy-selection policy for any given finite state Markov decision process. The above reasoningjustifies our choice of the fairness solution for CMT-SCTP over wireless multi-hop settings.The main purpose of Q-learning in our proposed mechanism is to decide which actionto take in the next time slot, i.e., increase, decrease or maintain the maximum congestionwindow size. Note that the best application of Q-learning is to learn action-reward func-tions for stationary settings, which can be proved to converge. It is true that Q-learningcan still get results in the non-stationary environment, such as wireless settings, but theQ-learning agent will take more time to be aware of the changes. Due to the time-varying66Chapter 4. Distributed Fairness For CMT-SCTPnetwork conditions, sometimes rapidly, the stationary assumption cannot always hold andit can make Q-learning less suitable for wireless networks. However, there are ways to makesure that the convergence rate stays within an acceptable range for dynamic environments.Using a learning rate of α = 1(1+t)0.77brings down the convergence rate to a polynomial in1(1−γ) , where γ is the discount factor [90]. We use a learning rate of α =1(1+t)0.77to ensurethat our proposal complies with the dynamic nature of the environment.Another consideration while using Q-learning for any scenario is the computationaloverhead. Assuming that in a specific scenario action ai alleviates the aggressive behaviorof a specific flow while keeping the throughput at its maximum possible, and τ iterationis needed before the algorithm converges, then, the overhead of the action to the learningnode is:Overhead =12Στt=1Σ3i=1,i 6=jPi(t)O(ai) (4.3)where Pi(t) is the probability of choosing action i at iteration t and depends on the valuesin the Q matrix, and the reward function. O(ai) is the overhead of performing action aiand τ is the convergence time. Based on [90], in a Q-learning scenario with a polynomiallearning rate, the convergence time depends on covering time L. Covering time indicatesthe number of iterations needed to visit all action-state pairs with the probability of atleast 0.5 starting from any pair. The convergence time τ depends on covering time withthe order of Ω(L2+1ω + L11−ω ) with the smallest amount at ω = 0.77. In our experiments,the covering time is tractable as we have only 4 states and in each state we have 3 actions.Therefore, the action-state pair space is as large as 81 which leads to a small covering timeand a small convergence time τ . Small convergence time, τ , decreases the accumulation ofterms in equation (4.3) and thus leads to a small overhead.One of the great advantages of our proposal is the ability to support different levels ofservice. Our proposal can offer flexibility to service providers for setting a minimum or a67Chapter 4. Distributed Fairness For CMT-SCTPstandard  CMT-SCTPQ-learning  CMT-SCTP with no biasedQ-learning  CMT-SCTP  with biased050010001500200025003000throughputs (kbps)(a)Flow BFlow Astandard  CMT-SCTPQ-learning  CMT-SCTP with no biasedQ-learning  CMT-SCTP  with biased0. fairness index(b)Figure 4.7: Average throughput of flows with different level of QoS using Q-learning fairnessmechanism.maximum for the dynamic damp, and to create different levels of QoS for costumers basedon their demand. To show the flexibility of our Q-learning fairness mechanism in offeringdifferent levels of QoS to costumers who are willing to pay more to gain a bigger shareof bandwidth, we set the maximum congestion window size of node B in favor of nodeA and evaluate the performance of our proposed mechanism by measuring the averagethroughput. The result confirms our claim that the Q-learning mechanism is capable ofproviding different levels of QoS and can be biased towards different flows. In Figure 4.7,the middle bar shows the result of applying the Q-learning mechanism without any biastowards any node, the first bar from the left shows the measurement results of using nofairness mechanism and the third bar from the left reports the effect of a biased Q-learningmechanism. Tuning the coefficient in the reward function and also assigning limits to thedynamic damp in the congestion control mechanism can give the provider the flexibilityof offering different amounts of bandwidth share. Figure 4.7 shows that even though theQ-learning agent is suppressing source B in favor of node A, Jain’s fairness index remainsin the desirable range.68Chapter 4. Distributed Fairness For CMT-SCTPflavors of link-centric network-centricCMT-SCTP fairness fairnessCMT-QA [75] not fair in loaded wireless scenarios not exploredQ-learning CMT yes yesCMT/RP-SCTP [18] yes is not exploredTable 4.3: Brief comparison on different flavors of CMT-SCTPA brief comparison of available CMT-SCTP flavors has been provided in Table 4.3.Network-centric fairness is measured using the Jain’s fairness index in Eq. (5.1) and link-centric fairness is a local measure of fairness over a bottleneck link. Table 4.3 summarizesthe comparison of Q-learning CMT-SCTP with other available mechanisms. AlthoughCMT/RP-SCTP increases the fairness of CMT-SCTP over a shared bottleneck, the resultsof the study of [18] were collected in a highly controlled simulated environment and over avery simple topology, offering no insight on the behavior of CMT-SCTP in more realisticsettings or wireless networks. Besides, the algorithm offers the service provider with nosay on bandwidth allocation, which is crucial when dealing with scenarios in which thecostumer pays more to access to a higher share of bandwidth. As stated in Table 4.3,Q-learning CMT-SCTP offers more in terms of both fairness and throughput. Q-learningCMT-SCTP only needs to be implemented at the sender side and does not require anychanges in the infrastructure. Further, it gives the provider the ability to offer differentlevels of QoS to costumers based on their demand, making it a lucrative candidate for thetransport layer of multihomed devices.4.6 ConclusionWe investigated the fairness behavior of CMT-SCTP against non-CMT flows (both TCPand SCTP) coming from nodes farther away from the gateway over a testbed in an office69Chapter 4. Distributed Fairness For CMT-SCTParea network. Our measurements showed that CMT-SCTP is highly aggressive towardssingle-homed flows coming from farther away hops. We proposed a distributed fairnessmechanism that uses Q-learning to tune the maximum congestion window size of theCMT-SCTP association based on the statistics of transport layer to alleviate the aggressivebehavior. We implemented our mechanism in FreeBSD over raspberry pi and tested theeffectiveness of our mechanism in a testbed of raspberry pi’s. Our measurements provedthat the proposed Q-learning fairness mechanism improves the Jain fairness index up to30% which is a significant increase of average throughput for a starving flow. Moreover, weused our Q-learning mechanism on single-homed SCTP to investigate whether the fairnessmechanism has any effect on fairness of single-home SCTP against other flows coming fromfarther away hops. Our measurements showed that the Q-learning mechanism effectivelyincreases the fairness of single-homed SCTP towards farther away nodes. The proposedmechanism needs to be only implemented at the sender side and does not require anychanges in the infrastructure. Moreover, it gives the provider the ability to offer differentlevels of QoS to costumers based on their demand which makes it a lucrative candidate.70Chapter 5How Network Monitoring and ReinforcementLearning Can Improve TCP Fairness in WirelessMulti-Hop NetworksIn this chapter, we aim to adopt the fairness mechanism designed in Chapter 4 to TCP.Our proposal uses a distributed mechanism to monitor the network anomalies in resourceallocation and tune TCP parameters accordingly. Each TCP source models the state ofthe system as an MDP and uses Q-learning to learn the transition probabilities of theproposed MDP based on the observed variables. To maximize TCP fairness, each nodehosting a TCP source takes actions according to the recommendations of the Q-learningalgorithm and adjusts TCP parameters autonomously. Our algorithm preserves autonomyof each node in decision making process and does not require a central control mechanismor control message exchange among nodes. Unlike the existing machine learning solutions,i.e. TCP ex Machina, our proposal is compatible with the computational capacity of thecurrent infrastructure. We call our approach Q-learning TCP. The contributions of thischapter can be summarized as:• Modeling the multi-hop network as an MDP in each TCP source and using Q-learningalgorithm to monitor and learn the dynamics of the network and the proposed MDP.• Finding a cross-layer distributed and scalable solution for TCP fairness over multi-hop networks with no extra overhead. Our proposal enhances TCP fairness overmulti-hop networks in favor of flows traversing a longer number of hops with negli-71Chapter 5. TCP Fairness over Wireless Multi-hop Networkgible impact on flows with a shorter number of hops via changing TCP parameterscooperatively based on the recommendation of the Q-learning algorithm• Enhancing TCP fairness by a factor of 10% to 20% without any feedback messagingand no incurred overhead to the mediumThe rest of this chapter is organized as follows: Section 5.1 is a detailed descriptionof our algorithm followed by the implementation specifics in Section 5.2. Performanceevaluation of our proposed algorithm is presented in Section 5.3 via extensive simulationand testbed experimentation. A discussion on implications of Q-learning TCP along witha comparison with available fairness techniques is presented in Section 5.4. Section 5.5concludes this chapter.5.1 Q-learning TCP ArchitectureIn our approach, each TCP source is equipped with a Q-learning agent that sees the worldas an MDP. In each decision epoch, the agent receives network statistics in the form ofstate space variables. The agent uses the received information to determine the state of theMDP; then, the agent takes an action via fine tuning TCP parameters. In the next decisionepoch, the environment responds with the new state. The learning agent uses a rewardfunction to receive feedback on the consequences of the taken action on TCP fairness. Thelearning process continues as long as the network is up and running; it basically works asa memory that can be adjusted according to network changes.In the following sub-sections, we present a detailed overview of the key factors of Q-learning TCP including states, action set, reward function, and transition probabilities.72Chapter 5. TCP Fairness over Wireless Multi-hop Network5.1.1 StatesThe state space of our proposed Q-learning algorithm in each TCP source is in the form ofS = (fairness index, aggressiveness index). To measure fairness index in each decisioninterval, the agent uses Jain’s fairness index as in equation (5.1) [111]:J tk(x1, x2, ..., xn) =(Σi=ni=1xi)2n× Σi=ni=1x2i, (5.1)where xi is the data rate of flow i, n is the number of flows that are originated from orforwarded by node k, and J tk is the Jain’s fairness index at node k at decision epoch t. TheJain’s fairness index is a continuous number that varies between 0 and 1; with 0 the worstfairness index and 1 an absolute best fairness condition. To tailor the Jain’s fairness for adiscrete state space, we divided the [0, 1] interval to p sub-intervals [0, f1], (f1, f2], ..., (fp−1, 1].Instead of using a continuous fairness index, we quantize it to have manageable number ofstates. Number of states is important in convergence of the learning algorithm.The aggressiveness of each TCP source in each decision epoch is measured as in equation(5.2):G(i) =number of packets originated from node itotal number of packets forwarded by node i. (5.2)The aggressiveness index is a continuous amount that varies between 0 and 1. To tailor theaggressiveness index for discrete state space, we divided the [0, 1] interval to q sub-intervals[0, g1], (g1, g2], ..., (gq−1, 1]. As such, the state space of the MDP is in the form of equation(5.3) with the size of p× q.S = {(ft, gt)|ft ∈ {0, f1, ..., fp} and gt ∈ {0, g1, ..., gq}} (5.3)Choosing a suitable value for p and q is a critical task. A small p or q shrinks the state73Chapter 5. TCP Fairness over Wireless Multi-hop Networkspace and positively affects the convergence rate; however, larger quantization intervalsdisturb the transparency of the changes in the system to the reward function. Rewardfunction uses the state of the system as a decision criterion to reward or penalize thelatest (state, action) pair. Our extensive simulations and testbed experimentation showthat choosing 3 ≤ q ≤ 4 and 3 ≤ p ≤ 4 provides the Q-learning TCP with enough numberof states to significantly increase the fairness index of the network and convergence rate.Fairness index obviously is a good indicator of how well the Q-learning TCP is doing interms of enhancing the fairness. However, fairness index is not enough to make a decisionon fairness of the TCP source. Therefore, we define aggressiveness index to indicate if aTCP source is fair to other TCP sources or not. The aggressiveness index calculates theshare of the TCP source located on a specific node in the outgoing throughput of the node.A high aggressiveness index along with a low fairness index in a TCP source triggers thelearning agent to make changes to the TCP parameters and to force the TCP source tobe more hospitable towards other flows. The desirable state for the learning agent is thestate with a high fairness index and an aggressiveness index of Tfair. Tfair is the fairnessthreshold of the node. The fairness threshold depends on the number of flows originatingand passing through the node and the priority of each flow. As an example, if 3 flows arepassing via a node, and 2 other flows are originating from the node, assuming the samepriority for all 5 flows, the fair share of the node from network resources and the fairnessthreshold is 25. Any aggressiveness index above fairness threshold is an indication of theunfair resource allocation by the TCP source on the node.Both fairness index and aggressiveness index are calculated based on the number ofpackets received or transmitted in each decision epoch in the TCP source. As such, bothvariables are accessible in each node and there is no need to get a feedback from othernodes. Let’s emphasize that the objective of our MDP is to enhance TCP fairness co-74Chapter 5. TCP Fairness over Wireless Multi-hop Networkoperatively and accomplish this objective via moving towards the goal state; therefore,choosing fairness index and aggressiveness index as state variables of our MDP is justified.5.1.2 Action setThe Q-learning agent uses the action set to constrain TCP aggressive behavior. Findings of[114] suggested that the maximum TCP congestion window size plays a crucial role in theaggressive behavior of TCP. However, putting a static clamp on TCP congestion windowsize might cause an under-utilization of the network resources. Therefore, to avoid anyunder/over-utilization of network resources, we use a dynamic clamp on TCP congestionwindow size. The learning agent dynamically changes the maximum congestion windowsize of TCP without interfering with the congestion control mechanism via the actionset. As such, TCP uses the standard congestion control mechanism along with a dynamicclamp on the congestion window to limit any aggressive window increase. The agent usesthe action functions in Algorithm 5.1 to change TCP parameters in each decision epoch.Algorithm 5.1 Q-learning action set1: if action = decrease then2: δ = 50%3: decreases the TCP maximum window size by δ4: else if action = increase then5: increases the TCP maximum window size by δ6: else7: no change to maximum congestion window size8: end ifThe Q-learning TCP does not interfere with the congestion control mechanism of TCP,only changes the maximum TCP window size; the maximum window size can be de-creased up to slow start threshold. In our algorithm, we used δ as 50% of the latestincrease/decrease of the current TCP maximum window size. The above action set, whichresembles a binary search behavior, serves perfectly in a dynamic environment. At the be-75Chapter 5. TCP Fairness over Wireless Multi-hop Networkginning of the learning process, the maximum congestion window size is set either to 65536bytes [115] or the amount allowed by the system. The learning agent starts searching forthe optimized maximum TCP window size by halving the current maximum TCP windowsize. As such, the search space decreases into half. The agent chooses either half randomlydue to the random nature of the search algorithm in the beginning of the learning process.The agent starts swiping the search space using the decrease( ), increase( ), and stay( )functions and uses the reward function as the guide to the optimum state. During thelearning process, the agent develops a memory and uses its memory after convergence asa series of policies to handle any changes in the dynamics of the system. As a result,in any state, the learning agent knows how to find its way to the optimum clamp. TheQ-learning agent converges when all the available action series and their associated rewardare discovered.5.1.3 Transition probabilitiesIn our proposal, the state space is in the form of S = (fairness index, aggressiveness index).Both fairness and aggressiveness indices depend on the number of packets transmitted orreceived in the most recent decision epoch. Therefore, any state transition only dependson the latest state of the system. As such, all states are Markovian. We use Q-learning toobtain the transition probabilities of the proposed MDP.5.1.4 Reward functionAccording to [93], an efficient reward function should have the following conditions:• the reward function has a uniform distribution for states far from the goal state.• the reward function points in the direction of the greatest rate of increase of rewardin a zone around the goal state.76Chapter 5. TCP Fairness over Wireless Multi-hop NetworkThe reward function that we use for our model is in the form of equation (5.4), which is asummation of fairness reward function and network utilization reward function.R(s, a) = βue− d(us,us∗ )22σ2u + βfe− d(fs,fs∗ )22σ2f (5.4)us and us∗ are the the network utilization factor in states s and s∗. Network utilizationfactor is the accumulative throughput of all the incoming and outgoing flows in a node. fsand fs∗ are the Jain’s fairness index in states s and s∗. s∗ is the goal state.There is always a trade off between fairness and the throughput/efficiency of the net-work. In a highly effective network, the utility function is focused on maximizing theaggregated throughput of the network which might not be optimized based on fairness. Inour scheme, we optimize TCP performance based on both fairness and throughput. To cre-ate a balance between the two, we use the aggressiveness index (throughput control factor)and fairness index (fairness control factor). The aggressive nodes have to compromise thethroughput in favor of starved nodes to increase the fairness index of the network. In ourmechanism, the reward function includes both fairness and throughput. One has to keepin mind that optimizing TCP performance based on both fairness and throughput resultsin some compromise on throughput to increase the fairness. Moreover, Quality of service(QoS) can be added to our scheme via the reward function. Putting different weights oneither throughput or fairness via changing the coefficients in reward function (βu,βf ,σu,σf )can result in different levels of QoS.5.2 Integration of Q-learning Agent and TCPAs depicted in algorithm 5.2, the MAC layer collects the number of sent packets of eachflow passing through the node and sends them to the learning agent which is located in77Chapter 5. TCP Fairness over Wireless Multi-hop NetworkTCP source every t seconds (t is the length of decision epoch). The learning agent usesthe latest information to calculate the aggressiveness and fairness index and determinesthe current state of the system. The reward function module uses the current state ofthe system, previous state of the system, and the latest action to calculate the immediatereward of the agent based on the latest (state, action) pair. When the agent obtains theimmediate reward, it updates the Q-matrix based on the Ballard equation (See 2.2.3).Finally, the agent chooses an action based on the recent Q-matrix and informs TCP toadjust its maximum window size based on the chosen action. We are using a delayedreward system because the agent has to wait for the system to settle down in a specificstate to figure out the instant reward.Algorithm 5.2 Q-learning TCP algorithm of node i1: action set ={decrease,increase,stay}2: absorbing state = (fgoal, ggoal)3: while not in goal state do4: for every t seconds do5: get the number of sent packets by node i6: for each flow being forwarded by node i do7: get the number of sent packets8: end for9: calculate the fairness index ft, based on (5.1)10: calculate the aggressiveness index gt, based on (5.2)11: determine the state according to (5.3)12: calculate the reward rt based on (5.4)13: update the Q matrix according to (2.1)14: choose the action with maximum Q value15: take the action (inform TCP)16: end for17: end while5.3 Performance EvaluationsQ-learning TCP is specifically aimed for the wireless mesh network environment; as such, allthe simulations and testbed are designed to present the interaction of TCP and the unique78Chapter 5. TCP Fairness over Wireless Multi-hop Networkcharacteristics of the wireless mesh setting. In this section, we present the numerical resultsof our proposed method and demonstrate the effectiveness of our fairness mechanism ascompared to TCP, TCP-AP, and TCP ex Machina.We first evaluate the performance of the Q-learning TCP in a multi-hop setting in whichall the nodes are located in the same wireless domain and participate in the optimizationprocess in section 5.3.1. Section 5.3.3 presents the performance of Q-learning TCP over atestbed with real data in an office environment.5.3.1 Chain topologyThe chain topology is a good scenario to evaluate the effectiveness of Q-learning TCP overWireless mesh networks (WMN), because it can create a competitive environment for flowswith different number of hops which is the main feature of each wireless mesh topology.We use Jain’s fairness index and the flow throughput as the comparison metric parameters.We set up a multi-hop network of 3 nodes located in a chain topology with 150 metersspacing between neighboring nodes, as shown in Figure 5.1. Each node is equipped with a  50 100 150 200 250 300 350 400 45001020Node 1 Node 2 Node 3y-axis x-axis Figure 5.1: Chain topology - 3 nodes, 2 flows802.11b network interface. Nodes 1 and 2 are equipped with an FTP source that transmitspackets to node 3. The transmission range is around 250 meters, and the carrier sensingrange is around 550 meters for each wireless interface. The data rate for IEEE 802.11bis 2 Mbps and the RTS/CTS mechanism is on. Each TCP packet carries 1024 bytes ofapplication data.For the Q-learning scheme, we had to determine the state space of the system and the79Chapter 5. TCP Fairness over Wireless Multi-hop Networkreward function. As mentioned in section 5.1, the fairness index and aggressiveness indexhave values between 0 and 1. For fairness index, we divided the [0, 1] interval into four sub-intervals, f = {[0, 0.5), [0.5, 0.8), [0.8, 0.95), [0.95, 1]}. For the aggressiveness index, we split[0, 1] interval into three sub-intervals, g = {[0, 0.5), [0.5, 0.8), [0.8, 1]}. We can split both thefairness index and aggressiveness index into smaller sub-intervals and provide more controlfor the learning agent to tune TCP parameters for more desirable results. Although havingsmaller intervals facilitates the learning agent decision making, an increase in number ofintervals increases the state space size and slows down the learning process convergencerate. After the quantization, the state space reduces to (5.5):s = {(f, g)|f = {0, 0.5, 0.8, 0.95} , g = {0, 0.5, 0.8}} (5.5)The above state space provides each node in the network with a realistic understandingof the resource allocation of the neighboring nodes. The immediate reward function thatwe used for our Q-learning TCP is in the form of equation (5.4). We ran each simulationfor 30 times for 95% confidence interval. The length of each simulation is 1000 seconds.0 5 10 15 20 25100150200250300350400cyclethroughput (kbps)  Longer flowShorter flowFigure 5.2: Throughput changes of the flows in the chain topology of Figure 5.1. Eachcycle is equivalent of 40 seconds80Chapter 5. TCP Fairness over Wireless Multi-hop NetworkFigure 5.2 shows the throughput changes during the learning process. At the beginningof the learning process, the Q values are all zero; therefore, the agent starts a systematicsearch to determine the effect of each action on the state of the system. Because we choosea Gaussian reward function, the Q-learning agent gradually moves to states adjacent tothe goal state. The systematic search behavior exists during the simulation, but the rangeof the search circle diminishes as the learning process converges to the goal state. Asdepicted in Figure 5.2, at the beginning of the learning process, the throughput of bothflows fluctuates. As time goes by, the fluctuation of both flows dwindles to negligibleamount. As the learning process progresses, the agent visits each state sufficient numberof times to find the best policy (the best maximum TCP window size) to maximize itsacquired rewards. Eventually the learning agent in node 2 converges to s = (0.95, 0.5),where 0.95 is the fairness index and 0.5 is the aggressiveness index.0 5 10 15 20 2500.511.522.5x 10−3cyclelearning rateFigure 5.3: Learning rate (proof of convergence) in topology of Figure 5.1. Each learningcycle consists of 40 seconds.To investigate the convergence of the learning algorithm, we calculated the averagelearning rate, as shown in Figure 5.3. The average learning rate of the process is calculatedas 1(E{n(s,a)}) , where E{n(s, a)} is the average number of times each (state, action) pairvisited by the agent. According to [89], a deceasing average learning rate is an indication81Chapter 5. TCP Fairness over Wireless Multi-hop Networkof the Q-learning convergence process. Figure 5.3 shows that the learning rate approaches0 which guarantees the convergence of the learning algorithm.Figure 5.4 shows the changes of network utilization factor as the learning progress. Asdepicted in Figure 5.4, the network utility factor fluctuates widely at the beginning of thelearning process. However, the network utility factor settles to a value within the desirablerange when the learning process converges.0 5 10 15 20 25440460480500520540560580cyclenetwork utility (kbps)Figure 5.4: Network utilization factor (kbytes/sec) for the topology shown in Figure 5.1.Each learning cycle consists of 40 seconds.As the learning process converges to the desirable state, the Jain’s fairness index of thenetwork settles and the fluctuations become negligible, as demonstrated in Figure 5.5.We compared our scheme with TCP-AP [51] and TCP ex Machina [68]. We imple-mented TCP-AP based on the algorithm in [51]. In TCP-AP, the sending rate is limitedbased on the changes in RTT.Figure 5.6 graphs the performance of TCP-AP, TCP, Q-learning TCP, and TCP exMachina. TCP-AP performs closely to our learning method in terms of the fairness index ofthe network. However, the fair resource allocation in TCP-AP comes at the cost of networkutility. The drop in the network utility factor in TCP-AP is caused by the frequent pausesin data transmission. TCP ex Machina, on the other hand, performs similar to standard82Chapter 5. TCP Fairness over Wireless Multi-hop Network0 5 10 15 20 25889092949698100cyclefairness indexFigure 5.5: Jain’s fairness index in the chain topology of Figure 5.1. Each cycle is equivalentof 40 secondsTCP; the reason behind this behavior of TCP ex Machina over the multi-hop wirelesssetting is that the learning mechanism in TCP ex Machina is optimized for wired settingsand the re-learning process requires a great deal of computational resources which almostis impossible to be done on the current wireless nodes within a reasonable amount of time.Table 5.3.1 summarizes the comparison of Q-learning with TCP-AP and TCP ex Machina.Table 5.1: Network metrics parameters for different TCP variations of Figure 5.1TCP variation Jain’s fairness index Network utilityLegacy TCP 89% 509 KbpsTCP-AP 99% 83 KbpsTCP ex Machina 84% 694 KbpsQ-learning TCP 99% 468 Kbps5.3.2 Larger scale WMNTo evaluate our proposed algorithm further with non-static traffic pattern and a randommesh topology, we generated a random topology in ns2, as illustrated in Figure 5.7. Thereexist 6 flows in the network; all flows are destined towards node 0 and are originated from83Chapter 5. TCP Fairness over Wireless Multi-hop Networkflow 1 flow 2 network utility0100200300400500600700throughput(kbps)legacy TCPQ-learning TCPTCP-APTCP ex MachinaFigure 5.6: Performance comparison between legacy TCP, Q-learning TCP, TCP-AP, andTCP ex Machinanodes 8, 3, 5, 7, 1, and 4. To study the performance of Q-learning TCP under non-staticdata traffic, we programmed all the sources to generate data for random length intervalsand intermittently.  0 100 200 300 400 500 600 700 800 9000100200300400500600Node 0Node 1Node 2Node 4Node 3Node 5Node 6Node 7Node 8Node 9y-axis x-axis Figure 5.7: Random network topology - 10 nodesAs depicted in Figure 5.8, the resource allocation in the legacy TCP is severely unfairas nodes 8, 1, and 4 are starved while nodes 3 and 7 aggressively consume the bandwidth.However, the Q-learning TCP pushes node 3 to decrease its network share and providemore transmission chance for other nodes. The learning agent of node 5 also indicates84Chapter 5. TCP Fairness over Wireless Multi-hop Networknode 8 node 3 node 5 node 7 node 1 node 4 network utility050100150200250300350400450throughput(kbps)legacy TCPQ-learning TCPTCP-APTCP ex MachinaFigure 5.8: TCP flow throughput and network utility in kbytes/sec for scenario of Figure5.7an unfair resource allocation and forces node 5 to slow down the data sending rate. Onthe other hand, node 8 experiences an undergrowth of the congestion window size andstarts to increase the sending rate as node 5 decreases the sending rate. Node 3 and node5 cooperatively provide other nodes with more sending opportunities by decreasing thesending rate which results in an increase in node 8’s sending rate. On the other side ofthe network, node 7 consumes a bigger portion of the bandwidth as compared to node 1;therefore, the Q-learning TCP forces node 7 to decrease its sending rate and provide othernodes with more sending opportunities. A comparison of TCP, TCP-AP, TCP ex Machinaand Q-learning TCP is presented in Table 5.3.2.TCP-AP outperforms both TCP and Q-learning TCP in fair resource allocation in thescenario of Figure 5.7. However, the high fairness index of TCP-AP comes at the cost ofdrastic decrease in network utilization by a factor of 50%. The reason behind the extremedecrease in network utility factor in TCP-AP is the over-estimation of RTTs and excessiveoverhead caused by feedback messages. TCP ex Machina performs as poor as legacy TCPand causes one of the flows to starve completely.To investigate the convergence of the learning process, we graphed the learning rate of85Chapter 5. TCP Fairness over Wireless Multi-hop Network0 5 10 1500. 10−3cyclelearning rate  learning rate of node 3learning rate of node 5learning rate of node 7Figure 5.9: Learning rate of nodes 3, 5, and 7 in scenario of Figure 5.7the learning process for nodes 3, 5, and 7 in Figure 5.9. The agent learning rate for allthree nodes converges to 0 as the learning process progresses. The learning rate of node3 is higher that the two other nodes because node 3 has to make more changes to get tothe optimum state. More changes translate into more state transitions and consequentlya higher convergence rate.Table 5.2: Network metrics parameters for different TCP variations of scenario 5.7TCP variation Jain’s fairness index Network utilityLegacy TCP 75% 430 KbpsTCP-AP 97% 240 KbpsTCP ex Machina 71% 436 KbpsQ-learning TCP 83% 403 Kbps5.3.3 TestbedTo evaluate the performance of Q-learning TCP, we use the testbed setup in Section 2.3and 2.3.2 (Figure 5.10). The data paths are as follows:• flow A: AR1BR2C86Chapter 5. TCP Fairness over Wireless Multi-hop Network• flow B: BR2CWe implemented the Q-learning mechanism as a python suite based on algorithm 4.1 innode B and node A. The Q-learning mechanism communicates with the transport layervia action functions in Section 5.1.2 in order to tune TCP maximum congestion windowsize. While all the users over the network continue their day to day activity.gatewayrouterwireless nodessample wireless nodesforwarder/wireless nodesAB CR1R2Figure 5.10: Testbed topology.To generate real-world traffic profile over the testbed, we use findings of Brownlee etal. [116]. According to measurements of [116] over two real-life networks, Internet streamscan be classified into two groups: dragonflies and tortoises. Dragonflies are session withlifetime of less than 2 seconds while the tortoises are the sessions with long lifetimes,normally over 15 minutes. Authors of [116] showed that although the lifetime of sessionsvaries; the lifetime distribution shape is the same for both Tortoise and dragonflies and itdoes not not experience rapid changes over time. Findings of [116] are critical to the designof Q-learning TCP and its interaction in real world; the fact that the distribution of thelifetime of the streams does not change rapidly over time fits well with the characteristicsof Q-learning. Based on [116], 45% of the streams have a lifetime of less than 2 seconds,53% have a lifetime between 2 seconds and 15 minutes, and the rest has life times morethan 15 minutes (usually in the order of hours). We use these findings to generate trafficon node A and B along with the other existing day-to-day traffic within the office network.87Chapter 5. TCP Fairness over Wireless Multi-hop NetworkTable 5.3 shows the result of our measurements over the testbed of Figure 5.10. TCPQ-learning outperforms TCP Reno on node A with the big margin of 85% of increasein the average throughput. Node B which acts as a source and a forwarder node has tocompromise its throughput to enhance the fairness of the network. The average throughputof node B decreases by 3% to accommodate a 85% boost in throughput of node A whichis a drastic increase with a minimal compromise.Table 5.3: A comparison between Q-learning TCP and TCP RenoTCP variation Source Throughput (Kbits/sec) 95% Confidence interval (Kbits/sec)TCP Reno node A 883 less than 100TCP Reno node B 4003 less than 400Q-learning TCP node A 1549 less than 400Q-learning TCP node B 3877 less than 1000The larger confidence interval in Q-learning TCP is caused by the changes of the max-imum congestion window size according to the optimal policy of Q-learning during thediscovery.5.4 Discussion and ComparisonComparing Q-learning TCP with other well-known existing fairness methods [117][48][51],Q-learning TCP does not incur any overhead to the network with the expense of extracomputation at each node. The focus of LRED [117] is on enhancing TCP throughputover WMN and fairness enhancement is not one of the design objectives. However, thepacing mechanism of LRED enhances the fairness as a side-effect at the cost of excessivetransmission delay. The extra transmission delay in LRED pacing mechanism alleviatesthe hidden terminal issue; however, the imposed delay is fixed in size and is not adjustableto the dynamic nature of the WMN. NRED uses a dropping mechanism to decrease the88Chapter 5. TCP Fairness over Wireless Multi-hop Networkcompetition and provide more resources for the starved flows. However, the droppingprobabilities are calculated and broadcasted constantly, thus incurring a heavy overheadto the shared medium. TCP-AP uses the received signal strength indicator (RSSI) to inferinformation regarding the hidden terminal issue; however, RSSI causes an over-estimationof transmission delay in its pacing mechanism and decreases the TCP throughput drasti-cally. As such, TCP-AP still requires feedback messaging from the neighboring nodes forhidden terminal distance calculations. TCP ex Machina, another comparable mechanism,requires excessive computational resources for its congestion control mechanism optimiza-tion which is not compatible with current network infrastructure resources. Our methodachieves the fairness enhancement of TCP at a cost of reasonable extra computation of themachine learning approach in each node. In WMN that the shared medium is extremelyvaluable, flooding the network with excessive feedback messages or under-utilizing the linkswith excessive non-dynamic transmission delays to enhance the fairness is not very costefficient. However, Q-learning TCP trades the computational simplicity in each node forTCP fairness. In a mesh setting, since the mobility of each node is very minimal, increasingthe computational capacity of the nodes is not very costly. A brief comparison of LRED,NRED, TCP-AP, TCP ex Machina and Q-learning TCP is presented in Table 5.4.Note that the complexity of Q-learning TCP is polynomial with the number of statesand the convergence rate is tractable with a suitable state space size as confirmed with thesimulation and testbed experiments.Our findings confirm that in a wireless multi-hop setting, each TCP source has tocooperate with others to ensure a fair share of network resources for all end-users. TCPallocates resources with the assumption of an inclusive knowledge of the network topologyand a reliable permanent access to the medium for all the end-users. However, in a wirelessmesh setting, the end-user knowledge of the network topology is partial and the access to89Chapter 5. TCP Fairness over Wireless Multi-hop Networkthe medium is intermittent. As such, TCP needs to collect information about other nodesto compensate for the short comings of the underneath layers. The learning agent providesthe TCP source with insight on existing competition for network resources from othernodes. The insight provided by the learning agent compensates for the unfair behavior ofthe MAC and TCP in the wireless multi-hop environment by suppressing the aggressiveresponse of TCP. Note that Q-learning TCP inter-works well with any variation of TCPon the other end-point because the changes to the TCP protocol stack are only in thesender side and the learning mechanism does not need any feedback from the receiver.The Q-learning TCP only relies on the information collected by the learning agent.5.5 ConclusionWe have proposed a cross-layer monitoring and learning mechanism for fair resource alloca-tion of TCP over WMN that uses the information obtained from the MAC layer to optimizeTCP parameter to enhance the end to end fairness within a network. The learning agentuses the local fairness index and the aggressiveness index of each node to decide if the nodeis starving or abusing the network resources. We have used a reward function to guide thelearning agent in taking correct actions that eventually allows us to solve the fairness prob-lem in a distributed manner. We have compared our learning method with legacy TCPand TCP-AP, and TCP ex Machina via extensive ns2 simulations. Simulation results havedemonstrated the superiority of our proposed method within wireless networks. Moreover,we have studied the performance of Q-learning TCP in a testbed. Testbed measurementshave proved that Q-learning TCP can be a great candidate for transport protocol overcurrent wireless multi-hop networks with minimal changes only in TCP source.90Chapter 5. TCP Fairness over Wireless Multi-hop NetworkTable 5.4: A comparison between Q-learning TCP and other well-known TCP solutions.The throughput and fairness enhancement of each TCP flavor is compared against thestandard TCP. As such, when a decrease/increase is mentioned, it is relative to standardTCP.fairnesssolutionfairness enhancement throughputenhancementdisadvantageLRED[117]slight increase 5% to 30% in-creaseoverhead causedby broadcastmessages andfixed transmis-sion delayNRED [48] effective increase(Jain’s fairness indexof 99% in a chaintopology)up to 12% in-creaseexcessive over-head causedby broadcastmessages (over60%)TCP-AP[51]effective increase(Jain’s fairness indexof 99% in a chaintopology)drastic decrease(up to 50%)reliance on RSSIand excessivetransmissiondelayTCP ex-Machina[68]decrease (Jain’s fair-ness index of 84% in achain topology)slight increase excessive learn-ing time andcomputationalresource require-mentQ-learningTCPeffective increase(Jain’s fairness indexof 99% in a chaintopology)slight decrease medium com-putationaloverhead91Chapter 6Conclusions and Future WorkIn this chapter, we summarize the results and highlight the contributions of this thesis.We also suggest several topics for future work.6.1 Research ContributionsIn the present thesis, we have identified two main challenges of CMT-SCTP over wirelessmulti-hop networks: receive buffer blocking and unfair resource allocation while competingwith single-homed flows. We have proposed novel dynamic schemes using the RL algorithmto address both challenges in Chapters 3 and 4. We have applied our findings on fairnessimprovement to TCP to demonstrate the adaptability of our state-of-the-art proposal inChapter 5.• In Chapter 3, we proposed a dynamic network coding mechanism for CMT-SCTP toaddress receive buffer blocking in a multipath transport data transfer in a wirelessmulti-hop environment. The network coding helps the transport layer to eliminatethe reliance of the congestion control algorithm on the packet’s transmission sequencenumber by sending a combination of packets instead of simply forwarding them. Thenetwork coding mechanism uses redundant packets to mask the random losses ofnetwork. We used a Q-learning mechanism within the network coding algorithm toestimate the number of redundant packets based on the dynamics of the network.The Q-learning module in the design adjusts to the dynamic nature of the wireless92Chapter 6. Conclusions and Future Workenvironment and minimizes the number of redundant packets required to mask therandom losses of network. As such, the algorithm proves to be effective in alleviatingthe receive buffer blocking and improving the throughput of CMT-SCTP. Our codedCMT-SCTP scheme outperforms the original CMT-SCTP with a high margin (upto 62% depending on the path dissimilarities and receive buffer size).• In Chapter 4, we proposed a distributed fairness mechanism for CMT-SCTP overwireless multi-hop environment to address the unfair behavior of CMT-SCTP againstnon-CMT flows coming over form farther away hops. We used Q-learning to modelthe environment as an MDP and find the transition probabilities for the MDP totake suitable actions and adjust the CMT-SCTP congestion window. Our proposedmechanism enhances the fairness behavior of CMT-SCTP over wireless multi-hopnetworks by constantly changing the maximum congestion window size on each sub-flow. The dynamic damp on the maximum congestion window size creates a co-operative resource share in CMT-SCTP and provides the starved flows with moretransmission opportunities.• In Chapter 5, we proposed a distributed fairness mechanism for TCP over wirelessmulti-hop environment to address the unfair behavior of TCP against flows comingfrom the farther away hops. The algorithm uses reinforcement learning to monitorthe dynamics of the networks and fine tunes TCP parameters to enhance the fairnessof the network. Our mechanism uses the dynamic damp on maximum congestionwindow size of TCP to alleviate the aggressive behavior of TCP against flows comingfrom farther away hops. Our mechanism proves to enhance the fairness index of thenetwork drastically. The adaptation of our proposed CMT-SCTP fairness mechanismto TCP demonstrates the powerful tool of machine learning in dealing with ever-changing environment such as wireless multi-hop networks.93Chapter 6. Conclusions and Future Work6.2 Suggestions For Future WorkIn the following, we discus several possibilities for extension of the current work.1. Network coding Q-learning mechanism for heterogeneous network In Chap-ter 3, the Q-learning network coding is fully explored as a potential solution for receivebuffer blocking over wireless mesh networks. One of the great directions to extendthe existing work is to explore the performance of the algorithm in a heterogeneousenvironment. Most of the time, a multi-home device is connected over two differentnetwork such as cellular and wi-fi, i.e., smart phones; as such, one has to investigatethe cons and pros of using our coded mechanism over heterogeneous environments topaint a comprehensive picture of the Q-learning mechanism.2. Distributed fairness algorithm for backbone network: In Chapter 4 and 5, weproposed a distributed fairness mechanism for small to medium size network beforethe traffic hits the backbone. Our mechanism can be deployed in the backbone withsome modifications. One of the advantages of using our mechanism before traffichits the backbone is that the algorithm has access to the source. However, if thealgorithm were to be used in the backbone, the algorithm has to be equipped withcross-layer functions. Moreover, the fairness mechanism needs to be re-designed withnew action functions as the backbone network does not have access to the trafficsource to be able to change the the parameters of the transport layer. As such, newcontrol parameters needs to be found that changing those parameters has a directeffect of fairness performance of the backbone nodes. Introducing feedback messagesto keep the old action functions is an easy fix to the problem; however, it defies themain advantage of the distributed mechanism which is the “no feedback” policy. Theproposed mechanism has great potentials for commercialization.94Chapter 6. Conclusions and Future Work3. Centralized learning vs. distributed learning: another approach in using learn-ing mechanisms to fine tune the performance of the transport layer is to collect thedata at the node and send the collected data to a nearby central gateway/data centerto do the learning off-line and then load the result on each node. However, to im-plement a centralized off-line approach, one has to consider the delay and the extramessaging overhead induced by the approach versus the gain in less computationaloverhead for each node within the network.4. Distributed fairness algorithm over bigger size networks: the proposed mech-anisms in Chapter 4 and 5 create computational overhead for the nodes using thealgorithm. As such, when the network size and the number of flows involved in theoptimization process grows, it might have a negative impact on the convergence timeof the learning mechanism. One way to adjust our mechanism for larger area net-works is to break down the network into smaller clusters and uses the mechanismwithin each cluster. Clustering the network into smaller groups decreases the con-vergence time while improving the fairness effectively. However, the idea needs to befully investigated to make sure all aspects of the design is delicately considered whileusing the mechanism along with clustering.95Bibliography[1] N. Arianpoo, I. Aydin, and V. C. Leung, “Network coding: A remedy for receiverbuffer blocking in the concurrent multipath transfer of data over multi-hop wire-less networks,” in 2014 IEEE International Conference on Communications (ICC).IEEE, 2014, pp. 3258–3263.[2] T. Dreibholz, M. Becke, E. P. Rathgeb, and M. Tuxen, “On the use of concurrentmultipath transfer over asymmetric paths,” in 2010 IEEE Global TelecommunicationsConference (GLOBECOM 2010). IEEE, 2010, pp. 1–6.[3] R. Pabst, B. H. Walke, D. C. Schultz, P. Herhold, H. Yanikomeroglu, S. Mukherjee,H. Viswanathan, M. Lott, W. Zirwas, M. Dohler, H. Aghvami, D. D. Falconer, andG. P. Fettweis, “Relay-based deployment concepts for wireless and mobile broadbandradio,” IEEE Communications Magazine, vol. 42, no. 9, pp. 80–89, Sept 2004.[4] I. F. Akyildiz, X. Wang, and W. Wang, “Wireless mesh networks: a survey,” ELSE-VIER Computer Networks, vol. 47, no. 4, pp. 445–487, Mar. 2005.[5] M. Scharf and A. Ford, “Multipath tcp (mptcp) application interface considerations,”Tech. Rep., 2013.[6] E. R. Stewart, “Stream control transmission protocol,” Network Working Group atIETF, Tech. Rep., 2007.[7] J. Iyengar, K. Shah, P. Amer, and R. Stewart, “Concurrent multipath transfer us-ing SCTP multihoming,” in Presented at International Symposium on PerformanceEvaluation of Computer and Telecommunication Systems (SPECTS’04), San Jose,CA, July 2004.[8] J. Iyengar, P. Amer, and R. Stewart, “Concurrent Multipath Transfer Using SCTPMultihoming Over Independent End-to-End Paths,” IEEE/ACM Transactions onNetworking, vol. 14, no. 5, pp. 951 –964, October 2006.[9] T. Yang, L. Pan, L. Jian, H. Hongcheng, and W. Jun, “Reducing receive bufferblocking in cmt based on SCTP using retransmission policy,” in 2011 IEEE 3rd In-ternational Conference on Communication Software and Networks (ICCSN). IEEE,2011, pp. 122–125.96Bibliography[10] J. R. Iyengar and P. D. Amer and R. Stewart, “Receive buffer blocking in concur-rent multipath transfer,” in Global Telecommunications Conference, 2005. GLOBE-COM’05. IEEE, 2005.[11] J. R. Iyengar, P. D. Amer, and R. Stewart, “Performance implications of a boundedreceive buffer in concurrent multipath transfer,” Computer Communications, vol. 30,no. 4, pp. 818–829, 2007.[12] M. Fisk and W.-c. Feng, “Dynamic right-sizing in TCP,” http://lib-www. lanl. gov/la-pubs/00796247. pdf, p. 2, 2001.[13] S. Barre´, C. Paasch, and O. Bonaventure, “Multipath tcp: from theory to practice,”in International Conference on Research in Networking. Springer, 2011, pp. 444–457.[14] D. Wischik, C. Raiciu, A. Greenhalgh, and M. Handley, “Design, implementationand evaluation of congestion control for multipath TCP,” in Proceedings of the 8thUSENIX Conference on Networked Systems Design and Implementation, 2011, pp.99–112.[15] I. A. Halepoto, F. C. M. Lau, and Z. Niu, “Scheduling over dissimilar paths usingCMT-SCTP,” in Seventh International Conference on Ubiquitous and Future Net-works, July 2015, pp. 535–540.[16] C. Xu and Z. Li and J. Li and H. Zhang and G. M. Muntean, “Cross-Layer Fairness-Driven Concurrent Multipath Video Delivery Over Heterogeneous Wireless Net-works,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 25,pp. 1175–1189, 2015.[17] P. Amer, M. Becke, T. Dreibholz, N. Ekiz, J. Iyengar, P. Natarajan, R. Stewart,and M. Tu¨xen, “Load sharing for the stream control transmission protocol (SCTP),”IETF ID: draft-tuexen-tsvwg-sctp-multipath-06 (work in progress), 2013.[18] M. Becke, T. Dreibholz, H. Adhari, and E.P.Rathgeb, “On the fairness of trans-port protocols in a multi-path environment,” in IEEE International Conference onCommunications (ICC). IEEE, 2012, pp. 2666–2672.[19] I. Aydin, J. Iyengar, P. Conrad, C. Shen, and P. Amer, “Evaluating TCP-friendlinessin light of concurrent multipath transfer,” Computer Networks, vol. 56, no. 7, pp.1876 – 1892, 2012.[20] J. Iyengar, “Concurrent multipath transfer using sctp multihoming,” MultihomedCommunication with SCTP (Stream Control Transmission Protocol), p. 99, 2012.[21] L. Cui, S. J. Koh, and W. J. Lee, “Fast selective ACK scheme for throughput en-hancement of multi-homed SCTP hosts,” IEEE Communications letters, vol. 14,no. 6, pp. 587–589, 2010.97Bibliography[22] P. Natarajan, N. Ekiz, P. D. Amer, J. R. Iyengar, and R. Stewart, “Concurrentmultipath transfer using SCTP multihoming: Introducing the potentially-failed des-tination state,” in International Conference on Research in Networking. Springer,2008, pp. 727–734.[23] T. Dreibholz, M. Becke, H. Adhari, and E. P. Rathgeb, “On the impact of congestioncontrol for concurrent multipath transfer on the transport layer,” in Proceedings ofthe 11th IEEE International Conference on Telecommunications (ConTEL), 2011,pp. 397–404.[24] T. Dreibholz, M. Becke, J. Pulinthanath, and E. P. Rathgeb, “Applying TCP-friendlycongestion control to concurrent multipath transfer,” in 2010 24th IEEE Interna-tional Conference on Advanced Information Networking and Applications. IEEE,2010, pp. 312–319.[25] C. Xu, Z. Li, J. Li, H. Zhang, and G.-M. Muntean, “Cross-layer fairness-drivenconcurrent multipath video delivery over heterogeneous wireless networks,” IEEETransactions on Circuits and Systems for Video Technology, vol. 25, no. 7, pp. 1175–1189, 2015.[26] N. Kuhn, E. Lochin, A. Mifdaoui, G. Sarwar, O. Mehani, and R. Boreli, “DAPS:intelligent delay-aware packet scheduling for multipath transport,” in 2014 IEEEInternational Conference on Communications (ICC). IEEE, 2014, pp. 1222–1227.[27] Y. Dong, D. Wang, N. Pissinou, and J. Wang, “Multi-path load balancing in trans-port layer,” in 3rd EuroNGI Conference on Next Generation Internet Networks.IEEE, 2007, pp. 135–142.[28] D. S. Lun, M. Me´dard, R. Koetter, and M. Effros, “Further results on coding for reli-able communication over packet networks,” in Proceedings. International Symposiumon Information Theory (ISIT 2005). IEEE, 2005, pp. 1848–1852.[29] ——, “On coding for reliable communication over packet networks,” Physical Com-munication, vol. 1, no. 1, pp. 3–20, 2008.[30] C. J. Watkins and P. Dayan, “Q-learning,” Machine learning, vol. 8, no. 3-4, pp.279–292, 1992.[31] B. Francis, V. Narasimhan, A. Nayak, and I. Stojmenovic, “Techniques for enhanc-ing tcp performance in wireless networks,” in 32nd International Conference on Dis-tributed Computing Systems Workshops. IEEE, 2012, pp. 222–230.[32] G. Jakllari, S. Eidenbenz, N. Hengartner, S. Krishnamurthy, and M. Faloutsos, “Linkpositions matter: A noncommutative routing metric for wireless mesh networks,”IEEE Transactions on Mobile Computing, vol. 11, no. 1, pp. 61–72, Jan 2012.98Bibliography[33] D. J. Leith, Q. Cao, and V. G. Subramanian, “Max-min Fairness in 802.11 MeshNetworks,” IEEE/ACM Transactions on Networking, vol. 20, no. 3, pp. 756–769,Jun. 2012.[34] X. M. Zhang, W. B. Zhu, N. N. Li, and D. K. Sung, “TCP congestion windowadaptation through contention detection in ad hoc networks,” IEEE Transactionson Vehicular Technology, vol. 59, no. 9, pp. 4578–4588, 2010.[35] J. Karlsson, A. Kassler, and A. Brunstrom, “Impact of packet aggregation on TCPperformance in wireless mesh networks,” in IEEE International Symposium on aWorld of Wireless, Mobile and Multimedia Networks Workshops, (WoWMoM 2009),June 2009, pp. 1–7.[36] R. De Oliveira and T. Braun, “A smart TCP acknowledgment approach for multihopwireless networks,” IEEE Transactions on Mobile Computing, vol. 6, no. 2, pp. 192–205, 2007.[37] Z. Fu, H. Luo, P. Zerfos, S. Lu, L. Zhang, and M. Gerla, “The impact of multihopwireless channel on TCP performance,” IEEE transactions on Mobile Computing,vol. 4, no. 2, pp. 209–221, 2005.[38] K. Nahm, A. Helmy, and C.-C. Jay Kuo, “TCP over multihop 802.11 networks:issues and performance enhancement,” in Proceedings of the 6th ACM internationalsymposium on Mobile ad hoc networking and computing, ser. MobiHoc ’05. ACM,2005, pp. 277–287.[39] Y. Su, P. Steenkiste, and T. Gross, “Performance of TCP in Multi-Hop Access Net-works,” in 16th International Workshop on Quality of Service (IWQoS 2008), 2008,pp. 181–190.[40] T. Kuang and C. Williamson, “A bidirectional multi-channel mac protocol for im-proving TCP performance on multihop wireless ad hoc networks,” in Proceedingsof the 7th ACM International Symposium on Modeling, Analysis and Simulation ofWireless and Mobile Systems, ser. MSWiM ’04. New York, NY, USA: ACM, 2004,pp. 301–310.[41] S. Xu and T. Saadawi, “Does the IEEE 802.11 MAC protocol work well in multihopwireless ad hoc networks?” IEEE Communications Magazine, vol. 39, no. 6, pp.130–137, Jun 2001.[42] E. Altman and T. Jime´nez, “Novel delayed ACK techniques for improving TCPperformance in multihop wireless networks,” in 8th International Conference onPersonal Wireless Communications (PWC 2003), vol. 2775. Berlin, Heidelberg:Springer Berlin Heidelberg, 2003, pp. 237–250.99Bibliography[43] K. Xu and S. Bae and S. Lee and M. Gerla, “TCP behavior across multihop wire-less networks and the wired internet,” in Proceedings of the 5th ACM internationalworkshop on Wireless mobile multimedia. ACM, 2002, pp. 41–48.[44] N. Arianpoo, P. Jokar, and V. C. Leung, “Enhancing TCP performance in wirelessmesh networks by cross layer design,” in International Conference on Computing,Networking and Communications (ICNC 2012), 2012.[45] N. Arianpoo and V. C. Leung, “How network monitoring and reinforcement learningcan improve tcp fairness in wireless multi-hop networks,” EURASIP Journal onWireless Communications and Networking, vol. 2016, no. 1, p. 278, 2016.[46] A. Al-Jubari, M. Othman, B. Mohd Ali, and N. Abdul Hamid, “An Adaptive De-layed Acknowledgment Strategy to Improve TCP Performance in Multi-hop WirelessNetworks,” Wireless Personal Communications, vol. 69, no. 1, pp. 307–333, 2013.[47] T. Yuki, T. Yamamoto, M. Sugano, M. Murata, H. Miyahara, and T. Hatauchi,“Performance improvement of TCP over an ad hoc network by combining of dataand ACK packets,” The Institute of Electronics, Information and CommunicationEngineers (IEICE) Transactions on Communications, vol. 86, pp. 3559–3568, 2004.[48] K. Xu, M. Gerla, L. Qi, and Y. Shu, “Enhancing TCP fairness in ad hoc wirelessnetworks using neighborhood RED,” in Proceedings of the 9th Annual InternationalConference on Mobile Computing and Networking (ACM MobiCom 2003). NewYork, NY, USA: ACM, 2003, pp. 16–28.[49] J. Ye, J.-X. Wang, and J.-W. Huang, “A cross-layer TCP for providing fairness inwireless mesh networks ,” International Journal of Communication Systems, vol. 24,no. 12, pp. 1611–1626, 2011.[50] A. Raniwala, D. Pradipta, and S. Sharma, “End-to-End Flow Fairness Over IEEE802.11-Based Wireless Mesh Networks,” in 26th IEEE International Conference onComputer Communications (INFOCOM 2007), May 2007, pp. 2361–2365.[51] S. ElRakabawy and C. Lindemann, “A Practical Adaptive Pacing Scheme for TCPin Multihop Wireless Networks,” IEEE/ACM Transactions on Networking, vol. 19,no. 4, pp. 975–988, 2011.[52] H. Xie, A. Boukerche, and A. Loureiro, “TCP-ETX: A cross layer path metric forTCP optimization in wireless networks,” in IEEE International Conference on Com-munications (ICC 2013), June 2013, pp. 3597–3601.[53] D. Katabi, M. Handley, and C. Rohrs, “Congestion control for high bandwidth-delayproduct networks,” ACM SIGCOMM Computer Communication Review, vol. 32,no. 4, pp. 89–102, 2002.100Bibliography[54] B. Vamanan, J. Hasan, and T. Vijaykumar, “Deadline-aware datacenter TCP(D2TCP),” ACM SIGCOMM Computer Communication Review, vol. 42, no. 4, 2012.[55] V. Gambiroza, B. Sadeghi, and E. W. Knightly, “End-to-end performance and fair-ness in multihop wireless backhaul networks,” in Proceedings of the 10th annualinternational conference on Mobile computing and networking, ser. MobiCom ’04.New York, NY, USA: ACM, 2004, pp. 287–301.[56] S. Shioda, H. Iijima, T. Nakamura, S. Sakata, Y. Hirano, and T. Murase, “ACKpushout to achieve TCP fairness under the existence of bandwidth asymmetry,” inProceedings of the 5th ACM workshop on Performance monitoring and measurementof heterogeneous wireless and wired networks, ser. PM2HW2N ’10, 2010, pp. 39–47.[57] C. Cicconetti, I. Akyildiz, and L. Lenzini, “FEBA: A Bandwidth Allocation Algo-rithm for Service Differentiation in IEEE 802.16 Mesh Networks,” IEEE Transactionson Networking, vol. 17, no. 3, pp. 884–897, 2009.[58] ——, “Bandwidth Balancing in Multi-Channel IEEE 802.16 Wireless Mesh Net-works,” in Proceedings of 26th IEEE International Conference on Computer Com-munications (ICC 2007), may 2007, pp. 2108 –2116.[59] T.-C. Hou, C.-W. Hsu, and C.-S. Wu, “A Delay-based Transport Layer Mechanismfor Fair TCP Throughput over 802.11 Multihop Wireless Mesh Networks ,” Inter-national Journal of Communication Systems, vol. 24, no. 8, pp. 1015–1032, August2011.[60] S. Fowler, M. Eberhard, and K. Blow, “Implementing an adaptive TCP fairness whileexploiting 802.11e over wireless mesh networks,” International Journal of PervasiveComputing and Communications, vol. 5, pp. 272 – 294, 2009.[61] T. Li, D. J. Leith, V. Badarla, D. Malone, and Q. Cao, “Achieving End-to-end Fair-ness in 802.11e Based Wireless Multi-Hop Mesh Networks Without Coordination,”Mobile Networks and Applications, vol. 16, no. 1, pp. 17–34, Feb. 2011.[62] K. Xu and N. Ansari, “Stability and fairness of rate estimation-based AIAD con-gestion control in TCP,” IEEE Communications Letters, vol. 9, no. 4, pp. 378–380,April 2005.[63] K. L. E. Law and W.-C. Hung, “Engineering TCP transmission and retransmissionmechanisms for wireless networks.” Pervasive and Mobile Computing, vol. 7, no. 5,pp. 627–639, 2011.[64] L. Xu, K. Harfoush, and I. Rhee, “Binary increase congestion control (BIC) for fastlong-distance networks,” in 23rd IEEE Conference on Computer and Communica-tions Societies (INFOCOM 2004), vol. 4, 2004, pp. 2514–2524.101Bibliography[65] S. Ha, I. Rhee, and L. Xu, “Cubic: a new tcp-friendly high-speed tcp variant,” ACMSpecial Interest Group on Operating Systems Review (ACM SIGOPS 2008), vol. 42,no. 5, pp. 64–74, 2008.[66] C. P. Fu and S. Liew, “TCP Veno: TCP enhancement for transmission over wirelessaccess networks,” IEEE Journal on Selected Areas in Communications, vol. 21, no. 2,pp. 216–228, February 2003.[67] K. Winstein, A. Sivaraman, and H. Balakrishnan, “Stochastic Forecasts Achieve HighThroughput and Low Delay over Cellular Networks,” in 10th USENIX Symposiumon Networked Systems Design and Implementation (NSDI 13), 2013, pp. 459–471.[68] K. Winstein and H. Balakrishnan, “TCP ex machina: computer-generated congestioncontrol,” in ACM Special Interest Group on Data Communication conference 2013(SIGCOMM 2013), 2013, pp. 123–134.[69] K. Nichols and V. Jacobson, “Controlling queue delay,” Communications of the ACM,vol. 55, no. 7, pp. 42–50, 2012.[70] K. Xu, Y. Tian, and N. Ansari, “Tcp-jersey for wireless ip communications,” IEEEJournal on Selected Areas in Communications, vol. 22, no. 4, pp. 747–756, May 2004.[71] I. Aydin and C. Shen, “Performance Evaluation of Concurrent Multipath TransferUsing SCTP Multihoming in Multihop Wireless Networks,” in 8th IEEE Interna-tional Symposium on Network Computing and Applications (NCA 2009) , July 2009,pp. 234–241.[72] Dreibholz, T. and Becke, M. and Pulinthanath, J. and Rathgeb, E.P., “Apply-ing TCP-Friendly Congestion Control to Concurrent Multipath Transfer,” in 24thIEEE International Conference on Advanced Information Networking and Applica-tions (AINA 2010) , April 2010, pp. 312–319.[73] Y. Cao and Ch. Xu and J. Guan and H. Zhang, “TCP-friendly CMT-based multi-media distribution over multi-homed wireless networks,” in IEEE Wireless Commu-nications and Networking Conference (WCNC 2014) , April 2014, pp. 3028–3033.[74] I. Aydin, “SCTP-based Concurrent Multipath Transfer in the Contexts of MultihopWireless Networks and Tcp-friendliness,” Ph.D. dissertation, University of Delaware,2010.[75] C. Xu and T. Liu and J. Guan and H. Zhang and G. M. Muntean, “CMT-QA:Quality-Aware Adaptive Concurrent Multipath Data Transfer in HeterogeneousWireless Networks,” IEEE Transactions on Mobile Computing, vol. 12, pp. 2193–2205, 2013.102Bibliography[76] Y. Cao and Ch. Xu and J. Guan and F. Song and H. Zhang, “Environment-awareCMT for efficient video delivery in wireless multimedia sensor networks,” Interna-tional Journal of Distributed Sensor Networks, pp. 4522–4527, 2012.[77] Y. Cao and Ch. Xu and J. Guan and J. Zhao and H. Zhang, “Cross-layer cogni-tive CMT for efficient multimedia distribution over multi-homed wireless networks,”IEEE Wireless Communications and Networking Conference (WCNC 2013) , pp.4522–4527, April 2013.[78] P. Natarajan, N. Ekiz, P. D. Amer, J. Iyengar, and R. Stewart, “Concurrent multi-path transfer using sctp multihoming: Introducing the potentially-failed destinationstate,” in Proceedings of 7th International IFIP-TC6 Networking Conference Singa-pore. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, pp. 727–734.[79] W. Zhuang and N. Mohammadizadeh and X. Shen, “Multipath transmission forwireless Internet access–from an end-to-end transport layer perspective,” Journal ofInternet Technology, vol. 13, pp. 1–18, 2012.[80] Becke, M. and Dreibholz, T. and Bayer, A. and Packeiser, M. and Rathgeb, E.P.,“Alternative transmission strategies for multipath transport of multimedia streamsover wireless networks,” in 12th International Conference on Telecommunications(ConTEL 2013) , June 2013, pp. 147–154.[81] Y. Yuan and Z. Zhang and J. Li and J. Shi and J. Zhou and G. Fang and E.Dutkiewicz, “Extension of SCTP for Concurrent Multi-Path Transfer with ParallelSubflows,” in IEEE Wireless Communications and Networking Conference (WCNC2010) , April 2010, pp. 1–6.[82] R. Ahlswede, N. Cai, S.-Y. Li, and R. W. Yeung, “Network information flow,” IEEETransactions on information theory, vol. 46, no. 4, pp. 1204–1216, 2000.[83] S. Chachulski, M. Jennings, S. Katti, and D. Katabi, “Trading structure for random-ness in wireless opportunistic routing,” ACM SIGCOMM Computer CommunicationReview, vol. 37, pp. 169–180, October 2007.[84] S. Katti, H. Rahul, W. Hu, D. Katabi, M. Me´dard, and J. Crowcroft, “Xors in the air:practical wireless network coding,” in ACM SIGCOMM computer communicationreview, vol. 36, no. 4. ACM, 2006, pp. 243–254.[85] R. Koetter and M. Me´dard, “An algebraic approach to network coding,” IEEE/ACMTransactions on Networking (TON), vol. 11, no. 5, pp. 782–795, 2003.[86] J. Sundararajan, D. Shah, M. Medard, M. Mitzenmacher, and J. Barros, “NetworkCoding Meets TCP,” in Proc. IEEE INFOCOM, Toronto, Canada, April 2009, pp.280 –288.103Bibliography[87] J. K. Sundararajan, S. Jakubczak, M. Medard, M. Mitzenmacher, and J. Barros,“Network Coding Meets TCP: Theory and Implementation,” Proceedings of theIEEE, vol. 99, pp. 490 – 512, March 2011.[88] C. J.C.H. Watkins and P. Dayan, “Technical Note: Q-Learning,” Machine Learning,1992.[89] C. J. Watkins and P. Dayan, “Technical note: Q-learning,” Machine Learning, vol. 8,no. 3-4, pp. 279–292, 1992.[90] E. Even-Dar and Y. Mansour, “Learning rates for q-learning,” Journal of MachineLearning Research, vol. 5, pp. 1–25, Dec. 2004.[91] L. Matignon, G. J. Laurent, and N. Le Fort-Piat, “Reward function and initial values:better choices for accelerated goal-directed reinforcement learning,” in InternationalConference on Artificial Neural Networks. Springer, 2006, pp. 840–849.[92] M. Kearns and S. Singh, “Finite-sample convergence rates for q-learning and indirectalgorithms,” in In Neural Information Processing Systems 12. MIT Press, 1999, pp.1996 – 1002.[93] L. Matignon and G.J. Laurent and N. Le Fort-Piat, “Improving Reinforcement Learn-ing Speed for Robot Control,” Intelligent Robots and Systems, 2006 IEEE/RSJ In-ternational Conference on, pp. 3172–3177, 2006.[94] RaspberryPi, “Model B,” (Accessed on 2017-02-20). [Online]. Available:https://www.raspberrypi.org/[95] “The FreeBSD Project,” (Accessed on 2017-02-20). [Online]. Available:https://www.freebsd.org/[96] “Debian Wheezy,” (Accessed on 2017-02-20). [Online]. Available:https://www.debian.org/releases/wheezy/[97] TPLINK, “150mbps wireless n nano usb adapter,” (Accessed on 2017-02-20).[Online]. Available: http://www.tp-link.com/lk/products[98] Wireshark. (Accessed on 2017-02-20). [Online]. Available:https://www.wireshark.org[99] G. King and L. Zeng, “Logistic regression in rare events data,” Political Analysis,vol. 9, pp. 137–163, 2001.[100] J. K. Sundararajan, D. Shah, and M. Me´dard, “ARQ for network coding,” in 2008IEEE International Symposium on Information Theory. IEEE, 2008, pp. 1651–1655.104Bibliography[101] G. Bolch and S. Greiner and H. de Meer and K. S. Trivedi, Queueing Networksand Markov Chains: Modeling and Performance Evaluation with Computer ScienceApplications. Wiley-Interscience, 1998.[102] Han, Yijie and Makowski, Armand M, “Resequencing delays under multipathrouting–Asymptotics in a simple queueing model,” Institute for System Research,University of Maryland, Tech. Rep., 2005.[103] “QualNet,” (Accessed on 2017-02-20). [Online]. Available: http://web.scalable-networks.com/qualnet-network-simulator-software[104] B. Horan, Practical Raspberry Pi. Apress, 2013.[105] J. Sundararajan, D. Shah, M. Medard, and P. Sadeghi, “Feedback-based online net-work coding,” Nov. 29 2011, US Patent 8,068,426.[106] J. Sundararajan, D. Shah, and M. Medard, “Online network coding for optimalthroughput and delay - the three-receiver case,” in International Symposium on In-formation Theory and Its Applications (ISITA 2008), Dec 2008, pp. 1–6.[107] J. Barros, R. Costa, D. Munaretto, and J. Widmer, “Effective delay control in onlinenetwork coding,” in INFOCOM 2009, IEEE, April 2009, pp. 208–216.[108] H. Balakrishnan, V. Padmanabhan, S. Seshan, and R. Katz, “A comparison of mech-anisms for improving tcp performance over wireless links,” IEEE/ACM Transactionson Networking, 1997.[109] S. Kopparty, S. Krishnamurthy, M. Faloutsos, and S. Tripathi, “Split TCP for mobilead hoc networks,” in IEEE Global Telecommunications Conference (GLOBECOM’02), vol. 1, November 2002, pp. 138–142.[110] M. Gerla and K. Tang and R. Bagrodia, “TCP performance in wireless multi-hopnetworks,” in IEEE 2nd Workshop on Mobile Computing Systems and Applications(WMCSA 99) , 1999, pp. 41–50.[111] R. Jain, D.-M. Chiu, and W. R. Hawe, A quantitative measure of fairness and dis-crimination for resource allocation in shared computer system. Eastern ResearchLaboratory, Digital Equipment Corporation Hudson, MA, 1984, vol. 38.[112] N. Jani and K. Kant, “SCTP Performance in Data Center Environments,” in Proceed-ings of the 2005 International Symposium on Performance Evaluation of Computerand Telecommunication Systems (SPECTS’05), 2005.[113] P. Domingos and M. Pazzani, “On the optimality of the simple bayesian classifierunder zero-one loss,” Machine Learning, vol. 29, pp. 103–130, 1997.105Bibliography[114] M. Franceschinis, M. Mellia, M. Meo, and M. Munafo, “Measuring TCP over WiFi:A real case,” in 1st workshop on Wireless Network Measurements (Winmee), RivaDel Garda, Italy, 2005.[115] V. Jacobson and R. Braden and D. Borman and M. Satyanarayanan and JJ Kistlerand LB Mummert and MR Ebling, “RFC 1323: TCP extensions for high perfor-mance,” May 1999.[116] N. Brownlee and K. C. Claffy, “Understanding internet traffic streams: dragonfliesand tortoises,” IEEE Communications Magazine, vol. 40, no. 10, pp. 110–117, Oct2002.[117] Z. Fu, H. Luo, P. Zerfos, S. Lu, L. Zhang, and M. Gerla, “The impact of multihopwireless channel on TCP performance,” IEEE Transactions on Mobile Computing,vol. 4, no. 2, pp. 209–221, March 2005.106


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items