UBC Faculty Research and Publications

How network monitoring and reinforcement learning can improve tcp fairness in wireless multi-hop networks Arianpoo, Nasim; Leung, Victor C Dec 5, 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-13638_2016_Article_773.pdf [ 1.06MB ]
JSON: 52383-1.0340076.json
JSON-LD: 52383-1.0340076-ld.json
RDF/XML (Pretty): 52383-1.0340076-rdf.xml
RDF/JSON: 52383-1.0340076-rdf.json
Turtle: 52383-1.0340076-turtle.txt
N-Triples: 52383-1.0340076-rdf-ntriples.txt
Original Record: 52383-1.0340076-source.json
Full Text

Full Text

Arianpoo and Leung EURASIP Journal onWireless Communications andNetworking  (2016) 2016:278 DOI 10.1186/s13638-016-0773-3RESEARCH Open AccessHow network monitoring andreinforcement learning can improve tcpfairness in wireless multi-hop networksNasim Arianpoo* and Victor C.M. LeungAbstractWireless mesh network (WMN) is an emerging technology for the last-mile Internet access. Despite extensive researchand the commercial implementations of WMNs, there are still serious fairness issues in the transport layer, where thetransmission control protocol (TCP) favors flows with a smaller number of hops to flows with a larger number of hops.TCP unfair behavior is a known anomaly over WMN that attracts much attention in recent years and is the focus of thispaper. In this article, we propose a distributed network monitoring mechanism using a cross-layer approach thatdeploys reinforcement learning techniques (RL) to achieve fair resource allocation for nodes within the wireless meshsetting. In our approach, we deploy Q-learning, a reinforcement learning mechanism, to monitor the dynamics of thenetwork. The Q-learning agent creates a state map of the network based on the medium access control (MAC)parameters and takes actions to enhance TCP fairness and throughput of the starved flows in the network. Theproposal creates a distributed cooperative mechanism where each node hosting a TCP source monitors the networkand adjusts its TCP parameters based on the network dynamics. Extensive simulation results and testbed analysisdemonstrate that the proposed method significantly improves the TCP fairness in a multi-hop wireless environment.Keywords: TCP, Fairness, Wireless mesh networks, Reinforcement learning, Distributed1 IntroductionEnhancing transmission control protocol (TCP) fairnessover wireless mesh networks (WMN) is a significantresearch area that attracts researchers’ attention for thepast decade. TCP was first designed for wired net-works and performs well over wired infrastructure; assuch, when wireless networks were introduced, TCP wasadopted to wireless environment. However, the funda-mental differences between wireless and wired mediumsresult in substandard performance of TCP over wire-less networks. Wireless multi-hop networks are especiallyaffected by TCP unfairness as TCP favors flows withsmaller number of hops in WMNs [1]. In a wired set-ting, the network topology is well defined and each nodehas a comprehensive knowledge of the available networkresources. Therefore, TCP is capable of assigning a fairshare of network resources to each flow. However, in*Correspondence: nasima@ece.ubc.caDepartment of Electrical and Computer Engineering, University of BritishColumbia, 2356 Main Mall, Vancouver, Canadaa wireless multi-hop network, each node has a partialknowledge of the available network topology and cre-ating a unified map of the links and collision domainsusing feedback messages is extremely costly in terms ofoverhead. As such, TCP is not capable of a fair resourceallocation over WMN [1–5].TCP unfairness increases remarkably in uplink trafficwhere nodes transmit data to the gateway [1]. Findingsof [1] over a real WMN and many similar studies moti-vated us to propose a fair resource allocation mechanismin transport layer overWMNdespite an unfairMAC layer.Our proposal uses a distributed mechanism to monitorthe network anomalies in resource allocation and tuneTCP parameters accordingly. Each TCP sourcemodels thestate of the system as a Markov decision process (MDP)and uses Q-learning to learn the transition probabilitiesof the proposed MDP based on the observed variables.To maximize TCP fairness, each node hosting a TCPsource takes actions according to the recommendationsof the Q-learning algorithm and adjusts TCP parame-ters autonomously. Our algorithm preserves autonomy© The Author(s). 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, andreproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to theCreative Commons license, and indicate if changes were made.Arianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 2 of 15of each node in decision-making process and does notrequire a central control mechanism or control mes-sage exchange among nodes. Unlike the existing machinelearning solutions, i.e., TCP ex Machina, our proposal iscompatible with the computational capacity of the currentinfrastructure. We call our approach Q-learning TCP. Thecontributions of this paper can be summarized as:• Modeling the multi-hop network as an MDP in eachTCP source and using Q-learning algorithm tomonitor and learn the dynamics of the network andthe proposed MDP.• Finding a cross-layer distributed and scalable solutionfor TCP fairness over multi-hop networks with noextra overhead. Our proposal enhances TCP fairnessover multi-hop networks in favor of flows traversinga longer number of hops with negligible impact onflows with a shorter number of hops via changingTCP parameters cooperatively based on therecommendation of the Q-learning algorithm.• Enhancing TCP fairness by a factor of 10 to 20%without any feedback messaging and no incurredoverhead to the medium.The rest of this paper is organized as follows: anoverview of the available TCP solutions is given inSection 2. Section 3 is a detailed description of ouralgorithm followed by the implementation specifics inSection 4. Performance evaluation of our proposed algo-rithm is presented in Section 5 via extensive simulationand testbed experimentation. A discussion on implica-tions of Q-learning TCP along with a comparison withavailable fairness techniques is presented in Section 6.Throughout this paper, we measure fairness using Jain’sfairness index.2 Related worksDifferent approaches have been proposed in the literatureto address the problem of TCP unfairness over wirelessmesh networks [2–11]. The existing literature on TCPfairness solutions overmulti-hopwireless networks can becategorized into cross-layer designs e.g [12–21], and lay-ered proposals, e.g., [22–34]. While layered proposals aimto keep the end-to-end semantic of TCP intact, the cross-layer designs use the information from different layers toadjust TCP parameters.Random early detection (RED) is among one of thefirst proposals to enhance TCP fairness over wired con-nections. Neighborhood RED (NRED) is a cross-layeradaptation of RED for wireless multi-hop settings [15].NRED is implemented in MAC and uses channel utiliza-tion from physical layer to calculate the probability ofpacket dropping. The main disadvantage of NRED is theuse of broadcast messages to inform the neighboring nodeabout packet dropping probabilities. In [16], the gatewayuses a centralized cross-layer explicit congestion notifica-tion algorithm (ECN) to notify the nearby TCP sourcesthat it favors the farther flows and allows them to usemore network resources [16]. The fact that the gatewayuses a centralized mechanism along with feedback mes-sages makes ECN less favorable for WMN [16]. In [13], across-layer solution is proposed that uses network codingin each hop with a different rate to improve TCP through-put, the computational over head of network coding alongwith the hop-by-hop feedback and rate estimation createsa bottle neck for computational resources of the network.In another instance of cross-layer approach [14], authorsused a hop-by-hop congestion control mechanism for fairresource allocation; however, themechanism in [14] intro-duces a heavy inter-node and intra-node control traffic.In [35], another hop-by-hop mechanism is discussed thatcollects information from physical layer and MAC layerto approximate channel utilization in each hop. The esti-mation of channel utilization is based on carrier sensingwhich might not be very accurate in some scenarios.CoDel is another cross-layer example that uses activequeuemanagement on selected bottle neck links, i.e., linkswith large queuing delay. CoDel uses spacing in trans-mission times as the queue management method overbottle neck links. CoDel has to run on all the hops/routerswithin TCP flow path; hence, the implementation requiresinfrastructural change which is not practical. D2TCP is arecent variation of TCP for on-line data intensive applica-tions that uses varying rate based on a deadline set by theapplication and the congestion condition of the networkreported by ECN [20]. D2TCP is a cross-layer approachthat needs compatibility in both application layer androuting protocol to perform effectively which is a hugedisadvantage.TCP-AP, another instance of the cross-layer method,attempts to eliminate the reliance of TCP fairness solutionon feedback messages [18]. The approach of [18] relieson the information from the physical layer to infer thefair share of each node of network resources. The maindrawback of TCP-AP is the reliance on received signalstrength indication (RSSI) which is not an accurate esti-mate of receiver power level. As such, TCP-AP still needsto get feedback from the receiver to function effectively.In [21], a cross-layer algorithm, MHHP, is proposed thatassigns higher priority to flows traversing larger numberof hops. AlthoughMHHP improves TCP fairness inmulti-hop environment, the design does not accurately reflectthe dynamic nature of the wireless environment.Unlike cross-layer approaches, layered solutions pre-serve the end-to-end semantic of the open systemsinterconnection model (OSI). According to [22], thebinary back-off mechanism, request-to-send/clear-to-send signaling (RTS/CTS), and the end-to-end congestionArianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 3 of 15mechanism play key roles in TCP unfairness. In [22], aMAC layer solution is proposed in which each accesspoint calculates the fair share of incoming and outgo-ing flows in terms of flow rates and broadcasts the ratesto other nodes. Reference [24] is another MAC layerapproach in which nodes negotiate to set up a fair trans-mission schedule throughout the network. The proposedmethods in [22] and [24] rely on feedback messages whichincreases the network overhead significantly. In [26], alayered approach is proposed that uses the TCP adver-tised window and delayed ACK to control flow rates ofdifferent flows. The algorithm in [26] only works in sce-narios where all flows are destined to the gateway. TCPVeno [32], another instance of layered solution for TCP,gained a lot of attention in recent years due to its bet-ter performance over wireless settings; Veno is basicallyan integration of TCP Vegas into TCP Reno and does notcontribute to the fairness significantly.Among the end-to-end solutions to enhance TCP per-formance, there are few that use machine learning as aninteractive tool to observe network dynamics and changeTCP parameters based on some prediction of the net-work behavior. Sprout is a good example of the interactivetransport layer protocols [33]. Sprout uses the arrivaltime of packets to predict how many bytes the sendercan successfully transmit without creating congestion inthe network while maximizing network utilization. Themain disadvantage of Sprout is the fact that it needs torun over CoDel to outperform other TCP variants; assuch, it requires changes to the infrastructure to enableactive queue management. TCP ex Machina, also knownas Remy, is another end-to-end interactive solution thatuses machine learning to create a congestion control algo-rithm which controls packet transmission rate based onthe perceived network model by the learning algorithm[34]. The main disadvantage of TCP ex Machina is itsresource-intensive nature that results in lengthly learningtime. It takes almost forever for a single Linux machineto come up with a congestion algorithm suitable for aspecific network using TCP ex Machina. Hence, TCP exMachina requires a separate set of nodes with extensivecomputational ability to learn the networkmodel and thenthe model has to be loaded into any nodes within the net-work. Any changes in the network dynamics requires TCPex Machina to repeat the costly re-learn process.Q-learning TCP uses reinforcement learning algorithm(RL) to monitor and learn the characterstics of the net-work; however, it distinguishes itself from [12, 16–34]by covering the unique characteristics of WMN. First, itdoes not use any feedback messages unlike the major-ity of the above fairness solutions which is crucial forsaving the valuable bandwidth in WMN. Second, it usesRL algorithm to learn the characteristics of the networkusing minimal computational resources contrary to TCPex Machina or Sprout. Third, it only requires minimalchanges in the source node which is vital for practical-ity/feasibility of the design. None of [12, 15–34] addressall the unique characteristics ofWMN stated above in onesolution.3 Q-learning TCP architectureIn our approach, each TCP source is equipped with a Q-learning agent that sees the world as an MDP. Q-learningis a class of reinforcement learning algorithms (RL) firstintroduced byWatkins et al. [36]. RL algorithms are basedon the basic idea of learning via interaction. The learnermodule is called an agent, and the interactive object whichis the subject of learning is the environment. During thelearning process the time is divided into decision epochs.In each decision epoch, the agent receives network statis-tics in the form of state space variables. The agent usesthe received information to determine the state of theMDP; then, the agent takes an action via fine-tuning TCPparameters. In the next decision epoch, the environmentresponds with the new state. The learning agent uses areward function to receive feedback on the consequencesof the taken action on TCP fairness.The interaction between the agent and the environmenthelps the agent to construct a mapping of the possiblestates-actions. The agent mapping is called the policiesand shows how the agent changes its decisions based onthe different responses from the environment. As timepasses, the mapping gets more inclusive and the agentlearns almost all the possible (state, action) pairs and theirassociated reward/penalty values and can cope with anychanges in the environment. The memory of the agent ofthe possible rewards based on the (state, action) pairs iskept in a matrix calledQ. The rows ofQ represent the cur-rent state of the system, and the columns represent thepossible actions leading to the next state. At the beginningof the learning process, the learning agent does not knowabout the rules of the MDP other than the states and thepossible actions; therefore, Q is initialized to 0. After eachdecision epoch, Q is updated as in Eq. (1).new value︷ ︸︸ ︷Qt+1(st , at) = (1 − α)old value︷ ︸︸ ︷Qt(st , at)+learned value︷ ︸︸ ︷α[ r(st , at)︸ ︷︷ ︸reward observed after performing at in st+estimated future value︷ ︸︸ ︷γ maxaQt(st , at)](1)Equation (1) is called the Q-learning or Ballard rule [37].The objective of the Q-learning rule is to inform the agentof the possible future rewards along with the immedi-ate rewards of the latest action. α is the learning rate ofthe agent and determines the importance of the newlyacquired reward. α varies between 0 and 1. A factor of 0causes the agent not to learn from the latest (action, state)Arianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 4 of 15pair, while a factor of 1 makes the agent to only considerthe immediate rewards without considering the futurerewards. γ is the discount factor and determines theimportance of future rewards. γ varies between 0 and 1. A0 discount factor prohibits the agent from acquiring futurerewards, while a factor of 1 pushes the agent to only con-sider future rewards. We use a polynomial learning rate(α = 1(1+t)2)as it has a faster convergence rate [38]. A dis-count factor of γ = 0.9 is suggested by the literature toencourage the agent into a comprehensive discovery of the(action, state)map [36]. The learning process continues aslong as the network is up and running; it basically worksas a memory that can be adjusted according to networkchanges.In the following sub-sections, we present a detailedoverview of the key factors of Q-learning TCP includingstates, action set, reward function, and transition proba-bilities.3.1 StatesThe state space of our proposed Q-learning algo-rithm in each TCP source is in the form of S =(fairness index, aggressiveness index). To measure fair-ness index in each decision interval, the agent uses Jain’sfairness index as in Eq. (2):Jtk(x1, x2, ..., xn) =(i=ni=1xi)2n × i=ni=1x2i, (2)where xi is the data rate of flow i, n is the number of flowsthat are originated from or forwarded by node k, and Jtk isthe Jain’s fairness index at node k at decision epoch t. TheJain’s fairness index is a continuous number that variesbetween 0 and 1; with 0 the worst fairness index and 1 anabsolute best fairness condition. To tailor the Jain’s fair-ness for a discrete state space, we divided the [ 0, 1] inter-val to p sub-intervals [ 0, f1] , (f1, f2] , . . . , (fp−1, 1]. Insteadof using a continuous fairness index, we quantize it tohave manageable number of states. Number of states isimportant in convergence of the learning algorithm.The aggressiveness of each TCP source in each decisionepoch is measured as in Eq. (3):G(i) = number of packets originated from node itotal number of packets forwarded by node i .(3)The aggressiveness index is a continuous amount thatvaries between 0 and 1. To tailor the aggressiveness indexfor discrete state space, we divided the [ 0, 1] interval toq sub-intervals [ 0, g1] , (g1, g2] , . . . , (gq−1, 1]. As such, thestate space of the MDP is in the form of Eq. (4) with thesize of p × q.S = {(ft , gt)|ft ∈ {0, f1, . . . , fp} and gt ∈ {0, g1, . . . , gq}}(4)Choosing a suitable value for p and q is a critical task.A small p or q shrinks the state space and positivelyaffects the convergence rate; however, larger quantizationintervals disturb the transparency of the changes in thesystem to the reward function. Reward function uses thestate of the system as a decision criterion to reward orpenalize the latest (state, action) pair. Our extensive sim-ulations and testbed experimentation show that choosing3 ≤ q ≤ 4 and 3 ≤ p ≤ 4 provides the Q-learning TCPwith enough number of states to significantly increasethe fairness index of the network and convergencerate.Fairness index obviously is a good indicator of how wellthe Q-learning TCP is doing in terms of enhancing thefairness. However, fairness index is not enough to makea decision on fairness of the TCP source. Therefore, wedefine aggressiveness index to indicate if a TCP sourceis fair to other TCP sources or not. The aggressivenessindex calculates the share of the TCP source located ona specific node in the outgoing throughput of the node.A high-aggressiveness index along with a low fairnessindex in a TCP source triggers the learning agent to makechanges to the TCP parameters and to force the TCPsource to be more hospitable towards other flows. Thedesirable state for the learning agent is the state with ahigh fairness index and an aggressiveness index of Tfair .Tfair is the fairness threshold of the node. The fairnessthreshold depends on the number of flows originating andpassing through the node and the priority of each flow.As an example, if three flows are passing via a node, andtwo other flows are originating from the node, assumingthe same priority for all five flows, the fair share of thenode from network resources and the fairness thresholdis 25 . Any aggressiveness index above fairness threshold isan indication of the unfair resource allocation by the TCPsource on the node.Both fairness index and aggressiveness index are calcu-lated based on the number of packets received or trans-mitted in each decision epoch in the TCP source. As such,both variables are accessible in each node and there is noneed to get a feedback from other nodes. Let us emphasizethat the objective of our MDP is to enhance TCP fairnesscooperatively and accomplish this objective via movingtowards the goal state; therefore, choosing fairness indexand aggressiveness index as state variables of our MDP isjustified.Arianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 5 of 153.2 Action setThe Q-learning agent uses the action set to constrainTCP aggressive behavior. Findings of [1] suggested thatthe maximumTCP congestion window size plays a crucialrole in the aggressive behavior of TCP. However, putting astatic clamp on TCP congestion window size might causean under-utilization of the network resources. Therefore,to avoid any under/over-utilization of network resources,we use a dynamic clamp on TCP congestion windowsize. The learning agent dynamically changes the maxi-mum congestion window size of TCP without interferingwith the congestion control mechanism via the actionset. As such, TCP uses the standard congestion controlmechanism along with a dynamic clamp on the conges-tion window to limit any aggressive window increase. Theagent uses the action functions in Algorithm 1 to changeTCP parameters in each decision epoch.Algorithm 1Q-learning action set1: if action = decrease then2: δ = 50%3: decreases the TCP maximum window size by δ4: else if action = increase then5: increases the TCP maximum window size by δ6: else7: no change to maximum congestion window size8: end ifThe Q-learning TCP does not interfere with the conges-tion control mechanism of TCP, only changes the maxi-mumTCPwindow size; the maximumwindow size can bedecreased up to slow start threshold. In our algorithm, weused δ as 50% of the latest increase/decrease of the cur-rent TCP maximum window size. The above action set,which resembles a binary search behavior, serves perfectlyin a dynamic environment. At the beginning of the learn-ing process, the maximum congestion window size is seteither to 65536 bytes [39] or the amount allowed by thesystem. The learning agent starts searching for the opti-mized maximum TCP window size by halving the currentmaximum TCP window size. As such, the search spacedecreases into half. The agent chooses either half ran-domly due to the random nature of the search algorithmin the beginning of the learning process. The agent startsswiping the search space using the decrease( ), increase( ),and stay( ) functions and uses the reward function as theguide to the optimum state. During the learning process,the agent develops a memory and uses its memory afterconvergence as a series of policies to handle any changesin the dynamics of the system. As a result, in any state,the learning agent knows how to find its way to the opti-mum clamp. The Q-learning agent converges when allthe available action series and their associated reward arediscovered.3.3 Transition probabilitiesAnother element of theMDP is the transition probabilitieswhich are unknown in our design. When the transitionprobabilities of an MDP are unknown, RL methods suchas Q-learning are used to calculate the transition proba-bilities. P(st|st−1, a) is the transition probability from statest−1 to state st taking action a. In an MDP, the states areMarkovian and Eq. (5) holds.p(st|st−1, st−2, . . . , s0, a) = p(st|st−1, a) (5)In our proposed model, the state space is in the form ofS = (fairness index, aggressiveness index). Both fairnessand aggressiveness indices depend on the number of pack-ets transmitted or received in the most recent decisionepoch. Therefore, any state transition only depends on thelatest state of the system. As such, all states are Markovianand Eq. (5) holds for our model.3.4 Reward functionAccording to [40], an efficient reward function shouldhave the following conditions:• the reward function must have a uniform distributionfor states far from the goal state.• the reward function must point in the direction of thegreatest rate of increase of reward in a zone aroundthe goal state.Choosing a Gaussian reward function in the form ofEq. (6) complies with the above conditions. The Gaussianfunction is almost uniform in states far from the goal state,and has an increasing gradient in a belt around the goalstate that directs the learning agent towards the desirablestate.R(s′|s, a) = βe− d(s′ ,s∗)2σ2 (6)The reward function that we use for our model is inform of Eq. (7), which is a summation of fairness rewardfunction and network utilization reward function.R(s, a) = βue−d(us ,us∗ )22σ2u + βf e− d(fs ,fs∗ )22σ2f (7)us and us∗ are the the network utilization factor in statess and s∗. Network utilization factor is the accumulativethroughput of all the incoming and outgoing flows in anode. fs and fs∗ are the Jain’s fairness index in states s ands∗. s∗ is the goal state.There is always a trade off between fairness and thethroughput/efficiency of the network. In a highly effectivenetwork, the utility function is focused onmaximizing theaggregated throughput of the network which might not beoptimized based on fairness. In our scheme, we optimizeArianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 6 of 15TCP performance based on both fairness and through-put. To create a balance between the two, we use theaggressiveness index (throughput control factor) and fair-ness index (fairness control factor). The aggressive nodeshave to compromise the throughput in favor of starvednodes to increase the fairness index of the network. In ourmechanism, the reward function includes both fairnessand throughput. One has to keep in mind that optimizingTCP performance based on both fairness and throughputresults in some compromise on throughput to increase thefairness. Moreover, quality of service (QoS) can be addedto our scheme via the reward function. Putting differentweights on either throughput or fairness via changing thecoefficients in reward function (βu,βf , σu, σf ) can result indifferent levels of QoS.4 Integration of Q-learning agent and TCPThe Q-learning process interacts with TCP via the actionset. The learning agent receives the statistics of the net-work in each decision epoch and based on the memory ofthe learning process and the state of the system, the agentselects an action. The interaction between the Q-learningagent and TCP can be best described in an example.Assume that the learning agent chooses the action seriesin Fig. 1 and decreases the TCP maximum window size toa specific amount.As a result of the new clamp in each decision epoch,the learning agent receives a reward or penalty from thereward function. The learning agent tries all the possi-ble combinations during the learning process and createsan inclusive map of the action series (polices) and theirrespective rewards. After the learning process converges,any changes in the network dynamics can be handledinstantly using the memory map of the agent. As such,Q-learning TCP adapts the TCP parameters instantlybased on the changes in traffic condition and networkFig. 1 An example of the Q-learning agent interaction and TCPmaximum congestion window sizeconditions. Our proposed mechanism is presented inAlgorithm 2.Algorithm 2Q-learning TCP algorithm of node i1: action set ={decrease,increase,stay}2: absorbing state = (fgoal , ggoal)3: while not in goal state do4: for every t seconds do5: get the number of sent packet by node i6: for each flow being forwarded by node i do7: get the number of sent packets8: end for9: calculate the fairness index ft , based on (2)10: calculate the aggressiveness index gt , based on (3)11: determine the state according to (4)12: calculate the reward rt based on (7)13: update the Qmatrix according to (1)14: choose the action with maximum Q value15: take the action (inform TCP)16: end for17: end whileAs depicted in Algorithm 2, the MAC layer collectsthe number of sent packet of each flow passing throughthe node and sends them to the learning agent which islocated in TCP source every t seconds (t is the lengthof decision epoch). The learning agent uses the latestinformation to calculate the aggressiveness and fairnessindex and determines the current state of the system. Thereward function module uses the current state of the sys-tem, previous state of the system, and the latest actionto calculate the immediate reward of the agent based onthe latest (state, action) pair. When the agent obtains theimmediate reward, it updates the Q-matrix based on theBallard equation, Eq. (1). Finally, the agent chooses anaction based on the recent Q-matrix and informs TCPto adjust its maximum window size based on the chosenaction. We are using a delayed reward system because theagent has to wait for the system to settle down in a specificstate to figure out the instant reward.5 Performance evaluationsQ-learning TCP is specifically aimed toward a wirelessmesh network environment; as such, all the simulationsand testbed are designed to present the interaction of TCPand the unique characteristics of the wireless mesh set-ting. In this section, we present the numerical results ofour proposed method and demonstrate the effectivenessof our fairness mechanism compared to TCP, TCP-AP,and TCP ex Machina.We first evaluate the performance of the Q-learningTCP in a multi-hop setting in which all the nodes arelocated in the same wireless domain and participate in theoptimization process in Section 5.1., Section 5.3 presentsArianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 7 of 15the performance of Q-learning TCP over a testbed withreal data in an office environment.5.1 Chain opologyThe chain topology is a great scenario to evaluate theeffectiveness of Q-learning TCP over WMN, because itcan create a competitive environment for flows with dif-ferent number of hops which is the main feature of eachwireless mesh topology. We use Jain’s fairness index andthe flow throughput as the comparison metric param-eters. We set up a multi-hop network of three nodeslocated in a chain topology with 150 m spacing betweenneighboring nodes, Fig. 2. Each node is equipped with a802.11b network interface. Nodes 1 and 2 are equippedwith an FTP source that transmits packets to node 3. Thetransmission range is around 250 meters, and the carriersensing range is around 550 m for each wireless inter-face. The data rate for IEEE 802.11b is 2 Mbps and theRTS/CTSmechanism is on. Each TCP packet carries 1024bytes of application data.For the Q-learning scheme, we have to determine thestate space of the system and the reward function. Asmentioned in Section 3, the fairness index and aggressive-ness index are values between 0 and 1. For fairness index,we divided the [ 0, 1] interval into four sub-intervals,f = {[ 0, 0.5), [ 0.5, 0.8), [ 0.8, 0.95), [ 0.95, 1] }. For aggres-siveness index, we split [ 0, 1] interval into three sub-intervals, g = {[ 0, 0.5), [ 0.5, 0.8), [ 0.8, 1] }. We can splitboth fairness index and aggressiveness index into smallersub-intervals and provide more control for the learningagent to tune TCP parameters for more desirable results.Although having smaller intervals facilitates the learningagent decision making, an increase in number of intervalsincreases the state space size and slows down the learningprocess convergence rate. After the quantization, the statespace reduces to (8):s = {(f , g)|f = {0, 0.5, 0.8, 0.95} , g = {0, 0.5, 0.8}} (8)The above state space provides each node in the networkwith a realistic understanding of the resource allocationFig. 2 Chain topology – three nodes, two flowsof the neighboring nodes. The immediate reward func-tion that we use for our Q-learning TCP is in the formof Eq. (7). We run each simulation for 30 times for 95%confidence interval. The length of each simulation is1000 s.Figure 3 shows the throughput changes during thelearning process. At the beginning of the learning pro-cess, the Q values are all zero; therefore, the agent startsa systematic search to determine the effect of each actionon the state of the system. Because we choose a Gaussianreward function, the Q-learning agent gradually moves tostates adjacent to the goal state. The systematic searchbehavior exists during the simulation, but the range of thesearch circle diminishes as the learning process convergesto the goal state. As depicted in Fig. 3, at the beginningof the learning process, the throughput of both flows fluc-tuates. As time goes by, the fluctuation of both flowsdwindles to negligible amount. As the learning processprogresses, the agent visits each state sufficient numberof times to find the best policy (the best maximum TCPwindow size) to maximize its acquired rewards. Eventu-ally the learning agent in node 2 converges to the s =(0.95, 0.5), where 0.95 is the fairness index and 0.5 is theaggressiveness index of node 2.To investigate the convergence of the learning algo-rithm, we calculated the average learning rate, Fig. 5.The average learning rate of the process is calcu-lated as 1(E{n(s,a)}) , where E{n(s, a)} is the average num-ber of times each (state, action) pair is visited by theagent. According to [36], a deceasing average learn-ing rate is an indication of the Q-learning convergenceprocess.Figure 5 shows that the learning rate approaches 0 whichguarantees the convergence of the learning algorithm.Figure 6 shows the changes of network utilization factoras the learning progress. As depicted in Fig. 6, the net-work utility factor fluctuates widely at the beginning ofthe learning process. However, the network utility fac-tor settles to a value within the desirable range when thelearning process converges. The same pattern can be seenin Jain’s fairness index of the network, as demonstratedin Fig. 4. The wide fluctuations of the fairness index arevisible during the systematic search of the learning agentin the beginning of the learning process. As the learningprocess converges to the desirable state, the Jain’s fairnessindex of the network settles and the fluctuations becomenegligible.We compare our scheme with TCP-AP [18] and TCPex Machina [34]. We implemented TCP-AP based on thealgorithm in [18]. In TCP-AP, the sending rate is limitedbased on the changes in RTT.Figure 7 graphs the performance of TCP-AP, TCP,Q-learning TCP, and TCP ex Machina. TCP-AP per-forms closely to our learning method in terms of theArianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 8 of 15Fig. 3 Throughput changes of the flows in the chain topology of Fig. 2. Each cycle is equivalent of 40 sfairness index of the network. However, the fair resourceallocation in TCP-AP comes at the cost of network util-ity. The drop in the network utility factor in TCP-APis caused by the frequent pauses in data transmission.TCP ex Machina, on the other hand, performs similar tostandard TCP; the reason behind this behavior of TCPex Machina over the multi-hop wireless setting is thatthe learning mechanism in TCP ex Machina is optimizedfor wired settings and the re-learning process requiresa great deal of computational resources which almostis impossible to be done on the current wireless nodeswithin a reasonable amount of time. Table 1 representsthe comparison of Q-learning with TCP-AP and TCP exMachina.Fig. 4 Jain’s fairness index in the chain topology of Fig. 2. Each cycle is equivalent of 40 sArianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 9 of 15Fig. 5 Learning rate (proof of convergence) in topology of Fig. 2. Each learning cycle consists of 40 s5.2 Larger scale WMNTo evaluate our proposed algorithm further with non-static traffic pattern and a random mesh topology, wegenerate a random topology in ns2, as illustrated in Fig. 8.There exist six flows in the network; all flows are des-tined towards node 0 and are originated from nodes 8,3, 5, 7, 1, and 4. To study the performance of Q-learningTCP under non-static data traffic, we programmed all thesources to generate data for random length intervals andintermittently.As depicted in Fig. 9, the resource allocation in legacyTCP is severely unfair as nodes 8, 1, and 4 are starvingwhile nodes 3 and 7 aggressively consume the band-width. However, the Q-learning TCP pushes node 3 toFig. 6 Network utilization factor (kbytes/s)in topology of Fig. 2. Each learning cycle consists of 40 sArianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 10 of 15Fig. 7 Performance comparison between legacy TCP, Q-learning TCP, TCP-AP, and TCP ex Machinadecrease its network share and provide more transmis-sion chance for other nodes. The learning agent of node5 also indicates an unfair resource allocation and forcesnode 5 to slow down the data sending rate. On the otherhand, node 8 experiences an undergrowth of the conges-tion window size and starts to increase the sending rateas node 5 decreases the sending rate. Node 3 and node5 cooperatively provide other nodes with more sendingopportunities by decreasing the sending rate which resultsin an increase in node 8 sending rate. On the other sideof the network, node 7 consumes a bigger portion of thebandwidth compared to node 1; therefore, the Q-learningTCP forces node 7 to decrease its sending rate and provideother nodes with more sending opportunities. A compar-ison of TCP, TCP-AP, TCP ex Machina and Q-learningTCP is presented in Table 2.TCP-AP outperforms both TCP and Q-learning TCP infair resource allocation in the scenario of Fig. 8. However,the high fairness index of TCP-AP comes at the cost ofdrastic decrease in network utilization by a factor of 50%.The reason behind the extreme decrease in network util-ity factor in TCP-AP is the over-estimation of RTTs andexcessive overhead caused by feedback messages. TCP exTable 1 Network metrics parameters for different TCP variationsof Fig. 2TCP variation Jain’s fairness index Network utilityLegacy TCP 89% 509 KbpsTCP-AP 99% 83 KbpsTCP ex Machina 84% 694 KbpsQ-learning TCP 99% 468 KbpsMachina performs as poor as legacy TCP and causes oneof the flows to starve completely.To investigate the convergence of the learning process,we graphed the learning rate of the learning process fornodes 3, 5, and 7 in Fig. 10. The agent learning rate forall three nodes converges to 0 as the learning process pro-gresses. The learning rate of node 3 is higher that the twoother nodes, the reason for this behavior is that node 3has to make more changes to get to the optimum state.More changes translate into more state transitions andconsequently a higher convergence rate.5.3 TestbedTo evaluate the performance of Q-learning TCP in realworld setting, we set up a testbed in a real office environ-ment along with other network users in the office, Fig. 11.The blue nodes in Fig. 11 are the employees with their lap-top (MacBook Pro or MacBook Air) who connect to theInternet via Router R1 (extender) or R2. Routers R1 andR2 connect to the Internet through the gateway (purplenode). We add node B which is both a source and for-warder node. Node B can forward to R1 and R2; moreover,we set up a static routing table inside node B that forwardsTable 2 Network metrics parameters for different TCP variationsof scenario 8TCP variation Jain’s fairness index Network utilityLegacy TCP 75% 430 KbpsTCP-AP 97% 240 KbpsTCP ex Machina 71% 436 KbpsQ-learning TCP 83% 403 KbpsArianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 11 of 15Fig. 8 Random network topology – 10 nodespackets from node A to node C without forwarding themto R1 or R2. We implement the Q-learning mechanism asa python suite based on Algorithm 2 in node B and nodeA. The Q-learning mechanism communicates with thetransport layer via action functions in Section 3.2 in orderto tune TCP maximum congestion window size. Whileall the users over the network continue their day to dayactivity.For node A, B, and C, we use raspberry pi as wirelessnodes in our testbed. Raspberry pi [41] is a tiny affordablecomputer that can host multiple wired/wireless interfacesand can be used either as a client node, a server node, arouter, or a forwarder node based on the available appli-cation. We use FreeBSD [42] on client and server nodes(nodes A and C) andWheezy [43], another flavor of Linux,on nodes acting as router (node B). FreeBSD only supportsWiFi adapters in client mode and not host or forwardermode. As such, we use Wheezy for nodes in the mid-dle that has to play a role in forwarding/routing packets.We use TP-LINK nano USB adapters as 802.11 g WiFiinterface [44], as they are cheap and they work well withFreeBSD and Wheezy.Fig. 9 TCP flow throughput and network utility in kbytes/s for scenario of Fig. 8Arianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 12 of 15Fig. 10 Learning rate of nodes 3, 5, and 7 in scenario of Fig. 8To generate real-world traffic profile over the testbed,we use findings of Brownlee et al. in [45]. According tomeasurements of [45] over two real-life networks, Inter-net streams can be classified into two groups: dragonfliesand tortoises. Dragonflies are session with lifetime ofless than 2 s while the tortoises are the sessions withlong lifetimes, normally over 15 min. Authors of [45]showed that although the lifetime of sessions are vari-able; the lifetime distribution shape is the same for bothTortoise and dragonflies and it does not not experiencerapid changes over time. Findings of [45] are critical tothe design of Q-learning TCP and its interaction in realworld; the fact that the distribution of the lifetime of thestreams does not change rapidly over time fits well withthe characteristics of Q-learning. Based on [45], 45% ofthe streams have a lifetime of less than 2 s, 53% have a life-time between 2 s and 15 min, and the rest has life timesmore than 15 min (usually in the order of hours). We usethese findings to generate traffic on node A and B alongwith the other existing day-to-day traffic within the officenetwork.Table 3 shows the result of our measurement overthe testbed of Fig. 11. TCP Q-learning outperformsTCP Reno on node A with the big margin of 85% ofFig. 11 Testbed topologyArianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 13 of 15Table 3 A comparison between Q-learning TCP and TCP RenoTCP variation Source Throughput(Kbits/s)95%Confidenceinterval (Kbits/s)TCP Reno node A 883 less than 100TCP Reno node B 4003 less than 400Q-learning TCP node A 1549 less than 400Q-learning TCP node B 3877 less than 1000increase in the average throughput. Node B which actsas a source and a forwarder node has to compromiseits throughput to enhance the fairness of the net-work. The average throughput of node B decreasesby 3% to accommodate a 85% boost in throughput ofnode A which is a drastic increase with a minimalcompromise.The larger confidence interval in Q-leanring TCP iscaused by the changes of the maximum congestion win-dow size according to the optimal policy of Q-learningduring the discovery. As a side note, the initial con-vergence time for the testbed is a few hours; however,after the convergence, adjustment to any changes in thenetwork is almost immediate as the agent has alreadydiscovered all the possible states.6 Discussion and comparisonTo propose an effective fairness solution for TCP overwireless multi-hop networks one has to consider thedynamic nature of the environment as an importantdesign factor. Meaning that a dynamic solution thatchanges strategy based on the network condition isrequired. Therefore, the effective solution requires twocharacteristics: (a) monitoring/learning network condi-tions (b) choosing the correct strategy based on the per-ceived condition. Reinforcement learning methods meetthe design characteristics in (a) and (b). Among vari-ous reinforcement learning methods, Q-learning fits ourneeds the best, as it is a model-free technique. Meaningthat Q-learning can be used to find an optimal strategy-selection policy for any given finite state Markov deci-sion process. The above reasoning justifies our choiceof fairness solution for TCP over wireless multi-hopsettings.The main purpose of Q-learning in this paper is todecide which kind of action to take in the next time slot,i.e., increase, decrease or maintain the maximum conges-tion window size. Note that the best application of Q-learning is to learn action-reward functions for stationarysettings, which can be proved to converge. It is true thatQ-learning can still get results in non-stationary environ-ment, such as wireless settings, but the Q-learning agentwill take more time to be aware of the changes. Due tothe time-varying network conditions, sometimes rapidly,the stationary assumption cannot always hold and it canmake Q-learning less suitable for wireless networks. How-ever, there are ways to make sure that the convergencerate stays within an acceptable range for dynamic environ-ments. Using a learning rate of α = 1(1+t)2 brings down theconvergence rate to a polynomial in 1(1−γ ) , where γ is thediscount factor [38]. We use a learning rate of α = 1(1+t)2to ensure that our proposal complies with the dynamicnature of the environment. We have to emphasize thatonce the Q-learning TCP converges, the agent does notneed to re-learn the environment as it has already discov-ered all the state-action pairs and their associated rewardsand can cope with any changes in the environment. In theevent of a drastic change in the environment, a fast con-vergence rate helps the Q-learning to adjust its memoryfairly quickly.Another consideration while using Q-learning for anyscenario is the computational overhead. Assuming thatin a specific scenario action ai alleviates the aggressivebehavior of a specific flow while keeping the throughputat its maximum possible, and τ iteration is needed beforethe algorithm converges. Then the overhead of the actionto the learning node is:Overhead = 1n − 1τt=1ni=1,i=jPi(t)O(ai) (9)where Pi(t) is the probability of choosing action i atiteration t and depends on the values in the Q matrix,and the reward function. O(ai) is the overhead of per-forming action ai and τ is the convergence time. n isthe number of states in the underlying MDP. Based on[38], in a Q-learning scenario with a polynomial learn-ing rate, the convergence time τ depends on coveringtime L. Covering time indicates the number of iterationsneeded to visit all action-state pairs with the probabil-ity of at least 0.5 starting from any pair. The convergencetime τ is in the order of 	(L2+ 1ω + L 11−ω)with the small-est amount at ω = 0.77. In our fairness mechanism,we have limited number of states and in each state theagent has three actions to choose from; as such, bothcovering time and convergence time are tractable in ourmechanism.Comparing the Q-learning TCP with other well-knownexisting fairness methods [8, 46, 47], the Q-learning TCPdoes not incur any overhead to the network with theexpense of extra computation at each node. The focus ofLRED [8] is on enhancing TCP throughput over WMNand fairness enhancement is not one of the design objec-tives. However, the pacing mechanism of LRED enhancesthe fairness as a side-effect at the cost of excessive trans-mission delay. The extra transmission delay in LREDpacing mechanism alleviates the hidden terminal issue;however, the imposed delay is fixed in size and is notadjustable to the dynamic nature of the WMN. NREDArianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 14 of 15uses a dropping mechanism to decrease the competi-tion and provide more resources for the starved flows.However, the dropping probabilities are calculated andbroadcasted constantly which incurs a heavy overheadto the shared medium. TCP-AP uses the received signalstrength indicator (RSSI) to infer information regard-ing the hidden terminal issue; however, RSSI causes anover-estimation of transmission delay in its pacing mech-anism and decreases the TCP throughput drastically. Assuch, TCP-AP still requires feedback messaging from theneighboring nodes for hidden terminal distance calcula-tions. TCP ex Machina, another comparable mechanism,requires excessive computational resources for its conges-tion control mechanism optimization which is not com-patible with current network infrastructure resources.Our method achieves the fairness enhancement of TCPat a cost of reasonable extra computation of the machinelearning approach in each node. In WMN that theshared medium is extremely valuable, flooding the net-work with excessive feedback messages or under-utilizingthe links with excessive non-dynamic transmission delaysto enhance the fairness is not very cost efficient. How-ever, Q-learning TCP trades the computational simplicityin each node for TCP fairness. In a mesh setting, since themobility of each node is veryminimal, increasing the com-putational capacity of the nodes is not very costly. A briefcomparison of LRED, NRED, TCP-AP, TCP ex Machina,and Q-learning TCP is presented in Table 4.It is noteworthy to mention that the complexity of Q-learning TCP is polynomial with the number of statesand the convergence rate is tractable with a suitable statespace size as confirmed with the simulation and testbedexperiments.Our findings confirm that in a wireless multi-hopsetting, each TCP source has to cooperate with others toensure a fair share of network resources for all end-users.TCP allocates resources with the assumption of an inclu-sive knowledge of the network topology and a reliablepermanent access to the medium for all the end-users.However, in a wireless mesh setting, the end-userknowledge of the network topology is partial and theaccess to the medium is intermittent. As such, TCP needsto collect information about other nodes to compensatefor the short comings of the underneath layers. The learn-ing agent provides the TCP source with insight on existingcompetition for network resources from other nodes. Theinsight provided by the learning agent compensate forthe unfair behavior of the MAC and TCP in the wire-less multi-hop environment by suppressing the aggressiveresponse of TCP. It is noteworthy that Q-learning TCPinter-works well with any variation of TCP on the otherend-point because the changes to the TCP protocol stackare only in the sender side and the learning mechanismdoes not need any feedback from the receiver. The Q-learning TCP only relies on the information collected bythe learning agent.7 ConclusionsWe have proposed a cross-layer monitoring and learn-ing mechanism for fair resource allocation of TCP overWMN that uses the information obtained from the MAClayer to optimize TCP parameter to enhance the end toend fairness within a network. The learning agent usesthe local fairness index and the aggressiveness index ofeach node to decide if the node is starving or abusing thenetwork resources. We have used a reward function toguide the learning agent in taking correct actions whicheventually allows us to solve the fairness problem in a dis-tributed manner. We have compared our learning methodwith legacy TCP and TCP-AP, and TCP ex Machina viaextensive ns2 simulations. Simulation results have demon-strated the superiority of our proposed method withinwireless network. Moreover, we have studied the perfor-mance of Q-learning TCP in a testbed. Testbed measure-ments have proved that Q-learning TCP can be a greatcandidate for transport protocol over current wirelessmulti-hop networks with minimal changes only in TCPsource.Table 4 A comparison between Q-learning TCP and other well-known TCP solutionsFairness solution Fairness enhancement Throughput enhancement DisadvantageLRED [8] Slight increase 5 to 30% increase Overhead caused by broadcastmessages and fixed transmissiondelayNRED [46] Effective increase (Jain’s fairnessindex of 99% in a chain topology)Up to 12% increase Excessive overhead caused bybroadcast messages (over 60%)TCP-AP [47] Effective increase (Jain’s fairnessindex of 99% in a chain topology)Drastic decrease (up to 50%) Reliance on RSSI and excessivetransmission delayTCP exMachina [47] Decrease (Jain’s fairness index of84% in a chain topology)Slight increase Excessive learning time andcomputational resourcerequirementQ-learning TCP Effective increase (Jain’s fairnessindex of 99% in a chain topology)Slight decrease Medium computational overheadArianpoo and Leung EURASIP Journal onWireless Communications and Networking  (2016) 2016:278 Page 15 of 15Competing interestsThis work was supported in part by the Canadian Natural Sciences andEngineering Research Council (Grants RGPIN-2014-06119 andRGPAS-462031-2014), and by the National Natural Science Foundation ofChina (Grant 61671088).Received: 18 April 2016 Accepted: 11 November 2016References1. M Franceschinis, et al, Measuring tcp over wifi: a real case. 1st Workshopon Wireless Network Measurements (Winmee), Riva Del Garda, Italy (2005)2. B Francis, V Narasimhan, A Nayak, I Stojmenovic, in 32nd InternationalConference on Distributed Computing SystemsWorkshops (ICDCSW 2012).Techniques for enhancing TCP performance in wireless networks, (2012),pp. 222–2303. G Jakllari, S Eidenbenz, N Hengartner, SV Krishnamurthy, M Faloutsos, Linkpositions matter: a noncommutative routing metric for wireless meshnetworks. IEEE Trans. Mob. Comput. 11(1), 61–72 (2012)4. DJ Leith, Q Cao, VG Subramanian, Max-min Fairness in 802.11 MeshNetworks. IEEE/ACM Trans. Networking. 20(3), 756–769 (2012)5. XM Zhang, WB Zhu, NN Li, DK Sung, TCP congestion window adaptationthrough contention detection in ad hoc networks. IEEE Trans. Veh.Technol. 59(9), 4578–4588 (2010)6. J Karlsson, A Kassler, A Brunstrom, in IEEE International Symposium on aWorld of Wireless, Mobile andMultimedia Networks Workshops, (WoWMoM2009). Impact of packet aggregation on TCP performance in wirelessmesh networks, (2009), pp. 1–77. R De Oliveira, T Braun, A smart TCP acknowledgment approach formultihop wireless networks. IEEE Trans. Mob. Comput. 6(2), 192–205(2007)8. Z Fu, H Luo, P Zerfos, S Lu, L Zhang, M Gerla, The impact of multihopwireless channel on TCP performance. IEEE Trans. Mob. Comput. 4(2),209–221 (2005)9. K Nahm, A Helmy, C-C Jay Kuo, in Proceedings of the 6th ACM InternationalSymposium onMobile Ad Hoc Networking and Computing. MobiHoc ’05. TCPover multihop 802.11 networks: issues and performance enhancement(ACM, New York, NY, USA, 2005), pp. 277–28710. Y Su, P Steenkiste, T Gross, in 16th International Workshop on Quality ofService (IWQoS 2008). Performance of TCP in multi-Hop access networks,(2008), pp. 181–19011. IF Akyildiz, X Wang, W Wang, Wireless mesh networks: a survey. ELSEVIERComput. Netw. 47(4), 445–487 (2005)12. A Al-Jubari, M Othman, B Mohd Ali, N Abdul Hamid, An adaptive delayedacknowledgment strategy to improve TCP performance in multi-hopwireless networks. Wirel. Pers. Commun. 69(1), 307–333 (2013)13. CY Huang, P Ramanathan, Network layer support for gigabit tcp flows inwireless mesh networks. IEEE Trans. Mob. Comput. 14(10), 2073–2085(2015)14. Y Cai, S Jiang, Q Guan, FR Yu, Decoupling congestion control from tcp(semi-tcp) for multi-hop wireless networks. EURASIP J. Wirel. Commun.Netw. 2013(1), 1–14 (2013)15. K Xu, M Gerla, L Qi, Y Shu, in Proceedings of the 9th Annual InternationalConference onMobile Computing and Networking (ACMMobiCom 2003).Enhancing TCP fairness in ad hoc wireless networks using neighborhoodRED (ACM, New York, NY, USA, 2003), pp. 16–2816. J Ye, J-X Wang, J-W Huang, A cross-layer TCP for providing fairness inwireless mesh networks. Int. J. Commun. Syst. 24(12), 1611–1626 (2011)17. A Raniwala, D Pradipta, S Sharma, in 26th IEEE International Conference onComputer Communications (INFOCOM 2007). End-to-end flow fairnessover IEEE 802.11-based wireless mesh networks, (2007), pp. 2361–236518. SM ElRakabawy, C Lindemann, A practical adaptive pacing scheme forTCP in multihop wireless networks. IEEE/ACM Trans. Networking. 19(4),975–988 (2011)19. H Xie, A Boukerche, AAF Loureiro, in IEEE International Conference onCommunications (ICC 2013). TCP-ETX: A cross layer path metric for TCPoptimization in wireless networks, (2013), pp. 3597–360120. B Vamanan, J Hasan, T Vijaykumar, Deadline-aware datacenter TCP. ACMSIGCOMM Comput. Commun. Rev. 42(4) (2012)21. N Arianpoo, P Jokar, VCM Leung, in International Conference on Computing,Networking and Communications (ICNC 2012). Enhancing TCP performancein wireless mesh networks by cross layer design, (2012), pp. 177–18122. V Gambiroza, B Sadeghi, EW Knightly, in Proceedings of the 10th AnnualInternational Conference onMobile Computing and Networking.MobiCom’04. End-to-end performance and fairness in multihop wireless backhaulnetworks (ACM, New York, NY, USA, 2004), pp. 287–30123. S Shioda, H Iijima, T Nakamura, S Sakata, Y Hirano, T Murase, in Proceedingsof the 5th ACMWorkshop on Performance Monitoring andMeasurement ofHeterogeneousWireless andWired Networks. PM2HW2N ’10. ACK pushoutto achieve TCP fairness under the existence of bandwidth asymmetry,(2010), pp. 39–4724. C Cicconetti, IF Akyildiz, L Lenzini, FEBA: A bandwidth allocation algorithmfor service differentiation in IEEE 802.16 Mesh Networks. IEEE Trans. Netw.17(3), 884–897 (2009)25. C Cicconetti, IF Akyildiz, L Lenzini, in Proceedings of 26th IEEE InternationalConference on Computer Communications (ICC 2007). Bandwidth balancingin multi-channel IEEE 802.16 wireless mesh networks, (2007),pp. 2108–211626. T-C Hou, C-W Hsu, C-S Wu, A delay-based transport layer mechanism forfair TCP throughput over 802.11 multihop wireless mesh networks. Int. J.Commun. Syst. 24(8), 1015–1032 (2011)27. S Fowler, M Eberhard, K Blow, Implementing an adaptive TCP fairnesswhile exploiting 802.11e over wireless mesh networks. Int. J. PervasiveComput. Commun. 5, 272–294 (2009)28. T Li, DJ Leith, V Badarla, D Malone, Q Cao, Achieving end-to-end fairnessin 802.11e based wireless multi-hop mesh networks withoutcoordination. Mob. Networks Appl. 16(1), 17–34 (2011)29. K Xu, N Ansari, Stability and fairness of rate estimation-based AIADcongestion control in TCP. IEEE Commun. Lett. 9(4), 378–380 (2005)30. KLE Law, W-C Hung, Engineering TCP transmission and retransmissionmechanisms for wireless networks. Pervasive Mob. Comput. 7(5), 627–639(2011)31. S Ha, I Rhee, L Xu, Cubic: a new tcp-friendly high-speed tcp variant. ACMSpec. Interes. Group Oper. Syst. Rev. (ACM SIGOPS 2008). 42(5), 64–74(2008)32. CP Fu, SC Liew, TCP Veno: TCP enhancement for transmission overwireless access networks. IEEE J. Sel. Areas Commun. 21(2), 216–228 (2003)33. K Winstein, A Sivaraman, H Balakrishnan, in 10th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 13). Stochasticforecasts achieve high throughput and low delay over cellular networks,(2013), pp. 459–47134. K Winstein, H Balakrishnan, in ACM Special Interest Group on DataCommunication Conference 2013 (SIGCOMM 2013). TCP ex machina:computer-generated congestion control, (2013), pp. 123–13435. Q Jabeen, F Khan, S Khan, MA Jan, Performance improvement inmultihop wireless mobile adhoc networks. J. Appl. Environ. Biol. Sci.(JAEBS). 6, 82–92 (2016)36. CCH Watkins, P Dayan, Technical note: Q-Learning. Mach. Learn. 8(3–4),279–292 (1992)37. SD Whitehead, DH Ballard, Learning to perceive and act by trial and error.Mach. Learn. 7(1), 45–83 (1991)38. E Even-Dar, Y Mansour, Learning rates for q-learning. J. Mach. Learn. Res.5, 1–25 (2004)39. V Jacobson, R Braden, D Borman, TCP Extensions for High Performance.RFC 1323 (1999). doi:10.17487/RFC1323. http://www.rfc-editor.org/info/rfc132340. L Matignon, GJ Laurent, N Le Fort-Piat, in IEEE/RSJ International Conferenceon Intelligent Robots and Systems. Improving reinforcement learningspeed for robot control, (2006), pp. 3172–317741. RaspberryPi: Model B. https://www.raspberrypi.org/42. The FreeBSD Project. https://www.freebsd.org/43. Debian Wheezy. https://www.debian.org/releases/wheezy/44. TPLINK: 150Mbps Wireless N Nano USB adapter. http://www.tp-link.com/lk/products45. N Brownlee, KC Claffy, Understanding internet traffic streams: dragonfliesand tortoises. IEEE Commun. Mag. 40(10), 110–117 (2002)46. K Xu, M Gerla, L Qi, Y Shu, in Proceedings of the 9th Annual InternationalConference onMobile Computing and Networking.MobiCom ’03.Enhancing tcp fairness in ad hoc wireless networks using neighborhoodred (ACM, New York, NY, USA, 2003), pp. 16–2847. SM ElRakabawy, C Lindemann, A practical adaptive pacing scheme forTCP in multihop wireless networks. IEEE/ACM Trans. Networking. 19(4),975–988 (2011)


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items