UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Frame allocation for smart phone based games using clouds Sandhu, Vikramjit Singh 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2014_september_sandhu_vikramjit.pdf [ 461.25kB ]
Metadata
JSON: 24-1.0167449.json
JSON-LD: 24-1.0167449-ld.json
RDF/XML (Pretty): 24-1.0167449-rdf.xml
RDF/JSON: 24-1.0167449-rdf.json
Turtle: 24-1.0167449-turtle.txt
N-Triples: 24-1.0167449-rdf-ntriples.txt
Original Record: 24-1.0167449-source.json
Full Text
24-1.0167449-fulltext.txt
Citation
24-1.0167449.ris

Full Text

Frame Allocation for Smart PhoneBased Games Using CloudsbyVikramjit Singh SandhuB.Eng., The University of Victoria, Victoria, Canada, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF APPLIED SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)May 2014c© Vikramjit Singh Sandhu 2014AbstractThis thesis considers exploiting the channel dynamics and the cloud state information for op-timal frame allocation of video frames between a cloud gaming server and a mobile user. Suchscheduling offloads the computational workload of running graphically intensive applicationsfrom the smart-phone onto the cloud, thereby extending the battery life of the smart-phone.The objective is to achieve a trade-off between the number of high resolution and low resolu-tion frames sent, subject to delivering a minimum total number of frames within a pre-defineddeadline. To tackle this objective, the trade-off has been formulated as a Markov DecisionProcess. It is proved that the optimal frame allocation policy for transmitting video framesfrom the cloud is monotone in both the number of frames transmitted as well as the time takento transmit those frames. Finally, a simulation based stochastic optimization algorithm is pre-sented which exploits the monotone structure of the optimal policy - the algorithm estimatesthe optimal policy without knowledge of the transition probabilities.iiPrefaceThis work is based on research performed at the Electrical and Computer Engineering depart-ment of the University of British Columbia by Vikramjit Singh Sandhu, under the supervisionof Dr. Vikram Krishnamurthy.The author of this thesis was responsible for all aspects of the research performed in thisthesis including problem formulation, simulations, proofs as well as writing the manuscript.Dr. Krishnamuthy provided advice and technical help in the proofs as well as correcting themathematical notation involved.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Modern Gaming on Smart Phones . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Cloud Based Gaming for Smart Phones . . . . . . . . . . . . . . . . . . . . . . . 21.3 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4 Smart Phone to Cloud Gaming Model . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Main Results and Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Frame Allocation Policy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.1 State Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Action Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.3 Transition Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4 Rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18ivTable of Contents2.4.1 Terminal Reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.4.2 Instantaneous Rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 Optimality of Monotone Frame Allocation Policies . . . . . . . . . . . . . . . 203.1 Stochastic Dynamic Programming Formulation . . . . . . . . . . . . . . . . . . . 203.2 Monotone Structure of the Optimal Frame Allocation Policies . . . . . . . . . . 214 Implementation and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Reinforcement Learning Algorithm that Exploits Monotone Structure . . . 256 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.1 Aggressive Policy Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.2 Neutral Policy Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39AppendicesA Proof of Theorem 1 and Theorem 2 for Aggressive Rewards . . . . . . . . . 44A.1 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45A.2 Proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48B Proof of Theorem 1 and Theorem 2 for Neutral Rewards . . . . . . . . . . . 51B.1 proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51B.2 proof of Theorem 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52vList of Tables1.1 Battery life comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 AWS variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Google app engine variability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46.1 Normalized performance loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31viList of Figures1.1 Sequential decision problem model . . . . . . . . . . . . . . . . . . . . . . . . . . 61.2 Decision epochs and Periods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.3 System model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96.1 Monotone policy structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286.2 Monotone policy structure for different error probabilities . . . . . . . . . . . . . 296.3 Average reward comparison:Bellman . . . . . . . . . . . . . . . . . . . . . . . . . 306.4 Average reward comparison:Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . 316.5 Stochastic approximation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 326.6 Stochastic approximation with perturbation . . . . . . . . . . . . . . . . . . . . . 336.7 Neutral reward monotone policy structure . . . . . . . . . . . . . . . . . . . . . . 346.8 Neutral reward monotone policy structure for different error probabilities . . . . 356.9 Neutral reward average reward comparison:Bellman . . . . . . . . . . . . . . . . 366.10 Neutral reward stochastic approximation algorithm . . . . . . . . . . . . . . . . . 36viiAcknowledgementsI owe the deepest gratitude to my Supervisor, Dr. Vikram Krishnamurthy, for his constantsupport and encouragement. This thesis would not be possible without his guidance and advice.I would like to thank my friends and colleagues in the lab who have always been happy todiscuss concepts and ideas and for providing a friendly and productive environment.I would also like to thank NSERC for funding my research.viiiDedicationTo my wife, who has always been there for me.ixChapter 1IntroductionIn this chapter, Section 1.1 discusses the limitations of modern mobile devices such as smartphones with respect to playing graphically complex games. Section 1.2 explores how the powerof cloud computing may be used to improve the gaming experience of smart-phone users. Section1.3 provides a brief overview of Markov Decision Processes (MDP). Section 1.4 describes thesystem model for cloud based smart phone gaming. Section 1.5 lists the main results andcontributions in this thesis.1.1 Modern Gaming on Smart PhonesSmart-phones have had a huge impact on modern society. Their ability to package tremendousprocessing power in a lightweight and mobile device has led them to be used for a large number ofpurposes apart from being used as phones. For example, market research conducted by NielsenMobile shows more than 40 million mobile subscribers in the US and hundreds of million moreglobally, access the web through their devices [1]. The popularity of mobile video games hasgrown tremendously over the last few years. According to a report from comScore [2], thenumber of mobile game downloads hit 8.5 million in November 2008 [1]. Modern smart phoneshave encouraged users to play graphically rich games, which in the past were limited to PCs[3]. Although present day phones have a wide variety of games available for gamers, their rangeis limited by the processing power and battery life of the phone. Graphically complex gamesdrastically increase the digital workload of the mobile device. The overall digital workload ofeven a relatively old mobile phone (mid 1990s) is in the order of 100 giga operations per secondfor which only 1 Watt of power is available, after taking away the power requirements for othertasks like RF transceivers etc [4]. The very feature that has made the smart phone such anattractive device i.e. its small size, limits the battery size. Modern games require tremendous11.2. Cloud Based Gaming for Smart PhonesTable 1.1: Battery life comparison [5] [6]Device Model Video Streaming lifetime Video Game lifetimeMacBook Air 5 hrs 45 mins 1 hr 50 minsGalaxy S4 5 hrs 1 hrHTC One 4 hrs 20 mins 52 minscomputation power on the platform which is running them. This is usually in the form ofsophisticated graphics cards which are not present in modern smart-phones yet. Although thesteady miniaturization of electronic devices will someday overcome the limitations of size, a moredifficult problem to tackle is that of battery life. Battery life is a major limitation in graphicallyintensive applications such as 3-D video games. One option to minimize battery consumptionon the hand-held device, is to offload computation intensive tasks to a cloud and later transmitthe results back to the device. This is particularly useful because smart-phones consume far lessbattery energy streaming HD video compared to running modern video games. To illustratethis important point, table 1.1 [5], [6] compares the battery life while playing computationallyintensive games versus streaming video on modern smart phones/laptops.1.2 Cloud Based Gaming for Smart PhonesThe variety and duration of games playable on mobile devices is affected by the limitations ofsuch devices, namely battery life which has led to alternative approaches being explored [3].One such approach requires executing the appropriate gaming engines on a remote location,and streaming the resulting video to the client devices. This approach is being explored for PCbased games [7] and a similar technique may be used for smart phone type devices [3].Traditionally, games like World of Warcraft (WoW), a popular Role Playing Game (RPG)have been hosted on dedicated servers. Dedicated servers simulate game worlds without sup-porting direct input or output, except that required for their administration [8]. Players mustconnect to the server with separate client programs in order to see and interact with the game.However, dedicated servers cost money to run and moreover are inefficient for games which canonly support low player counts [8]. Because of the cost in supporting on-line games and theunpredictable load on servers, companies are moving toward sharing infrastructure for game21.2. Cloud Based Gaming for Smart Phoneshosting. With the advent of commercial on-demand computing architecture like cloud comput-ing, it is becoming possible to statistically multiplex server resources across a range of diverseapplications, thus reducing the overall hardware costs to run them [9].Cloud computing has rapidly gained popularity as a flexible computing model where theuser’s only pay for the services that they use. Cloud servers [10, 11] have been studied toprovide different types of services. These services may be in the form of software or hardware.Cloud computing refers to both the applications delivered as services over the Internet and thehardware and systems software in the data centres that provide those services. The servicesthemselves can be divided into the following [12, 13]:1. Infra-structure as a Service (IaaS): Provides the distributed multi-site physical compo-nents to support cloud computing.2. Platform as a Service (PaaS): Provides computational resources via applications andservices that can be developed and hosted.3. Software as a Service (SaaS): Provides applications services using a cloud platform, ratherthan providing cloud features themselves.Cloud Computing offers to application providers the possibility to deploy a product as a SaaSand to scale it on demand, without building or provisioning a data center. With respect to thisdefinition, cloud gaming can be viewed as offering games as applications through the cloud togamers [13].Hosting a game on a cloud offers various advantages over a dedicated server. The first is cost.Since the number of players vary with time for most games, it would be more cost efficientto have a gaming environment that can scale both in cost and hardware with the number ofplayers. This makes it far more economical for game providers to host their games on thecloud. Cloud Servers can be scaled up and down with minimal fuss, using software only. Youcan increase or decrease RAM, Hard Drive space and even CPU power all dynamically using asoftware interface. Dedicated Servers require fussing with hardware to scale them up or downwhich is not as easy as reconfiguring a cloud[14]. A cloud is also more reliable than a dedicatedserver because it offers far more redundancy. Even if one of the connected servers goes down,31.2. Cloud Based Gaming for Smart PhonesTable 1.2: Cloud variability for Amazon Web Services [16]Service Min Q1 Median Q3 Max Mean SDEC2 [s]Deployment Latency 57.00 73.59 75.70 78.50 122.10 76.62 5.17SDB [ms]Query Response Time 28.14 31.76 32.81 33.77 85.40 32.94 2.39Update Latency 297.54 342.52 361.97 376.95 538.37 359.81 26.71Table 1.3: Cloud variability for Google App Engine’s Cloud Services [16]Service Min Q1 Median Q3 Max Mean SDPython Runtime [ms] 1.00 284.14 302.31 340.37 999.65 314.95 76.39Memcache [ms]Get 45.97 50.49 58.73 65.74 251.13 60.03 11.44Put 33.21 44.21 50.86 60.44 141.25 54.84 13.54Response 3.04 4.69 5.46 7.04 38.71 6.64 3.39the other servers will maintain hosting [15]. The main disadvantage of using a cloud over adedicated server to host an applications is the increased performance variability of the cloud.Its is shown in [16] that the impact of this performance variability depends on the applicationand that performance variability can be an important factor in cloud provider selection. Table1.2 shows the variability in the response time for Amazon Web Services (AWS). The ElasticCloud 2 (EC2) deployment latency is the time it takes to start an instance of the service, fromthe time startup is initiated to the time the instance is available. Amazon Simple Data Base(SDB) Query response time is the time it takes to execute a ”GetAttributes” function thatreturns 100 attributes. the update latency is the time it takes for the updates resulting froma “PutAttributes” operation to be available to a subsequent ”GetAttributes” operation [16].Similarly, table 1.3 shows the variability in the response time of Google App Engine’s cloudservices. The Python runtime is the time it takes to calculate the 27th Fibonacci number inthe Python Runtime Environment. The “Get Time” service is the time it takes to get 1 MB ofdata from memcache. “Put time” is the time it takes to put 1 MB of data in memcache. The“Response time” is the round trip time to request and receive 1 byte of data from cache. Thisis analogous to “Get Time”, but for a smaller chunk of data [16].There are already some existing cloud based gaming models in the market like OnLive and41.2. Cloud Based Gaming for Smart PhonesStreamMyGame [17]. Advanced Micro Devices (AMD) has been working on specialized cloudarchitecture to support online gaming [18]. Recently, Sony has just announced PlayStationNow, a brand name of Gaikai (the cloud gaming technology it purchased in June of 2012), aservice that will bring streaming PlayStation games not only to PS4, but also PS3, PlayStationVita, and even televisions, tablets, and smart phones [19].However, executing the gaming engine on a remote location and then streaming the resultsas a video stream introduces a new set of challenges. As opposed to conventional server-client gaming architecture, where the game is executed right on the client, the new approachintroduces the possibility of significantly higher response time, from the time a gaming commandis issued on a mobile device, to the time the video is streamed back to the device [3]. It becomesimportant to model the Quality of Experience (QoE) of the user when running the game engineon a cloud server.Several approaches have been made to model the quality of experience [13, 20–23, 23–25]for streaming video which include the effect of wireless networks on video delivery [26, 27].Conventional video quality metrics like PSNR cannot be directly applied to the user’s gamingquality of experience since gaming is a highly interactive application in which response timeto the user’s command is very important [1]. Similarly factors affecting PC user’s gamingexperience [9, 28–30] cannot accurately capture the gaming experience for a cloud server basedapproach. In summary, an QoE measure for cloud based gaming must include latency as afactor along with other parameters like video resolution, frame rate etc. [1, 3, 13, 31]. Thisthesis also uses a criteria in which latency along with the video resolution play an importantpart.51.3. Markov Decision Process1.3 Markov Decision ProcessFigure 1.1: Symbolic representation of a sequential decision problem [32]In a probabilistic sequential decision making model, at a specified point in time, a decisionmaker, agent, or controller observes the state of the system and based on this state, the decisionmaker chooses an action. The action produces two results; the decision maker receives animmediate reward (or incurs an immediate cost), and the system evolves to a new state ata subsequent point in time according to a probability distribution determined by the actionchoice. At this subsequent point in time, the decision maker faces a similar problem, but nowthe system may be in a different state and there may be a different set of actions to choosefrom [32]. The key ingredients [32] of a sequential decision model are the following:1. A set of decision epochs2. A set of system states3. A set of available actions4. A set of state and action dependent immediate rewards or costs5. A set of state and action dependent transition probabilitiesFigure 1.2: Decision epochs and Periods [32]61.3. Markov Decision ProcessDecisions are made at points of time referred to as decision epochs. We denote the setof decision epochs by T . Decision epochs may be discrete or continuous. In discrete timeproblems, time is divided into periods or stages. Models are formulated so that a decisionepoch corresponds to the beginning of a period. For discrete time problems, the set of decisionepochs are finite i.e. T ∈ {1, 2, ..., N} for some integer N <∞. When N is finite, the decisionproblem is called a finite horizon problem. In finite horizon problems, decisions are not madeat decision epoch N . Consequently, the last decision is made at decision epoch N − 1 [32]. Ateach decision epoch, the system occupies a state. The set of all possible states is denoted byS. If, at some decision epoch, the decision maker observes the system in state s ∈ S, he maychoose action a from the set of allowable actions in state s,As. We assume that S and As donot vary with the time t [32]. As a result of of choosing action a ∈ As at decision epoch t,1. the decision maker receives a reward rt(s, a)2. the system state at the next decision epoch is determined by the probability distributionpt(.|s, a).The real valued function rt(s, a) defined for s ∈ S and a ∈ As denote the value at time t ofthe reward received at time t. When positive, rt(s, a) may be regarded as income, and whennegative as cost. When the reward depends on the state of the system at the next decisionepoch, rt(s, a, j) denotes the value at time t of the reward received when the state of the systemat decision epoch t is s, action a ∈ As is selected, and the system occupies state j at decisionepoch t + 1 [32]. The expected value of the reward at decision epoch t may be evaluated bycomputingrt(s, a) =∑j∈Srt(s, a, j)pt(j|s, a) (1.1)In (1.1), the non-negative function pt(j|s, a) denotes the probability that the system is in statej ∈ S at time t + 1, when the decision maker chooses action a ∈ As, in state s at time t.The function pt(j|s, a) is called the transitional probability function. Many system transitionsmight occur in the time period between decision epoch t and decision epoch t+ 1. The modelis formulated so that transitions which occur between decision epochs do not influence the71.4. Smart Phone to Cloud Gaming Modeldecision maker [32]. We assume∑j∈Spt(j|s, a) = 1. (1.2)In a finite horizon Markov decision process, no decision is made at decision epoch N . Con-sequently, the reward at this time point is only a function of the state. Its is denoted by rN (s)and is also called the terminal reward. The collection of objects {T,S,As, pt(j|s, a), rt(s, a, j)}is referred as a Markov decision process. The qualifier ”Markov” is used because the transitionprobability and reward functions depend on the past only through the current state of thesystem and the action selected by the decision maker in that state [32].This section concludes by giving the definitions for a Decision rule and a Policy. A decisionrule prescribes a procedure for action selection in each state at a specified decision epoch. Adeterministic Markovian decision rule is a function dt : S → As, which specify the action choicewhen the system occupies state s at decision epoch t. For each s ∈ S, dt(s) ∈ As [32]. Thisdecision rule is said to be Markovian (memoryless) because it depends on previous system statesand actions only through the current state of the system, and deterministic because it choosesan action with certainty. A policy specifies the decision rule to be used at all decision epochs.It provides the decision maker with a prescription for action selection under any possible futuresystem state or history. A policy pi is a sequence of decision rules, i.e., pi = (d1, d2, ...., dN−1)[32].1.4 Smart Phone to Cloud Gaming ModelIn a cloud based gaming model, a user’s request is sent to the cloud and the results are streamedback as High Definition (HD) video to the user’s device [1]. Specifically the following scenariooccurs: The user sends a command via their mobile device to the cloud server that computesthe result of the user’s action as a sequence of video frames. These video frames are then sentback to the user where they are rendered by his device. However, this now presents a new setof challenges. Video games are delay sensitive in nature. User’s dislike a jerky video framesequence, but prefer a high resolution video sequence over lower resolution as it is visually moreappealing. These two objectives, transmitting only high resolution frames and at the same81.4. Smart Phone to Cloud Gaming ModelFrame Allocation PolicyCloud Response Time = TcQueue Length = L Queue Delay = TqQUEUE VIRTUAL MACHINEUplink Time = TuDownlink Time = TdRound Trip Time (RTT) = Tu + TdMOBILE DEVICE BASE STATIONVideo StreamVideo StreamControl SignalControl SignalError-free PacketPacket in ErrorCLOUDFigure 1.3: Smart Phone to Cloud Data Flow Model.time ensuring that a minimum number of frames reach the user within a time constraint, areconflicting in nature. The more high resolution frames sent, the more the time required to sendthem and the greater the chance of receiving less than the minimum number of frames.Traditionally, games have been hosted on dedicated servers in the form of dedicated com-puters with a large computing resource. However, cloud based gaming is a new paradigm thatdiffers from conventional server based gaming services. Unlike a traditional server, the clouddoes not have resources dedicated to any one client. A dedicated server would be able to actupon a client’s request the moment it receives it, whereas a cloud would only be able to actupon the request once all pending requests of other clients have been handled. If we thinkof a cloud based server as a queue, the time before the cloud server is able to respond to theuser’s request depends on the pending requests already in the cloud’s queue. This delay furtherreduces the time during which the cloud server must respond in a meaningful manner to theuser’s request.In this thesis, we consider optimizing the frame allocation policy between the cloud and themobile device. The frame allocation policy maps the system state (number of frames transmit-ted and time elapsed) to a transmission action, namely transmit a low or high resolution frame.91.5. Main Results and OrganizationThe goal is to compute the transmission policy that optimizes the user’s gaming experience bysending the largest number of high resolution frames in the given time.Fig. 1.3 shows the block diagram of the system model being considered. The mobiledevice sends a control signal via the base station to the cloud where it enters a queue ofpending requests. Once the request comes to the front of the queue, it is send to a VirtualMachine (VM) which decompresses and generates a sequence of frames after passing it throughthe frame allocation policy block. It is then sent back to the user; the resulting video streampackets may be corrupted depending on the channel error probabilities.1.5 Main Results and OrganizationThe main results in this thesis include the following:1. Sec. II formulates the system model and the frame resolution optimization problem ina video transmission to the user playing a video game on a cloud based gaming server.Formulating the problem as an MDP allows us to take advantage of the uncertaintyinvolved in the transmission of a video frame over an error prone network. Unlike a staticoptimization technique, which does not allow us to change the transmission strategy onceformulated at the beginning of the transmission, a dynamic optimization technique likethe one used in this thesis allows changing the transmission policy to maximize the rewardwithin a given horizon.2. Solving a MDP for S states and A involves, in general, O(S2A) computations. This canbe intractable for large state spaces. By exploiting the dynamics of the problem setup,Sec. III proves using supermodularity arguments that the optimal frame allocation policyis monotonically decreasing in the number of transmitted frames and the time taken totransmit the frames. In Section III, we also show how the monotonic structure of theoptimal policy can be exploited to estimate a policy that is close to the optimal policywhen the transitional probabilities are unknown and possibly changing slowly over time.3. In Sec. III, numerical examples are presented to illustrate the formulation and algorithmsproposed in this thesis in a realistic scenario. It is shown that the optimal transmission101.5. Main Results and Organizationpolicy performs better than a myopic or greedy policy by up to 12% and better than around robin policy by up to 31%.1.5.1 Related Work[13] presents an adaptation technique inspired by the level of detail (LoD) approach in 3Dgraphics. It is based on a cloud gaming paradigm for the minimization of the effect of poornetwork parameters (delay, loss, jitter) in order to enhance the game interactivity and improvethe player quality of experience (QoE). A pilot experiment has been carried out to evaluatethis approach through a game prototype. The results of this experiment show that the LoDbased adaptation in cloud gaming provides a significant QoE enhancement on poor or congestednetworks. [24] introduces the concept of Game Attention Model (GAM), which is basically ahybrid visual attention model, as a means for reducing the bit rate of the streaming videomore efficiently. GAM estimates the importance of each macro-block in a game frame from theplayer’s perspective and allows encoding less important macro-blocks with lower bit-rate. It isshown that by integrating this model into a cloud gaming framework, it is possible to decreasethe required bit rate by nearly 50 percent in average, while maintaining a relatively high userquality of experience.[25] categorizes the current approaches to cloud-based mobile gaming andproposes a new framework for next generation Mobile Cloud Gaming (MCG) systems. Thearchitecture proposed consist of five components namely Cloud Layer, Network Layer, MobileLayer, QoSE and Security. [31] studies an optimization problem to maximize the cloud gamingprovider’s total profit while achieving just-good-enough Quality-of-Experience (QoE). It alsostudies a Virtual Machine (VM) placement problem and models the QoE of cloud gaming asa function of sum of processing delay and network delay. [33] proposes a unified QoE-awaremanagement framework, directly targeting to cloud computing environments. The proposedmanagement system suggests the optimization of cloud resources usage and offered services interms of QoE, satisfying the different service and resource requirements of all the involved cloudentities. In addition, the proposed novel approach merges together various QoS aspects in amultidimensional framework, referred to as QoE4CLOUD, which considers the perceived qualityas the key metric for the management and performance optimization of the cloud environment.In [17] the real-time performance of current cloud gaming systems is examined. It measures the111.5. Main Results and Organizationlatency of components of cloud gaming systems, namely OnLive and StreamMyGame(SMG)and deduces that OnLive uses a game genre based strategy to provide sufficiently short latencyfor real-time gaming, whereas SMG has twice the latency of OnLive. In [1] a cloud server basedapproach is studied, termed Cloud Mobile Gaming (CMG), where the burden of executing thegaming engines is put on cloud servers, and the mobile devices just communicate the usersgaming commands to the servers. They have come up with a measure, called the Gamer’sMean Opinion Score (GMOS) to quantify the user’s gaming experience while playing underthis architecture. In [3], a set of application layer optimization techniques to ensure accept-able gaming response and video quality is proposed. In comparison, [34] proposes a renderingadaptation technique which can adapt the game rendering parameters to satisfy CMG com-munication and computation constraints, such that the overall mobile gaming user experienceis maximized. In [35], an architecture which obtains information from passive network mea-surements to adapt the target bit-rate of the generated video-stream to match the availablebandwidth in the network is used. In [36], a QoS-aware adaptation architecture is proposedthat can be used in all the real-time video services using unicast transmission.The conditionsrequired for the existence of a monotone policy have been studied before. In [37], the problemof transmitting data within a fixed horizon and a penalty constraint is studied. Conditions arederived for a two state threshold policy to exist both in the buffer occupancy and the residualtransmission time. In [38], the problem of scheduling a limited number of identical replacementsof a vital component is studied and the existence of a two way control limit policy in the statevariables is proved. In [39], a transmission scheduling problem using an ARQ protocol withretransmissions given channel state information (CSI) and a correlated fading channel is stud-ied.The authors in [39] provide sufficient conditions on the channel memory, and transmissioncost so that the optimal transmission scheduling policy is a monotonically increasing functionof the buffer occupancy. In [40, 41], a simulation based optimization technique is used to showhow the structure of the optimal policy may be exploited to directly estimate the optimal policy.12Chapter 2Finite Horizon Frame AllocationProblemThis section formulates a frame allocation policy from the cloud gaming server to the smart-phone as a MDP. As mentioned in Sec. 1, formulating the problem as an MDP facilitatesexploiting the dynamics of the channel and the cloud state information to optimize the frameallocation rate. We describe the factors which affect the communication between the cloud andthe user and how they in turn can be used to model the components which make up the MDP.The cloud server transmits the results of the user’s command in the form of video frames. Atany time, the cloud may choose to send a high-resolution or a low-resolution frame. A high-resolution frame is comprised of H packets whereas a low-resolution frame is comprised of Lpackets, with H > L. Maximizing the quality of service of a remote user may be modeled asdelivering the maximum number of high resolution frames while delivering at least a minimumnumber of frames, denoted by ft within a given horizon N . An MDP comprises of the followingcomponents:2.1 State SpaceWe define the state s = (x, n, i) ∈ S of the system to be composed of the number of framesx ∈ {0, 1, 2 . . . ft} (where ft is the minimum number of frames required to be transmitted inthe horizon N ; the value of ft is based on the number of frames required to be delivered tothe user so that the video sequence is smooth in nature), the time n ∈ {0, 1, . . . N} spent intransmitting x frames, i ∈ {0, 1, . . . C} is the number of requests in the cloud queue that needto be serviced before the next frame of the video sequence can be transmitted and C is themaximum number of users at any one time on a cloud server. The minimum time required to132.1. State Spacetransmit a packet is half the Round Trip Time (RTT ). Hence, the minimum time required totransmit a frame of length l ,Tmin, is l half RTT time units, i.e.Tmin = l(RTT2). (2.1)Each division on the time axis is one half the RTT . Since one packet takes one unit of time orequivalently one division on the time axis, the length of a high or low resolution frame may alsobe be expressed in units of time. The time taken to deliver a frame depends on the followingfactors:1. Round Trip Time (RTT): The RTT is the time required for a packet to travel from thesource to the destination and back again determines the minimum time required for thecloud’s response to reach the user. Round Trip Time is dependent on a number of factors,including the data transfer rate of the source, the nature of the transmission medium, thedistance between the source and the destination, the number of nodes between the sourceand the destination, the amount of traffic or bandwidth on the network that is beingused, the number of other requests being handled by the receiver or nodes along thetransmission path, the processing capabilities of the source, receiver, and nodes, and thepresence of interference [42]. For this research, it is assumed the RTT is only composed ofthe distance between the source and the destination and that half of the RTT contributesto the downlink. This quantity determines the maximum number of frames that canbe send per unit time. The minimum transmission time for a packet is assumed to bethe same as half the the RTT. The transition probability, described in the next section,encapsulates the delay due to the RTT.2. Cloud response time: Once the cloud server receives the command from the remote user,it is responsible for decompressing the the command, interpreting it, render the resultof the command as a High Definition (HD) video stream, compressing the video streamand sending it back to the user. Advances in gaming hardware have resulted in cloudsbeing able to perform these tasks in a milliseconds frame of time. A single cloud serverhosts a large but finite number of remote gamers. Due to these multiple users interacting142.2. Action Spacewith the cloud, a user’s request may not be dealt with the moment it reaches the cloud.Instead it is put in a queue and is dealt once the queue is cleared. Each user’s requestwill result in a sequence of video frames. However, the cloud does not transmit the entiresequence for the user; instead it transmits a frame of the sequence, and then the useris put back in the queue. The next frame in the sequence is then transmitted when thequeue is cleared again and so on. This is done to ensure that each user is treated fairlyand no one user monopolizes the resources of the cloud server. This implies that theservice time of the cloud may be modeled as a random variable and contributes towardthe time remaining in which the cloud must perform its task. For the purpose of thisthesis, the cloud is modeled as an M/M/m queue, i.e. with Poisson distributed arrivalrate of λ, with a service rate of µ per server and m servers. The reason for assuming aninfinite size queue for the cloud is that it would be unacceptable to block a customer’srequest once he has already started playing the game, which is what would happen in afinite size queue.2.2 Action SpaceAt each decision epoch the cloud takes an action a ∈ A = {0, 1} where 0 stands for transmittinga low resolution frame to be sent and 1 stands for transmitting a high resolution frame. For thepurposes of this research, it is assumed that the cloud based server will always be transmittingeither a low or a high resolution frame.2.3 Transition ProbabilitiesThe transition probabilities are dependent on the following factors:1. Packet transmission with a channel-aware ARQ protocol : The transmission network con-ditions are responsible for determining the time within which a frame is successfullydelivered from the cloud to the user. A non-zero error probability means that a packetwill not necessarily reach the user the first time and hence may need to be retransmitted.A packet is transmitted in a transmission time slot and the result of the transmission (an152.3. Transition ProbabilitiesACK or NACK) will be received error free in the next transmission time slot. Failureto deliver a packet unsuccessfully implies that the cloud needs to retransmit the packetagain. In other words, we assume an instantaneous, error free feedback channel for theARQ scheme, which is an assumption in most work that considers ARQ for real time datatraffic in packet transmission systems. The channel has been modeled with the two stateGilbert-Elliot channel model [43]. In this channel model the channel is either in a goodstate or a bad state with both states experiencing different Bit Error Rates (BERs). Thetwo state Markov model implies that when the channel is in the Bad state, the packetsthat are transmitted get damaged with a very high BER, and when the channel is in theGood state the packets get through relatively intact, with a result of burstiness in the biterror distribution. This effect is not however taken into account if we use the mean biterror rate which smooths out the bursty behavior by distributing the errors throughoutthe transmission [43].In order to try to preserve the effects of this burstiness, the probability of a whole packetbeing damaged with a specific bit error rate is calculated, and then the mean value istaken:pe = β.PepG+ (1− β).PepB. (2.2)where β is the probability the channel is in a good state, 1 − β is the probability thechannel is in a bad state, PepG is the probability of a damaged packet in the Goodstate and PepB is the probability of a damaged packet when the channel is in the Badstate [43]. If a packet fails to deliver the first time, it needs to be transmitted again.The probability of transmitting a frame of length l within some time may be modeledby a random variable X whose probability distribution function is a negative binomialdistribution with parameters l and 1− pe.2. Queue length: Let Pi denote the probability that c customers are in the queue when arequest to transmit a frame comes in. If P0 is the probability that no customers are in162.3. Transition Probabilitiesthe queue then and ρ = λµ , then :Pi =(ρi.P0m!mi−m)(2.3)whereP0 =(m−1∑i=0(ρ)ii!+ρmm!(1− ρ))−1. (2.4)Here, λ is the arrival rate of the requests and µ is the service rate of the cloud based server,both being exponentially distributed. We always assume that the number of requests, i,in the queue is greater than the number of servers, m.Before the frame for a user can be transmitted, the request needs to move to the frontof the cloud queue. Also, once all the packets of a frame have been transmitted, all packetsconstituting the frame need to be decompressed and decoded and assembled [1]. This is definedas Decoding/Client play-out delay. A frame with l number of packets will experience an (addi-tional) delay of lµip. Since most modern H.264 decoders are capable of decoding the HD streamin real time, client playout delay will be much smaller than the transmission time (RTT) for apacket. E.g, the transmission time is usually a few milliseconds whereas the playback time isin micro seconds. We define the quantities dH = i/mµ + Hµip and dL = i/mµ + Lµip wherei/mµ is the delay caused by the queue of the cloud and Lµip(Hµip) is the delay caused bydecoder playback decoding a low(high) resolution frame.This quantity affects the transmissionprobabilities. Specifically, when a high resolution frame is chosen to be sent:p((x+ 1, k, j)|(x, n, i), a = 1) = P(X = t)Pj0 ≤ x ≤ ft, 0 ≤ n ≤ N,H − dH ≤ t <∞.(2.5)When a low resolution frame is chosen to be sent:p((x+ 1, k, j)|(x, n, i), a = 0) = P(X = t)Pj0 ≤ x ≤ ft, 0 ≤ n ≤ N,L− dL ≤ t <∞.(2.6)172.4. Rewards2.4 Rewards2.4.1 Terminal RewardThe terminal reward can be interpreted as a penalty dependent on the number of framesthat have been successfully transmitted at the end of the time horizon N . If the number ofsuccessfully transmitted frames is x, then the terminal reward is defined:rN,i(x) = (RH −RL) min(0, x− ft) (2.7)where ft is the minimum number of frames that need to be transmitted in order not to incurany penalty. If the cloud manages to deliver more than x > ft frames, then no penalty isincurred. Note that the terminal reward is independent of n.2.4.2 Instantaneous RewardsWe will consider two types of instantaneous rewards. The first type of reward will be called an“aggressive” reward whereas the second type of reward will be called an “neutral” reward.Aggressive RewardsEvery time the system makes a decision to transmit a frame, it will expect a reward, providedthe frame is delivered to the user before the horizon of the Markov Decision Problem. Both thequeuing delay and the client playout delay take away from the time available to transmit theframe within the horizon. The instantaneous average rewards are therefore, defined by:rn,i(x, a) =t=N−n−dL∑t=LRLP(X = t)Pd, if a = 0,t=N−n−dH∑t=HRHP(X = t)Pd, if a = 1.(2.8)Since a user prefers high resolution frames to low resolution frames, RH > RL. Again, dH =i/mµ+Hµip and dL = i/mµ+Lµip where i/mµ is the delay caused by the queue of the cloudand Lµip(Hµip) is the delay caused by decoder playback decoding a low(high) resolution frame.The reward structure chosen reflects the twin goals of maximizing the number of high182.4. Rewardsresolution frames sent to the mobile phone of the user under the constraint that the user getsat least a minimum number of frames in a fixed time i.e. the chosen horizon. The terminalreward given by (2.7) penalizes the cloud depending on the margin by which the system failsto deliver the minimum number of frames in the horizon, thus encouraging the cloud to sendas many frames as possible. Although the reward for sending a high resolution frame is greaterthan the reward for a low resolution frame, the longer length of the high resolution frame alsosubject it to greater transmission errors. As the instantaneous rewards given by (2.8) are anaverage value of sending a high resolution frame or low resolution frame, the cloud is encouragedto send high resolution frames in the beginning of the video transmission. If the channel errorprobabilities are high or as the horizon draws closer, the cloud may switch to sending lowresolution frames in order to minimize the penalty given by the terminal reward.Neutral RewardsThe neutral reward has the following structure:rn,i(x, a) =RL, if a = 0RH , if a = 1(2.9)We will explore the nature of the frame allocation policy for both the “aggressive” and the“neutral” rewards.Alternate reward structures can also be chosen. For example, [1] proposes a reward basedon a measure called the Gamer Mean Opinion Score (GMOS). The GMOS is formulated asGMOS= 1 + 0.035R + 7× 10−6R(R − 600)(100−R) where the factor R ranges from 0 to 100and is related to GMOS through non-linear mapping. [1] then formulates the R-factor withindividual Impairment functions, IC , IP , IL and ID, which are determined experimentally. HereIC includes the effect of the initial streaming video; TP represents the impairment caused bysource streaming video quality Peak Signal to Noise Ratio; ID indicates the impairment causedby delay and IL covers the impairment caused by packet loss.The aim of the Markov Decision Problem will be to maximize the rewards taken duringeach decision epoch. This is described in detail in the next chapter.19Chapter 3Optimality of Monotone FrameAllocation PoliciesIn this chapter, the structure of the optimal frame allocation policy is studied. Subsection Apresents the formulation and B presents the structural results. At the end of the chapter, thesignificance of the structural results are discussed.3.1 Stochastic Dynamic Programming FormulationThe optimal frame allocation policy is then pi∗ = (u∗n(.,. ) : n = 1, 2, . . . , N) where the decisionrules u∗n(.,. ) : n = 1, 2, . . . , N are the solution of the following stochastic dynamic programmingrecursion called Bellman’s equation [32]:Vn,i(x) = maxa∈AQn,i(x, a) (3.1)u∗n,i(x) = arg maxa∈A(Qn,i(x, a)) (3.2)whereQn,i(x, a) =∑j{(rn,i(x, a) +N−n−dl∑t=lVn+t,j(x+ 1))P(X = t)Pj+ VN,j(x)∞∑t=N−n−dl+1P(X = t)Pj} (3.3)with VN,j(x) = (RH − RL) min(0, x − ft) and rn,i(x, a) defined as in (2.8). For a problemhorizon of length N and having K states and L actions, the number of multiplications requiredto solve the Dynamic Program (DP) directly is (N − 1)LK2. Since the the DP solution’s203.2. Monotone Structure of the Optimal Frame Allocation Policiescomplexity increases linearly with the action set and quadratically with the state space, itbecomes intractable for large scale problems.3.2 Monotone Structure of the Optimal Frame AllocationPoliciesThe dynamic programming solution suffers from the curse of dimensionality and is intractablefor large state spaces and action spaces. E.g., for the problem of transmitting high or lowresolution from the cloud to a smart phone, for a horizon of length N , the state space K = N.ftand action space L = 2, the time complexity of the DP solution is O(2N3). In this section weprove that the optimal frame allocation policy is monotone in the number of frames transmittedas well as the time taken to transmit the frames. A monotone policy is of the form (whereun(x, i) denotes the optimal action to choose at time n for system state x, i):un(x, i) =Action 1, if state n < TnAction 2, otherwise.(3.4)Thus if a MDP has a monotone policy, one only needs to compute the threshold (Tn in theequation above) to implement the optimal policy. The following structural results are provedin the appendix. Defineqn,i(k|x, a) =∑kp(k, n+ t, j|x, n, i, a) =∑kpn,i(k|x, a) (3.5)for all k ∈ [x, x+ 1] and t ∈ [a,N − n].Defineqx,i(k|n, a) =∞∑k=n+ap(y, k, j|x, n, i, a) =∞∑t=k−npx,i(k|n, a) (3.6)for all k ∈ [n+ a,∞) and y ∈ [x, x+ 1].Theorem 1. Suppose for n ∈ {0, 1, . . . N} thatA1. rn,i(x, a) is non-decreasing in x for all a ∈ A213.2. Monotone Structure of the Optimal Frame Allocation PoliciesA2. qn,i(k|x, a) is non-decreasing in x for all k ∈ S and a ∈ AA3. rn,i(x, a) is a super-additive function on S ×AA4. qn,i(k|x, a) is a super-additive function on S ×A for all k ∈ SA5. rN,i(x) is non-decreasing in xthen the optimal frame allocation policy is monotone in the frames transmitted, i.e.u∗n(x, i) =1, if x < x∗(n, i)0, otherwise.(3.7)where where x∗(n, i) is the number of frames transmitted in time n with i customers in thecloud queue.Theorem 2. Suppose for n ∈ {0, 1, . . . N} thatA1. rx,i(n, a) is non-decreasing in n for all a ∈ AA2. qx,i(k|n, a) is non-decreasing in n for all k ∈ S and a ∈ AA3. rx,i(n, a) is a sub-additive function on S ×AA4. qx,i(k|n, a) is a super-additive function on S ×A for all k ∈ SA5. rx,i(N) is non-decreasing in nthen the optimal frame allocation policy is monotone in the residual transmission time, i.e.u∗n(x, i) =1, if n < n∗(x, i)0, otherwise.(3.8)where where n∗(x, i) is the transmission time spent in transmitting x frames with c customersin the cloud queue.In summary, Theorems 1, 2 mean that the optimal frame allocation policy is a conservativepolicy since it is optimal to transmit low resolution frames whenever the residual transmissiontime is low or when the number of transmitted frames is further away from the threshold. Theconditions under which the monotone policy exists are examined in the next chapter.22Chapter 4Implementation and DiscussionIn the theorems presented above, we have derived conditions for which the optimal frameallocation policy is monotone in two dimensions: the number of frames transmitted and thetime taken to transmit the frames. The aim of this discussion is to provide more insight intothe significance of these monotone results and how they can be exploited.To guarantee the existence of a monotone policy described in Theorem’s 1 and 2, for aaggressive reward, a number of conditions need to be satisfied. Firstly, the horizon over whichthe frames are to be transmitted must be greater than the sum of the low and high resolutionframes i.e. (N > H+L). If the horizon does not satisfy the aforementioned condition, then thetransmission policy tells us to always transmit a low resolution frame. As explained in Sec. II,the RTT is a measure of the physical distance between the cloud based game server, determinesthe minimum time that a packet requires to reach the user from the cloud. Consequently, theRTT also puts a cap on the maximum number of frames that can be transmitted within a giventime horizon.A monotone policy exists as long as the residual transmission time is greater than the sum ofthe low and high resolution frames i.e. N−n > H+L, where n units of time have been spent uptill now and H and L are the number of packets in a high resolution, and a low resolution framerespectively. Also The existence of a monotone policy requires, that the residual transmissiontime (N − n)RTT2 >>imµ + Hµip, where1mµ is the mean service time per customer, for thecloud based server, Hµip is the mean decoding time for a high resolution frame and i is thenumber of customers in the cloud queue at that decision epoch.The numerical values for thereward for the chosen structure, must be such that RH > RL and RL > ft where RH is themaximum reward associated sending a high resolution frame, RL is the low reward associatedwith sending a low resolution frame and ft is the minimum number of frames to be send in the23Chapter 4. Implementation and Discussionhorizon N .Lastly, the channel error probabilities should be low enough so that a high resolution frameof length H may be transmitted with a high probability in exactly H RTT2 time units. Thisimplies that the monotone policies will hold, with a high degree of probability, provided thechannel is in a very good condition.Thus we have five distinct requirements; a) have cloud based game servers with very fastresponse times b) have a bound on the size of the cloud queue. This bound is dependent onresponse time per customer i.e 1mµ .The smaller the value of1mµ , the larger the size of the queuec) have very fast high definition decoders in the smart phones for users. Advances in technologyshow that the first requirement, would not be difficult to achieve in the near future. The secondrequirement is being implemented in many current gaming scenarios, e.g. many Massive Multiplayer Online Role Playing Games (MMORPGs) put a new user onto a server which does nothave too many users already logged in. The third requirement is already met by many currentsmart phones which already allow HD video to be streamed from websites like ’youtube.com’,amongst many others. d) The numerical values of the rewards and the threshold number offrames must be related as described above.The last requirement of having a channel with a very probability of error can be met easilymet by ensuring that the by error rates stay low both during good and bad channel conditions.E.g., for a round trip time of 20 ms and packet error probability of 10−2%, a frame having alength of 30 ms may be transmitted within 30 ms with a probability of 99% and an frame witha length of 90 ms may be transmitted with a probability of 98% within 90 ms.24Chapter 5Reinforcement Learning Algorithmthat Exploits Monotone StructureWhen the state space dimension is large and the transition probabilities are not known, thereis strong motivation to use reinforcement learning algorithms for estimating the optimal policy;rather than solving the MDP using dynamic programming. Since we know that the optimalpolicy is monotone and are aware of the general shape of the monotone curve, we may approx-imate the curve by a sigmoidal curve, as shown in Fig. 6.4. The general form of the sigmoidalcurve approximating the monotone policy, un(x, i), is:un(θ) = (ft/2− 0.5) + (ft/2− 0.5) tanh(θ1 + θ2n) (5.1)where θ =[θ1 θ2], ft is the threshold described in Sec. II and θ represents the two parametersof the sigmoidal curve that are to be estimated. In order to estimate the values of θ1 and θ2,we will maximize the expected cost function J(θ) given by:J(θ) =1T∑kN∑n=0Cn(x, i) (5.2)where x is the number of frames transmitted by time n and c customers in the cloud queue.Cn(x, i) =RH , if x ≥ un(θ),RL, otherwise.(5.3)Here T is the number of times the functionN∑n=0Cn(x, i) is evaluated so that its average valueapproaches to the average value of the reward given by (3.1) and (3.2).25Chapter 5. Reinforcement Learning Algorithm that Exploits Monotone StructureThe expected cost function plays the same role as the “Frame Allocation Policy” labelledin Fig 1.3. In other words it will deliver an optimal frame allocation policy to the cloud thatmaximizes the gaming experience of the user.The boundary of the curve represented by (5.1) represents the decision threshold vector. Anypoints lying to the left of the curve imply transmitting a high resolution frame and points to theright of the curve imply transmitting a low resolution frame. The Simultaneous PerturbationStochastic Approximation (SPSA) helps us determine the parameters of the sigmoidal curvewhich best represent (3.2). Algorithm 1 below, uses the Simultaneous Perturbation StochasticApproximation (SPSA) algorithm [44] to generate a sequence of estimates, that converges to alocal minimum of the optimal linear monotone with policy θˆn, n = 1, 2 . . . , that converge to alocal maximum of the optimal threshold u∗n(x, i).Algorithm 1 Policy Gradient Algorithm for compouting optimal sigmoidal monotone policyRequire: choose θˆ0 and monotone policy uθ0 ,Require: iterations← z, 0.5 ≤ γ ≤ 1 and ∆ > 0, 0.5 < α ≤ 1, and , A > 0.while z 6= 0 dogradient step size: ∆z ← ∆/(z + 1)γgradient estimate: ∇ˆθJz(θˆz)←Jz(θˆz+∆iωz)−Jz(θˆz−∆iωz)2∆zωzωz ={−1, with probability 0.5+1, with probability 0.5θˆz+1 ← θˆz − z+1∇ˆθJz(θˆz), zθˆz+1 ← /(z + 1 +A)αz ← z − 1end while26Chapter 6Numerical ResultsHaving proved that the optimal transmission policies are monotone in the transmission timeand the number of frames transmitted, we now illustrate these structural results via numericalexamples that arise in a scenario involving transmission of video frames from a cloud basedgame server. Assume the following: High resolution frame has 153600 pixels/frame. Lowresolution frame is (176 × 220) with 38720 pixels/frame with 8 bits used to represent a pixel.The frame threshold (miniumum number of frames required to be transmitted) is 9 and theerror experienced by a packet is 10−4. These 9 frames need to be delivered over a time period of900 ms and since RTT/2 = 10ms, the number of slots is 90. The number of servers is m = 1000and the service rate µ = 1000. The probability of the channel being in a good state is 9.9×10−1,probability of channel in a bad state is 9.95×10−3, the packet error rate in a good state is 10−6and the packet error rate in the bad state is 10−1 [43]. In the first section, the results for the“aggressive” policy are presented and the next section presents the rewards for the “neutral”policy.6.1 Aggressive Policy ResultsFrom Fig.6.1, we can see that the optimal policy is monotone in both state variables, x, thenumber of frames transmitted and n, the time elapsed. For any state (x, n, i), if the systemtransmits a high resolution frame, then it will also transmit a high resolution frame for state(x+ 1, n, i), thus verifying the threshold structure in the state variable x. Similarly, if for anystate (x, n, i), if the system decides to transmit a low resolution frame, then it will also transmita low resolution frame for any state (x, n + 1, i), thus verifying the monotone structure in thestate variable n. Fig.6.1 also tells us that the optimal threshold policy is conservative innature. If the system has failed to transmit at least a certain number of frames x∗(n, i) by276.1. Aggressive Policy Resultstimeframes transmitted1 11 21 31 41 51 61 71 8112345678910Figure 6.1: The two way control limit structure of the monotone policy for the chosen pa-rameters. The dark part of the figure is the region where only low resolution frames are sentwhereas the white region indicates high resolution frames being sent. As the number of framessent increases, the time threshold at which we start sending low resolution frames increases.The smaller the observed time, the more frames do we need to have sent if we wish to send ahigh resolution frame.time n, then the optimal policy tells it to transmit low resolution frames in preference to highresolution frames for all states x∗(m, i), where n < m ≤ N . This conservative nature of theoptimal policy is further reinforced by Fig.6.2. In this figure, the change in the thresholds isplotted for different increasing channel error probabilities. As the error probabilities increases,the threshold number of frame x∗ for any time n, increases as the channel error probabilities.This intuitively makes sense since an increase in the error probabilities implies a lesser likelihoodof a larger sized frame getting through in time as compared to a smaller frame. where size ofthe frame is a measure of the packets required to make up the frame.Fig.6.3 compares the average rewards of the optimal policy obtained by evaluating (3.1)and (3.2), a myopic policy and a round-robin policy. A myopic policy is one which, at each286.1. Aggressive Policy Results0 10 20 30 40 50 60 70 80 900123456789Time elapsedFrames transmittedOptimal threshold frames transmitted increases with time elapsed  Threshold with error prob = 10−4Threshold with error prob = 2*10−4Threshold with error prob = 3*10−4Figure 6.2: The optimal frame threshold for transmitting low resolution frames is increasing inthe residual transmission time i.e. it is optimal to transmit a high resolution frame if and onlyif there are at least x∗(n,i) frames transmitted.decision epoch, decides to send the frame (High or Low), which will deliver a higher immediateexpected reward within the remaining time. For this simulation, for the parameters mentionedat the beginning of Sec. IV, the myopic policy always decides to send a low resolution frame.A round-robin policy simply alternates between sending high and low resolution frames at eachdecision epoch. From the aforementioned figure, we can see that initially the myopic and theoptimal policy deliver nearly the same reward. As more time elapses, the difference between theaverage rewards of the myopic and optimal policy increases. This is because the myopic policyconcerns itself only with the immediate rewards and not with the current decision has on futurerewards, a fact taken into account by the optimal policy. The round-robin policy delivers alower reward than the optimal policy at all times. Fig.6.4 compares the average rewards of thethree policies obtained through Monte-Carlo simulations. The trend displayed by the averagerewards is the same as in Fig. 6.3.296.1. Aggressive Policy Results10 20 30 40 50 60 70 80 90−10−5051015202530HorizonAverage RewardBellmans equation simulation: Avg reward of MDP Vs Myopic and Round Robin policy  MDP policymyopic policyround robin policyFigure 6.3: Comparing the average reward defined by equations (3.1) with a myopic and around- robin policy. The MDP returns a higher average reward than the other 2 policies asthe horizon increases. For the chosen horizon, the a myopic policy suffers a performance loss of12% compared to the optimal transmission policy while the round robin policy suffers a loss of31%.Fig. 6.5 shows how the structure of the monotone policy can be exploited by estimating thepolicy using a sigmoidal curve, obtained by the Simultaneous Perturbation Stochastic Approx-imation algorithm [44]. The goal is to exploit the structure of the policy and directly estimateit in policy space, and by simulation based optimization, one can do this without knowledgeof the transition matrix. In situations where the transition probabilities of the MDP are notknown and even possibly change with time, using Algorithm 1, we can approximate the opti-mal policy with a sigmoidal policy which returns a reward close to the optimal transmissionpolicy. Fig. 6.6 shows how the SPSA algorithm is able to track the optimal policy even whenthe transitional probabilities change by a small amount with time.Table 6.1 compares the normalized performance loss of the optimal frame allocation policyversus the myopic, round-robin and stochastic approximation policies. From the table we cansee that the simulation based stochastic approximation technique gives a reward that is closestto the optimal policy.306.2. Neutral Policy Results10 20 30 40 50 60 70 80 90−10−50510152025HorizonAverage RewardMonte Carlo simulations: Avg reward of MDP Vs Myopic and Round Robin policy  MDP policymyopic policyround robin policyFigure 6.4: Comparing the average reward between the average reward for an MDP and around robin policy using Monte Carlo simulations.Table 6.1: Normalized performance lossPolicy Normalized performance lossMyopic 12%Round robin 31%Simulation based approximation 5%6.2 Neutral Policy ResultsThe results for the neutral policy show a similar trend to the aggressive policy.From Fig.6.7, we can see that the optimal policy is monotone in both state variables, x, thenumber of frames transmitted and n, the time elapsed. For any state (x, n, i), if the systemtransmits a high resolution frame, then it will also transmit a high resolution frame for state(x+ 1, n, i), thus verifying the threshold structure in the state variable x. Similarly, if for anystate (x, n, i), if the system decides to transmit a low resolution frame, then it will also transmita low resolution frame for any state (x, n + 1, i), thus verifying the monotone structure in thestate variable n. Fig.6.7 also tells us that the optimal threshold policy is conservative innature. If the system has failed to transmit at least a certain number of frames x∗(n, i) by316.2. Neutral Policy Results0 10 20 30 40 50 60 70 80 900123456789Time elapsedFrames transmittedOptimal threshold frames transmitted increases with time elapsed  Policy obtained using Bellmans EquationPolicy obtained using SPSAFigure 6.5: The monotone policy may be approximated by a sigmoidal function that returnsan average reward close to the average reward of the MDP policy. The simulation basedstochastic optimization algorithm suffers a normalized performance loss of only 5% comparedto the optimal frame allocation policytime n, then the optimal policy tells it to transmit low resolution frames in preference to highresolution frames for all states x∗(m, i), where n < m ≤ N . This conservative nature of theoptimal policy is further reinforced by Fig.6.8. In this figure, the change in the thresholds isplotted for different increasing channel error probabilities. As the error probabilities increases,the threshold number of frame x∗ for any time n, increases as the channel error probabilities.This intuitively makes sense since an increase in the error probabilities implies a lesser likelihoodof a larger sized frame getting through in time as compared to a smaller frame. where size ofthe frame is a measure of the packets required to make up the frame.326.2. Neutral Policy Results0 10 20 30 40 50 60 70 80 900123456789Time elapsedFrames transmittedOptimal threshold frames transmitted increases with time elapsed  Policy obtained with initial transition matrix valuesPolicy obtained with perturbed transition matrixFigure 6.6: Tracking of frame allocation policy by the SPSA with a small perturbation in thetransitional probabilities.336.2. Neutral Policy Resultstimeframes transmitted1 11 21 31 41 51 61 71 8112345678910Figure 6.7: The two way control limit structure of the monotone policy (neutral reward) forthe chosen parameters. The dark part of the figure is the region where only low resolutionframes are sent whereas the white region indicates high resolution frames being sent. As thenumber of frames sent increases, the time threshold at which we start sending low resolutionframes increases. The smaller the observed time, the more frames do we need to have sent ifwe wish to send a high resolution frame.346.2. Neutral Policy Results0 10 20 30 40 50 60 70 80 900123456789Time elapsedFrames transmittedOptimal threshold frames transmitted increases with time elapsed  Threshold with error prob = 10−4Threshold with error prob = 2*10−4Threshold with error prob = 3*10−4Figure 6.8: The optimal frame threshold for transmitting low resolution frames (neutralreward) is increasing in the residual transmission time i.e. it is optimal to transmit a highresolution frame if and only if there are at least x∗(n,i) frames transmitted.356.2. Neutral Policy Results10 20 30 40 50 60 70 80 90−10−50510152025HorizonAverage RewardBellmans equation simulation: Avg reward of MDP Vs Myopic and Round Robin policy  MDP policymyopic policyround robin policyFigure 6.9: Comparing the average reward defined by equations (3.1) with a myopic and around- robin policy for a neutral reward. The MDP returns a higher average reward than theother 2 policies as the horizon increases.0 10 20 30 40 50 60 70 80 900123456789Time elapsedFrames transmittedOptimal threshold frames transmitted increases with time elapsed  Policy obtained using Bellmans EquationsPolicy obtained using SPSAFigure 6.10: The monotone policy may be approximated by a sigmoidal function that returnsan average reward close to the average reward of the MDP policy (neutral reward) .36Chapter 7Conclusions and Future Work7.1 ConclusionsIn this thesis we considered the problem of transmitting video frames from the cloud to the userand determine a policy which allows the cloud to optimally schedule high and low resolutionvideo frame in order to maximize the user’s gaming experience. By exploiting the dynamicsof the problem setup, it is shown that the policy is monotone both in the number of framestransmitted and the transmission time. Such a monotone structure of the optimal The expectedcost function plays the same role as the ”Frame Allocation Policy” labeled in Fig 1.3. In otherwords it will deliver an optimal frame allocation policy to the cloud that maximizes the gamingexperience of the user policy allows for a low computational cost algorithm for evaluating theaverage cost value function because a monotone policy results in an actions set which decreaseswith increasing state. In order to guarantee guarantee the existence of a monotone structure, itis required to have a fast response time for the cloud, a limit on the queue size of the cloud aswell fast HD decoders in the smart-phone of the user. Finally, we show how the structure of theoptimal policy may be exploited to directly estimate the policy. We used a simulation basedoptimization algorithm, namely Simultaneous Perturbation Stochastic Approximation (SPSA)to estimate the optimal policy in a scenario where the transition probabilities of the MDP arenot known and possibly change with time.7.2 Future WorkFor the problem discussed in this thesis, the action set was comprised of only two actions,namely sending a high or low resolution frame. The next stage of the problem would be toenlarge the action set so that the cloud will have the option to choose from a different set of377.2. Future Workframes with more than just two resolutions to transmit to the user. This will result in a muchfiner control and a better experience for the user of the mobile device.Another point that will be addressed is exploring the use of a different cost functions, similarto the one described in [1]. The choice of different cost functions may result in a frame allocationpolicy which returns a higher QoE for the user of the mobile device. A different cost functionmay also be increasing in the time taken to transmit x frames and thus it can be shown thatthe optimal frame allocation policy will always be monotone.38Bibliography[1] S. Wang and S. Dey, “Modeling and characterizing user experience in a cloud server basedmobile gaming approach,” in Proceedings of the 2009 IEEE Global TelecommunicationsConference, Honolulu, HI, 2009, pp. 1–7.[2] comScore, “Smartphones provide extra mana for mobile games industry as audiencefor downloaded games grows 17 percent,” Jan. 2009. [Online]. Available: http://www.comscore.com/press/release.asp?press=2711[3] S. Wang and S. Dey, “Addressing response time and video quality in remote server basedinternet mobile gaming,” in Proceedings of the 2010 IEEE Wireless Communications andNetworking Conference, Sydney, NSW, 2010, pp. 1–6.[4] C. van Berkel, “Multi-core for mobile phones,” in Proceedings of the Conference on DesignAutomation and Test in Europe, Nice,France, 2009, pp. 1260–1265.[5] N. Chan, “Testing: Apple 11-inch macbook air (2013),” Jun. 2013. [Online]. Available:http://www.tested.com/tech/mac-os/456567-testing-apple-macbook-air-11-2013[6] Uoobe, “Phone and Tablet Battery Life Comparison Tests,” Nov. 2013. [Online].Available: http://www.uoobe.com/phones-battery-life-comparison-tests/[7] I. Nave, H. David, A. Shani, P. Laikari, P. Eisert, and P. Fechteler, “Games@large graphicsstreaming architecture,” in Proceedings of the 12th Annual IEEE International Symposiumon Consumer Electronics, Vilamoura, 2008, pp. 1–4.[8] Wikipedia, “Game server,” Jun. 2012. [Online]. Available: http://en.wikipedia.org/wiki/Game server#Types of game servers39Bibliography[9] T. Yasui, Y. Ishibashi, and T. Ikedo, “Influences of network latency and packet loss on con-sistency in networked racing games,” in Proceedings of the 4th ACM SIGCOMM Workshopon Network and System Support for Games, Hawthorne, NY, 2005, pp. 1–8.[10] B. Hayes, “Cloud computing,” Communications of the ACM., vol. 51, no. 7, 2009.[11] L. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, “A break in the clouds: towardsa cloud definition,” ACM SIGCOMM Computer Communication Review, vol. 39, no. 1,pp. 50–55, 2009.[12] M. Armbrust et al., “A view of cloud computing,” Communications of the ACM, vol. 53,no. 4, pp. 50–58, 2010.[13] Y. Ewelle, R.and Francillette and A. Mahdi G., and Gouaich, “Level of detail based net-work adapted synchronization for cloud gaming,” in Proceedings of the 18th InternationalConference on Computer Games: AI, Animation, Mobile, Interactive Multimedia, Educa-tional Serious games, Louisville, KY, 2013, pp. 111–118.[14] eSecureData, “Servers 101 cloud servers vs. dedicated servers,”Jun. 2012. [Online]. Available: http://www.esecuredata.com/pricing/servers-101-cloud-servers-vs-dedicated-servers[15] steadfast, Nov. 2013. [Online]. Available: http://steadfast.net/blog/index.php/cloud/he-difference-between-cloud-hosting[16] A. Iosup, N. Yigitbasi, and D. Epema, “On the performance variability of production cloudservices,” in Proceedings of the 11th IEEE/ACM International Symposium on Cluster,Cloud and Grid Computing, Newport Beach, CA, 2011, pp. 104–113.[17] K. Chen, Y. Chang, P. Tseng, C. Huang, and C. Lei, “Measuring the latency of cloudgaming systems,” in Proceedings of the 19th ACM International Conference on Multimedia,Scottsdale, Arizona, USA, 2011, pp. 1269–1272.[18] P. Ross, “Cloud Computing’s Killer App: Gaming,” March 2009. [Online]. Available:http://spectrum.ieee.org/computing/networks/cloud-computings-killer-app-gaming40Bibliography[19] H. Hollister, “Sony announces playstation now, its cloud gaming service for tvs,consoles, and phones,” Jan. 2014. [Online]. Available: http://www.theverge.com/2014/1/7/5284294/sony-announces-playstation-now-cloud-gaming[20] D. Hands, “A basic multimedia quality model,” IEEE transactions on Multimedia, vol. 6,no. 6, pp. 806–816, 2004.[21] S. Tasaka and Y. Watanabe, “Real-time estimation of user-level QoS in audio-video IPtransmission by using temporal and spatial quality,” in Proceedings of the 2007 IEEEGlobal Telecommunications Conference, Washington, DC, 2007, pp. 2661–2666.[22] K. Yamagishi and T. Hayashi, “Qrp08-1: Opinion model for estimating video quality ofvideophone services,” in Proceedings of the 2006 IEEE Global Telecommunications Con-ference, San Francisco, CA, 2006, pp. 1–5.[23] D. Wu et al., “Streaming video over the internet: approaches and directions,” IEEE Trans-actions on Circuits and Systems for Video Technology, vol. 11, no. 3, pp. 282–300, 2001.[24] H. Ahmadi, S. Khoshnood, M. Hashemi, and S. Shirmohammadi, “Efficient bitrate reduc-tion using a game attention model in cloud gaming,” in Proceedings of the IEEE Interna-tional Symposium on Haptic Audio Visual Environments and Games, Istanbul, 2013, pp.103–108.[25] W. Cai, V. Leung, and M. Chen, “Next generation mobile cloud gaming,” in Proceedings ofthe IEEE 7th International Symposium on Service Oriented System Engineering, RedwoodCity, 2013, pp. 551–560.[26] S. Winkler and F. Dufaux, “Video quality evaluation for mobile streaming applications,”in Proceedings of SPIE Conference on Visual Communications and Image Processing, vol.5150, Lugano, Switzerland, 2003, pp. 593–603.[27] M. Ries, O. Nemethova, and M. Rupp, “Video quality estimation for mobile H.264/AVCvideo streaming,” Journal of Communications, vol. 3, pp. 41–50, 2008.[28] M. Claypool and K. Claypool, “Latency and player actions in online games,” Communi-cations of the ACM, vol. 49, no. 11, pp. 40–45, 2006.41Bibliography[29] M. Dick, O. Wellnitz, and L. Wolf, “Analysis of factors affecting players’ performance andperception in multiplayer games,” in Proceedings of the 4th ACM SIGCOMM Workshopon Network and System Support for Games, Hawthorne, NY, 2005, pp. 1–7.[30] W. Feng, F. Chang, and J. Walpole, “A traffic characterization of popular on-line games,”IEEE/ACM Transactions on Networking, vol. 13, no. 3, pp. 488–500, 2005.[31] H. Hong, D. Chen, C. Huang, K. Chen, and C. Hsu, “Qoe-aware virtual machine place-ment for cloud games,” Technical report, September 2013. http://nmsl. cs. nthu. edu.tw/dropbox/netgames13TR. pdf, Tech. Rep.[32] M. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming.Hoboken, New Jersey: John Wiley Sons, 2005.[33] E. Kafetzakis, H. Koumaras, M. A. Kourtis, and V. Koumaras, “QoE4CLOUD: A QoE-driven multidimensional framework for cloud environments,” in Proceedings of the 2012International Conference on Telecommunications and Multimedia, chania, 2012, pp. 77–82.[34] S. Wang and S. Dey, “Rendering adaptation to address communication and computationconstraints in cloud mobile gaming,” in Proceedings of the 2010 IEEE Global Telecommu-nications Conference, Miami,FL, 2010, pp. 1–6.[35] J. Laulajainen, T. Sutinen, and S. Jarvinen, “Experiments with QoS-aware gaming-on-demand service,” in Proceedings of the 20th International Conference on Advanced Infor-mation Networking and Applications, vol. 1, Vienna, 2006, pp. 805–810.[36] S. Jarvinen, J. Laulajainen, T. Sutinen, and S. Sallinen, “QoS-aware real-time video en-coding,” in Proceedings of the 3rd IEEE Consumer Communications and Networking Con-ference, vol. 2, Budapest, Hungary, 2006, pp. 994–997.[37] M. Ngo and V. Krishnamurthy, “Optimality of threshold policies for transmission schedul-ing in correlated fading channels,” IEEE Transactions on Communications, vol. 57, no. 8,pp. 2474–2483, 2009.42[38] Z. Icten, S. Shechter, L. Maillart, and M. Nagarajan, “Optimal management of a limitednumber of replacements under markovian deterioration,” IIE Transactions, vol. 45, no. 2,pp. 206–214, 2013.[39] M. Ngo and V. Krishnamurthy, “Monotonicity of constrained optimal transmission policiesin correlated fading channels with ARQ,” IEEE Transactions on Signal Processing, vol. 58,no. 1, pp. 438–451, 2010.[40] V. Krishnamurthy, R. Bitmead, M. Gevers, and E. Miehling, “Sequential detection withmutual information stopping cost,” IEEE Transactions on Signal Processing, vol. 60, no. 2,pp. 700–714, 2007.[41] V. Krishnamurthy, “Bayesian sequential detection with phase-distributed change time andnonlinear penalty: A POMDP lattice programming approach,” IEEE Transactions onInformation Theory, vol. 57, no. 10, pp. 7096–7117, 2011.[42] K. Reyouchi, E.and Ghoumid, K. Ameziane, O. Mrabet, and S. Mekaoui, “Performanceanalysis of round trip delay time in practical wireless network for telemanagement,” In-ternational Journal of Electrical, Electronic Science and Engineering, vol. 7, no. 11, pp.14–19, 2013.[43] P. Littieri, C. Fragouli, and M. Srivastava, “Low power error control for wireless links,” inProceedings of the 3rd annual ACM/IEEE international conference on Mobile computingand Networking, Budapest, Hungary, 1997, pp. 139–150.[44] J. Spall, Introduction to Stochastic Search and Optimization. Hoboken, New Jersey: JohnWiley Sons, 2005.43Appendix AProof of Theorem 1 and Theorem 2for Aggressive RewardsThe following notation is used frequently in the proofs of both Theorem 1 and Theorem 2. n−refers to any state where x frames have been transmitted in n units of time. n+ refers to astate where x frames have been transmitted in n− 1 units of time. x+ refers to a state wherex + 1 frames have been transmitted in n units of time and x− refers to x frames transmittedin n units of time. Finally, a+ refers to transmitting a high resolution frame i.e. a = 1 and a−refers to transmitting a low resolution frame i.e. a = 0.Definition: Let X and Y be partially ordered sets and g(x, y) a real ordered function onX × Y . Then g is supermodular if for x+ ≥ x− in X and y+ ≥ y− in Y ,g(x+, y+) + g(x−, y−) ≥ g(x+, y−) + g(x−, y+). (A.1)If the reverse inequality holds, g(x, y) is said to be submodular.In order to prove Theorem’s 1 and 2, we will prove the following lemmas which provideconditions under which there exist monotone optimal policies. The existence of Lemmas 1 -5 prove that the state value function in (3.3) is supermodular in x, i.e. there exist optimaldecision rules which are non-decreasing in x, the number of frames transmitted [32]. Theproperties of Lemmas 6 - 11 prove that state value function in (3.3) is submodular in n, i.e.there exist optimal decision rules which are non-increasing in n, the time elapsed in transmittingx frames [32].44A.1. Proof of Theorem 1A.1 Proof of Theorem 1The following lemmas are required to prove the existence of a monotone policy in x, the numberof frames transmitted:Lemma 1. The terminal reward, rN,i(x), is non-decreasing in xProof. The terminal reward is defined as rN,i(s) = (RH − RL) min(0, x − ft) where ft is thethreshold and s is the total number of complete frames sent. Clearly, as the state variable sincreases, rN (x) increases. Hence the terminal reward is non-decreasing in x.Lemma 2. The reward for sending a frame at any time n, rn,i(x, a) is non-decreasing in x forall a ∈ A.Proof. Let Rl be the reward associated with successfully transmitting a frame of length l, wherel = H for a = 1 and l = L for a = 0. Let dl = i/mµ+ lµip. Then,rn,i(x+ 1, a)− rn,i(x, a)=∑jt=N−n−dl∑t=lRlP(X = t) + (RH −RL)(x+ 1− ft)t=∞∑t=N−n−dl+1P(X = t)Pj−∑jt=N−n−dl∑t=lRlP(X = t) + (RH −RL)(x− ft)t=∞∑t=N−n−dl+1P(X = t)Pj=∑j(RH −RL)t=∞∑t=N−n−dl+1(t− 1l − 1)(1− pepe)lptePj ≥ 0. (A.2)From (A.2), the reward function is non-decreasing in x for all a ∈ A.Lemma 3. rn,i(x, a) is superadditive on S ×A.45A.1. Proof of Theorem 1Proof. Let dH = imµ +Hµip and dL =imµ + Lµip. Then,rn,i(x+, a+)− rn,i(x−, a+)=∑jt=N−n−dH∑t=HRHP(X = t) + (RH −RL)(x+ 1− ft)t=∞∑t=N−n−dH+1P(X = t)Pj−∑jt=N−n−dH∑t=HRHP(X = t) + (RH −RL)(x− ft)t=∞∑t=N−n−dH+1P(X = t)Pj=∑jt=∞∑t=N−n−dH+1(t− 1H − 1)(1− pepe)HptePj≥∑j(RH −RL)t=∞∑t=N−n−dL+1(t− 1L− 1)(1− pepE)LptePj=∑jt=N−n−dL∑t=LRLP(X = t) + (RH −RL)(x+ 1− ft)t=∞∑t=N−n−dL+1P(X = t)Pj−∑jt=N−n−dL∑t=LRLP(X = t) + (RH −RL)(x− ft)t=∞∑t=N−n−dL+1P(X = t)Pj= rn(x+, a−)− rn(x−, a−). (A.3)Eq. (A.3) is always true for t ≥ H + L and t >> imµ and t >> Hµip > Lµip. From (A.3), thesuperadditivity of rn(x, a) is established for playback delay and service rate small compared tothe RTT.Lemma 4. qn,i(k|x, a) is non-decreasing in x for all k ∈ S and a ∈ AProof. We need to prove that, for all a ∈ A,qn,i(k|x+ 1, a)− qn,i(k|x, a) =∑kpn,i(k|x+ 1, a)−∑kpn,i(k|x, a) ≥ 0. (A.4)46A.1. Proof of Theorem 1qn,i(x+ 1, a+)− qn,i(x, a+)=∑j(∑kpn,i(k|x+ 1, 1)−∑kpn,i(k|x, 1))Pj≥∑j(pn,i(x+ 1|x+ 1, 1) + pn,i(x+ 2|x+ 1, 1)− p(x|x, 1)− p(x+ 1|x, 1))Pj≥ 0. (A.5)qn,i(x+ 1, a−)− qn,i(x, a−)=∑j(∑kpn,i(k|x+ 1, 0)−∑kpn,i(k|x, 0))Pj≥∑j(pn,i(x+ 1|x+ 1, 0) + pn,i(x+ 2|x+ 1, 0)− p(x|x, 0)− p(x+ 1|x, 0))Pj≥ 0. (A.6)From (A.5) and (A.6), qn,i(x, a) is non-decreasing in x for all k ∈ S and a ∈ ALemma 5. For all k ∈ S, qn,i(k|x, a) is superadditive on S ×A.Proof.qn,i(k|x+, a+)− qn,i(k|x−, a+)=∑j(∑kpn,i(k|x+ 1, 1)−∑kpn,i(k|x, 1))Pj≥∑j(pn,i(x+ 1|x+ 1, 1) + pn,i(x+ 2|x+ 1, 1)− p(x|x, 1)− p(x+ 1|x, 1))Pj=∑j(t=N−n−dH∑t=HP(X = t))Pj (A.7)≥∑j(N−n−dL∑t=LP(X = t))Pj= qn,i(k|x+, a−)− qn,i(k|x−, a−). (A.8)From (A.13), the superadditivity of qn(s, a) is established.47A.2. Proof of Theorem 2Lemmas 1-5, prove the existence of a non-decreasing monotone policy in x [32].A.2 Proof of Theorem 2The following lemmas are required to prove the existence of a monotone policy in the timespent in transmitting x frames:Lemma 6. The terminal reward, rN,i(x) is non-decreasing in n.Proof. The terminal reward is defined as rN,i(x) = (RH − RL) min(0, x − ft) where ft is thethreshold and s is the total number of complete frames sent. Clearly,the terminal reward isindependent of n and is hence non-decreasing in n.Lemma 7. The transition probability matrix has the Increasing Failure Rate (IFR) property; i.e.k=∞∑k=n+apx,i(k|n) is non-decreasing in n (the time spent in transmitting x frames) for alla ∈ A and k = n+ l, n+ l+1, ...., N−n where l is the length of the type (high or low resolution)frame that has been transmitted.Proof. Let l be the length of the frame chosen to be transmitted. l = H for a = 1 and l = Lfor a = 0. Let dl = i/(mµ) + µip. Then,∑j∑kpx,i(k/n+ 1, a)Pj −∑j∑kpx,i(k/n, a)Pj =∑jt=∞∑t=k−dl−n−1P(X = t)−t=∞∑t=k−dl−nP(X = t)Pj =∑j(k − n− dl − 2l − 1)(1− pepe)lpk−n−dle Pj≥ 0 (A.9)Lemma 8. The reward for sending a frame when n units of time are remaining, and x frameshave already been transmitted, rx,i(n, a) is subadditive on S ×A.48A.2. Proof of Theorem 2Proof. Let dH = imµ +Hµip and let dL =imµ + Lµip. Then,rx,i(n−, a+)− rx,i(n+, a+) =∑jPj((RH − (RH −RL)(x− ft))(N − n− dH − 1H − 1)pN−n−dHe(1− pepe)H)≥∑jPj((RL − (RH −RL)(x− ft)(N − n− dL − 1L− 1)pN−n−dLe(1− pepe)L)= rx,i(n−, a−)− rx,i(n+, a−). (A.10)Eq. (A.10) is a result of (A.14) and is true when the residual transmission timel N − n ≤(H + L− 1), pe < 0.5 and when (N − n >> Hµip > Lµip) and (N − n >> imµ).Lemma 9. The quantity qx,i(k/n, a) is non-decreasing in n for all a ∈ A.Proof. This is a direct consequence of Lemma 7.Lemma 10. The quantity qx,i(k/s, a) =∑kpx,i(k|n, a) is a subadditive function on S × A fora subset k ∈ S.Proof. Let dH = imµ +Hµip and let dL =imµ + Lµip. Then,∑j∑kpx,i(k/n+ 1, 1)Pj −∑j∑kpx,i(k/n, 1)Pj =∑j(k − n− dH − 2H − 1)(1− pp)Hpk−n−dHPj (A.11)≥∑j(k − n− dL − 2L− 1)(1− pp)Lpk−n−dLPj = (A.12)∑j∑kps,i(k/n+ 1, 0)Pj −∑j∑kps,i(k/n, 0)Pj . (A.13)Eq. (A.11) and (A.13) are a result of (A.9) and the inequality holds for all k such thatL + 1 ≤ k ≤ H. Since k − n is the time within which a frame is tranmitted, a high resolu-tion frame of length H must be transmitted within 1 time units. This condition requires verylow values of the error probabilities. The inequality also requires that (k − n >> Lµip) and49A.2. Proof of Theorem 2(k − n >> imµ). The sub additivity of qx,i(k/s, a) is hence proved through (A.13).Lemma 11. The reward for sending a frame when n units of time are remaining, and x frameshave already been transmitted, rx,i(n, a) is non-increasing in n for all a ∈ A.Proof. Let Rl be the reward associated with successfully transmitting a frame of length l, wherel = H for a = 1 and l = L for a = 0. Let dl = imµ + lµip. Then,rx,i(n−, a)− rx,i(n+, a) =Rl∑j∑kpx,i(k/n, a)Pj −∑j∑kpx,i(k/n+ 1, a)Pj =∑jPjRlt=N−n−dl∑t=lP(X = t) + (RH −RL)(x− ft)t=∞∑t=N−n−dl+1P(X = t)−∑jPjRlt=N−n−dl−1∑t=lP(X = t) + (RH −RL)(x− ft)t=∞∑t=N−n−dlP(X = t) =∑jPj((Rl + (RH −RL)(x− ft))(N − n− dl − 1l − 1)pN−n−dl(1− pepe)l)≥ 0. (A.14)Eq (A.14) is true for all n whenever Rl + (RH − RL)(x − ft) > 0 for (a ∈ 0, 1) and shows thereward is non-increasing in n. However, in order to prove the existence of a threshold policy, werequire that the instantaneous rewards rx,i(n, a) is non-decreasing for all a ∈ A.So our originalconjecture that the instantaneous rewards are non-decreasing does not hold. However, for thechosen examples, the simulation results do show the existence of a threshold policy. This isbecause the conditions mentioned in theorem 2 are sufficient but not necessary [32].Although, the necessary conditions for a threshold policy are not met [32], the simulationsshow that the threshold results still hold.50Appendix BProof of Theorem 1 and Theorem 2for Neutral RewardsB.1 proof of Theorem 1Lemma 12. The terminal reward, rN,i(x), is non-decreasing in xProof. The proof is similar to Lemma 8.Lemma 13. The reward for sending a frame when n units of time are remaining, and x frameshave already been transmitted, rx,i(n, a) is subadditive on S ×A.Proof. Let Rl be the reward associated with successfully transmitting a frame of length l, wherel = H for a = 1 and l = L for a = 0. Then:rn,i(x+ 1, a)− rn,i(x, a)= Rl −Rl = 0. (B.1)From (B.1), the reward function is non-decreasing in x for all a ∈ A.Lemma 14. rn,i(x, a) is superadditive on S ×AProof.rn,i(x+, a+)− rn,i(x−, a+)= RH −RH≥ RL −RL= rn(x+, a−)− rn(x−, a−). (B.2)51B.2. proof of Theorem 2From (B.2) the superadditivity of rn(x, a) is established.Lemma 15. qn,i(k|x, a) is non-decreasing in x for all k ∈ S and a ∈ AProof. The proof is similar to Lemma 4Lemma 16. qn,i(k|x, a) is superadditive on S ×A and for all k ∈ SProof. The proof is similar to Lemma 5Lemmas 12 - 16 establish a monotone policy in x [32].B.2 proof of Theorem 2Lemma 17. The terminal reward, rN,i(x) is non-decreasing in nProof. The proof is similar to Lemma 6Lemma 18. The reward for sending a frame when n units of time are remaining, and x frameshave already been transmitted, rx,i(n, a) is subadditive on S ×A.Proof. The proof is similar to Lemma 14Lemma 19. The quantity qx,i(k/n, a) is non-decreasing in n for all a ∈ A.Proof. This is a direct consequence of Lemma 7.Lemma 20. The quantity qx,i(k/s, a) =∑kpx,i(k|n, a) is a subadditive function on S × A fora subset k ∈ S.Proof. The proof is similar to Lemma 10Lemma 21. The reward for sending a frame when n units of time are remaining, and x frameshave already been transmitted, rx,i(n, a) is non-increasing in n for all a ∈ AProof. The proof is similar to Lemma 16Lemmas 17 - 21 establish a monotone policy in n [32].52

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0167449/manifest

Comment

Related Items