UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Stochastic resource allocation in wireless networks Farrokh, Arsalan 2007

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2007-317709.pdf [ 9.11MB ]
Metadata
JSON: 831-1.0302179.json
JSON-LD: 831-1.0302179-ld.json
RDF/XML (Pretty): 831-1.0302179-rdf.xml
RDF/JSON: 831-1.0302179-rdf.json
Turtle: 831-1.0302179-turtle.txt
N-Triples: 831-1.0302179-rdf-ntriples.txt
Original Record: 831-1.0302179-source.json
Full Text
831-1.0302179-fulltext.txt
Citation
831-1.0302179.ris

Full Text

Stochastic Resource Allocat ion Wireless Networks by Arsalan Farrokh M.Sc, Simon Fraser University, 2002 A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy m The Faculty of Graduate Studies (Electrical and Computer Engineering) The University Of British Columbia October, 2007 © Arsalan Farrokh 2007 Abstract This thesis presents several efficient and adaptive resource allocation schemes in wire-less networks under the framework of Markov Decision Problem (MDP). In particular, we formulate meaningful trade-offs for three specific resource allocation problems as MDPs and show that their solutions exhibit certain special structures. In each case, by utilizing the underlying structure, we present a resource allocation solution that is computationally inexpensive and is scalable in terms of the system parameters. First, we present opportunistic algorithms in scheduling High Speed Downlink Packet Access (HSDPA) users that exploit channel and buffer variations to increase the probability of uninterrupted media play-out. We formulate a feasibility problem with stability and robustness Quality-of-Service (QoS) constraints. A methodology for obtaining a feasible solution is proposed by starting with a stable algorithm that satisfies the stability QoS constraints. Next, we present optimal adaptive modulation and coding policies that minimize the transmission latency and modulation/coding switching cost across finite-state Markovian fading channels. The optimal tradeoff between the transmission delay and the switching costs is formulated as a discounted cost infinite horizon MDP. We show that under certain sufficient conditions optimal modulation and coding selection policies are monotone in the state variables. Finally, we present an ARQ-based power and retransmission control policy that achieves an optimal tradeoff between transmission power, delay, and packet drop penalty costs. Under certain sufficient conditions, we show that the optimal power and retransmission control policies are monotone in the channel quality, the penalty cost, and the number of the retransmission slots left. ii Table of Contents Abstract ii Table of Contents iii List of Tables vii List of Figures viii Acknowledgements x Dedication xi 1 Introduction 1 1.1 Methodology and Motivation 1 1.1.1 Resource Allocation in MDP Framework 2 1.1.2 Numerical Methods for Solving MDPs 3 1.1.3 Structural Results 4 1.2 Objective and Overview >, 5 1.2.1 Objective 5 1.2.2 Unifying theme 5 1.2.3 Overview of Thesis 6 1.2.4 Structure of Thesis 9 1.2.5 General Literature Review 9 1.3 Main Contributions 13 1.3.1 User Scheduling for Streaming Media in HSDPA 13 1.3.2 Adaptive Modulation and Coding with Switching cost 14 1.3.3 Power and Retransmission Control in ARQ 14 1.4 Supporting Publications 15 1.5 Organization of Thesis 16 iii Table of Contents 2 Opportunistic Scheduling in HSDPA 18 2.1 Introduction 18 2.2 H S D P A Streaming Discrete Event System Model 21 2.3 Formulation of HSDPA Scheduling Problem with QoS Requirements . . . 26 2.3.1 Opportunistic Joint User Scheduling and M C M Assignment . . . 26 2.3.2 Formulation of QoS Constraints for H S D P A Streaming 27 2.3.3 H S D P A Scheduling Problem 29 2.4 Opportunistic Scheduling Algorithm For Streaming HSDPA Users . . . . 30 2.4.1 Summary of the Opportunistic Streaming Algorithm 31 2.4.2 Initialization 32 2.4.3 Estimating per-TTI Capacities 32 2.4.4 Computing Optimal M C M ' s 33 2.4.5 Joint M - L W W F User Scheduling and M C M Assignment 34 2.4.6 Updating Fairness Parameters by Stochastic Approximation . . . 35 2.4.7 Convergence of the Stochastic Approximation Algorithm 37 2.5 Numerical Results 39 2.5.1 Simulation Scenarios 40 2.6 Conclusions 43 3 Transmission Rate Control with Switching Costs 46 3.1 Introduction 46 3.2 System Model 48 3.3 Problem Formulation for Optimal Adaptive Modulation and Coding -M D P Framework . 50 3.3.1 Components of the Discounted Infinite Horizon Markov Decision Problem (MDP) 50 3.3.2 Formulation as a Discounted Infinite Horizon M D P 52 3.4 Solutions to the M C S Selection M D P - Structural Results 53 3.4.1 Monotone Optimal M C S Selection Policies 53 3.4.2 Properties of Optimal M C S Selection Policies with Constant Switch-ing Cost 56 3.5 Counter Examples 58 3.6 Numerical Results 59 3.7 Conclusions 61 3.8 Analytical Proofs 62 iv Table of Contents 3.8.1 Proofs of the Main Monotonicity Results 62 3.8.2 Proof of Optimal MCS Selection Properties 63 4 Optimal Power and Retransmission Control in A R Q 72 4.1 Introduction 72 4.2 System Model 74 4.3 Frame Retransmission Control - MDP Framework 75 4.3.1 MDP Formulation of Frame Retransmission Control 76 4.3.2 Optimal Monotone Frame Retransmission Control Policies . . . . 79 4.4 Penalty Cost Allocation for Frame Transmission Sessions 83 4.4.1 Formulation of the Penalty Cost Allocation Problem 83 4.4.2 Optimal Monotone Penalty Cost Allocation Policies 85 4.5 Numerical Results 86 4.6 Background and Analytical Proofs 87 4.6.1 Background: Supermodular Functions and General Monotonicity Results 87 4.6.2 Analytical Proofs 88 4.7 Conclusions 91 5 Conclusions and Future Work 99 5.1 Theme of Thesis 99 5.2 Summary of Work Accomplished 100 5.3 Future Work 101 Bibliography 103 Appendices A Optimal Scheduling of a Dedicated-Platform 110 A . l Introduction 110 A.2 System Model 112 A.3 Formulation as a Markovian Search Problem - A POMDP Framework . . 114 A.4 Optimal Policies for the Operation of the Dedicated-Platform 117 A.5 Numerical Examples 119 A.6 Conclusions 120 v Table of Contents B Proof of Theorem 5 124 B . l Main Proof 124 vi List of Tables 2.1 Modulation and Coding Schemes 40 2.2 Simulation Parameters 40 3.1 Simulation Parameters 65 3.2 SNR quantization levels in (dB) for Markovian channel model: {9\, 62, • • •, # 2 4 } 65 4.1 Simulation Parameters 92 4.2 SNR quantization levels (linear) for Markovian channel model: {81,62, • •., #24} 93 4.3 Portion of transition probabilities r/jj of the Makovian channel model: t = {1,2,. . . , 8}, j = {1,2,. . . , 11} 93 vii List of Figures 1.1 General work plan of thesis 6 1.2 Instants of monotonicity and submodularity 8 1.3 Optimal power allocation vs. the number of retransmissions 9 2.1 Wireless Streaming Media System Model 24 2.2 Comparison of maximum supported HSDPA users in Scenario 1 42 2.3 Comparison of maximum supported HSDPA users in Scenario 2 44 2.4 Comparison of maximum supported HSDPA users in Scenario 3 45 3.1 Delay costs of M-ary QAMs with 25 symbols per frame vs. channel SNR. 66 3.2 Optimal MCS selection policy /U*(s, j) for Case 1: Two modulation schemes QPSK and 16-QAM and constant switching cost Cs = 30 67 3.3 Optimal cost V(s,j) vs. channel SNR 68 3.4 Optimal MCS selection vs. channel state SNR for Case 2 and 3 with M — 2fc-ary QAMs, k = {1, 2 . . . . , 6} 69 3.5 Optimal Costs vs. channel state SNR for simulation Case 2 and Case 3. Each curve represents V(s,j) where j is a M-ary Q A M scheme shown in the legend 70 3.6 Performance-evaluation of optimal MCS selection in terms of the through-put and the switching rate for Case 2 and Case 3 71 4.1 Monotonicity structures of the optimal power allocation. 94 4.2 Optimal power allocation vs. channel states and retransmission slots with Cf = 10000 and D = 1 95 4.3 Optimal aborting decisions vs. retransmission slots with D = 500, Cf — 1000, and s = 2 96 4.4 Comparison of the optimal policy to fixed power policies with Cf — 10000 and£> = l 96 viii List of Figures 4.5 Comparison of the optimal policy to fixed power policies with Cf = 10000 and D = 500 97 4.6 Patterns of the optimal power consumption 98 5.1 Illustration of the work accomplished 100 A . l Average cost vs. Target-task transition memory: a — h: c\ = 4, c\ — 1, Pd = 0.7 12.1 A.2 Average cost vs. Target-task transition memory: a = h, c\ — 4, c\ = 1, Pd = 0.9. 122 A. 3 Average cost vs. Target-task transition memory: a = h, c\ — 2, c\ — 1, Pd = 0.7 123 B. l A general form of non-monotone optimal transmission mode selection poli-cies as described by (B.l) and (B.2): Blank and filled circles represent transmission mode-1 and transmission mode-2, respectively. Hatched cir-cles represent undetermined transmission modes 126 ix Acknowledgements I thank my supervisor Dr. Krishnamurthy for supporting this research, and giving me the opportunity to work in this interesting research area. His supports and visions have been invaluable and assisted me greatly during the last four years. I would also like to thank Dr. Schober for his meticulous collaboration and valuable assistance. Finally, I would like to thank my parents, my friends, and colleagues in the statistical signal processing lab. I am grateful for a scholarship from the National Science and Engineering Research Council of Canada as well as graduate fellowships, and stipends from University of British Columbia. Dedicated to a Better Tomorrow xi Chapter 1 Introduction Emerging packet-based multiple access wireless solutions aim at efficient Quality-of-Service (QoS) provisioning of different traffic tjqpes while maximizing the network capac-ity. Examples are Wireless Local Area Networks (WLANs) with Internet backbones, ad hoc and sensor networks, and emerging High-Speed Packet Access (HSPA) networks [1,2]. The design challenges of these networks are to achieve suitable trade-offs between the end user QoS constraints and the overall network utilization. The end user QoS is essentially the collection of performance metrics such as delay and error rate that determine the end user satisfaction [3]. Different traffic classes can be distinguished through their distinct QoS requirements. For instance, data services are in general more delay tolerant and more error sensitive than voice or multimedia services. An appropriate system design must consider these traffic characteristics with sufficient scalability. One important factor in achieving the required mixed-traffic QoS requirements is utilizing efficient wireless link adaptation by matching relevant transmission parameters and resources to the conditions on the radio link. Optimal resource allocation and scheduling are then imperative to fully utilize these link adaptation strategies. System resources such as time slots, radio spectrum, transmission power, etc., must be allocated optimally based on the available and relevant information. The focus of this thesis is to devise effective resource management and scheduling algorithms for several different link adaptation methods that jointly optimize QoS and network capacity for multi-service traffic. This introductory chapter gives an overview of these algorithms which includes the related works, methodology, motivation, and contributions. 1.1 Me thodo logy and M o t i v a t i o n The problem of optimal resource allocation in time-slotted (packet-based) wireless net-works falls naturally under the class of Markov Decision Problems (MDPs) [4,5]. In this section, a brief overview of MDP frameworks and their solution methodologies is presented. 1 Chapter 1. Introduction 1.1.1 Resource Allocation in M D P Framework Resource allocation decisions are dynamically made in each time-slot based on the avail-able and relevant system information (information state). The information state can be assumed, with broad generality, to evolve as a controlled Markov process where the outcome of each decision will influence the future information states of the system. The objective is then to select the sequence of the optimal resource allocation actions to minimize certain expected cost which reflects the underlying QoS and capacity trade-off. More formally, let xk denote the Markovian state that summarizes all the available information relevant to the link adaptation up to time k. For example Xk may convey information about the quality of the wireless fading channels at time A:. Also, let Uk denote the resource allocation action at time k which may represent the scheduled user number, allocated transmission power, etc. Action Uk is selected based on the available information summarized in state Xk and may be obtained by the mapping Uk = Mfc(^ fc) from the state Xk at time k. Here, the collection of mappings {/ifc}> k = 0,1, 2,... is often referred to as a non-randomized policy. If pk is invariant in k then this policy, denoted by fj,, is stationary. At any time k, an immediate cost Qk{xk, Uk) is considered that represents a trade-off between the scarce system resources and the involved performance measures. The objective is to choose a resource allocation policy {u-k} so that the expected cost over a given horizon N is minimized. Let J*(XQ) denote the optimal cost in initial state XQ, so we have (cf. [4,5]) J*(x0) := min E \ V gk(xk, Pk(xk)) > , { t l k } U=o J where E denote the expectation over all paths of random process Xk- Assuming station-arity, horizon N can be also equal to infinity in which case the dependence of functions gk(-) and pk(') to k may be dropped and appropriate discounting may be applied to formulate an infinite-horizon discounted-cost problem as min E <^  V ] akg(xk, u-(xk)) } , where 0 < a < 1 is the discount factor. These optimizations often lead to Bellman optirnality equation (or recursion in finite horizon case) in terms of J* that must be solved to obtain the optimal policy [4,5]. The first step in developing an efficient QoS-aware resource allocation algorithms is then to 2 Chapter 1. Introduction meaningfully formulate the involved trade-offs under the MDP framework and obtain the corresponding Bellman equation or recursion. In doing so, QoS constraints and capacity considerations must be expressed in terms of the measurable quantities such as packet loss, delay, power, etc. Also, the information state may include quantities such as channel quality, status of the user queues, etc. Depending on the underlined QoS class, the immediate cost gk(,Xk,Uk) represents a trade-off between involved system resources (e.g., power) and relevant performance measures (e.g., throughput, delay) at time k. Remark 1 The MDP formulation may be further generalized into two following main classes: Constrained M D P s : More general formulations of resource allocation problems can be obtained under constrained MDPs in which certain constraints on some new costs may be introduced [6]. Solutions to constrained MDPs are often obtained by reducing the constrained formulations to the unconstrained MDPs using Lagrange multipliers. In this case the optimal policy is randomized in general. Several literatures investigate special structures under constrained MDP formulations using multi-modularity concepts, cf. [7], [8]. These results are out of the scope of this thesis since we limit our focus to more tractable structural results in unconstrained MDPs. Partially Observable M D P s (POMDPs): If the system states can only be partially observed, resource allocation problems are more appropriately formulated in POMDP framework [4,9] . Solutions to POMDPs can be obtained by formulating the problem under new information states, denoted by sufficient statistics, that can be known per-fectly. These sufficient statistics are essentially certain conditional distributions and hence continuous in nature. In this case, due to the continuity of the information states, the investigation and proof of the structural results may be more prohibitive. In Ap-pendix A, we present our work in the on/off scheduling of a dedicated platform under POMDP framework that can be referred to for a general methodology of formulating and solving POMDPs. 1.1.2 Numerical Methods for Solving MDPs In general, closed-form solutions for the underlying Bellman equation (or recursion) may not exist. Dynamic programming methods are often applied to solve this equation numerically and to obtain globally optimal resource allocation algorithms. For the func-tional recursion in the finite horizon case, each recursion involves 0(AB2) computational steps, where A is the number of actions and B is the number of states [10]. Hence, the 3 Chapter 1. Introduction total of 0(NAB2) steps are required to numerically solve a finite horizon MDP with horizon A7. In the case of the infinite horizon MDP, there are two main methods to solve the underlying Bellman functional equation. First method is value iteration, where the optimal value function is successively approximated up to the desired error toler-ance. Each iteration again involves 0(AB2) computational steps. The value iteration method requires an infinite number of iterations to converge to the exact solution. Fur-thermore, in applying value iteration for discounted cost infinite horizon MDPs with a given error tolerance, the number of iterations required can grow exponentially in the discount factor. In particular, as the discount factor approaches 1, the correlation of the current decisions with the future outcomes increases and therefore, the convergence rate decreases accordingly. An alternative method to compute the optimal policy of an infinite horizon MDP is Policy Iteration which requires a finite number of iterations to obtain the exact so-lution assuming that action and state spaces are finite sets. The idea is to successively approximate the optimal policy and since for a finite number of actions and states the set of stationary policies is also finite, this method converges in a finite number of itera-tions. However, this increased accuracy comes with the increased computational steps of 0(AB2) + B3 per iteration. Different versions of policy iteration algorithm or so called modified policy iteration methods have been developed in the literature to somewhat relax this extra computational burden [4], [5]. 1.1.3 Structural Results As the number of actions and the number of states grow, the numerical dynamic pro-gramming methods become computationally more complex. In addition to this "curse of dimensionality", exact modeling the underlying problem into a tractable MDP frame-work may be a daunting task. In particular, the system parameters such as channel transition probabilities may not be known or identified exactly. Therefore, the numer-ical dynamic programming solutions generally deviate from the true optimal solutions due to the estimation errors of the parameters involved. Furthermore, these numerical methods provide little qualitative understanding of the optimal solutions in terms of the characteristics of the optimal polices or their sensitivity to the variations of the system parameters. In view of these arguments, there is a motivation to explore whether the op-timal policies of the underlying MDPs have any special structures such as monotonicity, etc. This mainly help us in the following aspects: 4 Chapter 1. Introduction • Gaining intuition into the behavior of the optimal policy and possibly developing rule-of- thumb simple and flexible suboptimal resource allocation policies. • Obtaining structures of the optimal resource allocation policies for a large class of the MDP parameters. These structural results can be exploited to develop efficient resource allocation policies without the exact knowledge of the system parameters. • Alleviate the complexity of the numerical dynamic programming algorithms by using their special structures to modify them into more efficient algorithms. As an example, if the the optimal policy is monotone, the policy iteration algorithm can be modified into a more efficient monotone policy iteration algorithm (cf. [5, p.259]. 1.2 Object ive and Overv iew In this section, we present the objective of this thesis along with a brief overview of the problems considered. 1.2.1 Objective The aim of this research is to develop channel-aware and computationally efficient re-source allocation algorithms by using special structures in the MDP framework in order to achieve suitable trade-offs between QoS and network capacity. Figure 1.1 shows the general work plan considered in this thesis. 1.2.2 Unifying theme In this thesis, we have considered three separate resource allocation problems under the MDP framework. The unifying theme is obtaining and utilizing certain structural results to alleviate the complexity of the numerical dynamic programming algorithms and to provide useful qualitative understanding of optimal solutions. 5 Chapter 1. Introduction Devise Efficient and Flexible Resource Allocation Algorithms Figure 1.1: General work plan of thesis. 1.2.3 Overview of Thesis In this thesis, we consider three specific scheduling and resource allocation problems and focus on meaningful formulation of these problems under the MDP framework. We show certain special structures in these problems that can be exploited to devise computationally efficient and real-time implementable scheduling and resource allocation algorithms. These problems and their special structures are summarized in below. Opportunistic (myopic) Policies in High Speed Downlink Packet Access (HS-DPA): The first problem deals with o p p o r t u n i s t i c scheduling for streaming media traffic in High Speed Downlink Packet Access (HSDPA) systems. The term opportunistic refers to the fact that at any time slot user channel and/or buffer information are exploited to schedule the service in a myopic manner to the r e l a t i v e l y better users in that time slot (by "better" we mean better in terms of certain performance measure such as throughput). More specifically, depending on the required QoS, a dynamic state-dependent reward 6 Chapter 1. Introduction metric is considered and at each stage, the users with the highest reward metrics are assigned the service [11]. The general structure of an opportunistic policy can be stated as p{xk) = argmax <f>i{xk) (1.1) where xk is the system or information state at time fc, i denotes the user number, p(-) gives the scheduled user number, 3 is the set of users and (pi(xk) is the metric associated with user i at time fc. One extreme case is to always assign the service to the user with the best instantaneous channel quality (greedy algorithm), i.e., </>i(xk) = Sj(fc), where Si(fc) denotes the channel quality of user i at time fc. The greedy algorithms generally yields the highest overall throughput but starves the users with relatively lower channel qualities. Since often we need to maintain certain fairness among the users, the greedy algorithm may not be feasible. In general, the scheduling problem is formulated as maximizing the total average throughput while certain individual average QoS constraints must be maintained [11]. In our case, we consider an special opportunistic algorithm that prioritize users based on both queue lengths and channel qualities with some induced degrees of freedom. It can be shown that under certain conditions on the dynamic of the channel and QoS constraints, the opportunistic scheduling policy guarantee the QoS constraints and/or achieves the optimal throughput [12,13]. Note that the opportunistic algorithm in the form of (1.1) is computationally efficient. It requires only |3| comparisons (| • | denotes cardinality). Optimal threshold-based adaptive modulation and coding with switching costs: The second problem considered in this thesis is optimal transmission rate control via adaptive modulation and coding where there is a switching cost involved for changing the transmission rates. After proper formulation of this problem under the MDP frame-work, we will show that the optimal assignment of the modulation schemes is indeed monotone in the state variables. Figure 1.2(a) shows an instance of a threshold-based optimal modulation selection. Here, there are two modulation schemes (two transmission rates) available with a constant switching cost. It can be seen that there is a threshold in the channel quality where switching from modulation mode 1 to modulation mode 2 is optimal. Other instances of optimal threshold policy is given in Chapter 3. Also, here we use certain submodularity (supermodularity) properties of the optimal cost (value function to prove the main monotonicity results [5, 10]. Submodularity (supermodu-larity) of a function f(x,y) simply means / has decreasing (increasing) variations, i.e., 7 Chapter 1. Introduction for x\ > x'2, f(xi,y) — f(x2,y) is decreasing (increasing) in y. More detail on sub-modularity definition along with a brief overview of the main optimization properties of submodular (supermodular) functions are given in Section 4.6.1. Figure 1.2(b) shows an instance of a submodularity value function. Other monotonicity results and properties of the underlying rate control problem are further investigated and rigorously proved in Chapter 3. c o D T3 O ro E —^. Q. o HSBQ8B8IIIIB:' -a—B— 0 5 10 15 Channel SNR (dB) 20 (a) Opt imal modulation selection vs. channel state. Optimal Cost Variaions vs. SNR > i csT > 30 20 10 0 -10 -20 -30 \ s a s s B i i m i a 0 5 10 15 20 Channel SNR (dB) 25 (b) Variation of the optimal cost vs. channel state. Figure 1.2: Instants of monotonicity and submodularity. Optimal power and retransmission control in Automatic Repeat Request ( A R Q ) with packet drop penalty cost: Finally, we consider a resource allocation problem of power and retransmission control in Automatic Repeat Request (ARQ) with 8 Chapter 1. Introduction packet drop penalty cost. By formulating the problem under the MDP framework, we establish a suitable trade-off between delay, power and packet drop costs. Certain monotonicity properties of this problem are shown and proved rigourously. For instance, Figure 1.3 shows monotonicity of the optimal transmission power in the number of retransmissions for some given channel states s = 5 to s = 10. These monotonicity properties lead to scalable and computationally efficient power and retransmission control policies. Optimal Power vs. Retransmission Slots 20 18 16 14 i I 12 s. E 10 6 4 2* '" ' ' ' ' 1 ' 1 2 3 4 5 6 7 8 n Figure 1.3: Optimal power allocation vs. the number of retransmissions. 1.2.4 Structure of Thesis This thesis is presented in four chapters including the first introductory chapter. Each chapter is self-contained and a separate literature review is presented in each chapter. A version of Chapter 2 has been published in [13]. A version of Chapter 3, co-authored by V. Krishnamurthy and R. Schober, has been accepted for publication in IEEE transactions on communications with minor revisions. A version of Chapter 4, co-authored by V. Krishnamurthy and R. Schober, is under preparation for submission to IEEE transactions on wireless communications. A version of Appendix A is presented in [14]. 1.2.5 General Literature Review Each chapter of this thesis provides a separate literature review relevant to the problem considered in that chapter. In this section, we provide a general literature review to put 9 Chapter 1. . Introduction our work in more perspective. 3G cellular systems and HSDPA: The effectiveness of suitably designed radio resource allocation schemes for 3G high speed data services is extensively studied in the literature. The concept of 3G wireless high data rates services along with adaptation techniques is discussed in [15] and [16]. These papers discuss the mechanism that used to improve spectral efficiency of a 3G wireless system. A description of 3GPP Release 6 updates along with a chapter on HSDPA is provided in [2]. A brief review of the main elements of the HSDPA standard as an evolution to WCDMA is given in [17]. Further short overviews of HSDPA basic operation and applications are provided in [18] and [19]. The performance aspects of WCDMA systems with HSDPA is studied in [20]. The study considers packet scheduling and the trade-off among user fairness and cell throughput. In [21] link and network layer performance aspects of 3G WCDMA-HSDPA systems in terms of improving peak data rates and blocking probabilities are studied. The paper presents methods to map link level performance of the HSDPA retransmission schemes into a low-complexity network level representation. [22] gives a simulation-based eval-uation of the HSDPA performance for data streaming applications. The performance sensitivity of HSDPA to call model, peak data rate and C/I feedback delay is evaluated by simulation in [23]. One major advance in HSDPA as compared with previous releases of the UMTS-WCDMA specifications is associated with scheduling in fading channels. The use of A M C and time-slotted CDMA structure allows multiplexing in both time and code domain and results in more dynamic resource allocation schemes. In HSDPA user channel information can be fed back to base station via a dedicated channel. Therefore, the user fading-channel variations can be exploited by the base station to achieve a more efficient and intelligent scheduling under given QoS constraints. There are few references for the problem of HSDPA scheduling. A comprehensive description of packet scheduling schemes in HSDPA for various QoS constraints is presented in [3]. In [24] an Internet Protocol (IP) packet scheduler for HSDPA is designed that considers end-to-end applica-tion layer performance. Link and system performance of HSDPA with proportional fair scheduling is examined and simulated in [25]. The result of the paper shows the capacity gain depends largely on the operating conditions. The throughput performance of three different packet scheduling algorithms in HSDPA is compared by simulations in [26]. A comparison of "opportunistic" scheduling algorithms for streaming media in HSDPA is recently presented in [27] (see also the list of accepted papers). The paper assumes per-fect channel estimation and presents a unified structure of the opportunistic scheduling policies which exploit channel and/or buffer variations for achieving the required QoS. 10 Chapter 1. Introduction The result shows that for streaming applications the best performance is obtained by using both channel and buffer information. In [28] an opportunistic scheduling algorithm for streaming users in HSDPA is presented (see also the list of accepted papers). The algorithm uses both channel and buffer variations to assign service to one user at a given time slot. A more rigorous treatment of opportunistic scheduling methods for streaming users in HSDPA is given in [14]. The paper considers two sets of probabilistic long term and short term QoS constraints and present a feasible solution to the problem (see the list of submitted papers). There are several references for scheduling wireless users in general time slotted systems. In [12], the problem of scheduling time-slotted CDMA users (one user per time slot) in downlink with variable channel conditions is considered. The paper assumes full knowledge of channel condition and proposes an algorithm that considers both channel state information and user buffer state information for scheduling decisions. It is proven mathematically in the paper that the algorithm is capable of keep-ing the user queues stable (if stability is feasible). The algorithm in general guarantees the QoS but may not achieve the system capacity. A general approach to support QoS of multiple real-time data users is proposed in [29]. The scheduling algorithm is similar to [12] and can be used to maximize the number of users that can be supported with the desired QoS. Assuming a linear relationship between the bit rate and signal-to-noise ratio, it is shown in [30] that scheduling one user at a time with full power gives superior throughput performance compared to scheduling simultaneous users. Fair scheduling of delay and rate-sensitive packets over a wireless channel is considered in [31]. The paper presents an ideal wireless fair scheduling algorithm base on adaptation of fluid fair queuing with perfect channel information. In [32] a comparison of time slot alloca-tion strategies for time slotted CDMA systems is presented. The paper considers slot allocation strategies with traffic asymmetry between uplink and downlink. In [11] oppor-tunistic scheduling methods with perfect channel information for a time-slotted wireless system are presented. The aim is to maximize the cell throughput while maintaining certain fairness constraints among the users, i.e., each user receives a pre-determined average service time. To achieve fairness, the algorithm virtually increases the channel quality of the poor users by additive constants (fairness parameters) and then selects the best channel at each time slot. One drawback of this method is that the users with poorer channel may degrade the overall throughput significantly. In [33], similar opportunistic scheduling problem is considered with minimum-throughput constraints, i.e. each user must (at least) receive a certain average throughput. This problem may not be feasible and the paper discusses that the feasibility region is convex. The fairness constants in 11 Chapter 1. Introduction this case are applied as product terms to the channel qualities. In [34], a joint schedul-ing and power-allocation scheme for time-slotted systems is presented. The objective of the paper is to minimize the total average transmission power while maintaining cer-tain throughput for each user. In [35] an opportunistic scheduling method for streaming video in wireless networks is presented. The algorithm resembles the one in [12] but it also assigns different priorities to different video frames. Assuming a linear relationship between the bit rate and signal-to-noise ratio, it is shown in [36] and [30] that scheduling one user at a time with full power gives a better throughput performance compared to scheduling simultaneous users. In [12], the problem of scheduling CDMA users (one user at a time) in downlink with variable channel conditions is considered. The authors have proposed an algorithm that considers both channel state information and user buffer state information for scheduling decisions. The algorithm in general guarantees the QoS but may not achieve the system capacity. Adaptive Modulation and Coding with Switching Costs: The effectiveness of adaptive modulation and coding techniques for improving energy efficiency and network through-put over fading channels is well studied in the literature. In [37] adaptive modulation techniques based on M-&ry Quadrature Amplitude Modulation (QAM) are shown to be significantly more power efficient than non-adaptive modulations in fading. In [38] and [39] a coded version of the adaptive M-ary Q A M system is shown to exhibit addi-tional gains in power and spectral efficiency. In [40] the problem of excessive switching between different MCS's is addressed by assigning switching costs for changing the active MCS. Power and Retransmission Control in ARQ: Adaptive techniques for energy conserva-tion and delay minimization over fading channels are extensively studied in the literature. In [41], an on/off file transfer ARQ control is presented that minimizes power and delay costs over a partially observed two-state Markovian channel (Gillbert-Elliot). In [42] an A4DP framework for optimal power and retransmission control for random access systems is considered. Optimal on/off transmission policies over a single wireless link for multiple users over two-state Markovian channels are studied in [43]. In [44] a joint power/rate control under an MDP framework for minimizing average delay under average power constraints is proposed. The effectiveness of power ramping in ARQ systems over a sin-gle path Rayleigh fading channel is experimentally confirmed for a WCDMA downlink in [45]. In the absence of the Markovian dynamics, [46, p.8-11] proves certain insightful monotonicity results for a sequential allocation problem subject to penalty costs. 12 Chapter 1. Introduction 1.3 M a i n Cont r ibu t ions The common theme in the problems considered in this thesis is exploring and exploiting special structures in stochastic resource allocation for wireless networks. These special structures provide qualitative understanding of the optimal resource allocation policies and facilitate the real-time implementation of the optimal policies with minimal com-plexities. Each chapter provides a separate detail list of contributions. In this section, we summarize these contributions as follows: 1.3.1 User Scheduling for Streaming Media in H S D P A In this work we devise an efficient joint scheduling and modulation/multi-code selection algorithm that enables a smooth play-out of media streams for HSDPA users. Here, the resources to be allocated are time, modulation schemes (bandwidth), and multi-codes. The following summarizes our contributions in this work. • A rigorous stochastic discrete event system model for HSDPA streaming problem is presented. Furthermore, the problem of smooth media play-out in HSDPA is formulated by two constraints: — Long term QoS constraint or the stability of users queues. - Robustness QoS constraint for containing the variations of the users' queues described by probabilistic conditions on the length of the queues. This con-straint is expressed by P(V* > < 5{, where P is the probability measure, Vi is the length of the queue.i, etai and delta^ are some threshold levels for user i. • A methodology for reaching feasible solutions are presented by customizing a so called Modified Largest-Weighted-(unfinished) Work-First (M-LWWF) algorithm. This way we present a parametric and computationally efficient joint opportunis-tic user-scheduling and M C M (Modulation/Coding and Multi-code) selection al-gorithm that satisfies the long term or stability QoS constraints. • Fairness parameters in the proposed stable algorithm are adjusted via stochastic-approximation-based adaptive algorithms to additionally satisfy the given robust-ness QoS constraints and to reach a feasible solution for HSDPA streaming problem. 13 Chapter 1. Introduction 1.3.2 Adaptive Modulation and Coding with Switching cost In this work we consider the problem of transmission rate control via adaptive modulation and coding where there are switching costs involved for changing the Modulation and Coding Schemes (MCSs). This can be considered as part of the framework introduced in the previous work of HSDPA streaming in which the adaptive modulation and coding is used for link adaptation. The following summarizes our contributions in this work. • The MDP formulation of MCS selection with switching cost in [40] is generalized for non-adjacent Markovian channel transitions so that fast time-variant channels can be modeled with improved accuracy. • Some insightful properties of optimal MCS selection solutions with constant switch-ing costs are outlined. • Optimality of monotone adaptive modulation and coding policies with switching costs under certain proposed sufficient conditions is rigorously proved. • In the case of two modulation and coding modes, optimality of threshold based policies with constant switching cost is proved under two different sets of conditions. One of these conditions results in the MDP formulation presented in [40]. We show that in this case the proof needs substantially more mathematical treatment than what was presented in [40]. • By giving necessary conditions for switching, a lower bound for threshold levels in terms of delay differences is obtained. • Counterexamples are constructed to demonstrate the significance of the proposed sufficient conditions by showing the existence of optimal non-monotone policies when these conditions are not satisfied. Furthermore, it is shown that straightfor-ward generalization of the underlying monotonicity results to more than two MCSs with constant switching cost is not possible. 1.3.3 Power and Retransmission Control in A R Q In this work we consider an ARQ-based retransmission system where there are delay, power, and packet drop costs involved. We establish a suitable trade-off of these costs to enable a power efficient and channel-aware retransmission control policy. The following summarizes our contributions in this work. 14 Chapter 1. Introduction An optimal trade-off between power, delay, and packet drop cost in an adaptive ARQ system is formulated as a finite horizon MDP. The monotonicity of optimal power and retransmission control polices in number of retransmissions, delay, and penalty cost is rigorously proved. This work provides a rigorous and elegant analytical explanation for the experimental results in [45] on the effectiveness of power ramping in ARQ systems. • The monotonicity of optimal power and retransmission control polices in channel state is proved under certain sufficient conditions on the Markovian channel. • The problem of penalty cost allocation for frame transmission sessions is formulated as a finite horizon MDP and the optimality of monotone penalty cost allocation is proved." Remark 2 In Appendix A, we also present our work regarding the scheduling of a dedicated platform under POMDP framework. This work has been done in a close relationship to the main resource allocation problems considered in this thesis. We show the optimality of threshold-based scheduling policies based on the works in [47]. However, due to a different application considered, we present it as an appendix that can be referred to for a general methodology of formulating and solving POMDPs. 1.4 Suppor t ing Publ ica t ions Part of this work has been published in conference proceeding and journals listed below. In particular, a version of Chapter 2 is published in IEEE transactions on multimedia and a version of Chapter 3 is accepted with minor revisions for publication in IEEE transactions on communications. A version of Chapter 4 will be shortly submitted to IEEE transactions on wireless communications. 1. A. Farrokh and V. Krishnamurthy, Opportunistic Scheduling for Streaming Multi-media Users in High-Speed Downlink Packet Access (HSDPA), IEEE Transactions on Multimedia, Volume 8, Number 4, August 2006, Pages: 844 -855.' 2. A. Farrokh, V. Krishnamurthy, and R. Schober, Optimal Adaptive Modulation and Coding with Switching Costs, IEEE International Conference on Communications (ICC 2007), Glasgow, 2007. 15 Chapter 1. Introduction 3. A. Farrokh and V. Krishnamurthy, Optimal Threshold Policies for Operation of a Dedicated-Platform with Imperfect State Information - A POMDP Framework, Symbolic and Springer-Verlag Lecture Notes in Computer Science, Quantitative Approaches to Reasoning with Uncertainty, Volume: 3571/2005, 2005, Pages: 198-208. 4. A. Farrokh, F. Bloemer, and V. Krishnamurthy, A Comparison of Opportunistic Scheduling Algorithms for Streaming Media in High-Speed Downlink Packet Ac-cess (HSDPA), Springer-Verlag Lecture Notes in Computer Science, Multimedia Interactive Protocols and Systems, Volume: 3311. 2004, Pages: 130 - 142. 5. A. Farrokh and V. Krishnamurthy, Optimal Threshold Policies for Operation of a Dedicated-Platform with Imperfect State Information - A POMDP Framework, Springer-Verlag Lecture Notes in Computer Science, Symbolic and Quantitative Approaches to Reasoning with Uncertainty, Volume: 3571, 2005, Pages: 198 - 208. item A. Farrokh and V. Krishnamurthy, Opportunistic Scheduling for Streaming Users in High-Speed Downlink Packet Access (HSDPA), Global Telecommunica-tions Conference, (Dallas, USA), GLOBECOM '04. IEEE , Volume: 6, 29 Nov. -3 Dec. 2004, Pages: 4043 - 4047. The following paper, is accepted for publication with minor revisions: 6. A. Farrokh, V. Krishnamurthy, and R. Schober, Optimal Adaptive Modulation and Coding with Switching Costs, accepted with minor revisions for IEEE Transactions on Communications, August 2007. Also, the following paper is in preparation for submission to IEEE transactions on wireless communications: 7. A. Farrokh, V. Krishnamurthy, and R. Schober, Optimal Power and Retransmis-sion Control Policies over Fading Channels, August 2006. 1.5 Organiza t ion of Thesis In this chapter, the motivation for this research and an overview of the methodology have been given. Subsequent chapters are organized as follows: 16 Chapter 1. Introduction • In Chapter 2, a discrete event model for streaming users in HSDPA with meaningful QoS constraint is formulated. We then present a joint opportunistic user-scheduling and M C M assignment policy as a feasible solution to the above problem. The proposed policy enables a smooth play-out for HSDPA users if it is possible with any other policy. • Chapter 3 formulates an adaptive modulation and coding problem with switching costs as a discounted infinite horizon MDP. It is shown that optimal MCS selec-tion policies have monotone (threshold-based) structures under certain sufficient conditions. These sufficient conditions are characterized by the submodularity of the switching cost function and the delay cost and first order stochastic dominance conditions on channel transition probabilities. The underlying monotone struc-tures provide a qualitative understanding of the optimal MCS allocation methods and facilitate the real-time implementation of the optimal policy with minimal complexities. • In Chapter 4 , a general adaptive ARQ system as a finite horizon MDP is formu-lated that achieves an optimal trade-off between transmission delay and packet drop penalty costs. We have shown that optimal power and retransmission control policies have monotone structures under certain sufficient conditions. Furthermore, the monotonicity structures for optimal penalty cost allocation across the frame transmission sessions is demonstrated. The underlying monotone structures pro-vide qualitative insight into the optimal power and retransmission control schemes and facilitate the real-time implementation of the optimal policy with minimal complexities. • Chapter 5 presents conclusions and recommendations for future research. 17 Chapter 2 Opportunistic Scheduling in H S D P A High-Speed Downlink Packet Access (HSDPA) achieves high data rates and high spectral efficiency by using Adaptive Modulation and Coding (AMC) schemes and employing multi-Code CDMA. In this chapter we present opportunistic algorithms for scheduling HSDPA users and selecting Modulation/Coding and Multicode (MCM) schemes that exploit channel and buffer variations to increase the probability of uninterrupted media play-out. First, we introduce a stochastic discrete event model for a HSDPA system. By employing the discrete event model, we transform the scheduling problem of providing uninterrupted play-out to a feasibility problem that considers two sets of stochastic Quality of Service (QoS) constraints: stability constraints and robustness constraints. A methodology for obtaining a feasible solution is then proposed by starting with a so called stable algorithm that satisfies the stability QoS constraints. Next, we present Stochastic Approximation algorithms that adapt the parameters of the stable algorithm in a way that a feasible point for the robustness QoS is reached within the feasibility region of the stability QoS. 2.1 In t roduc t ion The IMT-2000 standards for third generation (3G) wireless networks, released in 1999, promise high bandwidth efficiency for supporting a mix of real-time and high data rate traffic. Three of the five standards proposed in IMT-2000, including UMTS (the Euro-pean contribution), are based on Wideband-CDMA. Though, WCDMA/UMTS specifi-cations fully meets the IMT-2000 requirements for 3G networks, there is still an increas-ing demand for much higher downlink data rates along with a better QoS. In order to meet these demands, a new packet concept, called High-Speed Downlink Packet Access (HSDPA), was recently published in Release 5 of 3GPP UTRA-FDD specifications [2]. HSDPA is an extension of the UMTS standard that provides an improved peak data rate 18 Chapter 2. Opportunistic Scheduling in HSDPA and an increased throughput for world-wide cellular systems. The main features that collectively describe HSDPA are: Adaptive Modulation and Coding (AMC) schemes, Turbo codes, higher order modulation (16-QAM), CDMA multi-code operation, fast physical layer hybrid ARQ and short transmission time intervals. These features enable HSDPA to support transmission rates of up to 10.7 Mbit/s [20] and allow high-rate down-link transmission of streaming media and data for large numbers of users in a cellular system. HSDPA is a time-slotted CDMA system and requires a suitable scheduling policy to achieve the desired performance. A transmission scheduler, based on the channel information of each user, must decide which user should be scheduled at each time slot and furthermore, what type of modulation, coding and multi-code combination should be used. Scheduling methodologies are not specifically denned as part of the HSDPA standard. This motivates the need to develop efficient scheduling algorithms that exploit the nature of wireless channel to achieve the required QoS for the users. Literature review: 3G wireless traffic (e.g. real time multimedia) can be very resource demanding and hence employing suitable Radio Resource Allocation (RRA) schemes is an important aspect of a 3G wireless system design. In [20,21], link and network layer performance aspects of 3G WCDMA/HSDPA systems in terms of improving peak data rates and blocking probabilities are studied. The retransmission scheduling is also considered and simulated at both link and network layers. The study shows a trade-off between user fairness and the cell throughput. In [11,33], opportunistic scheduling methods are used for a time-slotted wireless system to optimize the cell throughput while maintaining certain fairness among the users. Two categories of fairness in terms of minimum-throughput guarantee and specific shares of scheduling times are consid-ered. However, in these papers there is no assumption on the status of the user buffers. Assuming a linear relationship between the bit rate and signal-to-noise ratio, it is shown in [30,36] that scheduling one user at a time with full power gives a better throughput performance compared to scheduling simultaneous users. In [12], the problem of schedul-ing CDMA users (one user at a time) in downlink with variable channel conditions is considered. The authors have proposed an algorithm that considers both channel state information and user buffer state information for scheduling decisions. The algorithm in general guarantees the QoS but may not achieve the system capacity. A general approach to support the QoS of multiple real-time data users is proposed in [29]. The proposed scheduling algorithm in [29] is similar to what is presented in [12] and can be used to maximize the number of users that can be supported with the desired QoS. 19 Chapter 2. Opportunistic Scheduling in HSDPA M a i n results: In this chapter we consider scheduling algorithms for streaming multi-media users in a HSDPA transmission system. The main results of this chapter are: (i) In Section 2.2, we introduce a discrete-event stochastic model for streaming users in HSDPA. The key features of HSDPA system as described in 3GPP standards are included in our proposed model. Furthermore, a fading channel, modeled by a finite state Markov process, is considered. (ii) In Section 2.3, we formulate the scheduling problem of providing uninterrupted play-out as a stochastic feasibility problem. First, QoS constraints are formulated in the form of long term throughput constraints and stochastic robustness constraints of the mobile-station buffers. These constraints collectively define the feasibility region of the scheduling problem. Next, by exploiting the structure of the model proposed in (i), equivalent QoS constraints for the base-station buffers are formulated. Long term throughput performance is formulated as the stability constraints on the base-station buffers (queues). Roughly speaking, by the stability of a queue we mean the condition that the queue content does not grow to infinity (the precise meaning of the stochastic stability of a queue will be explained in Section 2.3). The robustness QoS constraints are formulated as probabilistic constraints on the base-station buffers (queues). Note that the terms "buffer" and "queue" are used interchangeably in this chapter, however the former refers to a physical entity while the latter describes a mathematical model. Furthermore, by the term "buffer size", we mean the size of the physical buffer (maximum length of the queue) and by the term "queue length" we mean number of the existing customers (bits) in the queue. (iii) In Section 2.4, we present a joint opportunistic user-scheduling and M C M (Modula-tion/ Coding and Multi-code) assignment algorithm as a solution to the HSDPA schedul-ing problem. Section 2.4.1 contains a complete summary of the proposed opportunistic scheduling algorithm. We first propose a framework for assigning M C M in HSDPA in a way to maximize the achievable bit rates of the scheduled users. We then present a user-scheduling method by customizing a particular scheduling algorithm denoted by Modified Largest-Weighted-(unfinished) Work-First (M-LWWF) [12]. The M-LWWF al-gorithm schedules only one user at a time and provides a simple algorithm with relatively low complexity. It is analytically proven that the proposed customized algorithm pro-vides us with a feasible long term solution (if such a long term feasible solution exists). However, additional degrees of freedom are embedded in the M-LWWF algorithm by as-signing fairness parameters to the users. The scheduling priority across the users can be further controlled by adjusting the fairness parameters to satisfy additional constraints. 20 Chapter 2. Opportunistic Scheduling in HSDPA We propose a stochastic-approximation-based adaptive algorithm that dynamically (i.e., in real time) adjusts each user's weight in a way that both robustness and stability QoS constraints are satisfied. (iv) In Section 2.5, we simulate the HSDPA transmission for streaming users across a finite state Markovian fading channel. In our simulations the relation between Signal to Noise Ratio (SNR) and FER (Frame Error Rate) is extracted from the data provided by Motorola research [23]. The data in [23] accounts for the extensive use of Turbo codes in HSDPA systems. 2.2 H S D P A Streaming Discrete Event Sys tem M o d e l In this section a stochastic discrete event system model for streaming users in HSDPA is presented. Here, the term BS refers to the Base-Station and the term UE refers to the User-Equipment (i.e., mobile station). Throughout this chapter we use superscript index i to denote the i'th component of a vector and subscript index k E Z+ to denote the discrete time slot index, where Z+ = {0,1,2,...} is the set of non-negative integers. It is convenient to outline our HSDPA streaming model in terms of the following nine elements [20], [21]. We consider L users and define 3 to be the set of all users: 3:={1,2,...,L}. (i) Transmission Time Interval (TTI): Time is the resource that is shared among the HSDPA users. The time axis is divided into slots of equal duration referred to as Transmission Time Intervals (TTI). Define: A T = Duration of one TTI. (2.1) By convention, time slot (or discrete time) A:, k £ Z+ is the time interval [kAT, (k + 1)AT). Also, we assume only one user is scheduled at each time slot and scheduling decisions are made at times kAT, k € 2,+ . (ii) Power: Fast power control feature of the WCDMA system is disabled in the HSDPA standards (3GPP UTRA-FDD). Therefore, we assume that transmission power is fixed for all time slots. (iii) Fading Channel: Channel quality for user i at time k is characterized by the average received SNR of the user % at the A;th time slot. Note that since the transmission power is fixed, SNR can be used as a direct measure for the channel quality. The Channel Quality Indicator (CQI) or simply the channel state is defined as the collection of the 21 Chapter 2. Opportunistic Scheduling in HSDPA channel qualities for the L users at time k and is denoted by vector c :^ Cfc = (4 4 •••Cfc)- (2.2) where clk is the channel quality of user i at time k. Assume Cf. € S is an irreducible discrete time Markov chain with state space S. State space S := {si, S 2 , - - -, S M } is ob-tained by quantizing the range of the received instantaneous SNRs into M vector levels. Here, s% are row vectors with L elements that define the quantization levels. We assume that CQI vector is measured by the users and reported to the the BS at each time slot. Therefore, the BS has a perfect knowledge of vector ck for k € Z+. Remark on the Markovian channel assumption: In general any stationary continuous time process can be quantized into a finite state irreducible Markov process. In par-ticular, there are well-known methods for modeling individual Rayleigh fading channels by irreducible Finite State Markov Processes (FSMP) [48], [49]. A finite-state model of channel is needed to formalize the proof of the queues stability. As can be seen in Algorithm 1, in Section 2.4.1, the scheduling decision at any given time is made based on the channel state at that time slot. However, since at any given time slot, the channel state is fully observed, the number of iterations per step or the complexity of the algo-rithm does not depend on the number of the channel states. In this view, any number of channel states (M) according to the desired "resolution" may be selected without any penalty in computational complexity. (iv) Adaptive Modulat ion and Coding ( A M C ) : In HSDPA, different modulation and error-correcting coding techniques can be dynamically selected from a set of Mod-ulation and Coding Schemes (MCS). The reasoning for using the adaptive MCS can be explained by examining two extreme situations for the channel. First, consider a situ-ation that channel experiences a relatively low quality (low SNR and high interference level). In this case, instead of increasing power (note that there is no power control in HSDPA), the transmission scheduler selects a more "robust" modulation scheme (such as QPSK) that will operate well in lower ^ (bit energy to noise density). Furthermore, error coding techniques such as Turbo code, with more redundancy will be employed to handle the lower SNR. On the other hand, if the channel enjoys a relatively high SNR, then the scheduler selects a more "aggressive" (in terms of the data rate) modulation scheme such as 16-QAM. We assume each MCS is represented by a MCS number such that MCS number € M := {1, 2, . . . |M|}. (2.3) 22 Chapter 2. Opportunistic Scheduling in HSDPA In case of scheduling user i at time k: m\ :— MCS number of user i at time k, (2.4) where | M | denotes the cardinality (number of elements) of set M and m\ G M . (v) Multi-Code Operation: In HSDPA, multi-code CDMA is used at each time-slot for transmission to the scheduled user. By choosing a larger set of spreading codes, higher data rates can be achieved. However, higher SNR will be required for transmission with more codes. Depending on the UE capabilities, different number of simultaneous spreading codes can be selected (e.g. 5, 10, 15). Define K to be the set of supported spreading codes for user i. In case of scheduling user i at time k: n\ Number of spreading codes of user i at time k. (2-5) where n\ G Ji. (vi) Transmission Rate Set: We refer to the rate chosen by the BS to transmit the data to the UE as the transmission rate. The transmission rate at each time slot k is fixed and chosen from a finite set of possible transmission rates by the BS. Define: r\ := The (achievable) transmission rate to user i at time k. (2.6) Let RT be the finite set of all possible transmission rates from the BS to the UE so that rlk G RT, i G 3, k G Z+. We assume that a unique transmission rate is associated with any combination of MCS and multi-code. Let r : M x N —> RT be a function that maps the MCS and multi-code combination to the corresponding transmission rate. We have: r ( m i 4 ) : = r i i G 0, k G Z+. (2.7) (vii) Physical-layer Hybrid A R Q (H-ARQ): Automatic Repeat Request (ARQ) is one of the error-control procedures used in data communication systems. If the user receiver detects transmission errors in a frame, it will automatically request a retrans-mission from the BS. The BS retransmits the frame until it receives an acknowledgement (ACK) from the receiver or the error persists beyond a predetermined number of retrans-missions. By a frame of data we mean the data that is delivered at any time slot (TTI). Note that the length of a frame in terms of bits is varying due to the different data rates at different time slots. H-ARQ (Hybrid-ARQ) uses a code with more redundancy that will correct some frame errors and hence reduces the average number of retransmissions. 23 Chapter 2. Opportunistic Scheduling in HSDPA H-ARQ operation in HSDPA is relatively fast since it is performed in the physical layer between the base station and the user equipment. We define fl as a function that gives the instantaneous Frame Error Rate (FER) of user i in terms of MCS, number of codes, and channel state of user i. The term error rate here is used to denote the error probability. By the instantaneous FER we mean the true F E R that indicates the probability of the occurrence of a frame error at the given time slot. Define: fl :=instantaneous FER at time k for the scheduled user i =/VL4,4) (2.8) (viii) Buffers and Streaming Rates: Figure 2.1 shows a queuing model for streaming HSDPA users. The streaming system is modeled with symmetric buffers, i.e., each UE Node-B buffers UE buffers l ! ..1 D D D v k 1 Figure 2.1: Wireless Streaming Media System Model buffer has an equivalent buffer in the BS with the same size. Assume that the data in the UE buffer is discharged with the constant rate Dl which is referred to as the user i play-out rate. Furthermore, assume the incoming data rates from the server are also constant. Let Dl denote the arriving data rate to the user i buffer in the BS (incoming rate from the server). Note that the above model can be extended to the case where the arrival rate from the server is not constant. In general, the results of this chapter, i.e., stability and robustness of the BS buffers under the proposed opportunistic algorithm, are valid if the arrival rate from the server is any renewal process [12]. However, this assumption poses more formal technicalities and render the exact analysis of our proposed model intractable. Define the user i BS queue length as: Vkl := Number of bits in the user i BS buffer at time k (2.9) 24 Chapter 2. Opportunistic Scheduling in HSDPA Also, define the collection of buffer states for L users as: Vk = {v£,v£,...,vkL} (2.10) Similarly, the number of bits in the buffer of UE i at time k is denoted by UK. Let B% be the full size of the BS (or the UE) buffer of user i. The relation between VK% and UK is then given by: V£ + Vik = Bi, iEJ, kEZ+ (2.11) where we have assumed Vg = 0 and UQ = BL. Remark 3 Note that the user buffer in BS can be assumed, without loss of generality, as a virtual buffer such that the relation in (2.11) holds. This is valid since our focus is the wireless link as the bottleneck of the system and we can assume the data to the BS buffers are provided based on demand and with no additional delays on the system. Let v\ be the arriving data rate (bit/sec) to the buffer of UE i at time k. Also let X\ be the rate that the data is discharged from the buffer i of the BS at time k. By convention, we assume if a frame error occurs in time slot k = s then A* = 0 . This is reasonable since the BS does not transmit a new frame to the UE till it receives an ACK confirming a successful previous transmission. The effective transmission rate at time s is then zero because the same packet will be retransmitted at the next scheduling time. We conclude that the two rates can be regarded to be identical: v\ = A .^. The evolution of the BS and the UE buffer content for user i is then given by the two following equations: V^^V^-^AT + D^T, i e 3 , keZ+ (2.12) U~l+l = ULK + XkAT - DiAT, ie3, k e Z+ (2.13) where A T is defined in (2.1). (ix) Per-TTI (instantaneous) capacities: Recall that HSDPA system performs ARQ in the physical layer. This implies that if a frame error occurs at any given time slot, the entire frame will be re-transmitted in the next scheduled time slot. In order to quantify the achievable throughput to user i at time k, we consider a notion of per-TTI (instantaneous) capacity that incorporates the event that a frame error occurs. Define 25 Chapter 2. Opportunistic Scheduling in HSDPA a random variable /xi as: /xjj. := User i per-TTI capacity at time k (2-14) This means, in the event of scheduling user i at time k, user i receives p\AT bits at time slot k (or the entire queue % content whichever is smaller). In terms of transmission rate, we have Pk = rfcl{No Frame Error at time k}i i E3, k € Z+ (2.15) where r\ is defined in (2.6) and 1{A] is the indicator function of the event A (it is equal to 1 if A occurs, otherwise is zero). As can be seen in (2.15), if a frame error occurs in a time slot, the per-TTI capacity is defined to be zero at that time slot. 2.3 Formula t ion of H S D P A Scheduling P r o b l e m w i t h QoS Requirements In this section we consider the problem of opportunistic joint user scheduling and M C M assignment in HSDPA. The objective is to enable a smooth play-out by applying a suitable user-scheduling and M C M assignment policy. First, the general structure of stationary opportunistic policies is defined. The term opportunistic refers to the fact that the channel and/or buffer (state) variations are exploited by the scheduler to assign relatively better users (in a sense to be defined below). Next, using the discrete event system model described in Section 2.2, the condition of smooth play-out is formulated as a feasibility problem with stochastic robustness and stability QoS constraints. Informally speaking, the objective is to keep the UE buffers non-empty in a long run and limit the variations of the buffers to avoid any interruption in the media play-out. Main result The main result of this section is the formulated problem in (2.25), (2.26) and (2.27) which states the required QoS constraints for streaming users in HS-DPA. 2.3.1 Opportunistic Joint User Scheduling and M C M Assignment In this subsection, we define a general structure for opportunistic joint user scheduling and M C M assignment policies in HSDPA. The user scheduling and M C M assignment 26 Chapter 2. Opportunistic Scheduling in HSDPA are dynamically performed based on the system state information at any time slot. The state of the system can be defined as the collection of the channel and the buffer states: xk-={ck,Vk)eX (2.16) where ck and Vk are defined in (2.2) and (2.10), respectively, and X is the state space. For now assume xk is a stationary and ergodic Markov process. Later based on the proposed scheduling algorithm and assuming stability of the queues this assumption is justified. A scheduling policy <f> is defined as a function that maps the system state to a system user at a generic time slot. The mapping <f>(x) = j , 4> : X —* 3 means that user j is scheduled if the state is reported to be x. As seen above, we assume only one user is scheduled at any given time slot. Furthermore, we only consider the class of stationary policies and hence 4> is not a function of time. These assumptions will be justified later in Section 2.3.3 where we formulate the scheduling problem as a feasibility problem. An M C M assignment rule ip is defined as a function that maps the system state to a pair of multi-code and MCS number at any time slot: ip(x) = (m, n), 4> : X -> M x X (2.17) (2.17) means that MCS number m along with n multi-code is chosen for the scheduled user if the state is reported to be x. A joint user scheduling and M C M assignment policy A is defined as the combination of the scheduling policy <f> and M C M assignment rule xp, in a generic time slot: • A = 0,V} (2.18) Hence, if user j £ 3 is scheduled at time k, we have: A(xk) — (j, rnk, nJk) where mk and nlk are defined in (2.4) and (2.5), respectively for all i £ 3. 2.3.2 Formulation of QoS Constraints for H S D P A Streaming In this subsection we state the QoS constraints that enable the smooth media play-out for streaming HSDPA users. Stability QoS: To have a smooth (uninterrupted) play-out, the user buffer content (in the UE) must have a non-zero steady-state regime. In other words, in a long run the user buffers in the UE must remain non-empty. Our streaming model, based on (2.11), 27 Chapter 2. Opportunistic Scheduling in HSDPA (2.12), and (2.13), allows us to express an equivalent requirement for the users buffers in the BS. (2.11) asserts that an empty buffer in the UE corresponds to a full buffer in the BS. Hence the situation that a UE buffer i is "under-running" corresponds to the situation that the BS buffer i is "blowing up". It can be concluded that for having uninterrupted play-out in a long run, the users queues in the BS must be stable. By stability we mean the existence of a stationary regime for the queue content process so the content of the queues does not show the tendency to grow to infinity [12]. In other words, the steady-state probability distribution of the queue length process Vk should be well defined. This condition is expressed as follows (with P denoting the probability measure): lim P(V£ <6) = Fi(9), i G 0, (2.19) k—>oo where Fi(-) is a valid distribution function (i.e., -Fj(-) > 0 is right continuous and l i m ^ o o Fi{0) = 1), and V£ is defined in (2.9). Remark: In the following, we assume only the steady-state behavior of the queues, i.e., time k G Z+ is sufficiently large. By formulating equivalent QoS constraints for user buffers in the BS (rather than for the UE buffers) we will be able to use certain stochastic stability results from [12]. First recall that the net discharging rate of queue i (in the BS) at time k is denoted by X\. Therefore Xk is equal to the user i throughput at time k and is given by Ai = m i n K l { f e H } ^ } , (2.20) where pk is given by (2.15) and <fi is the scheduling policy. The equality in (2.20) holds because in the event that the available rate completely discharges the BS buffer during one time slot, the effective data rate of user i during that time slot is equal to -gf. The general expression for Xk is then described as in (2.20). For stable queues the arriving rate Dl matches the net discharging rate Xlk in a long run. Assuming a scheduling policy (f) under which the system is stable, i.e., the discrete Markov chain xk (state) is ergodic, we have (with E denoting the expectation operator) E(X{) = D\ i el. (2.21) The sufficient and necessary condition for (2.21), i.e., stability of user queues, is given 28 Chapter 2. Opportunistic Scheduling in HSDPA by (see Theorem 1 in [12]) E ( / 4 l { # C f c ) = i } ) > Di, iE3. (2.22) Robustness QoS: To allow an uninterrupted real-time streaming of the users, the situation of an empty or under-run buffer must be avoided. Condition (2.22) ensures the existence of a long-term (stationary) non-zero process for the UE buffer but it does not address the variations of the UE buffer content. It is possible that under a stable scheduling policy individual users often encounter an empty or under-run buffers. To avoid this situation a stochastic robustness QoS is considered as follows: the probability of having an under-run buffer at each time slot needs to be smaller than a certain threshold. This constraint can be formulated more generally as P (ui < di) < sh t e a , k e z+ (2.23) where Oi is a threshold level for the U E buffer of user i and <5; is the threshold probability for user i. Using (2.11) an equivalent constraint can be written for the BS buffers P {Vi > ru) <Sh i€3, keZ+ (2.24) where r/j — Bi — 6{ is a threshold level for the BS buffer of user i and 5i is the threshold probability. Recall that Bi is the full BS buffer size of user i given in (2.11) (which is also equal to the full UE buffer size of user i by the symmetric-buffers model shown in Figure 2.1). 2.3.3 H S D P A Scheduling Problem In this subsection, based on the QoS constraints introduced in Section 2.3.2, we for-mulate the scheduling problem for streaming users in HSDPA as a feasibility problem. A more general problem can be stated as an optimization problem (i.e., minimizing a certain cost function) with the stability and robustness QoS constraints. This prob-lem can be generally formulated as a special MDP denoted by average-cost-per-stage infinite horizon problems [4]. The solution to this MDP can be obtained by Dynamic Programming techniques and may require scheduling more than one user at a given time slot. However, as the number of buffer and channel states increases, the complexity of the MDP grows exponentially and renders the solution to be impractical [50]. In this chapter by trading-off optimality with feasibility, we will be able to obtain a practical 29 Chapter 2. Opportunistic Scheduling in HSDPA opportunistic algorithm with low complexity that assumes only one user is scheduled at any time slot. Also note that similar to infinite horizon MDP's, the scheduling policy is assumed to be stationary otherwise no simple formulation of the problem exists [4]. The feasibility problem can be stated as the collection of the robustness and stability constraints introduced in (2.24) and (2.22), respectively. An additional constraint is also considered in (2.27) which asserts that the instantaneous FER for each user needs to be kept below a specific level f^MX at each time slot (if the channel quality allows). The constraint on the maximum instantaneous FER prevents excessive retransmissions which may not be practical. The scheduling problem for streaming users can be formulated as follows: Problem: Find a joint scheduling and MCM assignment policy A = {</>, ip} (defined in (2.18)) to satisfy the following QoS constraints for HSDPA users: {<p{xk)=i}) > A (Stability constraints) (2.25) P(V£ > rji) < Si (Robustness constraints) (2.26) fl. < flax (2-27) i £ 3, k € Z+ where jj,k is given by (2.15) and fk is defined in (2.8). 2.4 Oppor tun i s t i c Scheduling A l g o r i t h m For St reaming H S D P A Users In this section we present an opportunistic algorithm for joint user scheduling and M C M assignment as a solution to the problem outlined in (2.25), (2.26) and (2.27). First, we outline the methodology to obtain a feasible solution and summarize the steps of the algorithm. Next, each step of the algorithm is discussed in more detail. M a i n result: The main result of this section is Algorithm 1 in Section 2.4.1, which is proposed as a solution to the HSDPA scheduling problem. Remark: If not specified, throughout this section, we assume i e 3, k € Z+. 30 Chapter 2. Opportunistic Scheduling in HSDPA 2 A.l Summary of the Opportunistic Streaming Algorithm We consider the following general structure for the HSDPA scheduling algorithm: 0(x f c) = a r g m a x 7 i v 2 / 4 (2.28) where a\ are the per-TTI capacities defined in (2.14) and Vk are the queues' lengths defined in (2.9). 7$ are real constants, denoted by fairness parameters, that provide additional flexibility for assigning priorities (fairness) among the users. The idea is to assign higher priorities to the relatively better channels (higher u^) and hence ex-ploiting channel variations to maximize the overall throughput. On the other hand the algorithm assigns higher priorities to the users with larger queue lengths and therefore reducing the probability of the queues growing larger. The fairness parameters can then be adjusted to further shape the priority distribution among the users. The algorithm in (2.28) is denoted by M-LWWF (Modified Largest-Weighted-(unfinished) Work-First) and is proven to guarantee the stability of the queues (if possible with any other pol-icy) [12]. We present a joint opportunistic scheduling and M C M assignment algorithm by further modifying the M-LWWF scheduling algorithm and combining it with a suit-able M C M assignment scheme (in particular, the per-TTI capacities in M-LWWF are replaced with their optimal minimum-variance estimates). We prove that the resulting joint opportunistic scheduling and M C M assignment algorithm guarantees the stability of the queues as well and hence satisfies the stability condition in (2.25) (if feasible). Next, we present Stochastic Approximation methods to adapt the fairness parameters 7i in a way that the robustness constraint in (2.26) is satisfied as well (if feasible) and there-fore a solution for HSDPA scheduling problem is obtained. The proposed opportunistic algorithm is denoted by Algorithm 1 and is summarized in the following steps: Algori thm 1: Joint Opportunistic Scheduling and M C M Assignment Step 0 - Initialization Step 1 - Estimating per-TTI capacities: Apply equation (2.34) t° estimate the maximum achievable rate p\ for all users i 6 3 at time k. Step 2 - Computing a throughput optimal MCM for each user: Apply equation (2.37) to calculate the optimal MCM denoted by a(i, xk) for each user i E 3. The optimal MCM of user i is calculated in a xuay that the throughput of user i is maximized (in case of scheduling user i). Step 3 - Joint MCM assignment and M-LWWF scheduling: Apply equation (2.41) to schedule user ik at time k. Substitute ik in Step-2 by applying equation (2.42) 31 Chapter 2. Opportunistic Scheduling in HSDPA to assign a(ik,xk) as the MCM of the scheduled user ik. Step 4 - Updating the fairness parameters: Apply Stochastic Approximation algo-rithm in (2.46) and (2.47). Set k —• k + 1 and go to Step 1. 2.4.2 Initialization We start with empty queues: VQ =0, i G 3 and assume all the fairness parameters are identical and equal to one: 7g = 1, i £ 3 (note that later in the simulations, we may also consider random initial fairness parameters). 2.4.3 Estimating per-TTI Capacities Here, Step-1 of Algorithm 1 is discussed. The main result of this step is to show how to evaluate the per-TTI capacities fik, i G 3 in the M-LWWF algorithm (since the occurrence of a frame error is not known a priori, the relation in (2.15) cannot be directly used). We prove that instead of the exact values of the per-TTI capacities p\, i € 3, the optimal estimate (i.e., conditional expectation) of plk can be used by the scheduler without degrading the stability of the queues. Consider the optimal (Minimum-Variance) estimate of u.k. i G 3, defined as: ti-.= E(pi\xk). (2.29) Now consider the following well-known smoothing property of conditional expectations [51, Thm.10, Chapter 4.6, p. 106]: E ( /4 l { 0 ( S f c ) = i }) = E (E (fik\xk) l{*(Sfc)=i}) • Substituting from (2.29) gives: E {Pk1W(xk)=i}) = E (i»Uwife)=t}) • (2-30) The stability condition in (2.25) can then be reformulated as: E(Atfci{0(xfc)=o) > Di- ' (2-31) The relations in (2.30) and (2.31) state that any scheduling policy that keeps the buffers stable with per-TTI capacities i 6 3, will do so as well with per-TTI capacities u.xk, i G 3, (and vice versa). 32 Chapter 2. Opportunistic Scheduling in HSDPA Assuming perfect channel information, plk has a deterministic value at time k. From (2.29) and (2.15), p[ is given by: Pk = E ('rU{No Frame Error} W) = 4(1 - fl), (2.32) where r\ and flk are defined in (2.6) and (2.8), respectively. In the proposed algorithm, we use the deterministic value in (2.32) as the measure for user i capacity at time k which is equivalent to the original definition in (2.15) in terms of stability of queues. Here, we only need to consider the M C M combinations that satisfy the maximum FER constraints in (2.27). Define: &{xk) := {(rn,, n) e M x N s.t. 0 < ft < fmax}, (2.33) Let p\ be the maximum per-TTI capacity (estimate) of user i at time k. We have: Pk = , J'k = , r(m,,n)[l- f(m,n,clk)}, (2.34) where r(-) and /'(•) are defined in (2.7) and (2.8), respectively. Also define: Pk = (Pk,pl, •••,($)• (2-35) Note that in practice the exact form of /*(•) due to the use of Turbo-code and com-plexity of the system may not be known. Later in the simulations we use the simulation results and measurements from Motorola research group to map the channel state to the corresponding FER [23]. 2.4.4 Computing Optimal M C M ' s Here we elaborate step 2 of Algorithm 1. We compute the throughput optimal M C M for each user i € 3 at time k, i.e., the M C M that gives the highest throughput for user i (in case of scheduling user i at time k). Define: a(i, xk) = (m\, 4 ) , ^ » : X x 3 - > M x W , (2.36) where mk and nk are given by (2.4) and (2.5), respectively. In order to satisfy (2.31), the pair (mlk,nk) must be chosen to maximize the per-TTI capacity (estimate) jl\ while keeping the instantaneous FER below the allowable threshold. Therefore, by using (2.32) 33 Chapter 2. Opportunistic Scheduling in HSDPA and (2.8) the following gives the optimal M C M for each user: a(i,Xk)= argmax pi (2.37) — argmax r(m, n)[l — fl(rn, n, cjL)]. (2.38) (m,n)£&{xk) where Gi(xk) is defined in (2.33). 2.4.5 Joint M - L W W F User Scheduling and M C M Assignment Here, Step-3 of Algorithm 1 is elaborated. First consider a joint scheduling and M C M assignment policy A as defined as in (2.18). We provide additional degrees of freedom if we introduce a parameter vector 7 in the policy A and define: A(-, 7) = {>(-, 7), V(-, 7)}, A : X -> 3 x M x N . Here </>(-, 7) is the scheduling policy, ip(, 7) is the M C M assignment rule, both depending on 7, and 7 = ( 7 l 7 l . . . 7 L ) , (2.39) where 7$ 6 3?+, i € 3 are any arbitrary positive real constants denoted by fairness parameters (parameters of the algorithm). The main result of this section can be stated by the following theorem which presents a parametric joint scheduling and M C M assignment policy that guarantees the stability of queues (see the constraint in (2.25). Theorem 1 Assume maximum per-TTI capacity pk, defined in (2.35), is an irreducible finite-state discrete-time Markov process. Consider the following parametric policy: A(-,7) = 7); <K", 7)}, where: (2.40) 4>(xk, 7) = arg max 7,; Vkl plk, (M-LWWF user scheduling) (2.41) ik 0(x/;,7), (Scheduled user index) ip{xk,l) — a(ik,Xk), (MCM assignment) (2.42) where i, ik E 3, xk € X and k E Z+. Then for any set of parameters 7 E 31+, the policy 34 Chapter 2. Opportunistic Scheduling in HSDPA A ( : , 7 ) satisfies the stability QoS constraints in (2.25), if it is possible with'any other policy. Recall that a(-) is defined in (2.37), p\ and Vkl are defined in (2.34) and (2.9), respectively. Distribution parameters 73 E 31+, are any arbitrary positive real constants and 7 = [ji] E tt£ as in (2.39). Proof: As given by (2.34), p\, i 6 3, is the estimate of the maximum bit rate available to user i at time k. With the understanding that (2.31) is equivalent to (2.25), the schedul-ing policy described in (2.41), will be reduced to the M-LWWF scheduling algorithm introduced in [12]. Additionally, since by assumption the vector pk is an irreducible discrete-time Markov process, the proof in [12] is directly applicable. Remark: Note that the Markovian assumption on pk in Theorem 1 is not restrictive. In fact, if we consider a one-to-one mapping from the Markovian channel state ck (or xk) to pk then the assumption of Theorem 1 holds. The user scheduling algorithm in (2.41) simply selects the user for which the weighted product of the queue content and the channel quality is maximum. This way users with larger queue contents, larger fairness parameters and better channel qualities (compared to the other users) receive higher scheduling priority compared to the other users. Remark on feasibility: Theorem 1 states that if stability is feasible then the proposed algorithm will make the queues stable. In this view, the proposed algorithm can be used as a feasibility test for the queues stability. Numerical methods [52] can be used to verify the stability or boundedness of the queues within a desired confidence interval. 2.4.6 Updating Fairness Parameters by Stochastic Approximation Here we elaborate Step-4 of Algorithm 1. The main result of this step is to update the fairness parameters by Stochastic Approximation as in (2.46) and (2.47). Assuming the problem is feasible, Theorem 1 states that the policy A ( - , 7 ) , defined in (2.40), satisfies the stability QoS in (2.25) for all 7 = [7*] 6 Therefore, there exists a stationary regime for each queue content process. We can therefore rewrite the robustness QoS in (2.24) by dropping the time index k. P(V{ > ru) < Su i E 3. (2.43) We employ the method of Stochastic Approximation to adjust the parameter vector 7 so that the policy A ( - , 7 ) satisfies the robustness constraint in (2.43) as well and hence 35 Chapter 2. Opportunistic Scheduling in HSDPA a solution to the HSDPA scheduling problem is obtained. Let J~C C 31+ be the feasible region of the policy A ( - , 7 ) in terms of parameter vector 7 . We propose a Stochastic Approximation algorithm to generate a sequence of updates •jk = ( 7 ^ 7 ^ ... 7 ^ ) at time k — {0,1,2,...}' which converge to a feasible parameter vector 7 * G "K. Note that with each 7 ^ , a policy A ( - , 7 f c ) is associated. The initial value 7 0 is set to the vector of ones 7o = 1, i G 3 which implies that initially no additional priority is granted to any user. . Consider the robustness QoS constraint in (2.43). In general, the complementary dis-tribution of the buffer content P(Vl > rn) is not known. The Stochastic Approximation algorithm estimates P ( V l > tn) at each time slot and then uses this estimate to adjust the parameter 7 ^ at time k [53]. The procedure is formulated as follows: Algorithm: Assume the policy A ( - , 7 f c ) , defined in (2.40), is applied at time k. Con-sider the event {V£ > rn}, where i G 3 denotes the user number and s denotes time. An unbiased estimate of the overflow probability P(Vl > rn) can be obtained at time k by the number of times that the event {Vy > rn} occurs up to time k divided by k. Let 7%k denote the unbiased estimate of P ( V l > rn) at time k. We have 7\ = ^ X)s=i ^{v*>r,i}-Writing by iteration gives: r ' « = ( ^ T ) r ' + ( ^ T ) 1 ( ^ - " - ( 2 4 4 ) If the estimate 7lk is exceeding the threshold di, then it means that user i, based on the current estimate, does not receive his/her required QoS. Hence, the parameter 7^. is increased proportionally to assign higher priority (higher service share) to user i. If the estimate IPjj. does not exceed the threshold <5j, 7^. is left unchanged: iUi = rt + Mrk-8i)i{n>*i}> ( 2 - 4 5 ) where the step size can be set to ak — for the stationary case i.e., when the true parameter is constant. In the non-stationary case (when the true parameter is slowly varying with time), ak can be set to a small constant to track the slow variations of the system [33], [53]. Assuming stationarity and ak = (2.44) and (2.45) are collectively written as: n + 1 = n + a f c ( i { ^ + i > 7 ) i } - n ) (2.*$) 7JUi = ll + a*(n - Si)l{7jc>Si}, (2.47) 36 Chapter 2. Opportunistic Scheduling in HSDPA with initial values of 7Q = 1 and T 0 = 0. (2.46) and (2.47) can be written collectively in vector format as follows: ?k+l = ?k + a k { l { V k + 1 > r i } - ?k) (2.48) 7fc+i = Ik + ak{yk - 8) diag{l { y f c > < s } }. (2.49) where V •= (vi,---,riL)., $ •= {6i,...,6L), ^k •= (Pk, yk), l{VFC>77} : = (1{V^>77i}' • • •' 1{Vfci>»jL})> with initial values of 7 0 = 1 and To = 0(1 denotes a (1 x L) row vector with all elements equal to one). Also note that diag{y} represents a diagonal matrix whose diagonal elements are the elements of y. The algorithm proceeds by applying the policy A ( - , 7 / c + 1 ) at time k + 1. The iterations (2.48) and (2.49) will be repeated by setting k —> k + 1 till the convergence is obtained (assuming a feasible policy A ( - , 7 * ) exists). Starting with the policy A(-, 1), numerical results in Section 2.5 show that if a feasible A ( - , 7 * ) exists then the algorithm in (2.46) and (2.47) converges rapidly to a feasible point in Jx. 2.4.7 Convergence of the Stochastic Approximation Algorithm Here we discuss the convergence of the Stochastic Approximation algorithm in (2.46) and (2.47) (with decreasing step size). Discussion: Since the state of the system xk = {Vk,ck) is Markovian, we need to use the Ordinary Differential Equation (ODE) approach of [53] to prove the convergence of the Stochastic Approximation algorithm in (2.46) and (2.47). The main idea behind the ODE approach is that the asymptotic behavior of (2.46) and (2.47) is captured by a deterministic ordinary differential equation which represent the mean trajectory of the system [53]. This ODE is obtained by stochastic averaging of (2.46) and (2.47) and can be written as [53]: nv\it)>m)-m.it) (2-50) C P } ( 7 i ) - * ) l { a » ( 7 o > * } (2-51) dt H dt 37 Chapter 2. Opportunistic Scheduling in HSDPA where i E 3 and the variable t denotes continuous time. Note that for clarity, in the above equations we explicitly denote the dependence of V1 and 7\ on 7. If stable, the ODEs in (2.50) and (2.51) converge to their fixed points P* and .7* where: Note that 7* 6 M is a feasible point of the policy A{-,7}. Theorem 2 Under the condition that P* and 7* are bounded with probability 1 (i.e., a feasible solution exists), the estimates {T/c} and {7^}, generated by algorithm in (2.48) and (2.49), converge with probability 1 to the fixed points P* and 7* defined in (2.52) and (2.53) . Proof: The convergence proof requires showing the convergence of the Stochastic Ap-proximation algorithm for Markovian state-dependent noise. The proof follows from [53, Thm.4.2, Chapter 8.4 p.243]. The conditions required for convergence are verified as follows: (i) The averaging condition: (A.4.12) in [53] holds since the pair (14, ck) is a positively Harris recurrent process (see the discussion in [12] for the proof of Harris recurrence). (ii) Uniform integrability: the uniform integrability condition (A.4.11) for the vector holds since 7\ is uniformly bounded between 0 and 1. Also note that jk can be always kept bounded by projection into a compact set. The proof therefore follows from [53, Thm.4.2]. For situations where statistics of the channel are slowly time varying, the optimal pa-rameter vector 7* will be time varying. Denote the optimal time-varying parameter 7* by 7f*. In such cases it is necessary to adaptively track the time-varying 7*. This can be done by using a constant step size version of (2.46) and (2.47) with = a, where a is a small constant. For this adaptive case it can be shown, using [53, Thm.4.1, Sec.8.4.1, p.240] that the sequence {?),} and {7^ .} converges weakly (in distribution) to P* and 7* for i e 3. P*:={Pl...,PL), 7*:=(7i , . - - ,7* i ) (2.52) (2.53) (2.54) 38 Chapter 2. Opportunistic Scheduling in HSDPA 2.5 N u m e r i c a l Resul ts A summary of simulation parameters is presented in Table 2.2. Furthermore, simulation scenarios are described in Section 2.5.1. Channel: The channel is first simulated by a Rayleigh slow and flat fading model with a Doppler spectrum (Jake's model [54]). Let gi(k) be the available SNR for user i at time k. Based on the Jake's model, gi(k) can be written as the transmitted SNR scaled by a fading component and a path-loss component. Expressing in dB, we have: 9i(k)(dB) = {Pt/Pn){dB) - L(R)(dB) + {<>i(k)2}{dB). (2.55) where Pt is the transmission power, Pn is the noise power (includes interference as well), L(R) is the path-loss component in terms of the distance R and cii(k) is the Rayleigh fading gain process for the user i. Based on common assumptions for HSDPA in [55], we adopt the following path-loss model L{R)(dB) = 128.1 + 37.6 log a 0(fl), (2.56) where R is the distance of the user from the BS in Kilometers. Furthermore, in the simulations we choose a value for Pt/Pn to obtain reasonable SNR's for the UE-to-BS distances. We assume: Pt/Pn = 122(dB). (2.57) At the next step, to justify the assumptions of this chapter, we present the simulated Rayleigh channel by a Finite State Markov Process (FSMP). Here, we model the simu-lated fading gain process oti(k) (or equivalently gi(k)) by a 10-state Markov process Ci(k). In general Ci(k) can be correlated between different users, however, in our simulations independent ai(k) (or equivalently Cj(/c)), {i — 1,2,...,} are considered. 5 states are assigned for the SNR region corresponding to the single code transmission and 5 states are assigned to the SNR region corresponding to 15-code transmission. By examining the SNR-FER curves, obtained by Motorola research group [23], the number of states is chosen in a way to attain sufficient resolution for instantaneous capacities. For example, if for some SNR region the FER curve is more or less constant, only one state is assigned to that region. On the other hand, if F E R has large deviations in a particular SNR region, more sates are needed to model that SNR region (see for a rigorous procedure for choosing number of states and the threshold levels in [48,49]). M C S and Mult i -Code: In our simulations, the MCS and multi-code selection is based 39 Chapter 2. Opportunistic Scheduling in HSDPA on Table 2.1 that shows the numerical values for MCS number and spreading codes of the users. These numbers are extracted from the descriptions of HSDPA provided in [20]. For simplicity, we assume either a single code or 15 codes are used for each user. The selected MCS roughly corresponds to the frame error rate of = / * n a x = 0.1 at any time slot. M C S Modulation Code Rate Data Rate (1 Code) Data Rate (15 Codes) 1 2 3 Q P S K Q P S K Q P S K i 1 I 4 0.6 Mbps 1.2 Mbps 1.8 Mbps 1.8 Mbps 3.6 Mbps 5.3 Mbps 4 5 1 6 - Q A M 1 6 - Q A M I 4 477 kbps 712 kbps 7.2 Mbps 10.8 Mbps Table 2.1: Modulation and Coding Schemes Parameters: Table 2.2 summarizes the parameters and assumption used for our simula-tion. We have tried to choose realistic parameters from common simulation assumption for HSDPA provided by Ericsson, Motorola and Nokia in [55]. The simulation starts Cell radius RCeii 500m Max. number of users L 15 Duration of time slot A T 2 ms Spreading factor 16 Channel estimation Perfect Carrier frequency 2 G H z Chip rate 3.84 Mcps Jmax 0.1 Transmission S N R : ^ 122dB Fast fading model Ray leigh-fading Propagation model: L(R) 128.1 + 37.6 log (J?) Channel model 10 State Markov chain U E buffer time constant ts 1 second (for all users) Simulation duration 150000 A T (5 min) Table 2.2: Simulation Parameters with full UE buffers. 2.5.1 Simulation Scenarios We compare the performance of our opportunistic policy with the well known Round-Robin scheduling and Max C/I (Maximum Carrier to Interference) scheduling. In Round-Robin scheduling time is shared evenly among the users but the channel or buffer varia-tions are not exploited (non-opportunistic scheduling). In Max C/I the scheduler simply 40 Chapter 2. Opportunistic Scheduling in HSDPA picks the best channel (highest signal to noise/interference). Therefore, Max C/I can be regarded as a greedy opportunistic scheduling. Three scenarios are considered in terms of the distribution of the users in a single wireless cell. For each scenario we determine the maximum number of the users that can be supported with pSUccess > 90% at a given streaming rate D where pSUCCess is the probability of uninterrupted play-out (assumed to be the same for all users). Therefore, in the QoS formulation in (2.24), we roughly have: P (U{ = 0) < 0.9. Hence, di = 0 and Si — 0.1 for i = {1,2.... Note that the condition Uk < 0 is equivalent to Ulk = 0 since Uk is always non-negative by definition. The simulation scenarios are elaborated in the following. Note that the terms BS and Node-B may be used interchangeably to denote the base-station which is assumed to be located at the cell center. Scenario 1: The first scenario we consider is the worst case scenario. Al l users are located on the cell edge (R = Rceu = 500m, see Figure 2.2(a)) and consequently have relatively poor channels which only support low data rates (note that in all figure Node-B denotes the Base-Station). In this scenario single-code transmission is used. As shown in Figure 2.2(b), in scenario 1, our algorithm (Algorithm 1 in Section 2.4.1) gives the best performance. Max C/I also performs exceptionally well because of the given scenario: all users have the same distance to the BS with the same streaming rate. In this case, it is a relatively good solution to only pick the instantaneously best channel without regarding their queue lengths. The total average throughput will be maximized in this case and because of the symmetry, no user will be particularly starved. In a long run each user receives more or less the same portion of the maximized throughput and hence the overall performance is relatively good. Round-Robin performs poorly in this scenario since channel variations are not exploited." Scenario 2 : In this scenario, we assume that the users are distributed uniformly throughout the cell (10m <R< 500m) (see Figure 2.3(a)). The distances of the users and the BS (Node-B) are: Ri — J^J • {Rceii — l-min) + l-min where lrnin is the minimum distance a UE has from the BS. We observe in Figure 2.3(b) that our algorithm (Algorithm 1) performs significantly better than the other two al-gorithms. Max C/I shows very poor performance because users that are further away from the BS are not served at all. We can see in Figure 2.3(b) that Round-Robin gives significantly better performance than Max C/I. Scenario 3 : Here, we concentrate our simulation on the cell center (10m < R < 250m), where high data rates can be achieved. We use 15 multi-codes for transmission and this way increase our transmission rates significantly. As shown in 41 Chapter 2. Opportunistic Scheduling in HSDPA (a) H S D P A users located on the cell edge Scenario 1, single code, 15000 slots E \ \ \ Y \ \ \ \ \ \ \ \ Algorithml Round-Robin Max C/I \s i • " 0 \ 1 1 1 1 1 > 20 40 60 80 100 120 140 Play-out Rate[kbps] (b) Maximum H S D P A users supported vs. streaming rates Figure 2.2: Comparison of maximum supported HSDPA users in Scenario 1 42 Chapter 2. Opportunistic Scheduling in HSDPA Figure 2.4(a), the users are distributed uniformly between the BS (Node-B) and the edge of the cell center (10m < R < 250m) and Ri — • (R^ — l.min) + lmin where Z„Mn is the minimum UE distance from the BS and i?is is the maximum UE distance from the BS (user 15 distance from the center). It is possible to support high streaming rates to a reasonable number of users in the cell center. Again we see in Figure 2.4(b) the advantage of our algorithm. The Max C/I algorithm completely fails in this scenario, because users that are further away from the BS are not served at all. 2.6 Conclus ions In this chapter, we have proposed a discrete event model for streaming users in HSDPA to formulate the QoS requirements of the HSDPA streaming system as a feasibility problem. We then present a practical joint opportunistic user-scheduling and M C M assignment policy as the solution to the above feasibility problem. The proposed policy enables a smooth play-out for HSDPA users if it is possible with any other policy. The numerical results show that employing the proposed opportunistic policy achieves a significant improvement in the maximum number of users that can be supported with the desired QoS. 43 Chapter 2. Opportunistic Scheduling in HSDPA (a) Users distributed uniformly from the cell edge to the Node-B Scenario 2, single code, 15000 slots A -Q O c CL o •o c JO E 3 - Algorithml f- Round-Robin *- Max C/I 20 40 60 80 100 120 140 160 180 200 Play-out Rate[kbps] (b) Maximum HSDPA users supported vs. streaming rates Figure 2.3: Comparison of maximum supported HSDPA users in Scenario 2 44 Chapter 2. Opportunistic Scheduling in HSDPA Figure 2.4: Comparison of maximum supported HSDPA users in Scenario 3 45 Chapter 3 Transmission Rate Control with Switching Costs In this chapter, we present an optimal adaptive modulation and coding policy that mini-mizes the transmission latency and modulation/coding switching cost across a finite-state Markovian fading channel. We formulate the optimal tradeoff between transmission la-tency and modulation/coding switching cost as a discounted infinite horizon Markov Decision Problem (MDP). By exploiting special structures of the formulated MDP and under certain sufficient conditions, we show the optimality of monotone Modulation and Coding Scheme (MCS) policies. These monotone optimal policies are computationally less expensive to implement and are scalable in terms of channel and switching cost pa-rameters. Numerical results confirm the monotonicity and the threshold-based structure of the optimal MCS selection policies under the proposed sufficient conditions. 3.1 In t roduct ion Emerging packet-based multiple access wireless networks require efficient link adapta-tion strategies that maximize wireless network utilization under given Quality of Ser-vice (QoS) (e.g. delay) constraints. Examples include Wireless Local Area Networks (WLANs) with Internet backbones, ad hoc and sensor networks, micro-machine commu-nications, and short-range peripheral-device networking, where at any time slot a frame (packet) of data is transmitted. A common technique for link adaptation in these sys-tems is to employ adaptive modulation and coding, where in each time slot the MCS is adapted to the channel quality. The MCS selection decision can be made by either the intelligent mobile agent or the associated base station. In either case we refer to the decision-maker unit as the scheduler. Context and Related Work: The effectiveness of adaptive modulation and coding techniques for improving energy efficiency and network throughput over fading channels is well studied in the literature. In [37] adaptive modulation techniques based on M-ary 46 Chapter 3. Transmission Rate Control with Switching Costs Quadrature Amplitude Modulation (QAM) are shown to be significantly more power efficient than non-adaptive modulations in fading. In [38] and [39] a coded version of the adaptive M-ary Q A M system is shown to exhibit additional gains in power and spectral efficiency. Other practical aspects of adaptive modulation and coding such as delayed feedbacks and outdated channel estimates are considered in [56]. For further details and references on the subject of adaptive modulation and coding the reader may refer to [57, Ch.9]. In [40] the problem of excessive switching between different MCS's is addressed by assigning switching costs for changing the active MCS. In this chapter, we further investigate the switching problem and provide a rigorous mathematical treatment under the Markov Decision Problem (MDP) framework. Motivation: In order to achieve an efficient link adaptation, typical packet-based wireless networks may employ fairly short transmission time intervals to cope with fast channel variations (e.g. 0.4 ms for Zigbee defined based on the IEEE 802.15.4 standard [58]). Hence, an optimal MCS selection policy aiming at minimizing the expected delay of the network may have to switch between different MCS's rapidly. Since any switching event must be negotiated between the mobile agents or between the mobile and the base station, frequent switching between different MCS's can result in excessive protocol overheads with delay and system reconfiguration costs. Furthermore, increased use of feedback channels during the rate negotiation process generates additional interference along with higher bandwidth and power usage which in turn degrades the overall spectral efficiency. In view of this, there is a motivation to develop MCS selection solutions that achieve an optimal tradeoff between transmission latency and MCS switching rate. Contributions: Our major contributions in this chapter are summarized as follows. • The discounted infinite horizon MDP formulation of the MCS selection problem with switching cost as described in [40] is generalized by considering Markovian channel models with non-adjacent channel transitions. Therefore, in the proposed formulation, fast time-variant channels are modeled with improved accuracy. • Optimality of monotone and threshold-based MCS selection policies with switching costs under proposed sufficient conditions is proved in Theorem 3. By considering a Markovian channel dynamic, these results generalize some aspects of the works in [59] and [60]. • Optimality of monotone and threshold based policies for two MCS's and con-stant switching cost is proved under two different sets of conditions (see Theo-rems 4 and 5). One of these conditions results in the MDP formulation in [40]. We 47 Chapter 3. Transmission Rate Control with Switching Costs show that in this case the proof needs substantially more mathematical treatment than what was presented in [40]. • Some insightful properties of optimal MCS selection solutions with constant switch-ing costs are outlined in Propositions 3.4.1 and 3.4.2. • Counterexamples are constructed (1) to demonstrate the significance of the suffi-cient conditions in Theorems 3, 4 and 5 by showing the non-existence of monotone optimal MCS selection policies when these conditions are not satisfied, and (2) to show that a straightforward generalization of monotonicity results (as claimed in [40]) to more than two MCS's with constant switching cost is not possible. Organization: This chapter is organized as follows. In Section 3.2, we introduce the system model. The MCS selection problem is formulated as a discounted infinite horizon MDP in Section 3.3. In Section 3.4, we present a general form of the solution and show that the optimal selection policies have threshold-based structures. In Section 3.5, counterexamples are given for non-monotone optimal MCS selection policies. Numerical results are presented in Section 3.6. Finally, some conclusions are drawn in Section 3.7. 3.2 Sys tem M o d e l In this section, a stochastic discrete event system model is presented. We consider a time slotted system where in each time slot a block of data is transmitted. We assume the symbol rate is fixed so different data rates are realized by adjusting the constellation and the coding rates via adaptive modulation and coding. It is convenient to outline our model in terms of the following five elements. (i) Transmission Intervals and Data Frames: The time axis is divided into slots of equal duration A T referred to as transmission intervals. In each transmission interval a block of data referred to as a data frame is transmitted. Note that the length of a frame in terms of bits is varying due to the different data rates in different time slots. By convention, time k, k E Z+, is the time interval [kAT,(k + I)AT), where Z+ = {0,1,2,...} is the set of non-negative integers. MCS selection decisions are made at times kAT, k E Z+. Throughout this chapter, the terms transmission interval, time slot, and frame duration are used interchangeably. (ii) Power: We assume fast power control is disabled and link adaptation is performed solely by adaptive modulation and coding. This assumption is used to isolate the effect of the underlying adaptive modulation and coding algorithms from other link adaptation 48 Chapter 3. Transmission Rate Control with Switching Costs methods. Furthermore, power adaptation in addition to rate adaptation gives negligible additional gains when the number of rates is high, cf. e.g. [38,61]. (iii) Markovian Fading Channel: Let sk denote the channel state at time k obtained by quantizing the received Signal-to-Noise-Ratio (SNR), averaged over the fc'th interval, of a continuous-time Rayleigh fading channel. We assume sk G S is an irreducible discrete time Markov chain with state space S :— {1,2,... ,N}. The received SNR from the Rayleigh fading channel can be quantized into N levels by partition 0 := [01,02,-•• , 0 J V + I ] - Here, 0\ — 0, 0jv+i — oo, and 9\ < 62 < • • • < 0jv+i- The channel is in state I G S if the SNR lies between 9i and Let Q :— [g s ,s] jVxAf denote the channel transition matrix (i.e., the transition matrix of Markov process Sk). The channel transition probabilities qsj a r e defined as qs>£ = P(sfe = sl-s/c-i = s), k G Z+ with P denoting the probability measure. The SNR range of each state given by partition 0 should be sufficiently large so that the probability of transition during one time slot is negligible. This ensures that a single state meaningfully represents the channel quality in each transmission interval. On the other hand, as the SNR range for each state increases, the precision of quantization decreases, and the Markovian model may not be able to resolve channel variations in different intervals. In view of this, we adopt the method introduced in [49] with some modifications (to allow for non-adjacent channel transitions) to select an appropriate partition. The key idea is to choose 0 in such a way that the average duration of each state is equal to cAT, where c is a positive constant. (iv) Adaptive Modulation and Coding: In each time slot an MCS is dynamically selected from a set of MCS's based on the observed channel quality. Define M := {1,2 , . . . , L} to be the set of available MCS's in descending order of robustness (MCS 1 is the most robust). By a more "robust" MCS we mean an MCS that gives a lower frame error probability for the same SNR. This usually corresponds to a relatively lower order modulation scheme and/or a coding scheme with more redundancy, and hence, a lower transmission rate. We refer to the rate chosen by the scheduler to transmit the data in each time slot as the transmission rate. Since link adaptation is performed solely by adaptive modulation and coding, there is a unique transmission rate associated with each MCS. Let r : M —» T be a function that maps a given MCS to the associated transmission rate. Note that for two MCS's i,j G M , if j is more robust than i, then r(i) > r{j). (v) Frame Error Rate: Recall that the data delivered in a time slot is referred to as a frame. Define / : S x M —> [0,1] to be a function that maps the channel state and the 49 Chapter 3. Transmission Rate Control with Switching Costs applied MCS to the instantaneous Frame Error Rate (FER). Here, the empirical term "error rate" is used to denote the error probability. By the instantaneous FER, we mean the probability of the occurrence of a frame error in a given time slot. 3.3 P r o b l e m Formula t ion for O p t i m a l A d a p t i v e M o d u l a t i o n and C o d i n g - M D P Framework The aim here is to formulate the optimal MCS selection problem as a discounted infinite horizon MDP [4]. The optimal MCS selection problem can be described as follows. Problem: Consider a time slotted transmission system where in each time slot the scheduler selects an MCS for transmitting a data frame. There is a delay cost associ-ated with each MCS that depends on the corresponding transmission rate and channel-dependent FER. Furthermore, if the scheduler changes the current MCS j G JA to MCS u G M., with u ^ j, the system incurs a switching cost 0 < C(s,j,u) < oo (with C{s,j,j) — 0), where s denotes the channel state. The transmission session is termi-nated with some probability 0 < 1 — a < 1 in each time slot and the objective is to determine the optimal sequence of MCS selections that gives the minimum expected cost until the termination of the transmission session. In the rest of this section we formulate the above problem as an MDP where the optimal MCS selection policy can be obtained from the Bellman equation in (3.4). 3.3.1 Components of the Discounted Infinite Horizon Markov Decision Problem ( M D P ) The formulated discounted infinite horizon MDP can be described by five components. (i) Actions: The action at time k is defined by the MCS selected at time k and denoted by uk G M . (ii) System States: Let xk G X = 8 x M be the system state at time k, defined by %k '•— (sfe, w/c-i), k G Z+, where XQ :— (SQ,U-\) and sk is the Markovian channel state at time k and U k - \ 6 M is the selected MCS at time k — 1. u-i represents any MCS that is selected initially, prior to the start of the transmission session and may be regarded as a virtual value. We assume the first MCS selection is made at time k — 0 with no switching cost. We assume that the channel state is reported to the scheduler in each time slot so the scheduler has perfect knowledge of the state xk for k G Z+. (iii) Non-Randomized Stationary MCS Selection Policies: A general non-randomized 50 Chapter 3. Transmission Rate Control with Switching Costs stationary MCS selection policy ji : X —> M is defined as a time invariant function that maps state xk into action uk as u(xk) — uk- We often use p*(xk) or u*(s,j) to denote the optimal policy with the understanding that u*(s,j) := u*(xk = (s,j)) is the optimal action at state (s,j). It can be shown that under some natural boundedness condi-tions [4, vol.2, p.9], there are always optimal stationary policies for discounted infinite horizon MDPs. Therefore, we consider only the class of stationary time-invariant policies that can be meaningfully implemented by any a finite mapping from states to actions. (iv) Action-Dependent Transition Probabilities: Let px,y(u) be the transition probabilities of our controlled Markov process under action u so px>y(u) := P(xk — y | X f c - i = x, Uk-i — u). Assume x = (s, I) and y = (s, rn) (s and s are channel states; m and I are previously selected MCS's). For x,y G X, we have px,y(u) — Qs,sl{u=m}, where lt.y denotes the indicator function and qSt^ are channel transition probabilities. (v) Cost-per-Stage: The immediate delay cost for MCS u G M and state xk = (s,j) (channel state s and previously selected MCS j) can be meaningfully defined as follows (see [40] and [62]) d(s,u) := 1 - r , (3.1) r(u)(l - f{s,u)) where r(u) and f(s,u) are the transmission rate and FER, respectively. Note that the delay cost d(s,u) is independent of the previously selected MCS j and is determined in terms of the current channel sate s and the current selected MCS u. The denominator term r(u)(l — f(s,u)) represents the expected throughput for the given time slot. Remark 4 Note that minimizing the sum of per-frame delay measures in (3.1) does not necessarily maximize the total expected throughput. However, the cost function in (3.1) achieves a suitable trade-off between the total throughput and the per-frame throughput and somewhat alleviates sharp delay variations across frames which is desirable for frame-based transmission systems. In addition to the delay cost, a finite switching cost 0 < C(s,j,u) < oo is incurred for switching from MCS j to MCS u. Here, we assume C(s,j,j) — 0, Vj G M . The cost-per-stage, can then be defined as the sum of delay and switching costs g(s,j,u) :=d{s,u) + C(s,j,u), (3.2) 51 Chapter 3. Transmission Rate Control with Switching Costs 3.3.2 Formula t ion as a Discounted Infinite H o r i z o n M D P The goal here is to minimize the expected total cost up to the end of a transmission session. This problem can be conveniently formulated as a discounted infinite horizon MDP [4] with the discount factor a. Clearly, 1 - a represents the probability that the transmission session is terminated at any given time slot. The objective is to find an MCS selection policy [i : X —• M that minimizes the expected cost function [40] = n 1 ™ o E | ^ « f e 5 ( ^ > M ( a - - f e ) ) | , xQ = x, (3.3) n ^ ° ° U=o J where E denotes the expectation operator over all pathes of the random process xk and g(xk,n(xk) := g(sk,Jk-x,Kxk)) assuming xk = (sk,jk-i). Note that the function g is bounded and therefore, the limit in (3.3) exists. Given an initial state x £ X, the optimal cost or the value function V(x) is defined by V(x) = min JJx). Equivalently, the value function V(s, j) := V(x = (s, j)) is given by V(s, j) = min JJs. j), where J^(s,j) :— Jp,(x = (s, j)). The value function V(s,j) and the corresponding opti-mal policy are obtained by solving the following Bellman equation [4] V(s, j) = min \ g(s, j, u) + a V qs,sV(s, u) \ . (3.4) k s=l ) where qsj are channel transition probabilities. From [4], optimal stationary MCS selec-tion policies are given by j) = argmin \ d(s,u) + aV(s,u) + C(s,j,u) \ , (3.5) where V(s, u) is the conditional-cost-to-go defined as N V(s,u)=^qsjV(s,u). (3.6) 8 = 1 Remark 5 Note that in general argmin operator in (3.5) gives a set of minimizers and therefore the optimal policy is not unique. To avoid additional technicalities, throughout 52 Chapter 3. Transmission Rate Control with Switching Costs this chapter, we assume argmin gives the minimum value of such minimizers (unless otherwise specified). In this case, by the optimal policy we usually refer to the following policy M*(s)j) = m m {argmin< d(s,u) + aV(s,u) + C(s,j, u) > }. (3.7) 3.4 Solutions to the M C S Selection M D P - S t ruc tu ra l Resul ts Generally, numerical algorithms such as value iteration and policy iteration algorithms [4] can be used to solve the Bellman equation in (3.4). However, as the number of channel states N and the number of available actions L grow, these methods become computationally more complex. Furthermore, these methods provide little qualitative understanding of optimal MCS selection policies. In this section, we focus on proving certain monotonicity (threshold-based) structures of the optimal policy. Remark 6 There are two main reasons behind the importance of structural results such as monotonicity of the solutions : (1) [Most important] These results usually describe characteristics of the solution for a large class of MDP parameters. Since generally the MDP parameters cannot be known exactly, these structural results are very useful for gaining qualitative understanding of the solutions. (2) These structural results can reduce the complexity of the numerical methods used for solving MDPs. In this con-text, the reader may refer to [5, p.259] for "monotone policy iteration" methods that exploit the monotonicity of the solution to reduce the complexity of the policy iteration algorithm. 3.4.1 Monotone Optimal M C S Selection Policies In the following, we present our two main theorems regarding the monotonicity of the optimal MCS selection policy outlined in (3.7). These monotonicity results are stated using optimization properties of submodular functions. Before formalizing the main results, we introduce the definition of submodular functions in the 3Xn space [63]. Definition 3.4.1 Let X, C 31. A function f(x\, :i'2,. • •, xn) : YYi=i * s ca^ed submodular in FliLi ^ * (or ^n (x'i> • • • > xn)) if for any x, y 6 rT?=i / ( a ; V y ) + / (xAy) < f(x) + f(y), where V (A) is the element-by-element max (min) operator. 53 Chapter 3. Transmission Rate Control with Switching Costs Remark 7 Based'on the above definition a function f(xi,X2) : X i x X2 —* 3i is sub-modular in X i x X2 if and only if f(x\,X2) has decreasing differences, i.e., if X2 > £2, then the variation f(x\,X2) — f(x\,X2) is decreasing in x\. Remark 8 Note that by convention we assume constant functions fit the definition of "decreasing" or "increasing". We use strictly increasing/decreasing to convey the strict inequalities. At this stage, we introduce the following two assumptions on the delay cost variations and Markovian channel transitions. These assumptions are naturally justified and are part of sufficient conditions for the monotonicity of optimal MCS selection policies. Assumption 3.4.1 Delay Variations - The delay cost d(s,u) defined in (3.1) is sub-modular in (s,u), i.e., if m > I then d(s,m) — d(s,l) is decreasing in s. Remark on the assumption - This assumption essentially states that as the quality of the channel improves, the relative performance of MCS m to MCS I, m > I, improves. Figure 3.1 in Section 3.6 illustrates the delay curves associated with various simulated modulation schemes. These graphs show the validity of Assumption 3.4.1 for M-ary QAMs. Assumption 3.4.2 Let qsj be the channel transition probabilities and Q = [qaid]NxN be the Markovian channel transition matrix. We assume each row of Q dominates its previous row (and consequently all rows before) in the First Order Stochastic Dominance sense [10], i.e., the channel transition probabilities qsj satisfy N N 9«-M ^ Y, for 2 < s < N, 1 < k < N. (3.8) s=k s=fe Remark on the assumption - By definition, the condition in (3.8) means that each row of the channel transition matrix Q dominates its previous row (and consequently all rows before) in the first order stochastic dominance sense. This implies that the better the current channel state is, the better is the expected value (conditional to the current state) of a future channel state. In the context of the Markovian fading channels, the condition in (3.8) is somewhat natural. To illustrate this, assume a special case where only adjacent channel transitions are allowed. In this case, Assumption 3.4.2 is equivalent to qs-\,s < Qs,s + <7&-,s+i, for 2 < s < N. The above can be justified based 54 Chapter 3. Transmission Rate Control with Switching Costs on the discussions on the Markovian channel model in Section 3.2 and [49]. Recall that the duration of each channel state must be large enough so that the SNR at each time slot is meaningfully represented by a Markovian state. This implies that typically the probability of no transition must be larger than the probability of a transition, otherwise, the required state duration may not be achieved. The assumption in (3.8) is also confirmed by our numerical results where a continuous-time Rayleigh fading channel is quantized into a Markovian channel based on the procedures described in [49]. Now we are ready to state one of the main theorems of this section regarding the monotonicity of the optimal MCS selection policy. Theorem 3 Suppose Assumptions3.4-l and 3.4-2 are satisfied. Furthermore, assume switching cost function C(s,j,u) is submodular in 8 x M x M . Let a* : 8 x M —> M denote the optimal MCS selection policy described in (3.7). Then u*(s,j) is increasing in (s,j), i.e., for r E 8 and i E M , /x*(s, j) > p*(r, i) if s > r and j > i. Proof: See Section 3.8.1 for the proof. Corollary 3.4.1 There exist threshold levels lyj £ 8 with I £ JvCu{0}U{L + l} , r)jto — 0, Vj,L+i — °°; and 0 < r)j,\ < r]jt2 < • • • < Vj,L < oo such that for any j £ M , we have fi*(s,j)=i < s < i G M . (3.9) Furthermore, for any given i £ M , r/i^ > ??2,i > • • • > VL,i-Proof: See Section 3.8.1 for the proof. Now, to state the next monotonicity result, we introduce the following assumption on the Markovian channel. Assumption 3.4.3 Markovian Channel - Assume channel transitions occur only be-tween the adjacent states, i.e., qs>^ — 0 if \s — s\ > 1, where qs^ denotes the channel transition probabilities. The following theorems outlines the monotonicity results for constant switching cost and two available MCS's. Theorem 4 Two MCS's with constant switching costs - Assume there are only two MCS's available with constant switching cost CS) i.e., C(s,j,u) = C s l{ u ^j} and M = 55 Chapter 3. Transmission Rate Control with Switching Costs {1,2}. Then, the optimal policy u*(s,j) described in (3.7) is increasing in (s,j) if As-sumption 3.4-1 and Assumption 3.4-2 are satisfied. Proof: See Section 3.8.1 for the proof. Theorem 5 Two MCS's with constant switching costs - Assume there are only two MCS's available with constant switching cost Cs, i.e., C(s,j,v) = Cs1{u^j} o-nd M = {1, 2}. Then, there is an optimal policy u*(s,j) as described in (3.5) that is increasing in (s,j) if Assumption 3.4-1 and Assumption 3-4-3 are satisfied. Furthermore, a necessary condition for switching is given by P*(s,j)^j \d(s,l)-d(s,2)\>(l-a)Cs. Proof: See Appendix B for the proof. Remark 9 Note that Theorem 5 essentially states the main monotonicity result in [40]. However, as seen in.Appendix B the rigorous proof of this result requires substantially more mathematical treatment than what is presented in [40]. Corollary 3.4.2 If either of the conditions in Theorem 4 o-nd 5 hold, then there are threshold levels d > 0 and (2 > 0 such that u*(s,l) = 2 d(s, 1) — d(s,2) > (j, and u*(s,2) — 1 d(s,2) — d(s,l) > (2, where the {Q} are generally different for each condition. Furthermore, under conditions of Theorem 5, we have Q > C,,(l — a), i = {l,2}. Proof: See Section 3.8.1 for the proof. 3.4.2 Properties of Optimal M C S Selection Policies with Constant Switching Cost Here, we introduce certain properties of the optimal MCS selection policy with constant switching cost. First, the following proposition outlines some important properties of the value function described in (3.4). 56 Chapter 3. Transmission Rate Control with Switching Costs Proposition 3.4.1 Properties of the value function with constant switching cost - Sup-pose switching from MCS j to MCS i ^ j incurs a fixed cost 0 < Cs < oo, i.e., C(s,j,u) = C sl{U 5£j}. Denote by u*(-), V(-), and V(-) the optimal policy, the value function, and the conditional cost-to-go as given by (3.7), (3.4), and (3.6), respectively. The following statements are then applicable to the value function described in (3.4)-(i) For any given state (s,j), we have \V(s,j)-V(s,l)\<Cs and \V(s,j) - V(s,l)\ < Cs, V j , Z e M . (ii) If the optimal action in state (s,j) is to switch to a new MCS, i.e., fi*(s,j) ^ j then V(s,j)>V(s,l), VI EM. Proof: See Section 3.8.2 for the proof. The following proposition outlines some important and intuitive properties of the optimal MCS selection policy for the case of constant switching cost. Proposition 3.4.2 Properties of optimal MCS selection policy with constant switching cost - Assume C(s,j,u) — Csl^u^jj. The following statements are then applicable to the optimal MCS selection policy described by (3.7). (i) If there is no switching cost (Cs = 0), the optimal MCS selection is reduced to a pure opportunistic (greedy) policy, i.e., at any time slot it is optimal to select the MCS with the lowest delay cost. (ii) If Cs > 0, it is never optimal to switch to an MCS with currently "worse" or equal delay, i.e., u*(s,j) ^ I if d(s,l) — d(s,j) > 0. (iii) Define Q(s,j) :— {I E M : d(s,l) -d(s,j) < -2CS}. If Q{s,j) ^ 0, then in state (s,j), it is optimal to switch to some MCS I E C(s,j). (iv) Suppose the only optimal action in state (s,j) is to switch into a new MCS u*(s,j) j. Then, the optimal action for any state (s, i) is either to remain in MCS i or to switch into MCS u*(s,j). In other words, if channel state is s, it is never optimal to switch into any MCS other than u*. Proof: See Section 3.8.2 for the proof. 57 Chapter 3. Transmission Rate Control with Switching Costs 3.5 Coun te r Examples In this section, we focus on the case of constant switching costs, i.e., C(s,j,v) = C s l { u ^ j} . To explore the extent of the validity of the monotonicity results in The-orems 3, 4 and 5, two examples of MCS selection MDP with non-monotone optimal policies are constructed. Example 1 - Non-monotonicity of optimal MCS selection policies with constant switching cost with more than two actions: This example demonstrates that optimal MCS selection policies are not monotone in general for constant switching cost (C(s,j, u) — CSI{U7LJ}), and more than two actions (|M| > 2) (non-submodular), even if Assump-tions 3.4.1, 3.4.2, and 3.4.3 are true. This is in contrast to [40] where it was conjectured that the extension of the monotonicity results to more than two MCS's is straightfor-ward. We choose M = {1,2,3}, S = {1,2,3,4}, Cs = 0.2, and a = 0.9. Furthermore, the channel transitions matrix Q and delay matrix D are chosen as below: " 0.5 0.5 0 0 0.01 0.98 0.01 0 0 0.03 0.03 0.94 0 0 0.05 0.95 Here, delay cost d(s,j) is given by the (s, j)'th element (row s and column j) of ma-trix D. It is easy to verify that Assumptions 3.4.1, 3.4.2, and 3.4.3 are valid by direct calculations. The optimal policy fi*(s,j) for j = {1,2,3} is obtained exactly via the policy iteration algorithm [4]. In this case, we have verified that there is a unique sta-tionary optimal policy partially described as /i*(l, 1) = 1, p*{2,1) = 2, /x*(3,1) = 1, and /x*(4,1) = 3. Therefore, /x*(s, 1) is non-monotone which contradicts any straightforward generalization of Theorems 4 and 5 to more than two MCS's. Qualitatively speaking, matrix Q is selected such that the transition from channel state 3 to 4 occurs with a high probability. Therefore, in state (3,1), it is optimal to remain in MCS 1 and pay only one switching cost to MCS 3 once transition to channel state 4 occurs. Example 2 - Significance of the proposed sufficient conditions for monotonicity of op-timal MCS selection: This example shows the significance of the Markovian channel Assumptions 3.4.2 and 3.4.3 with respect to monotonicity of the optimal policy. Sup-pose Assumption 3.4.1 on delay variations is true. It is demonstrated that optimal MCS selection policies, even with two actions, may not be monotone, if neither of the Marovian channel Assumptions 3.4.2 and 3.4.3 are true. In fact, this example shows monotonicity, D 1.2 1.5 4 1 0.9 2 1 0.9 2 1 0.9 0.1 (3.10) 58 Chapter 3. Transmission Rate Control with Switching Costs even in one variable, does not hold, i.e., if Assumptions 3.4.2 and 3.4.3 are not true, p*{s,j) may not be monotone in s for a given j. We assume constant switching cost (C(s,j,u) = Cal{uftj}) and choose M = {1,2}, S = {1,2,3,4}, Cs = 5, and a = 0.9. Furthermore, channel transition matrix Q and delay matrix D are chosen as below: Q 0.97 0.01 0.2 0 0.01 0.98 0.01 0 0.02 0.01 0.6 0.45 0 0 0.19 0.55 D 1.8 1.8 1.7 1.4 2.5 1.1 1 0.5 (3.11) where as before, the delay cost d(s,j) is given by the (s, j)'th element of matrix D. It is easy to verify that Assumption 3.4.1 on delay variation is satisfied. However, non-adjacent channel transitions are allowed and Assumption 3.4.2 is not valid (row 3 does not dominate row 2 in Q). The optimal policy u*(s.j) for j = {1,2} is obtained via the policy iteration algorithm. We have a unique stationary solution partially described by /i*(l, 1) = 1, p*(2,1) = 2, /i*(3,1) = 1, and /x*(4,1) = 1. Therefore, p*(s, 1) is non-monotone in s. Qualitatively speaking, this is because the switching cost is selected large enough so that MCS 1 remains the optimal MCS for channel states {1,3,4}. However, transition probabilities are chosen in such a way that channel state 2 is somewhat isolated from other channels. In particular, 92,2 ~ 1 so the delay cost in state (s, 1) accumulates for a long horizon which justifies paying a switching cost and moving to MCS 2. This can be shown more formally as follows. Let V^*(2,1) denote the minimum expected cost in initial state (2,1) among the stationary policies /Z : S x M —> M such that p(2,1) = 1. Clearly, if ii*(2,1) = 1 then 1^.(2,1) =.V(2,1). Based on q2;2 ~ 1, we have Vr (2,1) - V(2, 2) « f ; ak(d(2,1) - d(2, 2)) = 1) ~ 2 ) f—' 1 — a k=0 1.8-1.1 1-0.9 = 7 > Cs = 5. Therefore, in state (2,1) it is better to pay a switching cost Cs and move to MCS 2 which gives V(2,1) = V(2,2) + Cs. 3.6 N u m e r i c a l Resul ts For our simulations, a time-slotted transmission system is considered. Duration of a time slot (duration of a fame) is A T = 2 ms. For simplicity we assume only adaptive modula-59 Chapter 3. Transmission Rate Control with Switching Costs tion without coding. A total of six M-ary Q A M schemes with M = 2k, k = {1,2,3,4,6}, are considered. Table 3.1 summarizes the simulation parameters and assumptions. Our simulation procedure and results can be described by the following stages (I to IV): I - Simulating the fading channel: In this stage, a Rayleigh fading channel is simulated and quantized into a Markovian process. We start by simulating a Rayleigh fading chan-nel with a mean received SNR of 12.55 dB and Doppler shift fm — 33 Hz (or velocity of about 10 m/s). The Rayleigh process is then sampled every 1 ms with each sample representing the received Rayleigh gain for the associated frame. Note that this implies the assumption that the received SNR (obtained from the Rayleigh gain) remains more or less unchanged during any frame. In the next step, we quantize this Rayleigh fad-ing channel into a Markovian process. In our simulations, we assume that the average duration of each state is equal to 1.6AT. We have observed that this value allows for sufficiently large channel variations while on average one frame can be represented by a single state. In the next step, we use a similar procedure as in [49] with some modi-fications (to allow non-adjacent channel transitions) to compute the SNR quantization levels. In this way we obtain 24 SNR quantization levels as given by Table 3.2 with the first level 9\ — 0. The quantization levels in Table 3.2 are then used to quantize the SNR of the simulated Rayleigh process. The Markov transition matrix is estimated by counting the transitions within the quantized SNR sequence. II - Computing FERs and delay costs: By repeated experiments a look-up table for the FER in terms of channel states for available modulations is constructed. The number of symbols per frame is assumed to be T = 25. Figure 3.1 shows the obtained delay costs (defined in (3.1)) for 2fc-ary Q A M with k — {1,2,. . . , 6}. Note that the graphs in Figure 3.1 confirm Assumption 3.4.1. III - Obtaining optimal MCS selection policies by policy iteration: The policy iteration algorithm as described in [4] is implemented to numerically obtain optimal MCS as-signment policies for different switching costs and MCS sets. In our simulations, we set a = 0.9 and optimize three cases as follows. Case 1: Constant switching cost and two modulation schemes {QPSK, 16-QAM}. Case 2: Constant switching cost and six 2fc-ary Q A M schemes with A; = {1,2,. . . , 6}. Case 3: Submodular switching cost and six 2fc-ary Q A M schemes with k = {1, 2, . . . , 6}. The results for each case are shown by plotting the optimal modulation selection policy and the optimal cost. Figure 3.2 shows the plots of the optimal MCS selection policy for Case 1 with Cs = 30 and a = 0.9. The dashed lines in the figures represent the channel state thresholds for the case of no switching cost (Cs = 0). It can be seen in the figures that the optimal policy exhibits 60 Chapter 3. Transmission Rate Control with Switching Costs a certain "Hysteresis" effect by remaining in MCS 1 (2) for a number of channel states, even if d(s, 1) > d(s,2) (d(s,2) > d(s, 1)). These simulation results confirm the analyt-ical results in Theorem 4. Figure 3.3 shows the optimal cost for Case 1. The optimal MCS selection policies for Case 2 and Case 3 are shown in Figures 3.4(a) and 3.4(b) with Cs — 30, respectively. It is easy to verify from Figure 3.4(b) that V(s,j) is monotone in (s,j). This confirms Theorem 3 based on the fact that Cs(u — j)2 is submodular in (s,j,i). Note that the optimal policy of Case 1 shown in Figure 3.4(a), is not monotone in (s, j). However, the shown policy is monotone in s for any given j. As explained in Example 1, the optimal MCS selection policy with constant switching cost and more than two MCS's is not monotone in general (even in a single variable s) under the adopted conditions. The optimal costs V(s.j) for simulation Case 2 and Case 3 are plotted in Figure 3.5. Note that each curve gives the optimal V(s,j) for a given previous MCS j. The graphs in Figure 3.5(b) confirm the submodularity of V(s,j) for Case 3. IV - Performance evaluation: In this stage, the performance of the optimal policy in terms of overall throughput and switching rate is evaluated. Figure 3.6(a) shows the achieved throughput in terms of Cs for Case 2 and Case 3. As Cs increases the optimal policy switches more reluctantly among the different MCS's and hence sacrifices the over-all throughput. Figure 3.6(b) shows the switching rate drops by increasing the switching cost. Note that Cs — 0 corresponds to pure opportunistic MCS selection which gives the highest throughput but also the highest switching rate. A similar adaptive algorithm as in [40] can be applied to determine an appropriate switching cost Cs. If the number of switchings are excessive, the algorithm increases the switching cost correspondingly. On the other hand, if a higher throughput is required, the switching cost is reduced. 3.7 Conclusions In this chapter, we have formulated the MCS selection problem as a discounted infinite horizon MDP that achieves an optimal trade-off between transmission delay and MCS switching costs. Under certain proposed sufficient conditions, we have shown that there are optimal MCS selection policies that have monotone (threshold-based) structures. These sufficient conditions are characterized by the submodularity of the switching cost function and the delay cost and first order stochastic dominance conditions on chan-nel transition probabilities. The underlying monotone structures provide a qualitative understanding of the optimal MCS allocation methods and facilitate the real-time im-plementation of the optimal policy with minimal complexities. Numerical results show 61 Chapter 3. Transmission Rate Control with Switching Costs significant reduction in switching rate at the expense of a moderate decrease in through-put. 3.8 A n a l y t i c a l Proofs 3.8.1 Proofs of the Main Monotonicity Results Here, we present the proofs for the main results presented in Section 3.4.1. Proof of Theorem 3: First, we show that value function V(s,j) is submodular in (s, j). Let Vn(x) be the optimal cost over a horizon of length n, defined as Vn(x) := min E <^  akg(xk, u(xk)) \. M U=o J Then by substituting x — (s,j), the value function Vn(s,j) := Vn(x = (s,j)) satisfies the following optimality equation (see [4, vol.1]) Vn(s,j) = min \d(s,u) + C(s,j,u) + aVn-1{s,u)\ , (3.12) where Vn(s,u) :— 2~2gLi Qs,4Vn{s,u) and VQ(S,J) — 0. We.show that Vn(s,j) is submodu-lar in (s, j) for any n. Since VQ(S, j) — 0, Vn(s,j) is submodular for n — 0. Now, suppose the claim is true for n — 1, i.e., Vn-i(s,j) is submodular in (s,j). First, we show that aVn-i(s,j) is submodular. To see this, let ji, j2 E M and j\ > j2- By definition N. [Vn-i(s,ji)-Vn-i(s,J2)] =^2qaAVn-i(s,jl) - yri-l(s,j2)}. s=l Submodularity of Vn-\(s,j) means that Vn-\(s,ji) — Vn-i(s,j2) is decreasing in s, hence by the First Order Stochastic Dominance property in Assumption 3.4.1, if s > r then ^2qs,s[Vn-l(s,jl) - Vn-i(s,j2)\< ^2qrAVn-l(s,ji) ~ Vn-i(s,j2)]-s s Hence, [Vn-i(s, j i ) — Vn~i(s,j2)] is decreasing in s, which means Vn-\(s,j) (and conse-quently otVn-\(s,j)) is submodular in (s, j). By Assumption 3.4.1, d(s,u) is submodular in (s, u) and since d(s, u) is independent of j, d(s, u) is also submodular in (s,j, u). Fur-thermore, by assumption, C(s,j,u) is submodular in (s,j,u) and by Lemma 2.6.1(b) in [63] the sum of submodular functions is submodular. Therefore, d(s,u) + C(s,j,u) + 62 Chapter 3. Transmission Rate Control with Switching Costs aVn-i(s,j) is submodular in (s,j,u). From (3.12) and Theorem 2.7.7 [63] (preservation of submodularity), Vn(s,j) is submodular in (s,j). This completes the induction proof. The submodularity of V(s, j) — lim Vn(s, j) then follows from the fact that the limit of n—>oc a sequence of submodular functions is submodular (see Lemma 2.6.1(c) in [63]). There-fore, R(s,j,u) :— d(s,u) + C(s,j,u) + a^2^qs^V(s,u) is submodular in (s,j,u) under a similar argument as above. Hence, the optimal policy p*(s,j) = argminims, j,u) is increasing in (s,j) by Theorem 2.8.2 in [63]. • Proof of Corollary 3.4.1: By Theorem 3, fi*(s,j) is increasing in s for any given j. Since the set of available actions is finite, (3.9) is immediately concluded. Furthermore, suppose l,j € M , I > j. Let s = 77^ -1, S O (3.9) gives p*{s,l) — i. By Theorem 3, P*(s,j) < i, therefore by (3.9) we have T ^ . J - I > s or r]j,i-i > Vi^-it which completes the proof. • Proof of Theorem 4: Observe that C(s,j, 2) - C(s,j, 1) = Csl{j^2) ~ C s l{jyi}-Hence, C(s,j,u) has decreasing differences and is submodular in (j,u). Since by as-sumption C(s,j,u) is independent of s, C(s,j,u) is also submodular in (s,j,u). By Assumptions 3.4.1 and 3.4.2 and submodularity of C(s,j, u), Theorem 3 implies p*(s,j) is increasing in (s,j). • Proof of Corollary 3.4.2: By Theorems 4 and 5, /x*(s,j) is increasing in (s,j). Corollary 3.4.1 then implies that there are threshold levels 771,1 and 772,2 such that p*(s,l) = 2 s > 771,1 and p*(s,2) — 1 s < 772,2- This implies, by Assump-tion 3.4.1, the existence of threshold level £1, C2 > 0 for the delay differences. Further-more, the necessary condition for switching in Theorem 5 gives Q > Cs(l — a), i = {1, 2}. 3.8.2 Proof of Optimal M C S Selection Properties Proof of Proposition 3.4.1: The proof for each part of the proposition is outlined below. (i) - Suppose j I. Clearly, at any time we can pay a switching cost Cs to change the state from (s, j) to (s,l). Hence V(s,j) < V(s,l) + Cs for any j,l € M and the proof follows. For the second inequality, apply the definition in (3.6) to obtain \ V(s,j) — V(s,l)\ < Zi=iQs,i\V(s,j) - V(s,l)\ < Ca. (ii) - Define u* := /i*(s,j) and suppose I 7^  j. Since u* may not be the optimal action in state (s, I), we have V(s, I) < d(s, u*)+Csl{u*^+aV(s, u*) < d(s. u*)+Cs+aV(s, u*) = V(s,j). • 63 Chapter 3. Transmission Rate Control with Switching Costs Proof of Proposition 3.4.2: The proof for each part of the proposition is outlined below. (i) - Since there are no switching restrictions, the conditional cost-to-go functions are independent of the initial MCS or V(s, h)-V(s, k) - 0 for h , l2 G M . The result follows by substituting this in (3.4) with g(s,j,u) — d(s,u). (ii) - Since the switching cost is fixed for all time slots and for all MCS's, we would do no worse if we postpone switching to a future time slot where there is an immediate improvement in the delay cost-per-stage. More formally, from (3.4), we have a[V(s,j) — V(s, i)] > d(s, i) — d(s,j) + Cs. If d(s, i) — d(s,j) > 0, then the right hand side is greater than Cs which contradicts Proposition 3.4.1 (ii), hence we must have d(s,i) < d(s,j). (iii) - It is clear that if the improvement in the delay cost-per-stage from MCS j to MCS I exceeds a "round trip" cost from MCS j to / then we can reduce the cost by switching. (iv) - Suppose for some i, p*(s,i) ^ i. Then V(s,i) := min I d(s,l) + aV(s,l) + Cs \. By (3.7), p*(s,j) = argmin \ d(s,l) + aV(s,l) + Cs \ so we have /x*(s,i) = p*(s,j). • leM t > 64 Chapter 3. Transmission Rate Control with Switching Costs co fl a P-. fl o C O 00 o 'c3 C O T3 < + H I qfl -fl bfl > tf \% CD ci K co u CD co O LO E"H < o o o o LO fl O T3 «3 rH H H fl .9 fl -3 ^ J H 0j U H fl .52 fl T3 LO LO CN CO i—I CO N 03 W a o CD a C O co is • Cu OH t-* Ji o fl ca S Q Q U > 5 u fl CP fl cr CD S H 03 =15 T3 o CD fl fl cc3 fl '> o r^ ccJ T3 CD > JD fl S3 IS) '•s cc3 fl cr C O CN co Ji CN rH •as 15.99 21.70 -H rH C£j 15.15 CO CN 21.27 O 14.24 CN CN <£, 20.81 13.2081 t-H CN 20.33 13.2081 0 CN sc. 19.82 CO <i> 12.044 19.82 12.044 crs 19.2853 10.70 19.2853 10.70 00 18.72 to C£> 9.11 18.72 I O <35 7.17 T-H <CE> 18.11 4.67 18.11 4.67 to <ac> 18.11 CO 1.15 18.11 1.15 rH 17.46 CN OS -4.87 17.46 -4.87 •* rH 16.75 U-l .a 7 16.75 CO < ^ 15.99 15.99 65 Chapter 3. Transmission Rate Control with Switching Costs 10 " 1 ' i i i I - 5 0 5 10 15 20 25 SNR (dB) Figure 3.1: Delay costs of Af-ary QAMs with 25 symbols per frame vs. channel SNR. 66 Chapter 3. Transmission Rate Control with Switching Costs Previous M C S j =1 CO o CD E -+—* CL O M C S 1: Q P S K M C S 2: 16 -QAM C, = 16.7531 (dB) 0 5 10 15 Channe l S N R (dB) 20 (a) Optimal MCS n*{s,l) vs. channel SNR. Previous MCS j =2 2 CO o CD £ -i—• Q. o M C S 1: Q P S K ; M C S 2: 16 -QAM C = 12.0439 (dB) B - B - H B H H H B I l M l H I i ™ 5 10 15 Channe l S N R (dB) 20 (b) Optimal MCS /i*(s,2) vs. channel SNR. Figure 3.2: Optimal MCS selection policy p*(s,j) for Case 1: Two modulation schemes QPSK and 16-QAM and constant switching cost Cs — 30. 67 Chapter 3. Transmission Rate Control with Switching Costs 700 600 500 -4—' w O 400 CD •I 300 Q . O 200 100 Optimal Cost vs. SNR °14 u \ -•-V(s,1) - - \/fc 0\ \ w J*— / *\ i • 1 • i 1 16 18 20 Channel SNR (dB) 22 Figure 3.3: Optimal cost V(s,j) vs. channel SNR. 68 Chapter 3. Transmission Rate Control with Switching Costs Optimal MCS 6\ Channel SNR (dB) (a) Optimal MCS selection vs. channel state SNR for Case 2 with constant switching cost C, = 30. Optimal MCS vs. SNR 6. Previous "-j 1 \ 1 < I \ MCS - 5 0 5 10 15 20 25 Channel SNR (dB) (b) Optimal MCS selection vs. channel state SNR for Case 3 with sub-modular switching cost C(s,j,u) = 30(w — j)2. Figure 3.4: Optimal MCS selection vs. channel state SNR for Case 2 and 3 with M = 2k-ary QAMs, fc = {1,2,... ,6} 69 Chapter 3. Transmission Rate Control with Switching Costs Optimal cost vs. channel SNR 250 200 o O m 150r ra £ o 100 16 18 20 Channel SNR (dB) (a) Optimal cost V(s, j) vs. SNR for simulation Case 2. Optimal cost vs. channel SNR 16 18 20 Channel SNR (dB) 22 (b) Optimal cost V(s,j) vs. SNR for simulation Case 3. Figure 3.5: Optimal Costs vs. channel state SNR for simulation Case 2 and Case 3. Each curve represents V(s,j) where j is a M-ary Q A M scheme shown in the legend. 70 Chapter 3. Transmission Rate Control with Switching Costs 14000r t Tj 12000 cu CO e10000 -4—' Q. o> 8000 o i— 6000 4000. Throughput vs. Switching Cost •6-MCS: Constant Switching Cost •6-MCS: Switching cost C (i-j)2 0 20 40 60 80 Switching Cost C 100 (a) Throughput vs. switching cost for the optimal M C S selection. Switching Rate vs. Switching Cost 100 i 80 N :r CD « CO Dl CD Jz o '1 -Q-6-MCS: Constant Switching Cost -o-6-MCS: Switching cost CJi-j) 2 20 40 60 80 Switching Cost C 100 (b) Switching rate vs. switching cost for the optimal M C S selection. Figure 3.6: Performance evaluation of optimal MCS selection in terms of the throughput and the switching rate for Case 2 and Case 3. 71 Chapter 4 Optimal Power and Retransmission Control in ARQ In this chapter, we present a power and retransmission control policy that achieves an optimal tradeoff between transmission power, delay, and packet drop penalty costs over a radio fading channel. We formulate the underlying power and retransmission control as a Markov Decision Problem (MDP). Under certain sufficient conditions, we show that the optimal transmission power monotonically decreases in the channel quality and the num-ber of retransmission slots left. Furthermore, we present optimal monotone schemes for dynamically assigning penalty costs across frame retransmission sessions. These struc-tural results lead to power allocation policies that are computationally inexpensive for on-line implementation and are scalable in terms of transmission parameters. Numeri-cal examples confirm the monotonicity results and demonstrate the improvement in the transmission cost enabled by the optimal policy. 4.1 In t roduc t ion Next generation wireless communications systems primarily aim at seamless integration of high speed data, voice, and multimedia services by using unified packet-based so-lutions with internet backbones. In these solutions a combination of different wireless link adaptation methods will be utilized simultaneously over the radio fading channel so that (i) the Quality-of-Service (QoS) requirements for different types of traffic are guaranteed, and (ii) the wireless network utilization is maximized by achieving suitable trade-offs between the relevant performance measures (e.g. delay) and the involved sys-tem resources (e.g. power). Typically, Automatic Repeat Request (ARQ) schemes are employed in the physical/link layer to achieve low error rates for error sensitive and delay tolerant data services. In contrast, these retransmission schemes may introduce delays not suitable for more delay sensitive services such as voice and video. Additional link adaptation methods such as power and rate control may be applied in this case to 72 Chapter 4. Optimal Power and Retransmission Control in ARQ achieve the goals described in (i) and (ii). Examples include power control schemes pro-visioned for the Universal Mobile Telecommunications System (UMTS) random access retransmissions [45,64], and rate control via adaptive modulation and coding in High Speed Downlink Packet Access (HSDPA) systems [1]. In this chapter, we consider the problem of power and retransmission control in adaptive ARQ systems over a fading radio channel under a Marokov Decision Prob-lem (MDP) framework. We devise power control algorithms that exploit radio channel variations and other relevant information across the retransmission frames to achieve a suitable trade-off between power, delay, and packet drop penalty costs. Furthermore, we propose an extension to traditional ARQ power and retransmission control policies by provisioning an option to abort retransmission in any given time slot. This feature may be particularly useful in poor channel qualities where there is a low probability for successful transmission in the given horizon of time slots. The proposed framework is designed to be scalable in terms of the involved parameters, i.e., depending on the traffic type and the available resources we may choose to calibrate the transmission towards less delay, less packet drops, or less power consumption by increasing the corresponding cost parameters. Context: Adaptive techniques for energy conservation and delay minimization over fading channels have been extensively studied in the literature. In [41], an On/Off file transfer ARQ control is presented that minimizes power and delay costs over a partially observed two-state Markovian channel (Gilbert-Elliot). In [42] an MDP framework for optimal power and retransmission control for random access systems is considered. In [44] a joint power/rate control for minimizing average delay under average power constraints in an MDP framework is proposed. The effectiveness of power ramping in ARQ systems over a single path Rayleigh fading channel is experimentally confirmed for a WCDMA downlink in [45]. In the absence of the Markovian dynamics, [46, pp.8-11] proves certain insightful monotonicity results for a sequential allocation problem subject to penalty costs. Contributions: Our major contributions in this chapter are summarized as follows. • A meaningful trade-off between power, delay, and packet drop costs in an adaptive ARQ system is formulated as a finite horizon MDP. This formulation generalizes some aspects of the work in [46, pp.8-11] by including a Markovian dynamic. • The monotonicity of optimal power and retransmission control policies in number of retransmissions, delay, and penalty cost is proven in Theorem 6. Interestingly, this 73 Chapter 4. Optimal Power and Retransmission Control in ARQ theorem provides a rigorous and elegant analytical explanation for the experimental results in [45] on the effectiveness of power ramping in ARQ systems. • The monotonicity of optimal power and retransmission control policies in the chan-nel state is proven in Theorem 7 under certain sufficient conditions on the Marko-vian channel. This theorem generalizes some of the monotonicity results in [46] for Markovian dynamics. Organization: This chapter is organized as follows. In Section 4.2. we introduce the system model. The power and retransmission control problem is formulated as a finite horizon MDP in Section 4.3.1. In Section 4.3.2. we show certain monotonicity structures of the optimal power and retransmission control policies. In Section 4.4, methods for assigning penalty costs to frame transmission sessions are discussed. Numerical results are presented in Section 4.5. Finally, some conclusions are drawn in Section 4.7. 4.2 Sys tem M o d e l In this section, a stochastic discrete7event system model for a packet transmission sys-tem is presented. We consider a general time-slotted wireless transmission system. We assume transmission of a packet (e.g. Internet Protocol (IP) packet) that is divided into several data frames. Each data frame is then fitted into a transmission time slot and transmitted via ARQ over a fading radio channel. Our transmission model can be outlined in terms of the following five elements: (i) Transmission Intervals and Data Frames: Define A T to be the duration of one transmission interval. At each transmission interval or time slot a block of data referred to as a data frame is transmitted. By convention, time k, k 6 {1,2,. . . , iV+1} is the time interval [(k — 1)AT, kAT) (the fcth time slot). Here, iV denotes the maximum number of transmission attempts. Note that after N attempts the control is terminated and in case of transmission failure, a penalty cost is incurred at time N + 1. (ii) Power: We assume that fast power control is enabled and power control decisions are made at the start of each time slot. Let 0 < yk < ymax denote the transmission power at time k with ymax denoting the maximum allowable power. (iii) Markovian Fading Channel: We consider a Rayleigh fading channel modeled by a finite-state Markovian process. Let sk denote the channel state at time k obtained by quantizing the received SNR of a continuous-time Rayleigh fading channel where the average received SNR is normalized to one. We assume sk € Sis an irreducible discrete 74 Chapter 4. Optimal Power and Retransmission Control in ARQ time Markov chain with state spaces 8 := {1,2,. . . , L}. The received normalized SNR from the Rayleigh fading channels can be quantized into L level by some partition 9 defined as 0 = (0i, 62, • • • , 9L+I) with Q\ = 0 and 9r,+x = 00 and 6\ < 62 < • • • < OL+I-Here, a procedure similar to [49] is adopted to select an appropriate partition. The channel is in state I E 8 if the SNR lies between 61 and 9i+\. Let Q :— [qij]LxL denote the channel transition matrix, where channel transition probabilities q,[j are defined as qij = P(s/c = j|sfe_i = i), k E Z+, with P denoting the probability measure. (iv) Link/Physical-layer A R Q : We assume data frames are transmitted via an ARQ scheme. If the user receiver detects transmission errors in a frame, it will automatically request a retransmission. The frame is retransmitted until an acknowledgement (ACK) is received from the user or the error persists beyond AT — 1 retransmissions. (v) Frame Error Probability: Define / : 8 x [0,ymaa-] —> [0,1] to be a function that maps the channel state and the applied power to the instantaneous Frame Error Probability (FEP). Therefore, 1 — f(s,y) is the probability of a successful transmission in channel state s with power y. Naturally, function / is assumed to be decreasing in both s and y. Also, define a frame transmission session to be the collection of time slots consumed for transmitting/retransmitting a data frame. Note that the length of a transmission session is a random variable. 4.3 Frame Retransmiss ion C o n t r o l - M D P Framework In this section, we consider the problem of power and retransmission control in the frame level as stated below. Problem: Consider a transmission system where a single data frame is transmitted at each time slot over a Markovian fading channel. Upon error detection the frame is retransmitted up to N—1 consecutive time slots. If the transmission is not successful after these N attempts, the data frame is dropped and a penalty cost 0 < Cf < 00 is incurred. Alternatively, in any time slot, the scheduler can decide to quit the transmission session with a penalty cost Cf. Associated with the transmission of each frame with the allocated power y is a fixed delay cost D, and a power cost H(y) which is increasing in y. The objective is to dynamically determine in any time slot whether to quit or to continue the transmission, and in the latter case, to allocate the optimal transmission power which minimizes the total expected cost until the termination of the frame transmission session. 75 Chapter 4. Optimal Power and Retransmission Control in ARQ 4.3.1 M D P Formulation of Frame Retransmission Control Here, we formulate the above problem as an MDP and derive the corresponding Bellman equation in Proposition 4.3.1. The formulated MDP can be described by the following four components. (i) System States: Let xk G X be the system state at time 1 < k < N + 1, defined by xk := (sk, ifc, jk-i), where sk G § is the Markovian channel state at time k, ik G {0,1} is the number of the frames left for transmission at time k with i\ — 1, jk G {0,1} is an indicator that the transmission session is aborted without success at time k with jo = 0, and X := S x {0,1} x {0,1} is the state space. Note that we have ^k 1{Unsuccessful transmission at time k — l}i jk 1 {Transmission session aborted without success at time k}t where i\ = 1, jo = 0, and l / . j is the indicator function. Here, we assume the states (•, 0, •) and (•, •, 1) are all absorbing states with no immediate costs since the transmission session is terminated in these states. Hence, we have j k = 1 j f c + i = 1, and ik = 0 => ik+i =0, 1 < k < N. We assume that the channel state is reported to the scheduler in each time slot so the scheduler has perfect knowledge of the state xk for k = 1,..., N + 1. (ii) Actions: Let uk := [yk,jk) G U be the action at time k with U denoting the action space. Here, yk. is the transmission power at time A: and jk — 1(0) is the decision to abort (transmit) as given by (4.2). Recall that we assume 0 < yk < ymax and therefore, U=[0,ymax] x {0,1}. (iii) Non-Randomized Power and Retransmission Control Policies: A non-randomized power and retransmission control policy {/J.k} is defined as a sequence of functions u-k : X —• U, k G {1, 2,... , AT -f- 1} that maps state xk into actions uk as Mfc(xfc) = uk '•— (Vkijk)- Note that since the control at time k — N + 1 is terminated, function HN+I is irrelevant and is only considered for the formal formulation of the cost function. Notation: Throughout this chapter, we often use the concise notation \ik instead of {u-k} to represent a non-randomized policy. (iv) Cost-per-Stage: The cost-per-stage at time k, in state Xk = ( s k , i k , j k - i ) £ X, with action u k = (ykJk) € IX is denoted by gk(xk,Uk) 9k{sk,ik,jk-i,yk,jk), 1 < k < 76 (4.1) (4.2) Chapter 4. Optimal Power and Retransmission Control in ARQ N + 1, where 9k(sk,ik,jk-i,yk,3k) = { 0, Cf + H(yk), Cf, k D + H(yk), ik = 0 or jk-i = 1, (jk, jk-i) = (1,0), (k, iN+l, jN) = (N + 1,1 otherwise, 0), (4.3) where 0 < H(y) < oo is the cost of the consumed power y with H(0) — 0, 0 < Cf < oo is the penalty cost associated with dropping the frame, and 0 < D < oo is the delay cost associated with the passage of the current time slot in an active transmission session. Remark 10 Here, we explain the cost-per-stage given by (4.3) in more detail. First, note that if the transmission is successful at time k — 1 then there will be no frame left at time k (ik = 0) and the cost-per-stage will be zero at time k and all of the subsequent time. slots. Also, if we choose to abort transmission at time k (jk — 0), an immediate penalty cost Cf is incurred at time k but no cost will be incurred for the subsequent time slots (transmission session terminated). Note that in the case of aborting the transmission Uk — 0), a power cost H(yk) is also added to the penalty cost (the second line in (4.3)). This additional power cost is essentially zero by optimally allocating no transmission power yk = 0 (H(0) = 0) and is merely introduced to simplify the problem formulation. Next, if the frame is not transmitted after consuming all of the time slots (i;v+i = 1) then a penalty cost Cf incurs at the final time slot N + 1. For an active transmission slot with the allocated power y the cost-per-stage is the sum of the delay cost D and the power cost H(y), where H(y) is an increasing function of y. H(y) can be regarded as the sum of two different components: (1) the actual energy cost, and (2) the interference cost imposed on the system by the frame transmission power y. Remark 11 Due to considerations of user interference, it is appropriate to choose the power cost function H(y) such that transmission power is penalized by an increasing rate in consumption. A proper candidate is the class of power functions H(y) = ya, a > 1. Here, a determines how dominant the power cost is relative to the other costs. In situ-ations where minimal energy consumption is required, a may be increased accordingly. In our numerical results, we find that a = 2.5 (or H(y) = y 2 , 5 ) gives a reasonable range of power consumption for the given simulation parameters. The goal is to minimize the expected total cost up to the end of a transmission session. This problem can be conveniently formulated as a finite horizon MDP. The 77 Chapter 4. Optimal Power and Retransmission Control in ARQ objective is then to find policy {u-k}. k G { 1 , 2 , . . . , iV + 1} that minimizes the following cost function at initial time n in state x G X (N+l 1 4"*>(x) = E ^ gk(xk, u.k(xk)) \ , xn = x, (4.4) {k=n J where x„ is the initial state, and E is the expectation operator over all paths of the random process xk. The optimal cost function at initial time n and in state x — (s, i.j) G Z is given by J*{s,i,j) := min J^k}(x = (s,i,j)), {»k} where s denotes the current channel state, i and j represent ik and in (4.1) and (4.2) in a generic time slot, respectively. If there is no packet left to send (i = 0) or the frame transmission session has been aborted in the prior time slot (j — 1) then no more power is allocated and the optimal remaining cost is zero, i.e., J*(s,n,0,j) = 0 and J*(s,n,i,l) — 0. Therefore, it suffices to formulate and solve the problem for (i, j) = ( 1 , 0 ) . In this case, we define the value function at time n as Vn(s):= J r : (M,0) , (4.5) which denotes the optimal cost assuming that the packet has not been successfully trans-mitted and the session has not been aborted yet. The following proposition then presents the optimality equation governing this value function. Proposition 4.3.1 The value function defined in (4-5) satisfies the following Bellman equation Vn(s)= mm {H(y) + (l-j)[D + f(8,y)Vn+1(s)}+jCf} with VN+i{s) = Cf, (4.6) where the cost-to-go function is defined as Vn(s) := X ^ ' ^ s ' ) , (4.7) s' with qs^> denoting the channel transition probabilities. Optimal power and retransmission 78 Chapter 4. Optimal Power and Retransmission Control in ARQ control policies / i * : 8 x [1, N + 1] —> [0, ymax] x {0,1} at time n are given by Mn( s ) == (Vn(s), JnOO) = argmin {H(y) + (1 - j)[D + f(s, y)Vn+1(s)] + j Cf) . (4.8) Proof: See Section 4.6.2 for the proof. Remark 12 In general argmin operator in (4.8) may give a set of minimizers and there-fore the optimal policy is not unique. To avoid additional technicalities, throughout this chapter, we assume argmin gives the minimum value of such minimizers (unless otherwise specified). 4.3.2 Optimal Monotone Frame Retransmission Control Policies Generally, closed-form solutions for (4.6) may not exist. Alternatively, by quantizing the power to discrete values numerical dynamic programming algorithms can be used to solve the Bellman equation in (4.6) [4], However, as the number of power levels and the number of channel states grow, these methods become computationally more complex. Furthermore, these numerical methods provide little qualitative understanding of opti-mal power and retransmission control policies. Hence, there is a motivation to explore whether optimal power and retransmission control policies have any special structures. These special structures may help us (i) to gain intuition into the behavior of the optimal policy without perfect knowledge of the system parameters, and (ii) to reduce the com-plexity of the numerical dynamic programming algorithms. In this section, we focus on showing certain monotonicity properties of the optimal power and retransmission con-trol policies by utilizing the optimization techniques from supermodular function theory (cf. [63]). We proceed by introducing the following proposition which states a result regarding the monotonicity of the optimal cost. Essentially, this proposition states the intuitive fact that as the number of the avail-able transmission slots increases, the optimal cost defined in (4.5) decreases and as the delay or the penalty costs increases the optimal cost increases. These results describe the monotonicity properties of the solution for a large class of MDP parameters and hence helps to gain intuition into the behavior of the optimal policy without a perfect knowledge of the system parameters. This intuition may be very useful in developing simple and efficient rule-of-thumb power and retransmission control policies. Furthermore, these monotonicity results can alleviate the complexity of 79 Chapter 4. Optimal Power and Retransmission Control in ARQ the dynamic programming algorithms by reducing the space of the optimal policies. In particular, due to the presence of a continuous power variable y in (4.6), the reduction in computational complexity may be quite notable. Proposition 4.3.2 Let Vn(s;Cf,D) be the value function defined in (4-5), where its dependence on (Cf,D) is shown explicitly. Then, 0 < Vn(s;Cf,D) < Cf and for any given channel state s G 8, Vn(s; Cf,D) is increasing in (n,Cf,D). Proof: See Section 4.6.2 for the proof. Notation 1: Throughout this chapter, we use ";" as in Vn(s;Cf, D), to separate the implicit and explicit variables in the argument of a function. Note that by changing Cf or D (implicit variables) in Vn(s; Cf, D), we essentially get new functions. Notation 2: Throughout this'chapter, we assume constant functions fit the defini-tion of "decreasing" or 'increasing" (we use strictly increasing/decreasing to convey the strict inequalities). At this point we present the first theorem which states a significant result regarding the monotonicity of the optimal policy. Theorem 6 Let (y*(s;Cf,D),j*(s;Cf,D)) be the optimal policy given in (4-8), where the dependence on the penalty cost and the delay cost is shown explicitly. Then, for any given channel state s G S, the following statements are true. (i) The optimal aborting decision j*(s;Cf, D) is increasing in (n,—Cf,D), i.e., if (ii,—Cf,D) > (n,—Cf,D) (element-wise), then j^(s;Cf, D) > j^(s;—Cf,D). (ii) The optimal power y (^s; Cf, D) is increasing in (n, Cf, D), assuming that the trans-mission session is not aborted. Proof: See Section 4.6.2 for the proof. Remark 13 Theorem 6 provides natural and intuitive interpretations. As the delay cost increases (assuming the rest of the parameters are unchanged), it is optimal to increase the transmission power and to show more tendency for aborting the transmission session in order to avoid additional delay costs. As the penalty cost increases (assuming the rest of the parameters are unchanged), it is optimal to increase the transmission power and to show less tendency for aborting the transmission session in order to reduce the 80 Chapter 4. Optimal Power and Retransmission Control in ARQ probability of facing the penalty cost. As the available transmission slots N — n + 1 increase, it is optimal to use less transmission power in an active transmission session and furthermore show less tendency to abort transmission. This is because in this case the power allocation and abort decisions can be diversified over an increased number of available transmission slots ahead. On the other hand, as the available transmission slots decrease, i.e., as we approach the maximum number of retransmission slots, it is optimal to increase the transmission power (if the transmission is not aborted yet). Interestingly, this analytical result elegantly explains the experimental results in [45] regarding the effectiveness of power ramping in ARQ systems. The following corollary states two useful special cases of the optimal policy outlined in Theorem 6. Corollary 4.3.1 R is optimal to not abort the transmission session, i.e., j*(; •, •) = 0, if either of the following conditions are true. (i) The delay cost is zero D — 0. (ii) The penalty cost is sufficiently large so that there exist some channel-dependent power level yo 6 ^ such that H(y0) + D Cf > r, S Eb. 1 - f(s, y 0) Proof: See Section 4.6.2 for the proof. In the next step, we show the monotonicity of the optimal power and retransmis-sion control in terms of the channel state. First, we introduce the following natural assumption for Markovian fading channels. Assumption 4.3.1 Let qs^ be the channel transition probabilities and Q = [qs,s\LxL be the Markovian channel transition matrix. We assume each row of Q dominates its previous row (and consequently all rows before) in the First Order Stochastic Dominance sense [65], i.e., the channel transition probabilities qsj satisfy L L 9«-M ^ Yl ^ f°r 2 < s < L, 1 < Ac < L. (4.9) s'=/c s=k 81 Chapter 4. Optimal Power and Retransmission Control in ARQ Remark 14 In the context of Markovian fading channels, ( 4 . 9 ) is quite natural. Assum-ing a special case where only adjacent channel transitions are allowed, ( 4 . 9 ) is equivalent to qs-i,s < qs,s + 9 s , s + i i for 2 < s < L. This condition is valid if the probability of no transition is larger than the probability of a transition. This is typically true by the fact that the duration of each channel state must be large enough so that the SNR at each time slot is meaningfully represented by a Markovian state (cf. [49]). The assumption in ( 4 . 9 ) is also confirmed by the numerical results in Section 4 . 5 . The following proposition states the intuitive fact that the optimal cost is monoton-ically decreasing in terms of the channel state (see similar results in [46, p .37]) . Proposition 4.3.3 Suppose Assumption 4-3-1 is satisfied. Let Vn(s) denote the value function in (4-5). Then Vn(s) is decreasing in s. Proof: See Section 4 . 6 . 2 for the proof. Combining the Propositions 4 . 3 . 3 and 4 . 3 . 2 gives the following corollary. Corollary 4.3.2 Vn(s;Cf,D) is decreasing in (s,n,—Cf,—D). At this stage, we present certain sufficient conditions for the optimal power and retransmission control policy to be monotone. First, we introduce the following defini-tion of supermodular (submodular) functions [46] (see Section 4.6.1 for a more general definition and some useful properties of the supermodular functions). Definition 4.3.1 Function f(x,y) is supermodular (submodular) in (x,y) if and only if f(x,y) has increasing (decreasing) differences, i.e., if x\ > xi, then the variation f[x\,y) — f{x2,y) is increasing (decreasing) in y. The following theorem states that under some sufficient conditions the optimal trans-mission power decreases as the channel quality improves. Theorem 7 Suppose Assumption 4-3-1 is satisfied and /-t*(s) :— (yn(s). j ? *(s)) is the optimal policy given by (4-8)- Then, the optimal aborting decision j*(s) is decreasing in s. Furthermore, if the FEP function f(s,y) is supermodular in (s,y) and assuming the transmission session is not aborted, the optimal power yn(s) is decreasing in s. 8 2 Chapter 4. Optimal Power and Retransmission Control in ARQ Proof: See Section 4.6.2 for. the proof. Theorem 6 and Theorem 7 directly give the following corollary regarding the mono-tonicity of the optimal policy in the transmission parameters. Corollary 4.3.3 Suppose Assumption 4-3.1 is satisfied and u^(s) := (Vn(s)i 3n(s)) is die optimal policy given by(4-8). Then, the optimal aborting decision j*(s;Cf,D) is decreasing in (s,—n,Cf,—D). Furthermore, assuming the transmission is not aborted and f(s,y) is supermodular in (s,y), the optimal transmission power y*(s;Cf,D) is increasing in (—s,n,Cf,D). 4.4 Pena l ty Cost A l l o c a t i o n for Frame Transmiss ion Sessions In this section, we consider the problem of transmitting a packet (e.g. IP packet) consist-ing of multiple frames. Assume there are M frames in the packet and the mth data frame is transmitted/retransmitted via the power allocation policy outlined in (4.8) with the delay parameter D and a penalty cost Cf(m) that can be assigned dynamically across the.frames. We assume that if the transmission of any frame in the packet fails, the whole packet is dropped with a penalty cost Cp. Associated with each frame transmis-sion there is a cost G(Cf(m)) which accounts for the power cost and time resources of the system. The objective is to dynamically allocate Cf(m) at the start of each frame transmission session in order to minimize the total expected cost. In Theorem 8, we prove an intuitive fact that as we transmit more frames in a packet, it is optimal to increase the frame drop penalty costs in order to improve the probability of successful frame transmissions. 4.4.1 Formulation of the Penalty Cost Allocation Problem Here, we formulate the parameter allocation problem as a Bellman equation. Assume penalty costs are allocated at the start of each frame transmission session. Define zm := (rn, lm-\) £ 2,; m — {1,2,..., M + 1} to be the system state at the start of the mth frame transmission session, where M is the total number of frames in the current packet, and Im !•{Transmission session is failed/aborted in frame m}i 83 Chapter 4. Optimal Power and Retransmission Control in ARQ where by convention IQ = 0. Here, state ZM+I represents the termination of the packet transmission. We assume the states (-,1) are costless absorbing states representing a situation that the packet has been dropped and the control has been terminated. The cost per stage in state (m, lm-\) 6 Z, m — {1,2,. . . , M + 1}, with the allocated penalty cost Cf is given by m — M + 1 or lm-i — Ij lm = 1, (4.10) otherwise, where G(Cf) denotes the immediate cost imposed on the system in terms of power and interference for a given penalty cost Cf. Here, for analytical tractability reasons, we do not examine the effect of the channel state on the optimal penalty cost allocation. Instead, we focus on determining the optimal penalty cost in terms of the frame number within the transmitting packet. Define p(Cf) to be the probability that a frame transmission session fails under the allocated penalty cost Cf, where the dependence of this probability to the other variables such as channel state is not shown explicitly. By Theorem 6, if Cf increases for a transmission session, more power will be used and the probability of success for that transmission session will be higher. We conclude that G(Cf) is increasing in Cf and p(Cf) is decreasing in Cf. Clearly, if the packet has been already dropped then no control is applied and the optimal remaining cost is zero. Hence, it suffices to find the optimal cost and the optimal policy with lm-i = 0. Define W(m) to be the optimal cost or the value function for an active transmission control in state (m, 0). It is easy to see that W(m) satisfies the following Bellman equation W{m) = mm {G(Cf)+p(Cf)CP+il-p(Cf)W(m+l))}, . (4.11) with W(M + 1) = 0. An optimal assigned penalty cost at the start of the mth frame transmission session is denoted by C*j(m) and given by C*Am) := argmin {G(Cf) + p{Cf)CP + (1 - p(Cf))W(m + 1)} . (4.12) Here, without loss of generality, by the optimal penalty cost we refer to the minimum of h(m,lm-i, Cf) = 84 Chapter 4. Optimal Power and Retransmission Control in ARQ the optimal values in (4.12) given by C}{m) := min { argmin {G(Cf) + p(Cf)CP + (1 - p(Cf))W(m + 1)} }. (4.13) cf 4.4.2 Optimal Monotone Penalty Cost Allocation Policies By appropriately quantizing the penalty cost and the maximum power to finite levels, numerical dynamic programming methods can be used to solve the Bellman equation in (4.11). However, in this section similar to Section 4.3.2, we focus on obtaining special monotonicity properties of the solution in order to gain a qualitative understanding of the optimal policy and to reduce the computational complexity of the underlying numerical dynamic programming methods. The following theorem states the main result of this section regarding the monotonic-ity of the optimal parameter allocation policy. Theorem 8 The optimal allocated penalty cost Cf(m) given in (4-13) is increasing in m. Proof: See Section 4.6.2 for the proof. j Remark 15 Theorem 8 can be intuitively explained as follows. As we successfully transmit more frames in a packet and hence invest more resources in the packet, there is more incentive for a successful transmission of the packet by utilizing more power for the subsequent frames. Following Theorem 6, this can be done by monotonically increasing the frame penalty cost for these frames. In other words, as we reach the final frames in the packet, a successful transmission of the packet becomes more relevant. Hence, the allocated resources for the frame transmission must be increased accordingly by increasing the frame drop penalty cost Cf. Also, it is worthwhile to note that the same monotonicity properties with similar proofs can be established for assigning other transmission parameters such as maximum number of retransmission slots or maximum power. For instance, let N(m) denote the maximum number of retransmission slots for frame m and assume that the system incurs a cost of K(N(m)) by allocating N(m) to a frame transmission session. Here, A (^-) can be any increasing function. Then, with similar arguments it can be shown that optimally, N(m) is increasing in frame number m to improve the likelihood of a successful transmission and thus to reduce the optimal cost. 85 Chapter 4. Optimal Power and Retransmission Control in ARQ 4.5 N u m e r i c a l Resul ts In this section, a time-slotted transmission system with the considered adaptive ARQ scheme over a fading channel is simulated. Table 4.1 summarizes the simulation pa-rameters. Rayleigh fading channel gains normalized to unit received SNR are simulated and quantized into a Markovian process. Each sample gain represents the normalized received SNR for a A T = 1 (ms) frame. In view of this, the quantized levels must be selected carefully such that an appropriate value for the average duration of each Markov state is obtained. The average duration of a state should not be much smaller than a frame duration A T otherwise one or more state transitions are expected to occur in a single frame. On the other hand, the average duration of a state cannot be chosen much larger than A T since the quantized channel may then mask channel variations over con-secutive frames and represent in effect a slower fading channel. We have observed that the average duration of 1.6AT for each channel state is appropriate. We then use similar procedures as in [49] to obtain 24 SNR quantization levels as given by Table 4.2. The channel transition matrix is estimated by counting the transitions within the quantized SNR (channel state) sequence. A portion of this transition matrix is given in Table 4.3. Next, by repeated simulations, a look-up table for the FEPs with Quadrature Phase-Shift Keying (QPSK) modulation in terms of the channel state and the transmission power is constructed. The number of symbols per frame is assumed to be 25. We assume a power cost function H(y) = y2-5 so that power will be priced with an increasing rate in consumption. Also, power is quantized uniformly to 201 levels from 0 to ymax = 20 with steps of 0.1. The optimal power and retransmission control policies are then obtained numerically by backward dynamic programming [4]. First, we consider a scenario that the penalty cost and the average power cost are substantially larger than the delay cost. In particular, let Cf = 10000 and D = 1. We observe that in this case, the persistent transmission is almost always optimal. Fig-ure 4.1(a) and 4.1(b) show the monotonicity of the optimal power allocation in this case. These simulation results confirm the analytical results of Theorem 6. Also, the mono-tonicity of the optimal power allocation in the retransmission slot and the channel state is shown in Figures 4.1(b) and 4.2(a)). As proved in Theorem 6 the optimal transmission power increases as the available retransmission slots decrease (n increases). Now, we consider a case where optimal aborting is more relevant. In particular, for large delay costs and in poor channel qualities we observe that it might be optimal to quit the transmission session. Figure 4.3 shows the optimal aborting for Cf = 10000, 86 Chapter 4. Optimal Power and Retransmission Control in ARQ s = 2, and D = 500. We see that j£(s) as defined in (4.8) is monotone in n and the optimal aborting occurs for n > 5. Observe that as the available transmission slots decreases, there is more tendency to quit the transmission session since there is less chance to achieve a successful transmission due to the small horizon ahead. Next, we evaluate the performance of the optimal power and retransmission control policy by comparing it to heuristic retransmission policies with no power control. We evaluate the average costs associated with retransmission policies with fixed power levels by repeated simulations over the fading channel. Figures 4.4 and 4.5 compare the optimal costs with the cost of fixed power policies for the power levels {2,4,6,... , 20}, the penalty cost Cf = 10000, and the delay costs D = 1 and D — 500, respectively. These simulation results show the improvement in the cost obtained by applying the optimal power and retransmission control. Finally, Figures 4.6(a) and 4.6(b) show two temporal sample paths of the optimal power consumption for D — 1 and D — 500. Note that for relatively large delay costs, the power cost becomes less significant compared to the delay cost and hence the effective power control becomes less relevant. This can be seen by comparing Figures 4.6(a) and 4.6(b), where the optimal power shows less variations for the larger delay cost D = 500.. 4.6 Background and A n a l y t i c a l Proofs In this section we present the analytical proofs for the main results of this chapter along with some background on supermodular functions. 4.6.1 Background: Supermodular Functions and General Monotonicity Results The main monotonicity results are best understood under the concept of Supermodularity and Supermodular functions. Here, a brief background on supermodular functions and some relevant monotonicity results are presented. First, a definition of supermodular functions in the 3ln space is given as follows [63]. Definition 4.6.1 Let Xj C 31. A function f(xi,X2,---,xn) : F J ^ = 1 Xi 31 is called supermodular in IliLi Xi (or in (xi,..., xn)) if for any x, y € EKLi X;, f(xVy) + f(xAy)<f(x) + f(y), 87 Chapter 4. Optimal Power and Retransmission Control in ARQ where V (A) is the element-by-element max (min) operator, i.e., (x1,x2, ...,xn)\J (yi, 2/2, • • •, yn) •= (max(.xi, yi), max(.T2, 2/2), • • •, max(.xn, yn)). Also, note that if the inequality in the above definition reverses, f is called submodular. The following proposition states more intuitive characterization of supermodular functions along with some results regarding optimization of supermodular functions. In particular, part (i) of the proposition establishes the equivalency of Definitions 4.3.1 and 4.6.1. More general versions of these results can be found in [63], however, with substantially more technicalities. Proposition 4.6.1 Let f(x\,x2) : X i x X2 —»• 01, X.; C 01. The following statements are true. (i) Monotonicity of Variations - Function f is supermodular (submodular) in X i x X2 if and only if f(x\,x2) has increasing (decreasing) differences, i.e.. if x2 > x2, then the variation f{x\,x2) — f(xi,x2) is increasing (decreasing) in x\. Note that the roles of x\ and x2 can be interchanged without loss of generality. (ii) Monotonicity of the Optimal Action - Assume function f is supermodular in X i x X 2 . Let y{x\) be the largest value of x2 for which the maximum of f(x\,x2) occurs for a given x\, i.e., y(x\) — max f(x\,x2). Then y(x\) is increasing in x\. X2GX2 Proof: Refer to [63, pp.11-13], for definition of lattices, sublattices, and chains. Here, by Example 2.2.5 in [63], Xj C 01 are chains and sublattices of 01. Part (i) then follows by Theorems 2.6.1 and Corollary 2.6.1 in [63]. Part (ii) follows from Theorem 2.8.2 in [63]. 4.6.2 Analytical Proofs Here, we present the proofs for the theorems and propositions presented in this chapter. Proof of Proposition 4.3.1: Using standard approaches in [46] or [4], the value func-tion J*(s,i,j) satisfies the following Bellman equation ^ + H{y) +f(s,y)Jn+1(s,l,0) + Delay Cost P o w e r C o s t Event: Frame E r r o r J*(s, 1,0) = min (i/j)eu 88 Chapter 4. Optimal Power and Retransmission Control in ARQ [ i - / (* ,y) ]Jn+i(s ,o ,o) l { i = 0 } + Cfl{]=1} }, (4.14) Event: Successful TransmissionJ Frame Drop Penalty Cost, with the terminal condition Jjv+i(s, 1,0) = Cf. Here, Jn(s, i,j) is the cost-to-go-function defined as J(s,n,i,j) :— Y^s°s,sJn(^.h.i)- Clearly, Jn(s,0,j) = 0 hence (4.5) and (4.14) give Vn(s) = min l[D + H(y) + f(s,y)Vn+1(s)]l{j=0} + [Cf + H(y)}l{j=1}\ (4.15) Now, if the optimal decision is to transmit (j = 0), then minimizations in (4.6) and (4.15) are equivalent. If the optimal decision is to abort transmission (j — 1), then since H(y) is increasing with H(0) — 0 the optimal power allocation in (4.15) is y — 0 so H(y) — 0 which gives the equation in (4.6). This completes the proof for (4.6). Eq. (4.8) will follow immediately. • Proof of Proposition 4.3.2: First, note that Vn(s) < Cf since in any time slot we can abort the transmission with the total cost of Cf. Also, by (4.6) it is clear that Vn(s;Cf, D) is increasing in (Cf,D). Now suppose n > n. Starting at time n, we can at least achieve the optimal cost Vn by replicating its optimal policy for [ri, N] in time interval [n, n + TV — n] and possibly aborting transmission at time n + N — n. Hence, Vn(s) < Vn(s) and the proof is complete. • Proof of Theorem 6: Part (i): Define Qn(s, y, Cf,D) := H(y)+D+f(s, y)Vn+1(s; Cf,D) for n = {1, 2, . . . , N} to be the minimum expected cost if we choose to transmit a frame with power y. Suppose j*(s;Cf,D) = 0 for some (n,Cf,D). Equivalently, from (4.6) there exists yo £ ^ such that Qn(s, yQ, Cf, D) < Cf. Now, if D < D then clearly, Vn+l(s; Cf, D) < Vn+l{s; Cf, D) and hence Qn(s,yo,Cf,D) < Qn(s,yo,Cf,D) < Cf or j*(s;Cf,D) = 0. Therefore, j*(s;Cf, D) is increasing in D. If n < n then by Proposition 4.3.2, Vn+i(s; Cf, D) < Vn+l(s;Cf,D), hence QA(s,y0,Cf,D) < Qn(s,y0,Cf,D) < Cf or ft(s;Cf,D) = 0 which implies j*(s;Cf,D) is decreasing in n. Finally, we argue that if Cf > Cf, then 0 < Vn(s; Cf, D) — Vn(s; Cf, D) < Cf - Cf. This is because in computing Vn(s; Cf, D) via the expectation over all the stochastic paths, the penalty cost appears explicitly only for the paths with unsuccessful transmissions. Hence we can write Vn(s;Cf,D) = F(-; Cf)+P (transmission failure; Cf) Cf, where the term P(-; Cf) is the probability mea-sure conditional to the penalty cost Cf and F(-) is a function of the applied actions with 89 Chapter 4. Optimal Power and Retransmission Control in ARQ an implicit dependence on Cf. Now by incrementing the penalty cost to Cf and repli-cating exactly the same actions as with Cf for any stochastic path (non-optimal for Cf), we have Vn(s;Cf,D) < F(-\Cf) + P (transmission failure; Cf)Cf hence Vn(s;Cf,D) -Vn(s;Cf, D) < P (transmission failure; Cf)(Cf - Cf) < Cf — Cf, which completes our ar-gument. It then follows by (4.5) that 0 < Vn+1(s; Cf, D) - Vn+1(s; Cf, D)<Cf- Cf and by direct substitution Qn(s, y0, Cf, D) — Cf < Qn(s, y0, Cf, D) - Cf < 0 and therefore, j*(s; Cf, D) = 0 which implies jn(s; Cf, D) is decreasing in Cf. Part (ii): Assume transmission session is not aborted so j^(s;Cf,D) — 0 and the optimal power is y*(s;Cf.D) = argmin {Qn(s,y.Cf,D)}. By Proposition 4.6.1, it yell suffices to show that Qn(s,y,Cf,D) is submodular in the pairs (y,Cf), (y,D), and (y,n). Assume y > y, and define A(n, Cf,D) := Qn(s,y,Cf,D) - Qn(s,y,Cf,D) = H(y) - H(y) + (f(s,y) - f(s,y))Vn+1(s;Cf,D). Since f(s,y) - f(s,y) < 0 then clearly A(n, Cf,D) is decreasing in Cf, D, and n. Therefore, Qn(s,y,Cf,D) is submodular in the pairs (y,Cf) and (y,D), and (y,n). Thus, the proof is complete. • Proof of Corollary 4.3.1: Part (i): Simply note that at any time and in any state we can do no worse than aborting the session by using the remaining transmission slots with possibly zero transmission power and no additional delay costs. Part (ii): By Proposition 4.3.2, Vn(s) < Cf which implies Vn+i(s) < Cf. Then, Qn(s,y0,Cf,D) := H(y0) + D + f(s,y0)Vn+1(s:Cf,D) < H(y0) + D + f(s,y0)Cf. By the condition in the corollary, Qn(s,yo,Cf, D) < Cf and thus j*(s;Cf,D) — 0 for any (s,n,Cf,D). M Proof of Proposition 4.3.3: The proof is done by induction. The claim is clearly true for n — 0 and V$(s) = Cf. Now suppose the claim is true for n — 1. We show that the claim is also true for n as follows. By definition V^+i(s) = E {V^_i(s)}, ir(s) where E denotes the expectation operator under the probability distribution 7r(s) := 7T(.S) [<?s,i, f?s,2) • • • ,1S,L] (row s of the channel transition matrix Q). By induction assumption Vn-i(s) is monotonically decreasing in s hence, by the first order stochastic dominance property in Assumption 4.3.1, we have E {Vn-i(s)} > E {V„_i(i)} for s > s [10, vol.2]. 7r(s) . 7r(s) Hence, Vn+i(s) < Vn+\(s). Substituting in (4.6) gives Vn(s) < Vn(s) which completes the induction. • Proof of Theorem 7: Suppose j r*( s) — 0 for some (s,n), then from (4.6) there exists yo € y such that Qn(s,.yo) :— H(yo) + D + f(s,yo)Vn+i(s) < Cf. Assume s > s then by Proposition 4.3.3, Vn+i(s) < Vn+i(s). Since / is decreasing in s, this gives Qn(s,yo) < 90 Chapter 4. Optimal Power and Retransmission Control in ARQ Qn(s,yo) < Cf. This implies jn(s) = 0 and therefore, jn(s) is decreasing in s. For the next part, observe that the optimal allocated power for an active transmission is given by yn(s) — argmin {Qn(s,y)}. By Proposition 4.6.1, it suffices.to show that Qn(s,y) yeu is supermodular in (y,s). Assume y > y, so /(s,y) — f(s,y) < 0 is increasing in s by supermodularity of / . Also, by Proposition 4.3.3, Vn+i(s) is decreasing in s. Therefore, the variation H(y) — H(y) + [f(s, y) — f(s, y)}Vn+i(s) is increasing in s. Supermodularity of Qn{s,y) then follows. Hence, y,*(s) is decreasing in s and the proof is complete. • Proof of Theorem 8: It suffices to show that Q(m, Cf) := G(C}) + p{Cf)CP + (1 -p(Cf))W(m + 1) is submodular in (m,Cf). Suppose m > m then clearly W(rh + 1) < W(m + 1). Since p(Cf) is decreasing in Cf, the variation Q(rh,Cf) — Q(m,Cf) = (1— p(Cf))[W (rh+1) — W(rn+1)] is decreasing in Cf. Therefore, Q(m, Cf) is submodular in (m, C/) and by Proposition 4.6.1, Cj(m) is increasing in m. • 4.7 Conclus ions In this chapter, we have formulated a general adaptive ARQ scheme as a finite hori-zon MDP that achieves an optimal trade-off between transmission delay and packet drop penalty costs. We have shown that optimal power and retransmission control policies have monotone structures under certain sufficient conditions. Furthermore, we have shown monotonicity structures for optimal penalty cost allocation across the frame transmission sessions. The underlying monotone structures provide a qualitative un-derstanding of the optimal power and retransmission control schemes and facilitate the real-time implementation of the optimal policy with minimal complexities. 91 Chapter 4. Optimal Power and Retransmission Control in ARQ Table 4.1: Simulation Parameters Modulation : QPSK Fast fading model: Rayleigh flat-fading Doppler Shift fm: 33.33 Hz Quantized channel: 24 State Markov chain Duration of a time slot AT: 1 ms Symbols per frame: 25 Maximum Normalized Power ymax: 20 Power quantization: Uniform with 201 levels Power cost function H(y): y 2 . 5 Maximum retransmissions N: 8 Channel estimation: Perfect Simulation duration: 50000 A T (50 sec) 92 Chapter 4. Optimal Power and Retransmission Control in ARQ <3i CD T 3 o a fl rt cS rt O fl 1 I-I SH d cp fl CD > JO fl .2 cd N fl CO CN *^ r2 E3 Oi CO o CN CN st> CN CO' CN i—i oo rH -tf CO I -o rH Oi CN co rH o CU i—i LO oi 00 00 00 <U o Oi CN LO co r-o rH co LO "* CD <U o oo 00 CN lO CU O o CO CO rH I 1 <CC> O CN t -O CO o rH 00 rH O CN OS O <U O LO CO CO •* Oi CM <U oi 00 r-LO CO o CN <U oi OO CO CN CN CN CN cu 00 LO Oi co r-CN CU i ^ CN LO Oi O CO CN <CE> CO rH CO Oi Oi CU LO CN co 00 co CU LO CO CN rH r- b -cu o co co CD rH H CU - t f rH CO o> LO CU CO CN os •tf o CU CO LO o CO co co cu CN 00 CD O CD fl d rt, cj > c3 CD rfl o •ei~ & CO CD rO cc3 rQ o OH fl o fl ccj |H fl .2 HH >H O OH CO rH rH rH II .0004 o o O o o o o o o rH II .0015 ,0256 o o o o o © o o Oi II 6000 0260 2225 "*-> o o o o o o o o oo .0006 .0321 .2276 ,4623 •<-> o o o o o o o o II .0004 ,0348 .2366 ,4630 ,2546 o o o o o o o o CO II .0018 .0348 .2475 .4626 .2507 .0329 o o o o o o o o LO II .0012 .0397 .2587 .4578 .2342 .0302 .0016 o o o o o o o o II ,0035 .0433 ,2656 .4638 ,2278 ,0320 6000 • » > o o o o o o o o co II .0865 ,3117 .4824 ,2159 ,0313 ,0016 " » i o o o o o o o o CN II ,4526 ,4868 .1930 ,0259 ,0002 '5-1 o o o o o o o o rH II .4573 ,1571 .0176 ,0004 o o o o o o o o rH CN co "tf LO co I'- oo II II II II II II ll II. 93 Chapter 4. Optimal Power and Retransmission Control in ARQ 20 18 16 14 t 12 3 co •I 8 ° 6 4 2 0 Optimal Power vs. Penalty Cost . - n=1 • - • n=7 — n=8 0.5 1 1.5 Penalty Cost Cf x 10 (a) Optimal power vs. penalty cost for D = 1 and s = 5. Optimal Power 20 15 ^i, 10 « c Retransmission slot n o o Penalty Cost C f (b) Optimal power vs. penalty cost and retransmission slots for a = 5. Figure 4.1: Monotonicity structures of the optimal power allocation. 94 Chapter 4. Optimal Power and Retransmission Control in ARQ Optimal Power vs. Retransmission Slots (a) Opt imal Power vs. Retransmission Slots. Optimal Power vs. Channel State (b) Opt imal Power vs. Channel State. Figure 4.2: Optimal power allocation vs. channel states and retransmission slots with Cf = 10000 and D = 1. 95 Chapter 4. Optimal Power and Retransmission Control in ARQ 1 0.9 0.8 0.7 0.6 ,0.5 0.4 0.3 0.2 0.1 * Optimal Aborting Decision Abrt jting Trar I ismission " * * | B 1 i a * * * * * * * * * t : « :"* \ « : it •9 Figure 4.3: Optimal aborting decisions vs. retransmission slots with D = 500, Cf 1000, and s = 2. 4500 4000 3500 3000 ^ 2500 <n o ° 2000 1500 1000 Optimal _ _ Cost 500 0 Optimal Cost vs. Fixed Power Costs D = 1 ,C ( = 10000 Optimal cost = 701.145 2 4 6 10 12 14 16 18 20 Fixed Power Levels Figure 4.4: Comparison of the optimal policy to fixed power policies with Cf = 10000 and D = 1. 96 Chapter 4. Optimal Power and Retransmission Control in ARQ 7000 6000 1 5000 Optimal Cost vs. Fixed Power Costs 4000 [ 10 o O 3000 2000 Optimal Cost -tooof-2 4 6 8 10 12 14 16 18 20 Fixed Power Levels Figure 4.5: Comparison of the optimal policy to fixed power policies with Cf = 10000 and D = 500. 97 Chapter 4. Optimal Power and Retransmission Control in ARQ Optimal Power Consumption Pattern Time (a) A sample path of the optimal power consumption with Cf = 10000 and D = 1. Optimal Power Consumption vs. time Time (b) A sample path of the optimal power consumption with C} = 10000 and D = 500. Figure 4.6: Patterns of the optimal power consumption. 98 Chapter 5 Conclusions and Future Work 5.1 T h e m e o f T h e s i s In this thesis, we have demonstrated the effectiveness of the MDP framework in solving several resource allocation problems. More specifically, we have considered three prob-lems under the MDP framework regarding the optimal resource allocation in time-slotted wireless networks. As a common theme to each of these problems, we have established meaningful trade-offs between the involved system resources and the relevant perfor-mance measures through the underlying MDP formulations. Furthermore, in each case, we have shown that the optimal and feasible resource allocation strategies exhibit certain special structure. The importance of these structural results can be stated as follows: • These structural results often describe characteristics of the optimal resource allo-cation solutions for a large class of MDP parameters. Generally, due to existing complexities, the MDP parameters cannot be known exactly. Hence, the structural results are very helpful to gain qualitative understanding of the optimal solutions without perfect knowledge of the MDP parameters. • As the number of system states and the number of actions grow, the numerical dynamic programming methods used for solving MDPs become more computa-tionally complex. These structural results can be utilized to reduce the complexity of the numerical dynamic programming methods. The usefulness of the proposed structural results has been highlighted throughout this thesis. Indeed, the main analytical efforts of this thesis were to explore and to prove these structural results. One example of such efforts is presented in Appendix B where a detailed proof of certain monotonicity results in transmission rate control is presented. In general, we paid an special attention to the monotonicity and threshold-based results. We have generally observed that powerful mathematical tools such as supermodularity methods may be needed to prove the main monotonicity results presented in this thesis. 99 Chapter 5. Conclusions and Future Work These tools were employed cleverly to obtain insightful results that facilitate developing novel and computationally efficient resource allocation schemes in wireless networks. 5.2 S u m m a r y of W o r k Accompl i shed We have considered three specific problems in stochastic resource allocation for time-slotted wireless systems. In each problem we have proposed and proved some novel results that are summarized in the following. Furthermore, Figure 5.1 illustrates the different layers of the work accomplished in this thesis. Streaming I'rnhli-m SRAType: Schednliiii* & M C M Selection Formulation: Stability & Robustness Constraints Customizing M L W W F vi:i Stochastic Approximation Efficient Opportunistic SKA Policies Stochastic Resource Allocation (SRA) fl Rate Control PiuhUin SRA l\pc: MCS Sclccti.ni Formulation: Delay and Switching Cost Trade-Off Nil merical Solutions via Dynamic Programming Power and Retransmission Control Problem i SRAType: Power & Time-Slot Allocation. Formulation: Power, Delay & Drop Cost Trade-Off Explore & Provc\ (iZ) [ Monotonicity j (ZZ) Structures- / Efficient Threshold-Based „ SRA Policies / <Z\ Numerical Solutions \i:i Dwiamic Programming Efficient .Monotone SKA Policies Figure 5.1: Illustration of the work accomplished. • Streaming Problem: In the first problem, we considered optimal scheduling strategies for streaming users in HSDPA. We have presented a discrete event model 100 Chapter 5. Conclusions and Future Work for streaming users in HSDPA and have formulated meaningful QoS constraints for smooth streaming. We have then presented a joint opportunistic user-scheduling and M C M selection policy that guarantees the given QoS constraints (if feasible). The main idea was to start from the stable M-LWWF algorithm and reach a feasible solution. It was shown that the proposed resource allocation algorithm effectively enables smooth play-out for HSDPA users with relatively low computational com-plexity due to its opportunistic structure. • Transmission Rate Control Problem: In this problem, optimal adaptive mod-ulation and coding strategies with switching costs were investigated. The optimal trade-off between latency and switching costs was formulated as an infinite horizon discounted cost MDP. It was rigorously shown that under certain sufficient condi-tions, optimal MCS selection policies have monotone (threshold-based) structures. These sufficient conditions are characterized by the submodularity of the switching cost and the delay cost functions and the first order stochastic dominance condi-tions on the channel transition probabilities. The underlying monotone structures provides qualitative understanding of the optimal MCS allocation methods and enables the real-time implementation of the optimal policy with reduced compu-tational complexity. • Power and Retransmission Control Problem: In the third problem, an opti-mal trade-off between power, transmission delay and packet drop penalty costs in a ARQ system has been formulated as a finite horizon MDP. We have shown that under certain sufficient conditions, optimal power and retransmission control poli-cies have monotone structures. This way, we have provided rigorous and elegant analytical explanations for the previous experimental results on the effectiveness of power ramping in ARQ systems. Additionally, the monotonicity structures for optimal penalty cost allocation across the frame transmission sessions has been demonstrated. 5.3 Future W o r k Here, we present some directions and recommendations for future research. The main idea is to tackle more complex systems through structural results under generalized frameworks and/or developing effective suboptimal solutions. We outline two possible ideas for the extension of this research as follows: 101 Chapter 5. Conclusions and Future Work • Structural Results with Partially Observable States: One of the main limitations of the MDP formulations in this thesis is that we assume perfect knowledge of the involved system states. However, due to practical limitations, the system states may be only partially known, fn this case, the proper formulation of the resource allocation problem falls under the class of Partially Observable Markov Processes (POMDPs), where the information states are some conditional distri-butions (Baysian estimates). A natural extension of the works presented in this thesis is to explore the underlying structural results under the POMDP framework. The results in stochastic ordering of the information states may be used to explore the monotone properties of the optimal resource allocation policies with partially observed states. A word of caution is that POMDP structural results are often rare and very difficult to prove. To put these ideas in more perspective, in Appendix A, we have presented our work in optimal on/off control of a dedicated platform. A careful and detailed methodology for solving POMDP by reducing them to regular MDPs under new information states has been presented. Furthermore, optimality of the threshold-based on/off control policies is demonstrated (structural results). • Reinforcement Learning Algorithms: In realistic and complex environments dy-namic programming methods greatly suffer from the curse of dimensionality. Fur-thermore, in these complex environments acquiring the exact knowledge of the underlying system parameters is often impractical. The reinforcement learning methods and decentralized policy improvement algorithms may be good subopti-mal alternatives-to dynamic programming methods. This research can be extended by exploring structural results in these reinforcement learning methodologies and obtaining efficient and possibly decentralized resource allocation schemes. 102 Bibliography [1] G. TR25.848, "Physical Layer Aspects of UTRA HSDPA," http://www.3gpp.org, 2005. [2] H. Holma and A. Toskala, WCDMA for UMTS : Radio Access for Third Generation Mobile Communications, 2nd ed. J. Wiley, 2004. [3] P. J. A. Gutierrez, "Packet Scheduling and Quality of. Service in HSDPA," in Ph. D. Thesis, Aalborg University, October 2003. [4] D. P. Bertsekas, Dynamic Programming and Optimal Control, 2nd ed. Athena Scientific, 2000. [5] M . L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Pro-gramming, 1st ed. Wiley-Interscience, 1994. [6] E. Altman, Constrained Markov Decision Processes (Stochastic Modeling), 1st ed. Chapman and Hall/CRC, 1999. [7] D. Djonin and V. Krishnamurthy, "Q-Learning Algorithms for Constrained Markov Decision Processes with Randomized Monotone Policies: Applications in Transmis-sion Control," IEEE Transactions Signal Processing, vol. 55, pp. 2170-2181, Oct. 2007. [8] E. Altman, Discrete-Event Control of Stochastic Networks: Multimodularity and Regularity, 1st ed. Springer-Verlag, 2004. [9] A. Lovejoy, "A Survey of Algorithmic Methods for Partially Observed Markov De-cision Processes," Annals of Operations Research, vol. 28, pp. 47-66, 1991. 10] M . J. Sobel and D. P. Heyman, Stochastic Models in Operations Research, 1st ed. Dover Publications, 2003. 103 Bibliography [11] X . Liu, E. K. P. Chong, and N. B. Shroff, "Opportunistic Transmission Scheduling with Resource-Sharing Constraint in Wireless Networks," IEEE Journal on Selected Areas in Communications, vol. 19, pp. 2053-2064, October 2001. [12] M . Andrews, K. Kumaran, K. Ramanan, S. Stolyar, R. Vijayakumar, and P. Whit-ing, "CDMA Data QoS Scheduling on the Forward Link with Variable Channel Conditions," Bell Labs Technical Memorandum, 2000. [13] A. Farrokh and V. Krishnamurthy, "Opportunistic Scheduling for Streaming Multi-media Users in High Speed Downlink Packet Access (HSDPA)," IEEE Transactions on Multimedia, vol. 8, pp. 844-855, Aug. 2006. [14] , "Optimal threshold policies for operation of a dedicated-platform with im-perfect state information - a pomdp framework," in Symbolic and Quantitative Ap-proaches to Reasoning with Uncertainty, ser. Lecture Notes in Computer Science, vol. 3571. Springer Berlin, 2005. [15] P. Bender, P. Black, M . Grob, R. Padovani, N. Sindhushayana, and A. Viterbi, "CDMA/HDR: A Bandwidth-Efficient High-Speed Wireless Data Service for No-madic Users," IEEE Communications Magazine, vol. 38, pp. 70-77, July 2000. [16] S. Nanda, K. Balachandran, and S. Kumar., "Adaptation Techniques in Wireless Packet Data Services ," IEEE Communications Magazine, vol. 38, pp. 54-64, Jan-uary 2000. [17] S. Parkvall, E. Englund, P. Malm, T. Hedberg, M . Persson, and J. Peisa, "WCDMA Evolved - High-Speed Packet-Data Services," Ericsson Review, pp. 56 - 65, 2004. [18] T. E. Kolding. K. I. Pedersen, J. Wigard, F. Frederiksen, and P. E. Mogensen. "High Speed Downlink Packet Access: WCDMA Evolution," IEEE Vehicular Technology Society News, pp. 4 - 10, February 2003. [19] I. Poole, "What Exactly is... HSDPA?" IEE Communications Engineer, vol. 2, pp. 1789-1801, June-July 2004. [20] T. E. Kolding, F. Frederiksen, and P. E. Mogensen, "Performance Aspects of WCDMA Systems with High Speed Downlink Packet Access (HSDPA)," in Pro-ceedings of IEEE 56th Vehicular Technology Conference, vol. 1, Vancouver, BC, Canada, 2002, pp. 477-81. 104 Bibliography [21] F. Fredriksen and T. E. Kolding, "Performance and Modeling of WCDMA/HSDPA Transmission/H-ARQ Schemes," in Procedings of 56th IEEE Vehicular Technology Conference VTC 2002-Fall, vol. 1, Vancouver, BC, Canada, September 2002, pp. 472-476. [22] T. J. Moulsley, "Performance of UMTS high speed downlink packet access for data streaming applications," in G Mobile Communication Technologies, 2002. Third International Conference on, no. 489, May 2002, pp. 302-307. [23] R. Love, A. Ghosh, R. Nikides, L. Jalloul, M . Cudak, and B. Classon, " High speed downlink packet access performance," in Vehicular Technology Conference, 2001. VTC 2001 Spring. IEEE VTS 53rd, vol. 3, May 2001, pp. 2234-2238. [24] W. S. Jeon, D. G. Jeong, and B. Kim, "Packet Scheduler for Mobile Internet Ser-vices using High Speed Downlink Packet Access," IEEE Transactions on Wireless Communications, vol. 3, pp. 1789-1801, September 2004. [25] T. E. Kolding, "Link and system performance aspects of proportional fair scheduling in WCDMA/HSDPA," in IEEE 58th Vehicular Technology Conference Proceedings, vol. 1, 2003, pp. 1717-1722. [26] Y . Ofuji, A. Morimoto, S. Abeta, and M . Sawahashi, " Comparison of packet scheduling algorithms focusing on user throughput in high speed downlink packet access," in Personal, Indoor and Mobile Radio Communications, 2002. The 13th IEEE International Symposium on, vol. 3, Sept. 2002, pp. 1462-1466. [27] A. Farrokh, F. Blomer, and V. Krishnamurthy, "A comparison of opportunistic scheduling algorithms for streaming media in high-speed downlink packet access (hsdpa)," in Interactive Multimedia and Next Generation Networks: Second Inter-national Workshop on Multimedia Interactive Protocols and Systems, MIPS 2004, . Grenoble, France, November 16-19, 2004- Proceedings, ser. Lecture Notes in Com-puter Science, vol. 3311. Springer, 2004. [28] A. Farrokh and V. Krishnamurthy, "Opportunistic scheduling for streaming users in high-speed downlink packet access (HSDPA)," in IEEE Globecom 2004 Conference, vol. 6, Dallas, Texas, 29 Nov.-3 Dec. 2004, pp. 4043 - 4047. 105 Bibliography [29] M . Andrews, K. Kumaran, K. Ramanan, A. Stolyar, and P. Whiting, "Providing Qulaity of Service over a Shared Wireless Link," IEEE Communications Magazine, pp. 150-154, February 2001. [30] F. Berggren and R. Janitti, "Asymptotically Fair Scheduling on Fading Channels," in Proceedings of IEEE 56th Vehicular Technology Conference , vol. 4, Vancouver, BC, Canada, 2002, pp. 1934-8. [31] S. Lu, V. Bharghavan, and R. Srikant, "Fair Scheduling in Wireless Packet Net-works," IEEE/ACM Transactions on Networking, vol. 7, pp. 473-489, Aug 1999. [32] W. S. Jeon and D. G. Jeong, "Comparison of Time Slot Allocation Strategies for C D M A / T D D Systems," Selected Areas in Communications, IEEE Journal on, vol. 18, pp. 1271-1278, July 2000. [33] X . Liu, E. K. P. Chong, and N. B. Shroff, "Transmission Scheduling for Efficient Wireless Resource Utilization with Minimum-Utility Guarantees,," in Proceedings of IEEE VTC Fall 2001, Atlantic City, USA, October 2001, pp. 824-828. [34] , "Joint Scheduling and Power-Allocation for Interference Management in Wire-less Networks,," in Proceedings. VTC 2002-Fall. 2002 IEEE 56th, vol. 3, September 2002, pp. 1892-1896. [35] R. S. Tupelly, J. Zhang, and E. K. P. Chong, "Opportunistic Scheduling for Stream-ing Video in Wireless Networks," in 2003 Conference on Information Science and Systems, The Johns Hopkins University, March 2003. [36] F. Berggren, S. L. Kim, R. Jantti, and J. Zander., "Joint power control and intracell scheduling of DS-CDMA nonreal time data," vol. 19, pp. 1860-1870, October 2001. [37] A. J. Goldsmith and S.-G. Chua, "Variable-Rate Variable-Power M Q A M for Fading Channels," IEEE Transactions on Communications, vol. 45, pp. 1218-1230, Oct. 1997. [38] , "Adaptive Coded Modulation for Fading Channels," IEEE Transactions on Communications, vol. 46, pp. 595-601, May 1998. [39] S. Vishwanath and A. J. Goldsmith, "Adaptive Turbo-Coding Modulation for Flat-Fading Channels," IEEE Transactions on Communications, vol. 51, pp. 964-972, June 2003. 106 Bibliography [40] J. Razavilar, K. J. R. Liu. and S. I. Marcus, "Jointly Optimized Bit-Rate/Delay Conrtrol Policy for Wireless Packet Networks with Fading Channels," IEEE Trans-actions on Communications, vol. 50, pp. 484-494, March 2002. [41] L. A. Johnston and V. Krishnamurthy, "Opportunistic File Transfer over, a Fading Channel - A POMDP Search Theory Formulation with Optimal Threshold Policies," IEEE Transactions on Wireless Communications, vol. 5, pp. 394-405, Feb. 2006. [42] G. del Angel and T. Fine, "Optimal Power and Retransmission Control Policies for Random Access Systems ," IEEE/ACM Transactions on Networking, vol. 12, pp. 1156- 1166, Dec. 2004. [43] G. Koole, Z. Liu, and R. Righter, "Optimal Transmission Policies for Noisy Chan-nels," Operations Research, vol. 49, pp. 892-899, 2001. [44] I. Bettesh and S. Shamai, "Optimal Power and Rate Control for Minimal Average Delay: The Single-User Case," IEEE Transactions on Information Theory, vol. 52, pp. 4115-4141, Sept. 2006. [45] S.-H. Hwang, "Effect of the Power Ramping under Retransmission in an ARQ for the WCDMA Downlink in One Path Rayleigh Fading Channel ," IEICE Transactions on Communications, vol. E.89-B, pp. 1024-1026, March 2006. [46] S. M . Ross, Introduction to Stochastic Dynamic Programming, 1st ed. Academic Press, 1983. [47] I. MacPhee and B. Jordan, "Optimal Search for a Moving Target," Probability in the Engineering and Information Sciences, vol. 9, pp. 159-182, 1995. [48] H. S. Wang and N. Moayeri, "Finite-State Markov Channel - A Useful Model for Radio Communication Channels," IEEE Transactions on Vehicular Technology, vol. .44, pp. 163-171, February 1995. [49] Q. Zhang and S. A. Kassam, "Finite-State Markov Model for Rayleigh Fading Chan-nels," IEEE Transactions on Communications, vol. 47, pp. 1688-1692, November 1999. [50] F. Blomer, "Scheduling for Streaming Users in HSDPA," in M.Sc Thesis, Munich University of Technology, Germany, Dec 2003. 107 Bibliography [51] G. Grimmet and D. Stirzaker, Probability and Random Processes, 3rd ed. Oxford, 2001. [52] G. C. Pflug, Optimization of Stochastic Models: The Interface Between Simulation and Optimization, 1st ed. Kluwer International Series in Engineering and Computer Science, 1996. [53] H. Kushner and G. Yin, Stochastic Approximation Algorithms and Applications. New York: Springer-Verlag, 1997. [54] T. S. Rappaport, Wireless Communications Principles and Practice, 1st ed. Pren-tice Hall, 1996. [55] Ericsson, Motorola, and Nokia, "Common HSDPA system simulation assump-tions," in 3GPP TSG RAN WG1 Meeting #15, Berlin, Germany, 2000, tSGRl#15(00)1094. [56] D. Goeckel, "Adaptive Coding for Time-Varying Channels using Outdated Fading Estimates," IEEE Transaction on Communications, vol. 47, pp. 844-855, June 1999. [57] A. Goldsmith, Wireless Communications, 1st ed. Cambridge University Press, 2005. [58] Zigbee, "Zigbee Alliance," http://www.Zigbee.org, 2005. [59] M . Y . Kitaev and R. F. Serfozo, " M / M / l Queues with Switching Costs and Hys-teretic Optimal Control," Operations Research, vol. 47, pp. 312-1132, March-April 1999. [60] F. V. Lu and R. F. Serfozo, " M / M / l Queueing Decision Processes with Monotone Hysteretic Optimal Policies," Operations Research, vol. 32, pp. 1116-1132, Sept.-Oct. 1984. [61] A. Gjendemsj0, G. E. 0ien, and H. Holm, "Optimal Power Control for Discrete-Rate Link Adaptation Schemes with Capacity-Approaching Coding," in Proceedings of IEEE Globecom 2005, vol. 6, 2005, pp. 3498-3502. [62] D. P. Bertsekas and R. Gallager, Data Networks, 2nd ed. Prentice-Hall, 1992. [63] D. M . Topkis, Supermodularity and Complementarity, 1st ed. Princeton University Press, 1998. 108 Bibliography [64] G. TR25.214, "Physical Layer Procedure," http://www.3gpp.org, 2005. [65] J. E. Smith and K. F. McCardel, "Structural Properties of Stochastic Dynamic Programs," Operations Research, vol. 50, pp. 796-809, Sept. 2002. [66] R. R. Weber, "Optimal Search for a Randomly Moving Object," Journal of Applied Probability, vol. 23, pp. 708-717, 1986. [67] S. J. Benkoski, M . G. Monticino, and J. R. Weisinger, "A Survey of the Search Theory Literature," Naval Research Logistics, vol. 38, pp. 469-494, 1991. [68] L. A. Johnston and V. Krishnamurthy, "Optimality of Threshold Transmission Poli-cies in Gilbert Elliott Fading Channels," in IEEE International Conference on Com-munications, ICC '03„ vol. 2, May 2003, pp. 1233-1237. [69] D. Zhang and K. M . Wasserman, "Energy efficient Data Communications over Fading Channels," Proceedings of IEEE Wireless Communications and Network-ing Conference, pp. 986-991, 2000. 109 Appendix A Optimal Scheduling of a Dedicated-Platform In this Appendix, we consider the general problem of optimal stochastic control of a dedicated-platform that processes one primary function or task (target-task). The dedicated-platform has two modes of action at each period of time: it can attempt to process the target-task at the given period, or suspend the target-task for later comple-tion. We formulate the optimal trade-off between the processing cost and the latency in completion of the target-task as a Partially Observable Markov Decision Process (POMDP). By reformulating this POMDP as a Markovian search problem, we prove that the optimal control policies are threshold in nature. Threshold policies are compu-tationally efficient and inexpensive to implement in real time systems. Numerical results demonstrate the effectiveness of these threshold based operating algorithms as compared to non-optimal heuristic algorithms. A . l In t roduct ion Many applications in manufacturing, personal telecommunications and defense involve a dedicated-platform that utilizes the system resources in order to process or execute one primary function or task (target-task). Conservation of the system resources and minimization of the latency in completing the target-task are important issues in efficient operation of this dedicated-platform, fn a real-time system, the target-task may become stochastically active and inactive (a task must be active in order to be processed suc-cessfully). The dedicated-platform must then adapt its operation to the dynamic of the target-task: The dedicated-platform attempts to process (utilizes the system resource) only when the target-task is active and hence the task can be completed with a non-zero probability. In this work, we consider the problem of optimally operating a dedicated-platform when the dynamic of the target-task is not directly observed. The only information avail-110 Appendix A. Optimal Scheduling of a Dedicated-Platform able to the controller is whether the task is successfully processed or not. Associated with each processing attempt is a cost that represents limited resources of the system, independent of whether the task is active or inactive. On the other hand, the latency to perform a successful processing also incurs a cost representing the task completion delay. Therefore, there is a strong motivation to devise a novel algorithm to attempt or suspend processing to minimize the average cost up to the completion of the target-task (successful processing). We present a computationally efficient control algorithm that achieves an optimal trade-off between the processing cost and the latency in completing the target task. Main results: The main results of this work are organized as follows: (i) In Section 2, we introduce a stochastic model for the operation of the dedicated-platform. We assume the controller decisions as whether to attempt or suspend the task are made at discrete times and the cost at each discrete time is only dependent on the action taken for that time. The state of the target-task, e.g. whether the target-task is present or not (active or inactive), is assumed to be a two-state Markov chain. Since we assume the task dynamic is not directly observed, the state of the target-task is described by a Hidden Markov model. (ii) In Section 3, we use the model in Section 2 to formulate the dedicated-platform control problem as an optimal search problem with POMDP framework. The optimal solutions to the Markovian search problems are generally complex and computationally intractable. However, there are special structures of the search problems whose opti-mal solutions are shown to be threshold in nature and hence efficiently computable. We show that in our case, the control of the dedicated-platform can be formulated as a search problem with a special structure described in [47], [46] for which the optimal policy is threshold in nature. (iii) In Section 4, we adapt the results of the Markovian search problem in [47] to present the optimal threshold policies for the control of the dedicated-platform. We show that depending on the system parameters, the operating control systems can be categorized into three different classes. The optimal policy for each system has a different threshold level. (iv) In section 5, we use numerical examples to demonstrate the performance improve-ment that can be obtained by applying the optimal threshold policies as compared to heuristic algorithms. Literature Review: Several papers consider the search problem formulation used in this work. Ross [46] first conjectured the existence of threshold policies for this search' 111 Appendix A. Optimal Scheduling of a Dedicated-Platform problem. Weber [66] solves the continuous time version of the problem. Recently, MacPhee and Jordan [47] prove the Ross' conjecture for an overwhelming proportion of transition rules and system parameters. Also, a useful overview of general search problems is given by Benkoski, Monticino and Weisinger [67]. In our work, we mainly rely on [47] to derive the optimal control algorithm for the dedicated-machine. A similar formulation is also used in [68] to find optimal retransmission policies in the Gilbert-Elliot fading channels. A . 2 Sys tem M o d e l In this section a stochastic model is presented to describe the operation dynamic of our dedicated-platform. We outline our model by the following five elements: (i) Time: The time axis is divided into slots of equal duration denoted by A T . By convention, discrete time k, k 6 Z+ is the the time interval [A;AT, (k + 1)AT), where Z+ is the set of non-negative integers. We assume that the attempt or suspend decisions by the controller are made in discrete times k 6 Z+. (ii) Markovian target-task: Assume a dynamic where the target-task is active or inactive based on a two state Markov chain. Note that the task must be active in order to be processed successfully. Define the target-task state space as: S = {Active = 1, Inactive = 2}. (A.l) Also define sk 6 S as: Sfc = State of the target-task at discrete time k. (A.2) We assume sk is a two-state irreducible Markov chain with the transition matrix A, where: r a 1 — a l - h h A = (A.3) Here, a < 1 is the probability that an active task remains in the "Active" state and h < 1 is the probability that an inactive task remains in the "inactive" state, (iii) Actions: At each discrete time k € Z+, the controller makes a decision as whether to attempt to process or suspend the task. Define the action space U as: U — {Attempt to process = At , Suspend processing = Su}. (A.4) 112 Appendix A. Optimal Scheduling of a Dedicated-Platform Also, define the action uk E U = {At, Su}: Uk = Action taken by the controller at time k. (A.5) (iv) Observations: The state of the target-task is not directly observed and hence it is described by a Hidden Markov Model (HMM). At each time, the controller can only observe whether the completion of the target-task is successful or not. For example, if for a given discrete time the target-task is inactive and the controller attempts to process, the controller will observe at the next discrete time that the completion is not successful. Define the observation space Y: Y = {Completion Affirmed = AFF, Completion Not Affirmed = NAF}, (A.6) and define the observation yk EY: yk = Observation by the controller at time A:. (A.7) If the target-task is inactive at time k (i.e., sk = 2) or if there is no attempt to process (i.e., Uk = Su) then yk = NAF. On the other hand, if at time k the target-task is "Active" (sfe — 1) and the attempt is made to process the task (uk — At), then the target-task is successfully completed with probability pa which represents the precision of the dedicated-machine: Pd = Probability of successful processing of an active task. (A.8) We then have: P(y* = NAF\sk = 1 Uk = At) = 1 ~Pd = NAF\sk = 2 Uk = At) = 1 = NAF\sk = 1 Uk = Su) = 1 = NAF\sk = 2, Uk - S u ) = 1 (A.9) (v) Cost: We assume the cost at each discrete time k E Z+ only depends on the 113 Appendix A. Optimal Scheduling of a Dedicated-Platform action uk G U. In particular, associated with each processing (uk = At) a cost c\ incurs (independent of the current state or observation). c\ represents the cost in utilizing the dedicated-platform and the limitations in the system resources. Also, each suspension of processing incurs a cost c<i which represents the cost associated with the latency in completing the target-task. Assume g : U —* {ci, c2} is a function that maps the action space to the corresponding costs. We then have: Cl=g(At), c2 = g(Su). (A.10) A.3 Formulation as a Markovian Search Problem - A POMDP Framework In this section we formulate the dedicated-platform control problem as a special Marko-vian search problem studied in [47], [46]. This search problem is proven to have optimal solutions that are threshold in nature and hence efficiently computable. The Markovian search problem described in [47], [46] is as follows: Markovian search problem : Consider an object that moves randomly between two sites. The movement is modeled with a two-state Markov chain. One of the sites is searched at each discrete times k G Z+ till the object is found. Associated with each search of site i G {1,2} there is a cost Ci and an overlook probability a.i (c<i is the prob-ability that the object is not found while it is in the searched box i). The aim is to find the object with minimum average cost. It is roughly seen that the structure of the above Markovian search problem fits into the framework that we have so far developed for finding an optimal control policy of the dedicated-platform. The movement of the object corresponds to the activation and deactivation of the target-task. Object in site 1 corresponds to an active target-task and object in site 2 corresponds to an inactive target-task. Searching site 1 corresponds to processing the task and searching site 2 corresponds to suspending the task. Finding the object corresponds to the completion of the target-task. Also, the overlook probabilities are: a i = l - P d , " 2 = 1, (A- 1 1 ) where precision pd is defined in (A.8). Note that a2 = 1 since if we suspend processing, almost surely the task will not be completed. At this point, we formulate the optimal control problem as a POMDP and derive 114 Appendix A. Optimal Scheduling of a Dedicated-Platform its dynamic programming equations. We observe that the optimality equation of this POMDP has the exact same structure as a Markovian search problem in [47]. Let Ik be the information available to the controller at discrete time k € Z+. This information consists of observations up to time k and actions up to time k — 1: h = (yi, • • •, yk, u i , U k - i ) , h = yi, (A.12) where yk and Uk are observations and actions defined in (A.7) and (A.S), respectively. Since upon the completion of the target-task, the control is terminated, throughout the following analysis we assume yk — NAF for 0 < k < N, where N is the stopping time denoting the discrete-time that the target-task is completed. At each discrete time k the controller takes an action based on the available information Ik- However, since Ik is growing in dimension, we summarize Ik in quantities denoted by sufficient statistics which are of smaller dimension than Ik and yet embodies all of the essential content of Ik as far as the control is concerned [4]. By checking the conditions in [4], it can be easily shown that a sufficient statistic can be given by the conditional probability distribution PSk\ik of the target-task state Sk, given the information vector Ik- PSk\ik then summarizes all the information available to the controller at time k. We have: PSk\ik = \pk QkY, . (A.13) where: P k = P(sk = l\Ik), qk = P(s f e = 2\Ik) (A.14) where P denotes the probability measure, pk is the probability that the target-task is in the "Active" state at time k based all on the available information (e.g. knowing that the task is not yet completed). Since pk + qk = 1, we can further reduce the dimension of the sufficient statistics by choosing pk as the information state, pk then contains all the information relevant to the platform control at time k. To complete the formulation of the problem, we need to describe the evolution of the information state.. Let <fi be a function that describe the evolution of the information state. We then have: 4>(pk, uk) = Pk+i = P(sfc+i = l|7fc+i), (A.15) where Uk 6 U — {At , Su} is the action at time k. By expanding the R.H.S in (A.15) 115 Appendix A. Optimal Scheduling of a Dedicated-Platform and applying basic probability rules, we have: <j>{pk, At) = P ( s f c + 1 = 11 uk = At , yk+1 = N A F , Ik) = p( sfc+i = 1 . 2/fc+i = NAF | »fc = At , Ik) P(yk+1 = NAF | u f c = At , Ik) = P(l/fc+i = NAF | s f c + 1 = 1, uk = At , J f c )P(s f c + 1 = 1 ] uk = A t , Jfc) P(2/fc+i = NAF | itfe = A t , Ik) 1 ' j Evaluate the R.H.S in (A. 16) by conditioning on sk: .( i f l a(l + (1 - h)(l -pk) 4>{pk,At) = r — r (A. 17) (1 ~Pd)Pk + (1 -Vk) where a and h are the target-task transition probabilities defined in (A.3). Similar calculations give the expression for the updated state if we suspend processing: <f>(pk, Su) = P(sk+l = 11 uk = Su, yk+1 = N A F , Ik) (A.18) = (a + h- l)pk + l - h Equations (A.17) and (A.18) collectively describe the evolution of the information state Ph- Now, we formulate the optimality equation and show it has the same structure as the search problem in [47]. Let u = {ui, U2, • •.} be a sequence of the actions uk 6 U taken by the controller at discrete times k 6 Z+. Define U ( n ) = {un, u n + i , . . . } . Let V(pi;u) to be the average cost of completing the target-task using the policy u with initial information state p\. Clearly, V(-,u) satisfies: V(p!, u) = g(Ul) + V(<f>(pk, uky, u ( 2 ) )P(y 2 = N A F | U l , h) (A.19) where g(-), defined in (A.10), is the cost associated with each action and 4>(pk,Uk) is given by (A.17) and (A.18). The term P(y2 — NAF | u\, J\) is needed since it is the probability that the completion of the task is not affirmed and controlling decisions will still continue at time 2. Now, let V(j>\) denote the minimum average cost starting with initial state p\. Then: V(pi) = inf V(pi;u), (A.20) 116 Appendix A. Optimal Scheduling of a Dedicated-Platform and V(pi) satisfies the Bellman dynamic programming functional equation: V(pi) = min{ci + V{cf>{Pl, At))((l - P d ) P l + 1 - P l ) ; c2 + V(<p(Pl, Su))}, (A.21) where ci and c 2 are the costs given in (A. 10). It is well known [46] that this functional equation has a unique bounded solution and furthermore V is concave in p. The Bellman equation in (A.21) has the exact same structure as the the optimality equation in [47] with «i = 1—pd and « 2 = 1. We therefore conclude that the problem of optimal platform control has been formulated as a Markovian search problem described in [47]. A .4 O p t i m a l Pol ic ies for the Opera t ion of the Dedica ted-P la t fo rm Generally, a value iteration algorithm as described in [9] can be used to solve the Bell-man equation in (A.21). However, this method is often computationally complex and inefficient. In this section we obtain the solution to the optimality equation in (A.21) by using the special structure of the equivalent Markovian search problem described in [47]. For this Markoivan search problem Ross [46] conjectured the existence of op-timal threshold policies. In 1995, MacPhee and Jordan proved this conjecture for an overwhelming proportion of the possible transition matrices, search cost and overlook probabilities. By adapting the results in [47] we show that the optimal policy for the dedicated-platform control is a threshold policy: The controller attempts to process if the probability of the target-task being active is greater or equal than a, certain threshold level, otherwise processing is suspended. The following theorem states the existence of an optimal threshold policy for the dedicated-platform control problem: Theorem 9 Let P k be the state information at time k in the dedicated-platform control problem. Then there exists a threshold value, 5, such that for any k £ Z+. if Pk > 5, the optimal action at time k is to attempt to process, and if pk < 5, the optimal action at time k is to suspend processing. Proof: By observing the corresponding optimality equations, we established in Section 3 that our platform control problem is equivalent to a special form of a two-state Marko-vian search in [47]. The reader is then referred to [47] to see the details of the proof for 117 I Appendix A. Optimal Scheduling of a Dedicated-Platform the equivalent search problem. It is shown in [47] that depending on the system parameters {a, h, c\, C2,pd}, there are three different threshold levels ,5. In particular, whether which threshold level is applicable depends explicitly on the fixed points of the evolution equations in (A. 17) and (A.18). Let Pj\t be the fixed point of c/>(-, At) in equation (A.17) and Ps u be the fixed point of (/)(-, Su) in equation (A. 18). P^t and Ps u are then given by: l-p, where we have defined: P 2 - ((1 - pd)q + h) - - pd)a + hf - 4(1 - p~^ P A T = . (A .22) l - / i PSu = fi = a + h - l . (A.23) /i is an eigenvalue of the target-task transition matrix, A (defined in (A.3)), and hence can be regarded as a measure of the memory in the target-task state transitions (e.g. if H — 0 then the target-task state transitions are i.i.d). Also, to express the main result of this section we need to define the following mappings of the information state by two different consecutive actions (i.e. {At, Su} and {Su, At}): PAt,Su(-) = <M0(-,At),Su)) Psu,At(-) = <K<K',Su),At)), (A.24) where </>(•, u € {At, Su}) is defined in (A.15) and is given by equations (A.17) and (A.18). The following proposition states the main result of [47] adapted to our platform control problem: Proposition A.4.1 The platform control system is categorized into three different classes - Class 1, Class 2 and 3. Each class has a different threshold value Si, S2 and S3. Class membership rules are as follows: Class 1 Class 2 Class 3 Si < PAL (A.25) PM <S2< PSU and Ps^Atih) < S2 < PAt,Su{S2) PAt <Sz< PSu and {S3 > PAt,su(<53) or S3 < Psu,At(S3)}, 118 Appendix A. Optimal Scheduling of a Dedicated-Platform where the fixed points PAt md Psu are given in (A.22) and PAt,Su end Psu,At o,re defined in (A.24)- The Threshold levels for class 1 and class 2 are given by: = ( 1 : " ) ( C 1 - C 2 ) . (A.26) (1 - M ) C I _ (1 - /Q(ci -pc2) f . s (1 - jLt)(Cl +c 2) where ii is defined in (A.23). The threshold level for Class 3, 5$, cannot be obtained in closed form but as explained in ( [47]), £3 can be numerically computed by applying multiple compositions of (/>(•, At) and <p(-, Su) [68]. The following is an important conceptual consequence of the above proposition: Corollary A.4.1 Each class is uniquely determined by the system parameters {a,h,ci,C2,Pd}- Furthermore, the system belongs to one and only one of the classes. At this point, we have obtained the optimal threshold policies for our dedicated-platform control problem. By analyzing the properties of these policies [68], we observe that for Class 1 systems, the optimal control policy is to suspend processing till the information state pk exceeds the threshold 5\. After that, the controller successively attempts to process up to the completion of the target-task. This is because in Class 1, once pk exceeds the threshold, the updated information state will be still above the threshold S\ after each attempt. In the case of Class 2 and 3, the optimal policy may have a more complex form, i.e., the optimal actions may vary between successive attempts and suspensions. In the next section we justify our results by numerical examples to demonstrate the improvement in the performance that can be obtained by the optimal threshold policies as compared to heuristic algorithms. A . 5 N u m e r i c a l Examples The purpose of this section is to evaluate by numerical experiments the performance of the optimal threshold policy in terms of the incurred average cost up to the completion of the target-task. We consider three different scenarios, whereby different costs and different processing precisions pd are selected. Also, we examine three different control policies: optimal threshold policy, persistent attempt and Suspend-M. The persistent attempt is the most aggressive method where the controller chooses to attempt to process at each discrete time until the target-task is completed. Suspend-M denotes a method 119 Appendix A. Optimal Scheduling of a Dedicated-Platform that controller waits for M discrete-time after an unsuccessful attempt before attempting to process the target-task again [69]. The number M generally increases with the state transition memory as described in [69]. We assume the stationary distribution of the target-task states is TT — [1/2 1/2] so in a long term the target-task can be active or inactive with equal probabilities. The stationary distribution of matrix A, defined in (A.3), is simply calculated as JEJ]- Therefore, we have: - / A 1-/J.i 1 - a 1-h 1 /.t 1 — fi 2 (A.28) where JJ, = a + h — 1. The above gives a = h which is also obvious from the symmetry of our assumption. The results for c\ — 4, c\ — 1, and pd — 0.7 are shown in Fig A . l . It is clear that the threshold policy gives the best performance. In the case that the processing precision increases to pa = 0.9, as shown in Fig A.2, the Suspend-M policy gives a better performance. However, the threshold policy still gives the lowest average cost. By reducing the cost of processing attempt to c\ — 2, as shown in Fig A.3, the persistent attempt policy gives a close performance to the optimal policy. In all cases when the memory, p, increases, the Suspend-M policy shows a degraded performance while the persistent attempt policy shows much less variations. A . 6 Conclusions We have derived stochastic control algorithms to achieve the optimal trade-off between the processing cost and the latency in completing the target-task by a dedicated-platform. The structural results in Makovian target search problem has been used to derive opti-mal threshold control policies. The resulting threshold policies are efficiently computable and easy to implement. We have shown by numerical examples that these polices out-perform non-optimal heuristic algorithms in terms of the average cost of the target-task completion. 120 Appendix A. Optimal Scheduling of a Dedicated-Platform 11 10 o O Optimal Threshold Suspend-M Persistent Attempt 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Target-Task State Transition Memoryu 0.8 0.9 Figure A . l : Average cost vs. Target-task transition memory: a = h, c\ — 4, c\ = 1, Pd = 0.7. , 121 Appendix A. Optimal Scheduling of a Dedicated-Platform 5.5 4.5 3.5 2.5 Optimal Threshold — Suspend-M Persistent Attempt 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Target-Task State Transition Memory^ Figure A.2: Average cost vs. Target-task transition memory: a = h, c\ = 4, c\ = 1, Pd = 0-9. 122 Appendix A. Optimal Scheduling of a Dedicated-Platform o O 0) 3 5 Optimal Threshold Suspend-M Persistent Attempt 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Target-Task State Transition Memoryu 0.8 0.9 Figure A.3: Average cost vs. Target-task transition memory: a — h, c\ — 2, c\ = 1, Pd = 0.7. 123 Appendix B Proof of Theorem 5 In this Appendix we provide a proof for Theorem 5 which in effect is a proof of a-discounted infinite horizon formulation with adjacent channel transitions. B . l M a i n P r o o f Denote by u*(-) : X —> M an optimal MCS selection policy described in (3.5) and by V*(-) : X —> 01 the associated value function. The proof is presented by contradiction: suppose an optimal policy u*(s,j) that is non-increasing in s. We will show that an increasing policy must exist with a less or equal cost. First, note that Proposition 3.4.2 (ii) implies that u*(s, 1) = 2 =>• u*(s, 2) — 2 and u*(s, 2) — 1 u*(s, 1) = 1. Hence u(s,j) is increasing in j , and consequently, it is enough to show that an optimal policy u*(s,j) exists that is increasing in s. Here, without loss of generality, assume j = 1 is fixed and we show that u*(s, 1) is increasing in s. The case of j — 2 can be treated with a similar argument. Define ro € § as ro — min {s € § : (his) < d\(s)} where di(s) are delay costs. Note that based on Assumption 5.1, ci2(s) < d\(s) for all s > TQ and d2(s) > di(s) for s < TQ. Define forte region i, denoted by It, as the collection of states for which the transmission mode i gives the minimum delay, i.e., 7\ = {s € S : s < ro} and J2 — {s E S : s > ro}. Clearly, Proposition 3.4.2 (ii) implies that for any optimal policy /i*(s,j) /z*(s, 1) = 1, s e J i , /i*(s,2) = 2, 5 6 ^ 2 . (B.l) Assume an optimal policy u*(s,j) such that u*(s,l) is not increasing in s. It will be shown that the cost associated with u*(s,j) can be improved or replicated by a monotone policy. Without loss of generality, assume l ^ l > 1 since otherwise u*(s,l) must be monotone by (B.l) (here \A\ denotes the cardinality of set A). Let m i , 777,2, and 7713 be some integers. It is easy to see that by adjusting integers mi > 0, 7712 > 0, and 7712 > 0 any optimal non-monotone policy u*(s,l) for s E 3\ can be described, starting from 124 Appendix B. Proof of Theorem 5 state ro, as follows. ro — 1 < s < r 0 + mi, ro + mi < s < r\ := ro + mi + rn2 r\ < s < r-i := ro + m\ + m 2 + 77x3 s = r 2 , (if r 2 < JV), s > r 2 . 2 1 2 x where x denote the undetermined part of the policy ("don't care"), and TV is the maxi-mum channel state. Figures B.l(a) and B.l(b) show the portion of the policy constructed by (B.l) and (B.2) with TV > r 2 and TV — r 2 — 1, respectively. Here, the circles in row j and column s of the grid gives the action (selected transmission mode) in state (s,j). States associated with action 1 are shown by blank circles and states associated with action 2 are shown by filled circles. The hatched circles represent the states for which the optimal action is not determined by (B.l) and (B.2). The vertical lines in the fig-ures, show the border of the forte regions. It should be clear that the descriptions in (B.l) and (B.2) are valid for any non-monotone optimal policy for some integers mi > 0 and m 2 > 0, and rn3 > 0. This is as much construction as we need to complete the contradiction proof. We show that policies described in (B.l) and (B.2) and shown in Figure B . l can be improved by monotone policies. First, we derive conditions on delay variations for policies described by (B.l) and (B.2). Both cases in Figures B.l(a) and B.l(b) can be treated with the same argument as follows. Let 7*1 := ro + mi-t-m 2 and 2 := {ri,r\ + l,... , r 2 — 1}. Assuming initial channel state r i , define random variable n to be the number of time-slots that the Markovian channel remains in set B before the first departure. Starting from initial channel state r i , let Ai denote the event the that channel first visits state r i — 1 (departure from left). Also, starting from initial channel state r i , let Ar denote the event that the channel first visits state r 2 (departure from right). Let pr := P(Ar) and pi := P(.A/) with the understanding that if r 2 > TV (Figure B.l(b)) then Ar = 0 and pr — 0. Note that since only adjacent channel transitions are allowed, the first departure must be in either one of the states r\ — 1 or r 2 . The optimal cost in state (rj, 1) and (ri,2) is then given by V*(r1,l).= plE(an\Al)V*(r1-l,l)+ prE(ah\Ar)V*(r2,1) + E(J s ( r , 1, n)). (B.2) 125 Appendix B. Proof of Theorem 5 j = 2 j = l j = 2 j = l ro ro + mi T\ (a) General form of non-monotone optimal policies with ri < N. i'Q ro + mi r i (b) General form of non-monotone optimal policies with 1-2 > N. T2 N Figure B . l : A general form of non-monotone optimal transmission mode selection poli-cies as described by (B.l) and (B.2): Blank and filled circles represent transmission mode-1 and transmission mode-2, respectively. Hatched circles represent undetermined transmission modes. r ( r i , 2 ) = p , E ( a f i | ^ ) r ( r i - 1,2)+ pr E(ah\Ar) V*(r2, 2) + E ( J s ( n , 2, n)), (B.3) where E denotes expectation over h and Js ( r i , j , h) denotes the conditional expected cost over all trips of length h. started from state (r\.j) and before any departure from set ¥>. Since by (B.2), p*(s,j) — j for s 6 B, so the switching cost is zero and the cost-per-stage is given by dj(sk) for sk G 23, and we have ^ s ( n , j , n ) = E \ yZakg(sk,j,j)\n Sfce23 U=o 126 Appendix B. Proof of Theorem 5 n = i ' E a ' d j W l f i - 5o = r i . (B.4) S k & U=o J By (B.2), / z * ( n - 1,1) = 2, and ,j*(r2, 1) = 2, so V > i - l , l ) = C s-rV * ( r i - l , 2 ) , V*(r2,l) = Cs + V*(r2,2). (B.5) Subtracting equations in (B.2) and substituting from (B.5) gives V\rul)-V%ru2)=plE(aA\Ai)Cs+prE(afi\Ar)C8-r • E ( J B ( r i , l , f i ) ) - E(J s (r i ,2 ,n)) - E (a"C s + J s ( n , l , 7 i ) - Mn,2,n)) . (B.6) Also, since u*(ri, 1) = 1, we need V*(ri, 1) — V*(ri , 2) < C s , otherwise, in state (ri, 1), it is not worse to pay switching cost Cs and select the transmission mode 2 which is a contradiction. It follows by (B.6) that E (ahCs + J B ( n , 1, n) - Jv(n, 2, n)) < Cs. (B.7) Therefore, there exists some integer value n — TLQ such that an°Cs + Mn, 1,n0) - Mn,2,n0) < C s . (B.8) On the other hand, (B.4) gives J s(7-i , l ,n 0)-Ja}(ri,2,no)) = V\f2«k(di(sk)-d2(sk))\. (B.9) S A : 6 U=o J Let A(s) := di(s) - d2(s). Assumption 5.1 then implies A(s) > A(ri) > 0, Vs € 03. It follows by (B.9) that Mn,l,n0) - Mn,2,nQ) > A ( n ) ^ - . (B.10) 1 — a 127 Appendix B. Proof of Theorem 5 By (B.8) and (B.10), we have 1 - an° A ( n ) - < (1 - an°)Cs. I — a The dependence on no remarkably vanishes, hence the following local necessary condition is obtained A ( r x ) < (1 - a)Cs. (B.ll) Therefore, for any non-monotone pattern described by (B.2), we have a necessary con-dition given by (B.ll) on the delay variation in state r\ (smallest channel state in set We repeat the above argument for a single element set B ' — {n — 1} with the optimal action p*(r\ — 1,1) = 2. Assuming initial channel state r\ — 1, define random variable fh to be the first departure time of the Markovian channel from set B ' . As defined in (B.4), J!%(r, l ,m) is then the conditional expected cost over all trips of length fh inside B ' before any departure. Furthermore, observe that Proposition 3.4.l(i) gives V * ( r i - l , l ) < C a + V * ( r i - l , 2 ) , V*(r2,l) <CS + V*(r2,2). (B.12) Repeating similar arguments in (B.2)-(B.6) and using (B.12) instead of (B.5) gives V*(n - 1,1) - ^*(ri - 1,2) < E (a™Cs + Mn - 1,1, m) - J B ( n - 1,2, fh)) , (B.13) where E denotes expectation over fh. By Assumption 5.1 and (B. l l ) , we have A ( r i - l ) < A ( r i ) < (1 — a)Cs, so for any given value fh = m, we have by (B.4) a^s+Mn - 1,1, m) - Mn ~ 1, 2, m) rn = amCs + Y,ak^-1) k=0 m <amCs + ^akA(n) k=0 m < amCs + ^2ak(l- a)Cs = Cs (B.14) fc=o 128 Appendix B. Proof of Theon It then follows from (B.13) that V*(ri-lA)<V*(n-l,2) + Cs But this means p*(r\ - 1,1) = 1 which is a contradiction. Consequently, a monotone pattern for p*(s, 1) must exist. Note that we have also obtained a sufficient condition for u*(s',l) — 1 for any channel state s' 6 S. In fact, if the condition in (B.l l ) is true in channel state s', i.e., A(s') < (1 - a)Cs, then repeating the above line of proof in (B.12)-(B.14) gives p*(s',l) = 1. Therefore, the necessary condition for u*(s',l) = 2 is that A(s') > (1 - a)C8, which completes the proof. • 129 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0302179/manifest

Comment

Related Items