UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Cross-layer resource scheduling for wireless systems over correlated fading channels Karmokar, Ashok Kumar 2007

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2007-317976.pdf [ 22.17MB ]
Metadata
JSON: 831-1.0100638.json
JSON-LD: 831-1.0100638-ld.json
RDF/XML (Pretty): 831-1.0100638-rdf.xml
RDF/JSON: 831-1.0100638-rdf.json
Turtle: 831-1.0100638-turtle.txt
N-Triples: 831-1.0100638-rdf-ntriples.txt
Original Record: 831-1.0100638-source.json
Full Text
831-1.0100638-fulltext.txt
Citation
831-1.0100638.ris

Full Text

CROSS-LAYER RESOURCE SCHEDULING FOR WIRELESS SYSTEMS OVER CORRELATED FADING CHANNELS by A S H O K K U M A R K A R M O K A R B.Sc. Engg., Bangladesh University of Engineering and Technology, 1998 M.Sc. Engg., Bangladesh University of Engineering and Technology, 2002 A THESIS S U B M I T T E D IN PARTIAL F U L F I L L M E N T OF T H E R E Q U I R E M E N T S FOR T H E D E G R E E OF D O C T O R O F P H I L O S O P H Y in T H E F A C U L T Y OF G R A D U A T E STUDIES (Electrical and Computer Engineering) T H E U N I V E R S I T Y OF BRITISH C O L U M B I A August 2007 © Ashok Kumar Karmokar, 2007. Abstract Packet scheduling is very important in future wireless networks due to their limited resources and increasing demand for high data rates. Time-varying incoming traffic and channel gains make the scheduling decision very challenging. Due to the inherent dynamic nature of packet scheduling, we formulate the scheduling problem as a Markov decision process. We consider a single user communicating over a correlated fading channel. The incoming traffic is ran-domly varying and stored in the finite buffer before transmission. We formulate the scheduling problem from a cross-layer viewpoint, by considering both the physical and the data link layer optimization objectives. Our objective is to maximize throughput, and minimize power, buffer-ing delay, packet overflow and bit error rate. First, we present optimal and suboptimal packet scheduling over Rayleigh fading channels. We analyze the problem using both the information-theoretic and the multilevel modulation transmission models. Two different ways of computing optimal policies are given and their benefits and drawbacks are discussed. The performance of the suboptimal scheduler is compared with that of the optimal scheduler. On the top of em-ploying a diversity technique at the receiver, we also show that by adapting packet transmission across different layers the scheduler can save a significant amount of power. The rate adapta-tion problem is then extended for schemes with a multiple-input multiple-output channel and a selective-repeat automatic repeat request protocol. Second, we develop several adaptation tech-niques for type-I hybrid automatic repeat request schemes. We investigate the rate and power adaptation problem for two cases: when both the perfect channel state information and obser-vation feedback are known, and when only latter is known. We analyze the adaptation problem for both flat-fading and frequency-selective channels. The policy heuristic-based solution for coding rate adaptation is later given for these schemes in the case of no perfect channel state information, and then extended for joint coding rate and modulation order adaptation. Finally, cross-layer scheduling for the incremental redundancy hybrid automatic repeat request system is given using rate compatible punctured code. We propose three different adaptation models and compare their performances. C O N T E N T S i i i I CONTENTS A b s t r a c t i i C o n t e n t s H i L i s t o f T a b l e s i x L i s t o f F i g u r e s x L i s t o f S y m b o l s x v i i L i s t o f A b b r e v i a t i o n s x x i v A c k n o w l e d g m e n t s x x v i i i D e d i c a t i o n x x x 1 I n t r o d u c t i o n 1 1.1 Introduction and Overview 1 1.2 Background Literature Review 3 1.2.1 Channel-Adaptive Transmission Rate and Power 3 1.2.2 Adaptive Forward Error Correction Coding with A R Q Protocol . . . . 6 C O N T E N T S iv 1.2.3 Adaptive Coded Modulation 11 1.2.4 Cross-layer Analysis 12 1.2.5 Transmission with Partial Observability of CSI 16 1.2.6 Nature of Incoming Traffic 18 1.2.7 Memory of Fading Channels 20 1.3 Thesis Statement 22 1.4 Contributions 25 1.5 Thesis Outline 29 2 Optimal and Suboptimal Packet Scheduling 30 2.1 Introduction 30 2.2 System Model 32 2.2.1 Traffic Model 33 2.2.2 Buffer Model and Dynamics 34 2.2.3 F S M C for The Wireless Fading Channel 35 2.3 Analysis of Packet Transmission Schemes 39 2.3.1 Information-Theoretic Transmission Scheme 39 2.3.2 Practical M-QAM-based Transmission Scheme 41 2.4 Objectives of Scheduling Scheme and Associated Costs 43 2.4.1 Transmission Power Cost 43 2.4.2 Buffer Costs 44 2.4.3 General Formulation 45 2.5 Optimal Scheduling 46 2.5.1 Unconstrained M D P Formulation 46 2.5.2 Constrained M D P Formulation 49 2.6 Suboptimal Scheduling 51 2.6.1 Log Scheduling 53 2.6.2 Threshold Scheduling 54 C O N T E N T S v 2.6.3 Mixed Scheduling . 54 2.7 Results and Discussions 55 2.8 Conclusions 60 3 Scheduling with Receiver Diversity 72 3.1 Introduction 72 3.2 System Model 73 3.3 Correlated Nakagami-m Channel 74 3.3.1 Selection-Combining Diversity 75 3.3.2 Maximal Ratio-Combining Diversity 76 3.4 Optimal Rate, Power and B E R Adaptation 77 3.4.1 Optimal Rate and Power Adaptation with Constant B E R 78 3.4.2 Optimal Rate and B E R Adaptation with Constant Power 79 3.4.3 Solution Techniques 79 3.4.4 Evaluation of Immediate Costs for the Above Problems 80 3.5 Numerical Results and Discussions 82 3.6 Conclusions 88 4 Rate Adaptation over M I M O Channels 89 4.1 Introduction 89 4.2 System Modeling 91 4.2.1 Problem Preliminaries 92 4.2.2 Objectives of the Scheduling Problem 92 4.2.3 Correlated M I M O Channel Using Transmit Diversity 93 4.3 Scheduling for Rate-Adaptive M - Q A M Systems 95 4.3.1 Description of the Elements of the Formulated M D P 96 4.3.2 Immediate Reward for the Objective of the Problem 97 4.3.3 PER and A C K / N A K Probability 97 C O N T E N T S vi 4.3.4 System State Transition Probability 98 4.3.5 Solution Techniques 99 4.4 Numerical Results and Discussions 99 4.5 Conclusions 104 5 Rate and Power Adaptation for H A R Q Systems 106 5.1 Introduction 106 5.2 General Model Description 108 5.2.1 System Modeling 109 5.2.2 Markov Modeling of Rayleigh Fading Channel I l l 5.3 Rate and Power Adaptive Transmission for Type-I Hybrid A R Q Systems . . . . 115 5.3.1 When Both the Perfect CSI and Observations are Known 116 5.3.2 When only Previous Observations are Known 120 5.4 Solution Techniques for the Scheduling Problems 123 5.4.1 Iterative Dynamic Programming based Approach 124 5.4.2 Linear Programming based Approach 125 5.5 Simulation Results and Discussions 126 5.6 Conclusions 142 6 Scheduling with Partially Observable CSI 143 6.1 Introduction 143 6.2 Model Description 145 6.2.1 Incoming Traffic and Buffer Model 146 6.2.2 Hidden Markov Channel Model 148 6.3 Packet Scheduling Techniques in Partially Observable Environments 149 6.3.1 Expression for A C K Probability 150 6.3.2 Observation and State Transition Probabilities 151 6.3.3 Problem Formulations 151 C O N T E N T S vi i 6.3.4 Transformation into Belief-State M D P 152 6.3.5 Optimal Algorithms 153 6.3.6 Maximum-Likelihood Policy Heuristic 154 6.3.7 Voting Policy Heuristic 155 6.3.8 Q-MDP Policy Heuristic 156 6.4 Simulation Results and Discussions 156 6.5 Conclusions 170 7 A C M for Type-I H A R Q Systems 172 7.1 Introduction 172 7.2 System Modeling . . 174 7.2.1 Description of the System 174 7.2.2 Channel Model 176 7.3 Scheduling Techniques using A C M and H A R Q 176 7.3.1 Costs 178 7.3.2 Transition Probability 179 7.3.3 Solution Methodology 179 7.4 Simulation Results and Discussions 179 7.5 Conclusions 183 8 Scheduling Techniques for I R - H A R Q Systems 185 8.1 Introduction 185 8.2 IR-HARQ Modeling 187 8.2.1 System Model 188 8.2.2 R C P C Code and Observation Probability 190 8.3 S M D P Formulation of the Scheduling Problem 192 8.3.1 Three Adaptation Models 194 8.3.2 SMDP Kernel and Transition Probability 196 C O N T E N T S v i i i 8.3.3 Costs Associated with Different Objectives 198 8.4 Solution Techniques 201 8.4.1 Equivalent Auxiliary DT-MDP Formulation for S M D P 202 8.4.2 Constrained M D P Formulation 203 8.4.3 Linear Programming Solution Technique 204 8.5 Simulation Results 205 8.6 Conclusions 218 9 C o n c l u s i o n s a n d F u t u r e D i r e c t i o n s 2 2 0 9.1 Introduction \ 220 9.2 Summary 220 9.3 Future Work 223 B i b l i o g r a p h y 2 2 5 A p p e n d i c e s 2 4 4 A p p e n d i x A M a r k o v D e c i s i o n P r o c e s s 2 4 5 A . l Relative Value Iteration Algorithm 248 A.2 Policy Iteration Algorithm 249 A.3 Linear Programming Algorithm 249 A p p e n d i x B W e a k l y C o m m u n i c a t i n g S t r u c t u r e : A n E x a m p l e 251 A p p e n d i x C A l g o r i t h m f o r T r a c k i n g B e l i e f 2 5 4 A p p e n d i x D A l g o r i t h m f o r D e t e r m i n i n g O p t i m a l Q - f u n c t i o n 2 5 5 L I S T O F T A B L E S ix LIST OF TABLES B. 1 System transition matrix corresponds to policy P(na) 252 B.2 System transition matrix corresponds to policy P(nb) 253 L I S T O F F I G U R E S x FIGURES 2.1 (a) Schematic of the cross-layer adaptive single-transmit single-receive antenna system and (b) schematic of channel transitions among different discrete chan-nel states 33 2.2 The variation of buffer occupancy in packets with respect to time-slots 34 2.3 A n illustration of the weakly communicating structure of the M D P problem. . . 49 2.4 A n illustration of the optimal rate allocation policy for the second (ergodic capacity) information-theoretic model 52 2.5 Comparison of the average power vs. average delay tradeoffs with bounded allowable average packet-dropping probability for different normalized fading rates and different traffic statistics. The first information-theoretic model dis-cussed is considered here 62 2.6 Comparison of the average packet-dropping probability vs. average delay curves corresponding to Fig. 2.5 for different normalized fading rates and different traffic statistics. The first information-theoretic model discussed is considered here 63 L I S T O F F I G U R E S xi 2.7 Comparison of the average power vs. average delay tradeoffs with optimal packet scheduling, log scheduling and channel threshold scheduling for the second information theoretic model discussed. No packet-dropping is allowed in this case, and comparison graphs are simulated for different normalized fad-ing rates 64 2.8 Comparison of the average power vs. average delay tradeoffs with optimal scheduling, log scheduling and mixed scheduling for the third information the-oretic model discussed. No packet-dropping is allowed in this case and com-parison graphs are simulated for different normalized fading rates 65 2.9 Comparison of the average power vs. average delay tradeoffs for the first M Q A M model with equal maximal instantaneous B E R for all channel states. The effect of the average allowable packet-dropping probability is also shown, along with the effect of normalized fading rates 66 2.10 Comparison of the average packet-dropping probability vs. average delay curves corresponding to Fig. 2.9 for the first M Q A M model, with equal maximal in-stantaneous B E R for all channel states. The effect of the average allowable packet-dropping probability is also shown, along with the effect of normalized fading rates 67 2.11 Comparison of the average power vs. average delay tradeoffs for different numbers of actions and different normalized fading rates. The second M Q A M model with equal average B E R for all channel states has been used 68 2.12 Comparison of the average power and average delay with bounded allowable average packet-dropping probability for the second M Q A M model. Optimal packet scheduling and log scheduling are compared for Poisson arrivals 69 2.13 Comparison of the average packet-dropping probability with average delay corresponding to Fig. 2.12 for the second M Q A M model. Optimal and log scheduling are compared for Poisson arrivals 70 L I S T O F F I G U R E S x i i 2.14 Comparison of the average power vs. average delay tradeoffs for the second M Q A M model with optimal and suboptimal log schedulers. Constant arrivals and no packet-dropping are considered for this figure 71 3.1 Schematic of the cross-layer adaptive packet transmission system with receiver diversity 74 3.2 Influence of the number of receive antennas UR and normalized fading rate frnTs on the average power vs. average delay curve for the selection-combining scheme at the receiver. 83 3.3 Influence of the number of receive antennas UR and normalized fading rate / m T s on the average power vs. average delay curve for the maximal ratio-combining scheme at the receiver. 84 3.4 Average power vs. average delay curve for selection combining scheme at the receiver with different Nakagami severity fading parameters m 85 3.5 Effect of the number of receive antennas TIR and the normalized fading rate fmTs on the average B E R vs. average delay trade-off curve for the selection-combining scheme at the receiver 86 3.6 Effect of the number of receive antennas nR and the normalized fading rate / m T s on the average B E R vs. average delay trade-off curve for the maximal ratio-combining scheme at the receiver 87 4.1 Schematic of the Adaptive M Q A M system using SR-ARQ and STBC 91 4.2 Average throughput for different numbers of receive branches 101 4.3 Average throughput for different numbers of actions 102 4.4 Effect of Nakagami parameter on the average throughput 103 4.5 Effect of Nakagami parameter on the average delay 104 5.1 Schematic of the Adaptive Type-I Hybrid A R Q Systems (a) when perfect CSI is available (b) when perfect CSI is not available 109 L I S T O F F I G U R E S xi i i 5.2 The effect of normalized fading rate fmTB on the average power vs. average delay for perfect CSI case over flat fading channels 128 5.3 The effect of normalized fading rate / T O T B on the average overflow rate vs. average delay for perfect CSI case over flat fading channels 129 5.4 The effect of arrival packet rate A on the average power vs. average delay for perfect CSI case over flat fading channels 131 5.5 The effect of arrival packet rate A on the average overflow rate vs. average delay for perfect CSI case over flat fading channels 132 5.6 The effect of buffer size B on the average power vs. average delay for perfect CSI case over flat fading channels 133 5.7 The effect of buffer size B on the average overflow rate vs. average delay for perfect CSI case over flat fading channels 134 5.8 Influence of channel coding on the power/delay performance of the optimal scheduler in the perfect CSI flat-fading channel 135 5.9 Influence of channel coding on the overflow/delay performance of the optimal scheduler in the perfect CSI flat-fading channel 136 5.10 Influence of the channel model on the power vs. delay performance of the optimal scheduler in the perfect CSI flat and frequency-selective fading channel. 137 5.11 Influence of the channel model on the overflow vs. delay performance of the optimal scheduler in the perfect CSI flat and frequency-selective fading channel. 138 5.12 Influence of perfect and non-perfect CSI on the power vs. delay performance of the optimal scheduler in the flat-fading channel 140 5.13 Influence of perfect and non-perfect CSI on the overflow vs. delay performance of the optimal scheduler in the flat-fading channel 141 6.1 Schematic of the adaptive type-I H A R Q systems 147 6.2 Graphical representation of P O M D P belief dynamics 152 L I S T O F F I G U R E S xiv 6.3 A comparison among different heuristics and perfect CSI in terms of through-put. The effect of fading rates is also shown 157 6.4 A comparison among different heuristics and perfect CSI in terms of buffer occupancy. The effect of fading rates is also shown 158 6.5 Belief state entropy variation as a function of time-slot for two normalized fading rates (fmTB = 0.01 and fmTB = 0.1, where TB = 10~4) 161 6.6 The effect of number of actions on throughput as a function of incoming traffic probability. The performance of different heuristics for different numbers of actions are also shown 162 6.7 Effect of the maximum number of packet arrivals on the H A R Q throughput . . 163 6.8 Influence of different numbers of channel partitioning on the H A R Q throughput 164 6.9 Effect of frame length on the H A R Q throughput 166 6.10 Effect of incoming traffic burstiness on the H A R Q throughput 167 6.11 Influence of buffer size on the H A R Q throughput 168 6.12 Effect of buffer size on the buffer occupancy. Difference between no adaptation and adaptation cases is also shown 169 7.1 Schematic of the H A R Q system employing A C M and SR-ARQ over the par-tially observable Nakagami-m fading channel 174 7.2 Variation of average throughput with average received SNR for different Nak-agami severity parameter m and different policy heuristic . . 1 8 0 7.3 Influence of channel partitioning on the average throughput 182 7.4 Effect of fading rate on the average throughput for different heuristic 183 7.5 Effect of buffer size on the average throughput for different heuristic 184 8.1 (a) System diagram of the incremental redundancy hybrid A R Q system and (b) Typical sample path for a SMDP. 188 L I S T O F F I G U R E S xv 8.2 Trade-off between average transmitter power and average buffer delay for con-stant incoming traffic arrival, specified overflow bound and different fading rates. Comparison between power adaptive and no adaptation is shown 206 8.3 Trade-off between average transmitter power and average buffer delay for Poisson-distributed incoming traffic arrival, specified overflow bound and different fad-ing rates. The effect of different buffer sizes is also shown 207 8.4 The influence of different packet arrival rates on the average transmitter power/average buffer delay trade-off for fixed buffer size and packet overflow bound 208 8.5 The effect of unequal channel state stationary probability on the average trans-mitter power vs. average buffer delay curves for fixed number of channel states and packet overflow bound 209 8.6 Comparison of average transmitter power as a function of average buffer de-lay for constant and Poisson-distributed incoming packet arrival for different channel state memory and packet overflow bound 210 8.7 Comparison of average packet overflow as a function of average buffer de-lay for constant and Poisson-distributed incoming packet arrival for different channel state memory and packet overflow bound 211 8.8 Comparison of average power/average delay curves for different immediate delay cost models. A l l immediate delay cost models give approximately the same performance 214 8.9 Trade-off between average transmitter power and average buffer delay for con-stant and incremental transmission power during a particular decision-epoch. For suitably chosen incremental power sets, incremental power actions outper-form constant power actions 215 L I S T O F F I G U R E S xvi 8.10 Comparison of only power adaptation with combined rate and power adapta-tion. Joint power and rate adaptation provides better performance than only power adaptation, due to the addition of more degrees of freedom in the action set 216 8.11 Variation of average power with average buffer delay and average packet over-flow. Only power adaptation and constant traffic are considered for the 3-D plot 217 8.12 Optimal policies are shown as a function of channel state and buffer state. For fixed sets of Lagrangian multipliers, we used relative value iteration (RVI) al-gorithm to compute optimal deterministic policies by solving corresponding Bellman's equation iteratively. We consider only power adaptation with con-stant traffic for this plot. In each subplot, the average overflow is below 1 0 - 4 . .218 B . l A n illustration of the weakly communicating structure of the M D P problem. . . 251 / List of Symbols xvi i List of Symbols o}f - Number of path of weight d for code CT a,i=ith discrete incoming traffic state a n=Incoming traffic state at decision-epoch n j4=Number of incoming traffic states vi=Average packet arrival rate in packets/time-slot ,4=Set of incoming traffic states 6i=zm discrete system state 6n=System state at decision-epoch n 5=Number of buffer states #=Set of transmission buffer states Ci-i^ discrete channel state c"=Channel state at decision-epoch n C=Number of channel states C=Set of channel states Ci- Code rate of R C P C code in ith transmission dP)= Distance contribution of code C\ d^= Distance contribution of the added bits to C j _ i yielding code Ci d^lee= Free distance of code Ci List of Symbols xvi i i D= Maximum allowable delay D= Average packet delay in the buffer et= ith power level in the set e " = Transmitter power level in time-slot n E= Number of transmission power levels £ = Set of transmission power levels f m = Maximum Doppler frequency fi=ith discrete hidden incoming traffic state F=Number of hidden incoming traffic states J"=Set of hidden states of incoming traffic fi\ldc)= Probability density function of 7 ^ F 7 ( 7 d c ) = Cumulative distribution function of jdc G=Set of cost/reward vectors g(si) Ui, Sj)=Immediate cost/reward i f the system moves to state Sj when action Ui is taken in state Si Gx(s, u)=Immediate one stage expected cost for state-action pair (s, u) with objective "x" G%= Long-term expected cost for policy TT associated with objective "x" h(s)= Differential reward/cost for state s hn(l)= The coefficients of Ith tap at time-slot n H= Horizon of the problem H = Channel matrix of M I M O system 7= Number of gain state for a particular tap J- Number of taps K- Number of rates offered by the family of R C P C codes JC= Set of state-action pairs 1= Number of stages in a trellis li= ith History state List of Symbols x i x L- Diversity order of M I M O systems Ln= History record at time-slot n C= Set of history states m= Nakagami fading severity parameters M = Constellation size of a multilevel modulation nT= Number of transmit antennas TIR= Number of receive antennas •/V"7i= Level crossing rate at received SNR 7* Nf = Number of modulated symbols in a physical layer frame (or Number of channel use in a time-slot) Np= Number of bits in a data link layer packet Nc= Number of pilot symbols and control parts Ni= Length of noncausal part of the M M S E estimator N2= Length of causal part of the M M S E estimator 0= Set of composite channel and traffic observations Oj= ith observation in the observation set o"= Composite observation at time-slot n Pfuf = Probability of transition from traffic state fc to traffic state / j P ( ° j l / i ) = Probability of a, packet arrivals given that traffic state is fc V= Set of state transition probability matrices Psi,sj(ui)- Probability of moving from state Sj to state Sj i f action m is chosen Pci,Cj= Probability of transition from channel state Cj to state c,-Vc= Set of channel state transition probability matrices P0f= Maximum allowable packet overflow/time-slot Pt= Instantaneous transmit power of the transmitter •pw= Set of observation probability vectors P A ( C J , UJ)= A C K probability for a given channel state Cj when action Uj is chosen L i s t o f S y m b o l s xx P J V ( C J , Uj)= N A K probability for a given channel state Q when action Uj is chosen Pe= Bit error rate (BER) for the modulation employed Pe= Average B E R Pe ,BPSK = B E R for B P S K transmission P e , M - P S K = B E R for M - P S K transmission • f e , M - Q A M = B E R for M - Q A M transmission Pp= Packet error rate (PER) Pp= Average PER Q= Length of the history QSUSJ (T, Ui)= Probability of moving from state S j to Sj at or before time Q(s, u)= Differential cost/reward corresponding to state-action pair (s, it) Q= Set of transition distributions R= Maximum number of retransmissions RB= Number of blocks/second Rd= Error detection coding rate Rc= Error correction coding rate Ri= Number of bits in a modulated symbol Ri- Transmission rate in bits/symbol for ith action Rn= Transmission rate in bits/symbol at decision-epoch n Si=ith discrete system state, S j 6 S s n=System state at decision-epoch n 5=Number of system states <S=Set of system states <SV=Set of transient system states <!>H=Set of recurrent system states T a=The duration of a modulated symbol Ts= Symbol duration List of Symbols xxi TB=The duration of a discrete time-slot (also called block) T= Total length of the horizon Tm= Completion time of the mth transition (also called sojourn time for mth decision-epoch) tn= Time of occurrence of the start of the nth decision-epoch •T= Set of time-slots W=Set of allowable actions Z/4=Set of allowable actions in state s U=Number of actions Ui=ith action in the action set, Eli u"=Action chosen at decision-epoch n vmt= speed of mobile terminal V{= ith element of the modulation constellation set V= Puncturing period of the rate compatible punctured convolutional code V= Set of modulation constellations V = Matrix of receiver noise W= Total possible number of transmitted packets Wf = Bandwidth of the Nyquist pulse shaping filter Wi= i * element of the transmission packet set, which corresponds to ith action W= Set of number of transmitted packets w e r = Number of packets received in error among W{ transmitted packets w a c = Number of packets received successfully among w{ transmitted packets wn= Number of packets taken from the buffer for transmission w— Least number of packets in the buffer for transmission with any action X= The set of the pair of transmission power and rate X = Matrix of transmitted signal Y= Matrix of received signal zn(si)= Initial estimate of the information state (belief state) at the beginning of time-slot n List of Symbols xxii zn(sj)= Updated estimate of the information state (belief state) at the end of time-slot n Z= Total number of history state Z*= Set of nonnegative integers an= Channel power gain at time-slot n an(l)= The power channel gain of Ith tap at time-slot n a= Average channel power gain &"= Estimated channel power gain at time-slot n ajk= Path gain between kth transmit antenna and jth receive antenna 7=Instantaneous received SNR 7=Average received SNR T=Set of discrete received SNR 7(m, x)= Lower incomplete Gamma function r(m)= Complete Gamma function Aj= Average duration of channel state C,(s)= Traffic state corresponding to composite state s Op*(Si)(ui)= Probability of applying optimal action Ui in state Sj Q(X)= Set of discrete probability distribution over finite set X A= Optimal reward/cost A(s)= Average cost or for state s Xrw= wavelength of radio wave fi= Stationary policy //"= Decision rule at time slot n v(s, u)= Steady-state probability that the process is in state s and action u is applied vn= Additive white Gaussian noise at time-slot n £ ; ( C J ) = Gain state of Ith tap for channel state c ; 7r= Policy at time slot n, n E IT. fl= Set of admissible policy L i s t o f S y m b o l s x x i i i g(sn)= Function returns the history state of the composite system state s" <j2= Variance of the additive white Gaussian noise (AWGN) present in the channel c(s)= Incoming traffic hidden state corresponding to the composite state s T if action ii< is chosen T ( U J ) = Number of bits in a symbol of the M-PSK scheme that corresponds to action U j (f>i= Stationary probability of channel state q 4>N= Stationary probability of channel state c n $= Function that maps action to transmitter power level 0jfc= Phase between kth transmit antenna and jth receive antenna <p(x, y)= Returns the difference of x and y when x > y and 0 when x < y x(s)= Channel state corresponding to composite state s I/J(S)= Buffer state corresponding to composite state s \P= Function that maps action to number of packet to be transmitted u)\= Acknowledgement u>2= Negative acknowledgement u)n= Channel feedback observation at time-slot n £2= Set of feedback observations Qjfc=E[ajfc]=Average fading power List of Abbreviations xxiv List of Abbreviations lxEV-DO= lx Evolution Data-Optimized 3G= Third Generation 3GPP= 3G Partnership Project 3GPP2= 3G Partnership Project 2 4G= Fourth Generation ACK= Acknowledgment ACM= Adaptive Coding and Modulation AMC= Adaptive Modulation and Coding ARQ= Automatic Repeat reQuest AWGN= Additive White Gaussian Noise BCH= Bose-Chaudhuri-Hocquenghem BER= Bit Error Rate BMAP= Batch Markov Arrival Process CBR= Constant Bit Rate CDMA= Code Division Multiple Access CMDP= Constrained MDP CPR= Constant Packet Rate CRC= Cyclic Redundancy Check List of Abbreviations xxv CSI= Channel State Information DP= Dynamic Programming DT-MDP= Discrete-Time MDP EDGE= Enhanced Data rates for GSM Evolution EDM= Equal Duration Method EGC= Equal Gain Combining EPM= Equal Probability Method FEC= Forward Error Correction FER= Frame Error Rate FIFO= First-In First-Out FIR= Finite Impulse Response FOCS= Fully Observable Channel State FSMC= Finite State Markov Channel GBN-ARQ= Go-Back-N ARQ GPRS= General Packet Radio Services GSM= Global System for Mobile Communications HARQ= Hybrid ARQ HIPERLAN= High Performance Radio Local Area Networks HMM= Hidden Markov Channel HRPD= High Rate Packet Data HSDPA= High Speed Downlink Packet Access i.i.d.= Independent and Identically Distributed IEEE= Institute of Electrical and Electronics Engineers IR-HARQ= Incremental-Redundancy HARQ ISI= Inter Symbol Interference LCR= Level Crossing Rate LOS= Line of Sight List of Abbreviations xxvi LP= Linear Programming MAC= Medium Access Control MCS= Modulation and Coding Scheme MDP= Markov Decision Process MIMO= Multiple-Input Multiple-Output MMP= Markov Modulated Process MMPP= Markov modulated Poisson process MMSE= Minimum Mean Square Error MLPH= Maximum-Likelihood Policy Heuristic MRC= Maximal Ratio Combining MSE= Mean Square Error M-QAM= M-ary Quadramre Amplitude Modulation M-PSK= M-ary Phase Shift Keying NAK= Negative Acknowledgment PER= Packet Error Rate PI= Policy Iteration POMDP= Partially Observable MDP POSS= Perfectly Observable System State QoS= Quality of Services RCPC= Rate Compatible Punctured Convolutional RS= Reed-Solomon RVI= Relative Value Iteration SBBP= Switched Batch Bernoulli Process SC= Selection Combining SINR= Signal-to-Interference and Noise Ratio SISO= Single-Input Single-Output, Soft-Input Soft-Output SMDP= Semi-Markov Decision Process List of Abbreviations xxvii SNR= Signal-to-Noise Ratio SR-ARQ= Selective-Repeat ARQ STBC= Space-Time Block Coding SW-ARQ= Stop-and-Wait ARQ TDMA= Time Division Multiple Access TM= Transmission Mode T-ARQ= Truncated ARQ UMDP= Unconstrained MDP VBR= Variable Bit Rate VPH= Voting Policy Heuristic V-BLAST= Vertical Bell Laboratories Layered Space-time WCDMA= Wideband Code Division Multiple Access WLAN= Wireless Local Area Networks WRR= Weighted Round-Robin WWAN= Wireless Wide Area Networks XOR= Exclusive OR Acknowledgments xxviii Acknowledgments During the course of this thesis work, I received endless support and help from my family, colleagues and well-wishers. I believe that this thesis would not have been possible without their support, encouragement and guidance. First, I would like to express my gratitude to my thesis advisor, Professor Vijay K. Bhar-gava. Both this thesis and my professional development have been benefited greatly from his guidance.and insight. His enthusiasm and dedication to his students are truly inspiring; I feel fortunate to become a part his group. I would like to thank him for his faith on my abilities. I am grateful to my friend and predecessor lab-fellow Dr. Dejan V. Djonin for his help in many work of this thesis. It was a joy to work with him in many research papers we collabo-rated. I would like to thank him for his help in both technical and non-technical matters. I would like to thank Professor Vikram Krishnamurthy and Professor Robert Schober for their time serving in my candidacy, internal defense and university defense examination com-mittees. I am grateful for their useful suggestions in my research. I also would like to thank Professor Victor Leung (University Examiner), Professor Brian Marcus (University Examiner, Department of Mathematics), Professor Weihua Zhuang (External Examiner, University of Waterloo), Professor Cyril Leung and Professor Lutz Lampe for serving in my different ex-amination committees, and for their insights that greatly improved my horizon of knowledge. My sincere thanks goes to Professor Victor Leung, Professor Tim Salcudean and Professor Acknowledgments xxix Gail Murphy for serving chair in my candidacy, internal and university defense examinations, respectively. The work of this thesis is supported jointly by the Natural Sciences and Engineering Re-search Council (NSERC) of Canada under a strategic project grant and the University of British Columbia Graduate Fellowship awards. I am grateful to these sponsors for their support. I am thankful to my former and current group-mates: Dr. Zeljko Blazek, Dr. Zhiwei Mao, Dr. Poramate Tarasak, Dr. Kin-Kwong Leung, Dr. Masaki Bandai, Daniela Djonin, Serkan Dost, Olivier Gervias-Harreman, Hugues Mercier, Chandika Wavegedara, Chris Nicola, Jahangir Hossain, Mamunur Rashid, Majid Khabbazian, Praveen Kaligineedi, Gaurav Bansal, Ziaul Hashmi and Umesh Phuyal for their cooperation and adding element of fun in my Ph.D. life. I would like to acknowledge the sacrifice, love and support of my family members. This thesis is as much theirs as it is mine. My father and mother have been a constant source of my encouragement and motivation. My father-in-law, mother-in-law, brother, sisters and brother-in-laws have been enormously supportive of me. Last, but certainly not the least, my appreciation of a larger order of magnitude are due to my wife, Piplu. She makes my Ph.D. life more enjoyable by supporting, inspiring and sharing her with me. Thanks to God for blessing me with my son, Amav Karmokar. Amav has been my great inspiration to complete my thesis. Dedication x x x Dedication TO My Loving Parents, Wife an( C H A P T E R 1. I N T R O D U C T I O N 1 CHAPTER 1 Introduction 1.1 Introduction and Overview Wireless communications has already emerged as one of the largest sectors of the telecom-munications industry and is expected to grow tremendously in the future. When coupled with the explosive proliferation of web-based services (such as wireless file transfer, web browsing, e-mail, video streaming, etc.), it is obvious that there will be an increasing demand for wireless data services [1]. Various mechanisms have been proposed and recently deployed to support data traffic over wireless media. These schemes have ranged from wireless local area networks (WLANs), mainly based on the IEEE 802.11b or HIPERLAN (High Performance Radio LAN) standards, to wireless wide area networks (WWANs), where data services are supported in the 2.5G (e.g., GPRS, EDGE) and 3G system versions [2]. Several advances have also been introduced for 3G wireless systems to further enhance data rate and system performance in order to cope with the ever-increasing data rate for different emerging wireless applications. Examples include High-Speed Downlink Packet Access (HSDPA), Enhanced Uplink (EUL) evolution of wideband code division multiple access (WCDMA) systems in the 3G Partnership Project (3GPP) and lx Evolution Data-Optimized (lxEV-DO, also known as High Rate Packet Data, HRPD) Revision 0 and Revision A of CDMA2000 systems in 3GPP2. To enable high-C H A P T E R 1. I N T R O D U C T I O N 2 speed services, advanced techniques such as adaptive modulation and coding (AMC), hybrid automatic repeat request (HARQ), and fast scheduling were introduced in these 3G evolution standards [3,4]. Traffic on the present and next-generation wireless networks is a mix of delay-sensitive real-time traffic and delay-tolerable best-effort data traffic [5]. Whereas real-time traffic can tolerate some degree of bit error rate (BER), BER should be very low for data traffic. Future wireless networks are envisioned to support high data rates, high spectral efficiency, packet-oriented transport, heterogeneous multimedia traffic and a wide range of quality of service (QoS) requirements, e.g., throughput, rate, delay, delay jitter, BER, packet-dropping probabil-ity, packet error rate, etc. However, the time-varying nature of the channel as well as scarce wireless resources (e.g., power and bandwidth) pose challenges in delivering such a wide vari-ety of services. In wireless environments, the gain and hence the signal-to-noise ratio (SNR) of the channel fluctuates randomly due to various unpredictable phenomena, such as mobility of the wireless terminal, channel fading, shadowing, interference, noise, etc. However, the quality of the channel at any time instant depends on the previous channel conditions, due to a significant degree of correlation with some of these phenomena [6]. Because of the random variation of channel gain, errors occur in bursts. Therefore, it is very important to take the memory of the wireless channel into consideration in designing wireless packet transmission protocols [7]. Various adaptation techniques (e.g., adaptive modulation, adaptive coding in conjunction with ARQ protocol, adaptive transmission power, etc.) in different layers have been investigated to cope with the problems of the time-varying correlated channel and scarce wireless resources (e.g., power and bandwidth) [8]. In the following section, we briefly discuss the literature addressing adaptive resource allocation issues. C H A P T E R 1. I N T R O D U C T I O N 3 1.2 Background Literature Review Adaptive resource allocation schemes have proven to be powerful techniques both in increasing high throughput and achieving high reliability over time-varying fading channels [9]. Adaptive resource allocation has already been used in several wireless standards, such as Enhanced Data rates for GSM Evolution (EDGE) and 3G evolution standards (e.g., HSDPA, lxEV-DV). The key idea of these schemes is adaptation of some of the parameters, e.g., transmitter power level, symbol transmission rate, constellation size, BER, coding rate/scheme, or any combination of these parameters with channel conditions (cf. [10] and the references therein). Good perfor-mance of these schemes requires accurate channel estimation at the receiver and a reliable feedback path between that estimator and the transmitter. 1.2.1 Channel-Adaptive Transmission Rate and Power The potential of adaptive variable-rate transmission over fading channels was first recognized in [11]. In this work, a modulation system was proposed that continuously adjusts its data rate in response to signal strength variations of the fading channels. That is, rate adaptation is achieved through a variation of the symbol-time duration. However, the proposed system did not receive much interest, possibly because of hardware constraints, lack of good channel estimation techniques and adoption of systems with point-to-point links using no transmit-ter feedback [12]. The advent of feasible software radio systems and the availability of fast flexible and reconfigurable transceivers have been key aspects of a renewed interest in adap-tive techniques. Whereas rate adaptation achieved through variation of symbol rate requires complicated hardware and results in variable-bandwidth systems without additional spectral efficiency, rate adaptation achieved through variation of constellation size is better suited for hardware implementation with a fixed bandwidth, and is spectrally efficient. Goldsmith and Varaiya showed in [13] that the optimal transmission scheme maximizing long-term through-put (ergodic capacity) is water-filling in time, using an information-theoretic treatment. This C H A P T E R 1. I N T R O D U C T I O N 4 scheme was demonstrated to be optimal for power adaptation and variable-rate multiplexed coding schemes when channel state information (CSI) is known at both the transmitter and receiver. These adaptive transmission schemes are extended in [14,15] for different diversity-combining techniques. In [14], the Shannon capacity (or, equivalently the upper bound on spectral efficiency) of various adaptive transmission techniques, namely optimal rate and power adaptation, constant power with optimal rate adaptation and channel inversion with fixed rate in conjunction with diversity combining are studied. Maximal ratio combining and selection combining are considered, and it is shown that diversity yields large capacity gains for all the techniques, with diminishing returns on the number of branches. In [15], the Shannon ca-pacity analysis of a similar adaptive transmission scheme has been extended for a correlated Rayleigh fading channels with maximal ratio-combining at the receiver. In [16], the capacity of the Nakagami multi-path fading channels with average power constraint for three power and rate adaptation policies, namely optimal power and rate, optimal rate and constant power, and channel inversion, are studied. The authors also derived the closed-form expressions for the outage probability, spectral efficiency and average bit error rate assuming perfect channel es-timation and negligible time delay between channel estimation and signal set adaptation for a practical constant-power variable-rate M - Q A M scheme. The impact of time delay on the BER is analyzed as well. Rate adaptation schemes for a practical M - Q A M system are given in [10,17]. In [17], the spectral efficiency of variable-rate variable-power M - Q A M schemes over both the log-normal and the Rayleigh fading are derived and compared with the Shannon capacity limit. The authors show that there is a constant power gap between the spectral efficiency of the M - Q A M scheme and channel capacity, and that this gap is a simple function of the required BER. The variable-rate variable-power M - Q A M scheme exhibits a 5-10 dB power gain rela-tive to variable-power fixed-rate transmission, and up to 20 dB of gain relative to nonadaptive transmission. When 5-6 different signal constellations are used, efficiency within 1-2 dB of the maximum can be achieved with unrestricted constellation sets. The effect of channel estimation C H A P T E R 1. I N T R O D U C T I O N 5 error and delay on BER performance is also determined. In [10], adaptive modulation schemes using M - Q A M and M-PSK for flat fading channels are examined. The data rate, transmit power, and instantaneous BER are varied to maximize spectral efficiency, subject to average power and BER constraints. Analysis is given for four combinations of rate (continuous, dis-crete) adaptation with BER (average, instantaneous) constraints. Restricting adaptive policies to maintain a constant transmit power or rate cases is also considered. Delay-limited capacity for fading channels is introduced in [18]. It gives the maximum rate of information that can be successfully transmitted even in the worst-case channel-fading scenario. In a power-limited regime, the delay-limited capacity is zero (e.g., for the Rayleigh fading channel), since perfect channel inversion is not possible. In a fading channel, it is not always possible to guarantee finite short-term rates, with finite available transmission power; the concept of outage is intro-duced in [19] for such cases. Power adaptation mechanisms to reduce outage probabilities are given in [20]. In [21], adaptive modulation schemes are designed that yield the minimum outage proba-bilities for wireless systems with strict delay constraints under the assumption of perfect causal channel state information at the transmitter and the receiver. A suboptimal adaptive Q A M modulation scheme for a given number of information bits and delay constraints is also pro-posed. Closed-form expressions of the outage probability, average allocated power, achievable spectral efficiency, and average bit error rate for both voice and data transmission over the Nakagami-m fading channels are given in [22]. In [23], a study of joint antenna subset selec-tion and link (e.g., rate and power) adaptation for MIMO systems is reported. The authors have developed link adaptation algorithms based on an estimation of the optimal number of active transmit antennas for the Rayleigh i.i.d. fading MIMO channels, and based on channel corre-lation information for MIMO fading channels. The adaptive selection of transmit antennas for V-BLAST systems, as well as rate and power assignment at each antenna are explored in [24] with the goal to maximize throughput. In [25], the average throughput and average packet error rate are analyzed for a selective-repeat automatic repeat-request scheme over a multiple-input, CHAPTER 1. INTRODUCTION 6 multiple-output Markovian Nakagami-ro fading channels. The scheme is based on a constant-power variable-rate adaptive M - Q A M system combined with selection transmit diversity. The impact on the performance of the system of using outdated and/or imperfect channel state in-formation is considered. In [12], an overview of different adaptation techniques, e.g., adaptive modulation and coding, adaptive error control mechanisms are given. The gain achieved by employing adaptive techniques in the system with diversity and multi-layer adaptivity, is also discussed. Some results from information theory are presented, which show the limitations of these techniques and motivate further research on the practical design issues that need to be addressed to enable them. In [26], a statistical decision-making approach is presented for selecting the appropriate modulation and coding scheme according to the estimated channel condition for 3G wireless systems. The objective is to maximize average throughput while maintaining an acceptable frame error rate. 1.2.2 Adaptive Forward Error Correction Coding with A R Q Protocol Adaptive hybrid automatic repeat-request schemes are another promising technique for in-creasing both the throughput and reliability of packet transmission over time-varying fading channels [27]. Hybrid automatic repeat-request (HARQ) schemes include parity bits for both error detection and error correction. They combine the throughput efficiency of physical-layer forward error-correction (FEC) coding and the reliability of data-link layer error-detection cod-ing. The key idea of adaptive coding schemes is to vary the code rate of the error-correcting code with the channel condition. That is, to use a lower-rate coding or more error protection when the channel condition is worse, and to use a higher-rate coding or less protection to send more information when the channel condition is better [28]. Recently, there has been con-siderable interest in adaptive HARQ schemes, due to the increased delay tolerance of many applications, such as file transfer, internet browsing, messaging, etc [29,30]. In [27], an adaptive error control system utilizing a class of Hamming codes in a cascaded manner is proposed to provide high throughput over a wide range of channel bit-error probabil-C H A P T E R 1. I N T R O D U C T I O N 7 ity. The system uses the same decoder for decoding the received information and is shown to provide the same order of reliability as an ARQ system, while improving the error-correcting capability of the code. An adaptive coding scheme using punctured convolutional code with maximum-likelihood Viterbi algorithm over a Rician fading finite-state Markov channel is pre-sented in [28]. The throughput gains achieved by the adaptive scheme relative to the con-ventional nonadaptive coding methods are demonstrated by several examples. An adaptive error control technique based on the use of type-I hybrid ARQ protocols over a slowly varying Gilbert-Elliott channel is presented in [29]. The channel state monitor detects the changes in channel state by counting the number of negative acknowledgments (NAKs) during an obser-vation interval (called a frame), consisting of some fixed number of transmitted packets. At the end of each frame, the number of NAKs is compared with a set of thresholds to estimate the state of the channel, and a code rate from a family of Reed-Solomon codes is chosen ac-cordingly. In [31], a sequential scheme for channel state estimation is proposed, where each transmitted packet is scored based on the outcome of the decoding process. When the cumula-tive score crosses a decision threshold, the coding strategy is altered and the sequential inspec-tion scheme is restarted, using the same scoring routine with different weighting and decision threshold constants. Both the BCH code and the rate-compatible punctured convolutional code are considered to evaluate the performance of the adaptive error control system over slowly varying channels. Adaptive rate algorithms that use packet-combining techniques (averaged diversity-combining techniques with packet weights based on either ideal channel state infor-mation or weights derived from side information generated by the Viterbi decoder) for type-I hybrid ARQ systems over stationary and time-varying channels are developed in [30] to further improve throughput and reliability. In [32] and [33], XOR-ing of the two consecutive erroneous copies is used to estimate the channel BER and determine whether or not mode change should be performed. The dependence of the efficiency of type-II hybrid ARQ schemes on optimum packet size according to channel bit error rate in the context of packet combining schemes is also discussed. In [33], the throughput efficiency of adaptive ARQ schemes employing Reed-C H A P T E R 1. I N T R O D U C T I O N 8 Solomon codes is evaluated using a computer simulation approach. Performance analysis of adaptive GBN-ARQ and SR-ARQ for time-varying channels are given in [34] and [35], respec-tively. In [36], the count of successive positive acknowledgment (ACK) or N A K is used for mode change decision in a two-mode adaptive system in a stationary channel. If a ACKs are received in mode-H (high error rate mode), the adaptive system changes to mode-L (low error rate mode). On the other hand, if B NAKs are received in mode-L, the adaptive system changes to mode-H. The values of a and 8 are found by trial or optimization. In [37], the throughput performance of Yao's adaptive GBN ARQ scheme in a time-varying Rayleigh channels is an-alyzed using a simulation study. The simulation is carried out over both fast and slow fading channels using specific M-FSK modulation and Reed-Solomon coding. In [38], the code rate is increased or decreased, based on the symbol error rate at the Reed-Solomon decoder, in a rather ad hoc way. A variable-rate type-I hybrid ARQ using Reed-Solomon codes for meteor-burst communications is presented in [39] and compared with both the fixed-rate type-I hybrid ARQ and the ARQ without FEC. Several applications of error-control coding are discussed in [40]. It is noted that all of the literature discussed above proposes to change the operation mode and correspondingly adapt the code rate based on the channel state information, in some ad hoc fashion. Incremental redundancy-hybrid automatic repeat request (IR-HARQ) is provisioned as a part of the EDGE standard and is also proposed as a part of 3G evolution cellular system standards, such as W-CDMA high-speed downlink packet access (HSDPA) for high-speed re-liable packet data communications [41,42]. IR-HARQ employs a forward error correction (FEC) technique in the physical layer as well as an ARQ technique in the data-link layer to cope with the time-varying nature of fading channels, and to guarantee both high reliability and high throughput. In IR-HARQ schemes, information packets are first transmitted with no or few parity bits for error detection and correction purposes. Incremental redundancy bits are then transmitted upon retransmission request. The receiver combines the transmitted and retransmitted bits together to form a more powerful error correction code to recover the in-C H A P T E R 1. I N T R O D U C T I O N 9 formation. Rate-compatible punctured convolutional (RCPC) codes proposed in [43,44] are particularly useful for IR-HARQ systems. RCPC codes are constructed from a single rate \/N convolutional code, wherein a family of higher-rate codes is formulated by puncturing successively greater numbers of code symbols. These codes have practical utility in that the system requires a single rate 1/JV convolutional encoder and a Viterbi decoder [45]. In [43], a truncated IR-HARQ scheme with RCPC is analyzed for transmission over both an AWGN channel and an ideally interleaved Rayleigh fading channel, assuming independent decoding attempts. The unequal error protection capabilities of convolutional codes belonging to the family of rate-compatible punctured convolutional codes are studied in [44]. The performance of these codes is analyzed and simulated for the fast-fading Rice and Rayleigh channels with differentially coherent four-phase modulation. The authors of [45] have presented the encod-ing as well as Viterbi and sequential decoding of high-rate punctured convolutional codes. Weight spectra and upper bounds on the bit error probability of the best known short-memory punctured codes having memory 2 < M < 8 and coding rates 2/3 < R < 7/8, and long-memory punctured codes having memory 9 < M < 23 and rates 2/3,3/4, are also provided in [45]. Generalized type-II HARQ using RCPC, which combines the IR-HARQ strategy of Hagenauer with the code-combining ARQ strategy of Chase, is analyzed in [46]. A general-ized type-II ARQ scheme using punctured convolutional coding on a two-state non-stationary Markov channel is analyzed in [47]. In [48], hybrid ARQ protocols with adaptive forward error correction using convolutional coding over both non-fading and ideally interleaved Rayleigh fading additive white Gaussian noise channels are proposed and analyzed. Both the adaptive coding-rate ARQ protocols and the adaptive incremental redundancy ARQ protocols, in con-junction with all three ARQ, namely SW-ARQ, GBN-ARQ and SR-ARQ are analyzed. In [49], a type-II hybrid automatic repeat-request scheme with adaptive forward error correction using Bose-Chaudhuri-Hocquenghem (BCH) codes along with an incremental redundancy technique is proposed and analyzed. Performance comparison for two hybrid automatic repeat-request combining strategies, namely, chase combining and incremental redundancy is given in [50]. C H A P T E R 1. I N T R O D U C T I O N 10 The authors of [51] have developed an information-theoretic model to explain and predict the coding gains of the incremental redundancy over the Chase combining scheme. In [52], the au-thors propose a reliability-based incremental redundancy HARQ algorithm with convolutional codes, and show performance results for static and time-varying channels. In reliability-based HARQ, the bits that are to be retransmitted are adaptively selected at the receiver, based on the estimated bit reliabilities at the output of a soft-input soft-output (SISO) decoder. The per-formance of a reliability-based hybrid ARQ that uses RCPC codes in the forward channel and source coding in the feedback link is proposed and evaluated in [53]. In [54], a method for re-ducing the rate of retransmission in a time-varying channel by utilizing the previously received erroneous data frame is proposed. The authors address two questions: how much informa-tion is still useful in the erroneous frame and how a retransmission scheme can be designed to make efficient use of such information. Truncated type-II hybrid ARQ schemes are analyzed over a block-fading channel assuming noisy feedback in [55]. Truncated type-II hybrid ARQ schemes are compared with pure FEC using two different approaches, namely, random cod-ing techniques for block codes and union upper bounds for specific terminated convolutional codes. In [56], an IR-HARQ sceme with selective combining using RCPC is investigated over a Rayleigh fading finite-state Markov channel (FSMC). Simulation results of real-time video transmission for a time division multiple access (TDMA) system with bounded delay are also given and compared with analytical results in terms of throughput and packet error rate. In [57], the performances of different IR-HARQ schemes are compared over a FSMC using a theoret-ical method. Using multi-state Markov error structure, analytical throughput estimation meth-ods for adaptive modulation systems combined with ARQ schemes in correlated slow-fading channels are presented in [6]. In [58], a type-II HARQ system with a finite size receiver buffer is analyzed over a two-state Markov channel using rate 1/2 convolutional code and truncated HARQ. Adaptive transmissions are powerful techniques to compensate for channel variations. In [59], throughput maximization is analyzed with a finite number of power levels and code rates. An error-recursion approach is developed in [60] to mathematically analyze the through-C H A P T E R 1. I N T R O D U C T I O N 11 put, delay, and energy efficiency of reactive and proactive rate-adaptation techniques over fad-ing channels with arbitrary correlations between retransmissions. Using Reed-Solomon codes, the performance tradeoff of throughput and latency for IR schemes is predicted quantitatively. In [61], a combined link adaptation and incremental redundancy protocol, which adjusts the starting code rate based on the channel condition for enhanced data transmission, is proposed. The proposed method is a tradeoff between delay and throughput, and minimizes the number of retransmissions at the cost of reduced throughput, compared with a pure incremental redun-dancy scheme. In [62], descriptions and simulation results for the asynchronous and adaptive HARQ schemes for 3G evolution, e.g., HSDPA in 3GPP and lxEV-DV in 3GPP2, are pro-vided. In [63], a model for wireless downlink data transmission with hybrid ARQ is studied that takes into account both user-channel conditions and retransmissions. A scheduling rule that minimizes the total average cost, where the cost function assigned to each user depends on the queue length and the number of transmissions for the head-of-line packet, is found by transforming the problem into the Klimov framework. The authors consider two scenarios: (1) the cost functions are linear, and packets arrive to the queues according to a Poisson pro-cess and (2) the cost functions are increasing, convex, and there are no new arrivals (draining problem). Several heuristic myopic scheduling policies are also compared with optimal fixed priority policy. An overview and simulation data of the hybrid ARQ used by the 3G CDMA evolutions (lxEV-DV for CDMA2000 and HSDPA for WCDMA) is provided in [42]. 1.2.3 Adaptive Coded Modulation Adaptive coded modulation (ACM) is a promising tool for increasing the spectral efficiency of time-varying mobile channels while maintaining a predictable bit-error rate (BER) [64]. Trellis and lattice codes, which are special cases of coset codes, are particularly well suited for adap-tive coded modulation, since the code design and modulation design are separable. Adaptive coded transmission using trellis-coded M - Q A M is discussed in [8] and shown to have an effec-tive coding gain of 3dB relative to uncoded adaptive M Q A M for a simple four-state trellis code. C H A P T E R 1. I N T R O D U C T I O N 12 Performance of an adaptive trellis-coded modulation scheme for airlinks in fully loaded urban micro-cellular networks is analyzed in [65]. Each airlink is degraded by shadowed Nakagami multipath fading, interference from other airlinks and signal path loss. Average link spectral efficiency and average area spectral efficiency approximations are also given for a fully loaded network. In [64], the effects of predicting the CSI with a linear fading-envelope predictor in order to enhance the performance of an adaptive coded modulation system is investigated using multidimensional trellis codes over the Rayleigh fading channels. In [66], the performance of an adaptive trellis-coded modulation system, where receive antenna diversity is implemented by means of maximal ratio combining, is analyzed over the Rayleigh fading channels in the presence of estimation and prediction errors. An optimal adaptation strategy based on turbo-coded modulation schemes over the Rayleigh flat-fading channels for maximizing throughput with a given BER under an average power constraint is considered in [67]. In [68], adap-tive trellis-coded modulation schemes are designed for scenarios where neither the Doppler frequency nor the exact shape of the autocorrelation function of the channel-fading process is known. Such schemes use only a single outdated fading estimate, which can provide a significant increase in bandwidth efficiency over its nonadaptive counterpart on time-varying channels. A forward error correction (FEC) strategy and a medium access control (MAC) protocol that support a high-speed asymmetric physical-layer design based on equalization and precoding are presented. The adaptive FEC algorithm is based on the use of variable-rate trellis coded modulation with fast channel estimation, while the MAC protocol employs a centralized, dynamic slot allocation technique. 1.2.4 Cross-layer Analysis The studies discussed in the previous sections optimize only the physical-layer performance parameters. They literature assume that the transmission buffer is of infinite length and there is always a packet available in the buffer for transmission. Recently, it has been realized that the performance of wireless networks can be significantly improved by adapting system parameters C H A P T E R 1. I N T R O D U C T I O N 13 with higher-layer parameters in addition to physical-layer time-varying channel gain. There-fore, the design of wireless protocols that rely on the interactions between various layers has received a significant attention in the wireless research community and become an active area of promising research. A cross-layer design that combines adaptive modulation and coding at the physical layer with a truncated ARQ at the data-link layer is described in [69]. The authors derive the expression for average spectral efficiency over the Nakagami-m block fading chan-nels in order to maximize system throughput under prescribed delay and error performance constraints. Queuing analysis for adaptive modulation and coding over the Nakagami-m wire-less links is given by the same authors in [70] to optimize packet error rate, packet loss rate and average throughput. In [71], the authors develop a cross-layer design for multiuser schedul-ing at the data-link layer by classifying users into QoS-guaranteed and best-effort users, with each user employing adaptive modulation and coding at the physical layer. In [72], the authors have addressed some of the design issues associated with the choice of the modulation and coding scheme used for transmission, given that an ARQ scheme is being used. An approach that optimizes the mapping between signal-to-interference and noise ratio (SINR), as well as a modulation and coding scheme (MCS) that maximizes throughput by taking into account the type of HARQ scheme employed, are proposed. The author also proposes incorporat-ing frame error rate (FER) and retransmission information as part of the scheduling decision. In [73], a medium access control technique that efficiently manages adaptive modulation and coding to achieve maximum channel capacity for 3GPP's HSDPA scheme is presented. The authors adapt transmission parameters with channel condition only, and give queueing anal-ysis for three different traffic models, namely, Poisson arrival, batch arrivals with modified geometric message length and batch arrivals with Pareto message length. In [74], the radio link-level delay statistics in a wireless network using adaptive modulation and coding (AMC), weighted round-robin (WRR) scheduling, and automatic repeat-request-based error control is analyzed. A framework for cross-layer management of packet-data transmissions in MIMO systems employing orthogonal space-time block coding (STBC), adaptive M - Q A M and trun-C H A P T E R 1. I N T R O D U C T I O N 14 cated automatic repeat request (T-ARQ) over the Nakagami fading channels is provided in [75]. Closed-form expressions of the average packet error rate, the average spectral efficiency and the outage probability are derived over an equivalent SISO channel model for the system that maximizes the system's spectral efficiency under prescribed delay and error-rate constraints. The authors provide the closed-form expressions of the bit error rate for M - Q A M and M-PSK in [76] over the Nakagami MIMO fading channels employing orthogonal STBC. Expressions for the Shannon capacity achieved by a transmit-diversity scheme over the MIMO Rayleigh fading channels under adaptive transmission and channel-estimation errors is given in [77]. In [78], the authors exploited the single-input single-output equivalency of orthogonal space-time block coding in order to analyze its performance over a nonselective Nakagami fading channels in the presence of spatial fading correlation. Exact symbol-error probability of coher-ent M-PSK and M - Q A M are derived. The cross-layer designs given above do not consider the inherently dynamic nature of the adaptation problem. Because of randomness of the channel gain and the incoming traffic, the state of the system is dynamic. Therefore, the nature of true scheduling policy should be dy-namic. Early work in [79] developed a dynamic programming framework for transmission policies over a simple two-state Gilbert-Elliott channel model, with constraints on average de-lay and peak power. The author considers critical backlog transmission policies and shows by numerical study that critical backlog policies are near optimal. In [80], the issue of opti-mal trade-off between the average power and average delay for a single user communicating over a memoryless block-fading channel has been addressed from an information-theoretic standpoint. The authors discussed two models: one corresponding to fixed-length/variable-rate codewords, and the other corresponding to variable-length codewords. The behavior of this tradeoff is quantified in the regime of large delay, and the connection to the delay-limited capacity and the expected capacity of fading channels is also discussed. Convexity and the decreasing property of the optimal power/delay trade-off curve for a finite-buffer single-user system was established in [81] for Gaussian channels and uniformly distributed bursty i.i.d. C H A P T E R 1. I N T R O D U C T I O N 15 traffic. In [82], the utility of packet scheduling for i.i.d. traffic over both AWGN channel and fading channels is demonstrated. Convexity properties and characterization of the delay-power region of different schedulers (e.g., zero-outage schedulers, power-efficient schedulers, sched-ulers guaranteeing absolute delay bounds) are discussed. In [83], an unconstrained Markov decision process formulation for minimizing the average transmission power and packet loss rate for M - Q A M system is presented. Poisson traffic is considered and the structure of the optimal policy is discussed. In [84], the multiuser cross-layer adaptive transmission prob-lem is decoupled into an equivalent number of single-user problems, so that optimal adaptive single-user policies can be used. In [2], the authors consider several basic cross-layer resource allocation problems (e.g., the transmission rate, power assigned to each user) for wireless fad-ing channels. The fundamental performance limits with higher-layer QoS (such as delay) are characterized in the survey. The authors of [85] have extended some structural results given in [80] to developed optimal policies that minimize mean delay subject to an average power constraint for an independent and identically distributed (i.i.d.) channel. The existence of a stationary average optimal policy is demonstrated. Rate and power-control strategies for transferring a fixed-size file successfully over fading channels under constraints on both transmit energy and average transmission delay are dis-cussed in [86]. The author considers two delay constraints: average delay constraint and strict delay constraint. Performance degradation caused by imperfect (delayed or erroneous) chan-nel knowledge is also investigated. An optimization-based suboptimal scheduler is described in [87] for an i.i.d block-fading channel. The authors optimized the average power subject to average delay and packet loss rate constraints. The transmission policy of the suboptimal threshold scheduler is determined by transmission rate threshold, channel state threshold and transmission buffer size. An offline packet scheduling scheme for an additive white Gaus-sian noise channel model has been analyzed in [88,89], with the goal of minimizing energy, subject to a deadline constraint. An online lazy packet scheduling algorithm is also devised that varies transmission time according to backlog. It is shown to be more energy efficient C H A P T E R 1. I N T R O D U C T I O N 16 than a deterministic schedule with the same queue stability region and similar delay. Optimal transmission scheduling with energy and deadline constraints for a satellite transmitter is given using a Shannon capacity equation in [90,91]. The authors consider two optimization prob-lems: first, maximization of average data throughput under a given number of time-slots and a fixed amount of energy, and second, minimization of expected energy to send a fixed amount of data given deadline constraints. Dynamic programming methods are employed to determine the optimal policy. A closed-form optimal policy is given for the special case of a piece-wise linear energy-throughput relationship. The analysis is given for both the known channel quality case and the unknown channel quality case. Power adaptation strategies for delay-constrained channels are investigated in [92] to maximize expected capacity and minimize outage capacity under both short-term and long-term power constraints over i.i.d. flat block-fading channels. It is assumed that the channel state information is fed back to the transmitter in a causal man-ner. In [93], two energy-efficient delay-constrained packet scheduling problems over AWGN channels are given. In the first problem, an optimal off-line scheduling scheme with deadline constraint is analyzed and extended to take into account energy recovery of the battery. In the second problem, optimal packet scheduling given an average delay constraint is considered, using a constrained dynamic programming method. The authors then incorporate a simple energy-recovery model for the battery into the problem and provide some heuristic methods of devising energy-efficient transmission schemes. Poisson-distributed traffic is considered in the work. However, the paper does not consider fading of the wireless channel, nor practical coding and transmission schemes. 1.2.5 Transmission with Partial Observability of CSI As mentioned earlier, because of the random variation of the channel, SNR errors occur in bursts. Hence, sometimes the perfect state of the current channel cannot be accurately pre-dicted at the receiver and may not be available at the transmitter while transmitting. Rather, the packet error feedback signal is received from the receiver after the packet has been transmitted. C H A P T E R 1. I N T R O D U C T I O N 17 In the following literature, no perfect CSI is assumed at the transmitter. In [94], the authors formulate the decision process of determining transmission (when to attempt or suspend) over a Gilbert-Elliott fading channel as a partially observable Markov decision process. The optimal policy for the throughput vs. energy efficiency tradeoff problem is derived for a time horizon less than or equal to thirteen, and shown to be a threshold rule that varies with the memory present in the channel error process. A suboptimal, limited-lookahead implementation of this policy is also simulated and its performance compared with the persistent retransmission and probing protocols. In [95], the work of [94] is extended by studying the impact of various feed-back structures and the effect of channel memory on the performance, design, and structure of this scheme. In both works, the channel state is not directly observable, thus the transmission decisions must be based on ACK/NAK information provided over a feedback channel. The authors extended their work in [96] by adapting transmission power along with the decision on when to attempt a transmission. The optimal transmission scheme has been interpreted as a back-off rule: at the end of the transmission, depending on the channel quality during the past time-slot, the transmission may be suspended for some time-slots. A formulation of the opportunistic file-transfer problem using a Stop-and-Wait ARQ transmission protocol over a two-state Gilbert-Elliott fading channel is given in [97]. The optimal tradeoff between the transmission energy and the latency is formulated as a partially observable Markov decision process. As POMDP problems require exponential computational complexity and memory, the problem is then reformulated as a Markovian search problem, with optimal threshold con-trol policies that are threshold in nature. A limitation of the threshold policy result presented in [97] is that it is no longer optimal when the channel has more than two states or the trans-mission power has multiple levels. In [98], a theoretical analysis of a combined optimization of the scheduling layer with the physical layer is given. The channel is modeled with a hid-den Markov model, and the solution of the resulting optimization problem is shown to be a partially observable Markov decision process. With the aid of an information vector, the prob-lem is converted to a fully observable MDP problem. A linear programming methodology is C H A P T E R 1. I N T R O D U C T I O N 18 indicated for the optimal solution of the problem, where discretization of H M M states is as-sumed. However, no discretization techniques and results are shown, probably due to large state space. In [99], the problem of buffer and channel adaptive transmission for maximizing system throughput under average transmission power constraint is studied over the Rayleigh fading channels with perfect and imperfect channel state information. The authors have given dynamic programming solution techniques for both problems. In the second problem, delayed channel state information or erroneous channel state information is assumed and is formed as a partially observable Markov decision process problem. It is solved by using an approximate finite number of information channel states with dynamic programming algorithms. 1.2.6 Nature of Incoming Traffic As mentioned above, the future-generation wireless networks will support a wide variety of incoming traffic having different QoS requirements, such as constant-bit-rate (CBR) real-time traffic such as voice, variable-bit-rate (VBR) real-time traffic such as video streaming, tele-conferencing, games, etc as well as CBR or VBR best-effort data traffic such as file trans-fer, web browsing, messaging, etc. Some traffic, such as interactive data and compressed video, is highly bursty, while other traffic, such as large files, is continuous. Therefore, in-corporation of traffic models into the analysis is important for packet scheduling in wireless networks [100-102]. Since incoming traffic in wireless networks is non-constant and ran-dom in nature, the distribution of the traffic plays an important role in determining the buffer state for systems with a finite-size transmission buffer. In this section, we survey some com-monly used traffic models in the literature. Simple constant traffic consists of a sequence of single arrivals of discrete packets. Compound traffic consists of batch arrivals; that is, ar-rivals may consist of more than one packet at an arrival instant. To describe fully compound traffic, one also needs to specify a non-negative random sequence {Bi}^, where Bi is the random number of packets in the batch. Measurements of traffic in voice and data systems have shown that in a wide range of applications, bursty traffic generation can be modeled as C H A P T E R 1. I N T R O D U C T I O N 19 a Poisson process [103]. A Poisson process is characterized as a counting process, satisfying pa{an = O i } = e x p ( ~ A ^ f ) ( A T g ) ' , i = 0,1, • • •, and where the number of arrivals in disjoint intervals is statistically independent. The Poisson process is very popular in the literature [88], because of its elegant analytical properties such as superposition and memoryless-ness. An-other popular traffic model for analysis at the data-link layer [104] is the Bernoulli-distributed traffic model, where the arrival probability p in a particular time-slot n is independent of any other time-slot. It follows that for slot n, the corresponding number of arrivals is binomial, p{n = k} — (£)p f c(l — p)n~k, k between 0 to n. Whereas the time between arrivals for Poisson traffic is exponential, it is geometric for Bernoulli arrivals. Compound Poisson and Bernoulli processes are defined in the same way as compound traffic described above. Some of the key traffic supported in the next-generation wireless networks, such as com-pressed video, file transfer etc., exhibits "burstiness". A Markov traffic model can potentially capture the burstiness of such traffic, because of nonzero correlations among batch sizes [105]. In the Markov traffic model, the activities of a source can be modeled by a finite number of states, and the probability of the next state depends only on the current state. If the state transi-tions occur at integer values, the Markov chain is discrete-time and the time spent in a state is geometrically distributed [106]. Markov-modulated models constitute an extremely important class of traffic models, where an auxiliary Markov process is evolving in time and its current state controls (modulates) the probability distribution of the traffic [105]. The most commonly used Markov modulated model is the Markov modulated Poisson process (MMPP) model, which combines the simplicity of the modulating (Markov) process with that of the modulated (Poisson) process [107]. The interrupted Poisson process (IPP) model is a two-state MMPP, where one state is an "ON" state with an associated positive Poisson rate, and the other is an "OFF" state, with an associated rate of zero. Actually, MMPP is a special case of the batch Markov arrival process (BMAP). For BMAP, the arrival is compound and each arrival contains a random number of traffic with probability distribution {Bn} [103,108]. Another important special case of BMAP is the switched-batch Bernoulli process (SBBP), where the phase pro-C H A P T E R 1. I N T R O D U C T I O N 20 cess is a Markov chain with a finite number of states. The arrival process is a Bernoulli batch process characterized by a probability density function (pdf) depending on the state of the phase process; this case is introduced in [109] for non-controlled MPEG video sources. In [110], it has been reported that actual network traffic is self-similar and long-range dependent in na-ture, and can be modeled with Pareto distribution. However, the computational complexity associated with self-similar traffic is extremely high, due to long-range dependence [109]. 1.2.7 Memory of Fading Channels Because of the inherent memory of the radio channel, the channel quality shows a signifi-cant correlations among consecutive states [6]. However, in the literature most of the models for the performance analysis of packet transmission protocols have assumed that the channel states are independent and identically distributed. Also, many protocols and coding schemes have been designed for an independent and identically distributed channel, and techniques have been developed to eliminate channel memory (e.g., by channel interleaving). In fact, turning a channel with memory into a memoryless one by interleaving is not necessarily an efficient way of using it, since the interleaving operation introduces complexity and may substantially reduce the channel capacity [111, 112]. It has recently been shown that being able to cap-ture the memory of the wireless channel is very important in order to accurately assess the performance of a wireless system [7]. A natural way to model a channel with memory is to approximate it by means of a Markov model. There is a extensive body of literature dealing with the representation and analysis of bursty error channels using simple Markov models. The classical two-state Gilbert-Elliott model with a good state having small error probability and a bad state having larger error probability for burst noise channels has been widely used and analyzed [113,114], In an other variation of the two-state model, the transition between good and bad states is allowed; errors do not occur in good states and occur with a probabil-ity of 1 in bad states [104,115,116]. In some cases, modeling a radio channel as a two-state Gilbert-Elliott model is not adequate when the channel quality varies dramatically (e.g., with C H A P T E R 1. I N T R O D U C T I O N 21 fast Doppler spread) [117]. In [28], a multi-state quasi-stationary Markov channel model is used to characterize the wireless non-stationary channel. The model is formed based on exper-imental measurements of some real channels. The finite-state Markov channel (FSMC) model for the Rayleigh fading is built by partitioning the instantaneous received signal-to-noise ratio (SNR) into a finite number of non-overlapping states [117,118]. For packet/block level com-munications, when the block length is large, this first-order FSMC has been shown to be much more accurate than the two-state model, and is popular due to its good balance between accu-racy and complexity [7,119,120]. In [117], the received SNR of the channel is partitioned into a finite number of states so that the stationary probability associated with all the states is equal. A theoretical approach is conducted to show the usefulness of FSMC compared with that of two-state Gilbert-Elliott channels. The second-order statistics of the received SNR is used to approximate the Markov transition probabilities. The validity and accuracy of the model are confirmed by the state equilibrium equations and computer simulation. Optimization of the re-ceived SNR thresholds to minimize the mean square error (MSE) of the state BER value using least square quantization and Lloyd-Max algorithm is proposed in [121]. The authors analyze the performance of the concatenated hybrid ARQ scheme, comprising Reed-Solomon code and rate-compatible punctured convolutional code for low bit-rate video packet transmission over wireless channels. The experimental results were found to agree reasonably well with the simulation results. The authors of [122] used an equal duration method for partitioning the received SNR. In this method, the average fade durations for all the partitioned states are equal. When the parameter values are chosen appropriately, this method is shown to give better representation of the actual fading channels. A Rician channel is modeled with a ifth-order Markov channel in [123]. The parameterization and accuracy of Gilbert-Elliott channel mod-els are also investigated. Of all the papers mentioned above, FSMCs are mostly designed to model flat fading channels without employing any diversity-combining techniques or equal-ization techniques at the receiver. On the other hand, diversity is taken into consideration in [124], and a FSMC is designed for a Rayleigh fading channel by partitioning the received C H A P T E R 1. I N T R O D U C T I O N 22 combined SNR at the output of a selection-combining (SC) diversity receiver. In [125], the authors extend the use of FSMCs to the Nakagami-m fading channels with diversity using SC, maximal ratio combining (MRC) and equal gain combining (EGC). In this thesis, we ex-tend the use of FSMC to fading channels with an equalizer at the receiver and with multiple transmit and multiple receive antennas. Second-order statistics (e.g., the average fade duration, the level crossing rate) over the Nakagami-m fading channels without diversity techniques, and with diversity-combining techniques at the receiver (i.e., pure selection-combining, equal-gain-combining and maximal-ratio-combining diversity) are given in [126], [127] and [128], respectively. In [129], an analytical methodology is described for evaluating the average level crossing rate and the average outage duration of a generalized-selection-combining (a hybrid form of diversity that combines selection-combining with maximal-ratio-combining) for inde-pendent identically distributed Rayleigh fading channels. The time-varying fading channels are modeled as a hidden Markov model (HMM) in [130]. HMMs can accurately model the channels with memory, and are general enough to capture various statistical properties [130] (e.g., autocorrelation function, level-crossing rates, etc.) of a wide range of practical fading channels. In [108], a three-state H M M of the packet error process is used to analyze packet queue length and packet delay distribution over a slow Rayleigh fading channels. In [131], sev-eral soft-input/soft-output (SISO) equalization algorithms based on the minimum mean square error (MMSE) criterion are explored. It is shown in [131] that for the turbo equalization appli-cation, the MMSE-based SISO equalizers perform well compared with a MAP equalizer, while providing a tremendous reduction in complexity. 1.3 Thesis Statement In a wireless network, the wireless terminals usually rely on a battery with a limited amount of energy. Therefore, minimizing of transmission power can lead to more efficient utilization of battery energy and hence longer battery. By delaying transmission and storing packets C H A P T E R 1. I N T R O D U C T I O N 23 in the buffer, transmission power can be saved. However, different users can have different QoS (i.e., delay, packet-dropping, packet error rate, bit error rate, etc.) requirements, and allowing excessive delays can result in buffer overflows and hence packet-dropping in practical systems with finite-size buffers. Most of the adaptive transmission schemes at the physical layer maximize long-term throughput by adapting channel state only. These schemes assume that there is always sufficient information data waiting to be transmitted and/or consider an infinite storage buffer, and do not consider arrival statistics. However, from a practical point of view, adapting policy by neglecting packet arrival statistics and current buffer state is unrealistic and does not optimize overall system performance, since the size of the transmission buffer for practical systems is finite. Hence, the buffer may not have packets all the time, and the buffer may be full at other times, due to nonconstant and nondeterministic arrivals. With a finite-size buffer, random arrivals and random channel gains, packet overflows and packet-dropping probability cannot be neglected. Again, reducing transmission power at the physical layer may result in higher-error rates or lower transmission rates, which affect network layer performance. The main focus of this thesis is cross-layer optimization, where physical-layer parameters are combined with data-link/network-layer parameters. These parameters are adapted to chan-nel conditions, buffer occupancy and input traffic to minimize delay, delay jitter, transmission power, BER, packet error rate, packet-dropping rate, etc., and to maximize throughput, etc. Since these objectives sometimes conflict, we consider a tradeoff between them and finding the optimal policy by forming the problem as an average-cost Markov decision process (MDP) and solving the problem with dynamic programming (DP) techniques. In all of our work, we consider a practical finite-size buffer, random arrival statistics and correlation of the fading channels. Both unconstrained and constrained MDP formulation are considered. Also, we find a computationally less expensive suboptimal policy that avoids the two curses of DP (the curse of modeling and the curse of dimensionality). We also investigate the effect of different receive and transmit diversity techniques on the performance of adaptive transmission. Instead of considering adaptive coding at the physical layer and ARQ at the data-link layer C H A P T E R 1. I N T R O D U C T I O N 24 separately, in cross-layer design these two layers can be combined. Hence, the stringent error control requirement at the physical layer is alleviated, and the number of retransmissions at the data-link layer is reduced. Therefore, in an adaptive type-I HARQ system, the system performance can be improved by adapting parameters with both layers. The actual state of the channel is hidden for the HARQ system, but can be observed through ACK/NAK feedback from the receiver. As mentioned above in Section 1.2.2, all of the studies reviewed change the operation mode and correspondingly adapt the coding rate with ACK/NAK feedback in some ad hoc way. They consider that packets are always available in the buffer, and neglect buffer queueing delay and packet-dropping. However, from the practical point of view, queueing delay and packet-dropping should be considered for retransmission systems where packets are stored in the finite-size buffer before transmission. In our work, we adapt transmission parameters (e.g., power, coding rate) of type-I HARQ systems with channel conditions, input arrivals and number of packets in the buffer to optimize different cross-layer system objectives, as mentioned above. Since the state of the system is partially observable, the problem can be formed as a partially observable Markov decision process (POMDP) problem. We give solutions from both equivalent history dependent MDP and equivalent belief dependent MDP for the formulated POMDP problems. The incremental redundancy hybrid ARQ has been proposed and used in different wireless standards, and has been shown to have better performance than chase combining. We inves-tigate different scheduling techniques for an incremental redundancy HARQ system, using a semi-Markov decision-process framework. In our work, we consider the adaptation of both the modulation and coding together at the physical layer with the higher-layer parameters, which is one of the key technologies used in 3G evolution standards. C H A P T E R 1. I N T R O D U C T I O N 2 5 1.4 Contributions O p t i m a l a n d S u b o p t i m a l S c h e d u l i n g o v e r R a y l e i g h F a d i n g C h a n n e l s ( [ 1 3 2 , 1 3 3 ] ) : We investigate optimal packet scheduling over correlated Rayleigh fading channels involving trade-offs between the minimization of three goals: average transmission power, average delay and average packet-dropping probability. Both the information-theoretic and the practical M - Q A M models are discussed. For the information-theoretic model, we present three models for com-puting transmission power. The first model gives the power upper bound, where power is calculated using the lower received SNR threshold. The second and third models give the aver-age value of the power, where power is found using the ergodic capacity notion and average re-ceived SNR, respectively. For the M - Q A M scheme, the power cost is evaluated for two models for a particular BER using the worst and average channel BER, respectively. We show that the problem forms a weakly communicating Markov decision process and formulate the problem both as an unconstrained Markov decision process (UMDP) problem and a constrained Markov decision process (CMDP) problem. A relative value iteration (RVI) algorithm is used to find the optimal deterministic policy for the unconstrained problem, while the optimal randomized policy for the constrained problem is obtained using a linear programming (LP) technique. The benefits and drawbacks of these two models are discussed. Whereas with RVI only a finite number of scheduling policies can be obtained over the feasible delay region, LP can produce policies for all feasible delays with a fixed dropping probability, and is computationally faster than the RVI. We show the structure of optimal deterministic policy as a function of the chan-nel state and buffer state, and form a simple logarithmic functional suboptimal scheduler that approximately follows the optimal structure of the policy. Performance results are given for both constant and bursty Poisson arrivals, and the proposed suboptimal scheduler is compared with optimal and channel threshold scheduler. Our suboptimal scheduler performs close to that of optimal scheduler for every feasible delay, and is robust to different channel parame-ters, as well as a different numbers of actions and incoming traffic distributions. Our proposed suboptimal scheduler also outperforms the channel threshold scheduler. C H A P T E R 1. I N T R O D U C T I O N 2 6 O p t i m a l S c h e d u l i n g o v e r N a k a g a m i - m F a d i n g C h a n n e l s w i t h D i v e r s i t y o n R e c e i v e ( [134 , 135 ] ) : We study two cross-layer optimization problems for M - Q A M systems over diversity Nakagami-m fading channels. In both schemes, the scheduler adapts the transmission rate to the channel state and buffer occupancy. We consider two diversity-combining techniques at the receiver, namely selection-combining and maximal-ratio-combining. We formulate both problems as constrained Markov decision process problems and provide linear programming-based solutions. In the first problem, our objective is to minimize average transmission power under constraints on average delay and packet-dropping probability. We minimize average bit error rate (BER) with average delay and packet-dropping probability constraints in the second problem. The Nakagami-m fading channel with diversity combining is described as a finite-state Markov channel. The incoming traffic is assumed to have Poisson distribution. Simulation results show that system performance can be improved by adapting rate to buffer state, hence delaying packets in the buffers in addition to employing diversity combining at the receiver. A d a p t i v e M o d u l a t i o n f o r M I M O C h a n n e l s w i t h S R - A R Q ( [ 136 ] : A rate-adaptive M -Q A M system that maximizes throughput and minimizes packet error rate, delay and overflow over multiple-input, multiple-output Nakagami-m fading channel is analyzed using the frame-work of a Markov decision process. We study the system from the cross-layer point of view by considering a finite-size buffer and random incoming traffic arrivals. Transmit diversity and adaptive M - Q A M are employed in the physical layer, and SR-ARQ is employed in the data-link layer. The fading channel is assumed to be Nakagami-m, and is modeled with a finite-state Markov channel. Transmit diversity is achieved with orthogonal space-time block code. To schedule packets, the transmitter depends on both the buffer state and channel state information, and selective-repeat ARQ is used at the data-link layer for error detection. Simu-lation results show that the throughput maximization depends not only on the received SNR but also on the number of actions and incoming traffic statistics. Increasing the number of action does not necessarily increase the throughput. C H A P T E R 1. I N T R O D U C T I O N 2 7 J o i n t R a t e a n d P o w e r A d a p t a t i o n f o r T y p e - I H y b r i d A R Q S y s t e m s ( [ 1 3 4 , 1 3 7 ] ) : A gen-eral framework for simultaneous rate and power adaptation of type-I hybrid ARQ systems is studied. This framework can be applied to the adaptive resource allocation problem on cor-related flat-fading or frequency-selective fading channels for bursty non-constant packet ar-rivals. The optimal rate and power control policy can be obtained by solving the formulated weakly communicating Markov decision process. We consider two cases of the problem. In the first, we assume that the transmitter knows the channel state perfectly at the beginning of the transmission. The transmitter is also provided with the decoding results for the previous transmission, in terms of observation feedback at the end of the transmission. Second, we consider the scheduling problem when the transmitter does not know the channel state at the time of transmission, but makes the transmission decision based on the history of previous transmission decisions and corresponding outcomes. In both cases, our objective is to min-imize transmission power, and the optimal policies are computed under two different buffer cost constraints, namely, the average buffer delay and the average packet overflow rate. Simu-lation results for both cases are shown over both the frequency-flat and the frequency-selective channels, compared. S c h e d u l i n g w i t h P a r t i a l l y O b s e r v a b l e C h a n n e l S t a t e I n f o r m a t i o n ( [ 1 3 8 , 1 3 9 ] ) : We ad-dress the issue of optimal coding rate scheduling for adaptive type-I hybrid automatic repeat-request wireless systems. In this scheme, the coding rate is varied depending on channel, buffer and incoming traffic conditions. In general, we consider the hidden Markov model for both time-varying flat fading channels and bursty correlated incoming traffic. We shown that the appropriate framework for computing the optimal coding rate allocation policies is a par-tially observable Markov decision process (POMDP). In this framework, the optimal coding rate allocation policy maximizes the reward function, which is a weighted sum of throughput and buffer occupancy with an appropriate sign. Since a polynomial amount of space is needed to calculate the optimal policy for even a simple POMDP problem, we investigate maximum-likelihood, voting and Q-MDP policy heuristic approaches for the purpose of efficient and C H A P T E R 1. I N T R O D U C T I O N 2 8 real-time solution. Our results show that three heuristics perforin close to that of completely observable system state case if the fading and/or traffic state mixing rate is slow. On the other hand, when the channel fading is fast, the Q-MDP heuristic is the most throughput-efficient among considered heuristics. Also, its performance is close to that of optimal coding rate al-location policy of the fully observable system state case. We also explore the performance of the proposed heuristics in the bursty correlated traffic case and show that maximum-likelihood and voting heuristics consistently outperform the non-adaptive case. A d a p t i v e C o d i n g a n d M o d u l a t i o n o v e r P a r t i a l l y O b s e r v a b l e C h a n n e l ( [140] ) : We ex-amine the coding and modulation rate adaptation problem for HARQ systems with a partially observable state from the cross-layer viewpoint. The rate of convolutionally coded M - Q A M is adapted jointly to buffer state and channel state. We assume that perfect channel state in-formation is not known at the transmitter, but can be estimated from previous actions and observations. The underlying correlated channel is assumed to have a Nakagami-m distribu-tion and it is modeled as a finite-state Markov channel. Throughput efficient selective-repeat ARQ is employed at the data-link layer to control packet retransmission in the event of de-coding failure. A POMDP-based approach is utilized to formulate the problem, where average throughput is maximized, and average delay, packet error rate and overflows are minimized. To solve the cross-layer adaptation problem approximately, we discuss two heuristic-based meth-ods and compare their applicability to the considered problem by simulation, using the case of a completely observable channel state. S c h e d u l i n g T e c h n i q u e s f o r I n c r e m e n t a l R e d u n d a n c y H A R Q ( [ 1 4 1 , 1 4 2 ] ) : Incremental redundancy hybrid automatic repeat request (IR-HARQ) schemes are proposed in several wire-less standards for increased throughput efficiency and greater reliability. We investigate the transmit-power and modulation-order adaptation strategies for IR-HARQ schemes over the correlated Rayleigh fading channels. In order to jointly analyze the physical layer and link layer, the transmitter model incorporates a finite-size buffer that receives randomly varying C H A P T E R 1. I N T R O D U C T I O N 29 traffic from a higher-layer application. It is assumed that channel variations can be modeled with a first-order Markov chain. We show that the optimal transmission power and rate adap-tation law under the buffering delay and packet overflow constraints can be obtained using the framework of a semi-Markov decision process. We discuss three different adaptation models for the IR-HARQ schemes and compare their performances with the non-adaptive scheme. We show that a unique optimal policy exists for each case that can be computed using a linear programming approach. 1.5 Thesis Outline The outline of the remainder of this thesis is as follows. In Chapter 2, we address the issues surrounding optimal and suboptimal scheduling over the Rayleigh fading channels. The con-tribution of this chapter is extended in Chapter 3 with the inclusion of receiver diversity over the Nakagami-m fading channels. Also, two different objectives are analyzed. In Chapter 4, the rate adaptation problem is given for MIMO channels, which maximize throughput instead of minimizing power and BER, as do in the previous two chapters. We also incorporate the SR-ARQ protocol to obtain feedback about decoding results. Joint rate and power adapta-tion for type-I HARQ systems is given in Chapter 5 for both perfect and delayed CSI cases. Scheduling over partially observable Rayleigh fading channels is studied in Chapter 6, and ex-tended in Chapter 7 with the inclusion of adaptive coding and modulation, as well as SR-ARQ over the Nakagami-m fading channels. Scheduling techniques for IR-HARQ systems over the Rayleigh fading channels are discussed in Chapter 8. Finally, in Chapter 9, we summarize our contributions in this thesis and identify some future research directions. C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 30 CHAPTER 2 .Optimal and Suboptimal Packet Scheduling 2.1 Introduction Decision making in wireless networks is crucial due to their limited resources. Because of the time-varying fading of wireless channels, the decision about the transmission rate and power to be used depends primarily on channel conditions. Also, the randomly varying incoming traf-fic plays an important role in scheduling wireless resources, as practical systems are equipped with a finite storage buffer. Therefore, optimal channel-adaptive schemes may not optimize the system as a whole. Because of the randomness of incoming packets and channel gains, exact scheduling policies cannot be determined using static optimization techniques. Since the packet scheduling problem is inherently dynamic in nature, we use stochastic dynamic pro-gramming algorithms to determine the optimal adaptation policies. Dynamic programming methods are very popular decision-making techniques in the fields of artificial intelligence, robotics and business. In this thesis, we use dynamic programming techniques to make deci-sions on transmission that consider both physical layer and data-link layer optimization. In this chapter, we present a unified cross-layer sequential stochastic optimization of a single-user communication system over a realistic correlated fading channel with Rayleigh distributions. We consider minimization of average power, with constraints on average delay C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 31 and average packet-dropping probability over a range of fading rates. We investigate for differ-ent incoming traffic models as well as different theoretical and practical transmission schemes. A brief summary of our contribution is presented below: • We formulate the cross-layer optimization problem over a correlated fading channel, considering a finite-size transmission buffer. Both the information-theoretic and practical M - Q A M models are discussed. For each of these transmission models, we analyze two different methods for determining the upper bound of the transmission power and the average transmission power. • The structural property of the adjoined Markov decision process (MDP) with average cost criteria is analyzed and shown to in general constitute a weakly communicating MDP problem. • Two approaches for dealing with these weakly communicating MDP problems, namely unconstrained Lagrangian formulation and constrained formulation, are introduced and their benefits and drawbacks are compared. We also discuss using a linear-programming (LP) approach to find optimal randomized policies for the weakly communicating con-strained MDP (CMDP) problem. • Our formulation includes constraints on both average delay and average packet-dropping probability, and encompasses both packet-dropping and non-dropping cases, using ap-propriate buffer costs. This model is most suitably solved using the constrained MDP formulation with linear programming technique. • A novel log-scheduling policy that avoids the need for optimization is introduced and shown to be more suitable for correlated channels than the channel threshold policy over all allowable delays. • We present and discuss extensive numerical results for all the mentioned frameworks with constant and bursty Poisson arrivals. C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 32 Note that optimal policies are computed off-line using dynamic programming techniques and these policies are stored in the memory of the wireless terminals. After determining the instan-taneous state of the system, the optimal policy is applied to take decision on the transmission power, rate, etc. The remainder of the chapter is organized as follows. The system model is described in Section 2.2. In this section, we describe the operation of the proposed system, including incom-ing traffic model, buffer model and channel model. We analyze different packet transmission schemes in Section 2.3. In Section 2.4, the optimization objectives of our cross-layer adap-tation schemes are discussed and corresponding costs are given. In Section 2.5, we give two formulations. We formulate the trade-off between average power, average delay and average packet-dropping probability as an unconstrained problem as well as a constrained MDP prob-lem. The optimal stationary deterministic policies for unconstrained problem and randomized policies for constrained problem are given using iterative dynamic programming algorithms (e.g., relative value iteration, policy iteration) and linear programming techniques, respectively. In Section 2.6, we discuss a suboptimal scheduler with logarithmic dependency on the buffer and channel states. We provide simulation results and comparisons between optimal and sub-optimal schedulers in Section 2.7, and conclude in Section 2.8. A simple example illustrating the structure of the problem is given in Appendix B. 2.2 System Model We consider a single-transmit single-receive antenna system, where a single user with a finite transmission buffer is communicating over a wireless fading channel. The time during which the transmitter sends packets is divided into a countably infinite number of discrete time-slots, where a time-slot corresponds to a single block (also called physical-layer frame) of Nf channel uses. Let TB "denote a discrete time-slot in seconds. We assume that the packets transmitted in a block each experience the same channel gain, and describe the channel as a block-fading finite-C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 3 3 Higher Layer Application Fading Channel Higher Layer Application B u f f e r S i z e , B Transmittei ~~I HReceiver C h a n n e l S t a t e I n f o r m a t i o n , c n cl,cl c2,c2 (a) Figure 2.1: (a) Schematic of the cross-layer adaptive single-transmit single-receive antenna system and (b) schematic of channel transitions among different discrete channel states. state Markov channel (FSMC). The data packets are arriving from a higher-layer application and are first placed into a finite buffer of size B packets. Incoming data, data stored in the buffer, as well as data being sent in a block are packetized, where the size of each incoming packet is Np bits. The schematic of the system is shown in Fig. 2.1. In the next sections, we provide detailed descriptions and models of the incoming traffic, buffer behavior and channel conditions. 2.2.1 Traffic Model Unless otherwise specified, in this thesis we use the superscript n to denote the value of cer-tain variables at time-slot n € Z* = {0,1, • • • }. For example, an denote the number of packets arriving at the buffer input in time-slot n. Suppose that in general, {an} forms an ergodic Markov chain with state space A — {ao, ai, • • • , a^} and average packet arrival rate C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 34 PQ I n c o m i n g p a c k e t s , a ' P a c k e t s t a k e n f o r t r a n s m i s s i o n , O J 1 .n 0 1 2 3 D i s c r e t e T i m e - s l o t s , n Figure 2.2: The variation of buffer occupancy in packets with respect to time-slots A = E{a n } packets/time-slot. As in [80], we also assume that {an} is independent of the channel fading and noise processes. Without loss of generality, we consider two special cases of incoming traffic: (1) constant packet rate (CPR) with an = A and (2) Poisson-distributed bursty traffic with average arrival rate E{an} = A packets/time-slot. For Poisson traffic, let a* denote i, i = 0,1,2, • • • , A packet arrivals. The distribution of an for the Poisson traffic can be given as where TB is the block period. The truncated Poisson distribution with maximum number of incoming packets/slot, A, is found assuming p(an = a^) —> 0 and normalizing the distribution. 2.2.2 Buffer Model and Dynamics Assume that at the beginning of the nth time-slot the transmitter chooses action un and cor-respondingly takes ty(un) packets from the buffer and maps these packets into a rate N p ^ ^ codeword that will be transmitted over the next Nf channel uses. We will assume fixed-length and variable-rate codewords, i.e., all codewords are sent over the same number of channel uses, p(an = ai) = exp{-ATB) = 0,1,---,A (2.1) C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 35 but the number of possible codewords can vary. LetB = {bo, h, • • • , bB} denote the state space of the buffer's packet occupancy, where 6; corresponds to i E {0,1,2, • • • , B} packets in the buffer. The dynamics of the buffer in terms of packet occupancy is then given by, bn+1 = min {max(60, bn - + an), bB) , (2.2) where the max operator ensures that the minimum buffer state is no less than b0, whereas the min operator ensures that the maximum buffer state is no more than bB. We assume that the transmitter can choose un based on the buffer state bn, the channel state c n , and the source state an. A l l packets that arrive in time-slot n can be transmitted only in time-slot n + 1 or later. A natural constraint on un is that 0 < *(u n) < bn for all n. That is, there is no transmission when the buffer is empty and the scheduler cannot transmit more packets than available in the queue. Let U be the set of all possible actions with U elements, i.e., U = {u\, u2, • • • , uu} with u\ denoting no transmission, while the interpretation of Uj, j = 2,3, • • • , U depends on the transmission scheme under consideration. We shall use symbolic representation of actions in order to cover all the cases analyzed in the thesis, with function ^(UJ) returning the number of packets taken from the buffer when the applied action is Uj. The pictorial representation of the buffer dynamic occupancy is given in Fig. 2.2. In this thesis, we assume that the packet arrival rate to the queue is either lower than or equal to the average departure rate from the queue. The actions are chosen so that the possibility of queue instability can be avoided. Also, packet-dropping occurs when the buffer has no vacancy. The packet which comes first also drops first. 2.2.3 F S M C for The Wireless Fading Channel Let us consider a user communicating over a wireless fading channel, as shown in Fig. 2.1(a). A slowly varying wireless fading channel can be modeled as a Finite-State Markov Channel (FSMC) [117], whis is done by partitioning the received signal-to-noise ratio (SNR), which is proportional to the square of the received signal amplitude, into a finite number of C non-overlapping states. The FSMC model is very useful, due to its good balance between accuracy C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 36 and complexity [125]. It is a popular block-fading channel model for packet-level communi-cations that considers the correlation of the fades between blocks [7]. As mentioned in [120], this first-order Markov model accurately models the practical fading channel for packet/block-level communication when the block length is sufficiently large. Let P denote the average transmit signal power, a denote the average channel power gain, and a2 denote the variance of channel noise, which is assumed to be additive white Gaussian. The receiver estimates the channel at each time-slot and sends the information to the transmitter over a feedback path. We assume this path to be instantaneous and error-free. Therefore, the channel-gain estimate a" equals the channel gain an. For a constant transmit power P, the instantaneous received SNR is 7" = anP/a2. In the sequel, we will omit the time reference n relative to 7 since an is stationary. Therefore, at time-slot n when the transmit power is Pt, the instantaneous received SNR is given by jPt/P. Ideal coherent phase detection is assumed. Let C — {ci, c2, • • • , cc} denote the state space of the FSMC and T — {70,7i, • • • , 7c} denote the corresponding re-ceived SNR thresholds in increasing order with 70 = 0 and 7c = 00. Then the fading channel is said to be in state ct, i — 1,2, • • • , C, if the received SNR is in the interval [7i_i, 7*). We assume the fading of the channel is slow enough so that the received SNR remains in a cer-tain state for the time duration of a block. Furthermore, the channel states associated with consecutive blocks are assumed to be neighboring states, i.e, pCi,Cj = 0, V| i — j\ > 1. The transition probability, pCi,ci+i fr°rn state Q to state q + 1 is approximated by the ratio of the ex-pected number of level crossings at the received SNR ji and the average transmission rate in state Ci. Similarly, the transition probability, P a ^ i from state Cj to state c<_i is approximated by the ratio of the expected number of level crossings at the received SNR 7;_x and the average transmission rate in state Q. Let fa denote the steady state probability associated with channel state cit i = 1, 2, • • • , C, and RB denote the number of blocks per second of the block-fading channel. So the average number of blocks/second during which the channel is in state Q is •Rfij = ^ t P f i - Therefore, the crossover transition probabilities can be written as , i = l , 2 , - . . ,C-1 (2.3) C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 37 and Pw-r « ^=L, fc = 2 > 3 , - - . J C (2.4) where JV 7 i, i — 1,2, • • • , C is the expected number of times per second that the received SNR passes downward across the corresponding threshold jif and is given by W , - ^ / - « * ( - * ) • (2.5) In this expression, fm = vmt/Xrw is the maximum Doppler frequency, where vmt is the speed of the mobile terminal and Xrw is the wavelength of the radio wave. We can express the steady state probability of channel state C j as 4>i = P /7(7)<*7 = tfy(7i) - tfy(7i-i)- (2.6) The transition probability of staying in the same state can be found from the fact that the sum of all outgoing transition probabilities is equal to one. Thus, the self-transition probabilities of the channel states can be written as JWi = l - ]C Pc,,Ci;* = 1|2,--- ,C, (2.7) where redundant probabilities p C l ] C 0 =Pcc,cc+\ = 0-Rayleigh Fading Channel In a rich multipath propagation environment, the instantaneous received signal amplitude is commonly modeled with the Rayleigh distribution. The pdf of 7 for the Rayleigh fading chan-nels is exponentially distributed and can be written as (cf. [143]) 1 / 7 / 7 ( 7 ) = - e x p ^ J f o r 7 > 0 (2.8) where 7 is the average received SNR. The cumulative density function (cdf) of 7 can be eval-uated using the following expression: ^r(7i)= / 7 , / 7 (7 )d7 - (2-9) Jo C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 38 C h a n n e l P a r t i t i o n i n g M e t h o d s The received instantaneous SNR can be partitioned using several methods. In our thesis, we consider the following two partitioning schemes: • E q u a l P r o b a b i l i t y M e t h o d : In the equal probability method (EPM), the instantaneous received SNR is partitioned so that the probability of staying in all channel states is the same [117]. Thus, in EPM, ^ = 02 = . . . = ^ = ! (2.10) For the Rayleigh fading channels, the steady state probability <j>^ i — 1,2, • • • , C can be given as 4>i= I"' A(7)d7 = e x p ( - ^ i ) - e x p ( - ? ) . (2.11) J-H-i 7 7 • E q u a l D u r a t i o n M e t h o d : In the equal duration method (EDM), the instantaneous re-ceived SNR is partitioned so that the average duration for all states is the same. Let A denote the average duration of channel state Cj , i = 1,2, • • • ,C [122]. Thus, in EDM, A i = A 2 = • • • = A c , and we can write, Ai = rTB, fori = 1,2,-.. ,C, (2.12) where r is a constant that should be larger than 1. The average duration of a state Cj G C is expressed as - = Prob{7<_i < 7 < 7,} = / V 7 i _ 1 + i V 7 i /V 1 + JvV Combining (2.12) and (2.13), for the Rayleigh fading channels, we obtain (rix/T^i - l)e-«-^ + ( n + l)e~^ = 0, (2.14) where i — 1, 2, • • • , C and r\ = ^J^rfmTB- For a given value of C, fm and TB, the set of C nonlinear equations found from (2.14) can be solved numerically to obtain the value of r and 71, • • • , 7c - i [144]. C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 39 Among above two methods, first method is widely accepted and used in the literature and we have used it for most of our simulations. Reason behind such choice is its better simplicity and complexity tradeoff. Also it is more rugged to change of simulation parameters. 2.3 Analysis of Packet Transmission Schemes Two different transmission schemes, namely, an information-theoretic transmission scheme and the M - Q A M transmission scheme are analyzed in this chapter. For the first scheme, we provide three different approaches to calculate transmission power, and discuss their differ-ences. For the second scheme, we provide two different ways of computing power. Let P^k\ci,Uj) be the required power for the kth analyzed case (k — 1,2,3,4,5) during the block when the channel is in state Q, i — 1,2, • • • ,C and the transmitter chooses action Uj, j = 1,2, • • • , U for transmitting *(u,-) packets. 2.3.1 Information-Theoretic Transmission Scheme The first transmission scheme we analyzed is based on an information-theoretic channel capac-ity model, namely, the mutual information model. We assume that P}k\(ci, Uj) is the required power so that the mutual information rate is equal to Np^'K We extend this scheme, pre-viously analyzed in [80] for i.i.d. block-fading channels, to correlated FSMC channels. We assume that the discrete transmission rate corresponds to action Uj is to take j — 1 packets from the buffer, i.e., ^ (UJ) = (j — 1). Therefore, the set of transmission rates is given by W = {wi,w2, • • • , u>u} — {0,1 , • • • , [ / — 1}. This scheme can be analyzed in three different ways, as discussed below. C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 40 L o w e r SNR T h r e s h o l d C a s e In the first case, we assume that the lower threshold of the received normalized SNR 7J_I of the channel state c< represents the channel SNR for that state. Therefore, Assuming the same channel SNR during the block, the above relation (2.15) provides a pes-simistic estimate of the transmission rate. Now, solving (2.15) in terms of P^\ci, Uj) we get, which gives the maximal necessary power to achieve an information rate of p'Ny' for any pos-sible received SNR, given the channel is in state c*. Obviously, (2.16) gives the upper bound of transmission power for scheduling packets, packet-dropping is inevitable at the lowest channel state, therefore, unity average delay is not obtainable for this case. The assumption for this case gives a pessimistic estimate of the transmission rate. However, it is clear that this estimate becomes more accurate as the number of channel states increases. E r g o d i c C a p a c i t y C a s e An alternative approach that gives an optimistic estimate of available rates is to consider that transmission rates based on ergodic capacity (for a detailed definition of ergodic capacity, read-ers are referred to [13]) can be achieved in a certain channel state c\, i.e., where expectation is over a distribution of normalized SNR 7, given the channel being in state Q. This model is valid under the condition that sufficiently long codewords with fixed code rates can be used during a block, which is long enough for the fading to reflect its ergodic nature. Because of this assumption, (2.17) gives an optimistic estimate of achievable rates in a certain channel state for the slow-fading FSMC model. Note that code rate estimates from (2.15) (2.16) (2.17) C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 41 (2.15) and (2.17) become more accurate as the number of channel partitions increases and converge to the same value. In the special case of a Rayleigh fading channels we have where transmit power (cj, Uj) can be obtained numerically. Average Received SNR Case Another optimistic estimate of the transmission power can be found using the average received SNR of a certain channel state. For channel state Q when action Uj is chosen, we assume that Pt3\ci, Uj) is the required power, so that the mutual information rate over Nf channel uses is equal to [133]. Therefore, we can write = i 0 g 2 + lifll^pM^ a n c j t h u s P t ( 3 ) ( c i ) % ) = ^ 2 "f (2.19) Therefore, for all % with |7;| > 0, P^3\ci,Uj) is an increasing and strictly convex function of Uj > 0. The average SNRs for state C j , i — 1,2, • • • , C are found as follow: * = T I" 7 / 7 ( 7 ) ^ 7 (2-20) 2.3.2 Practical M-QAM-based Transmission Scheme 1 The second transmission scheme we analyze is based on practical multilevel adaptive modu-lation schemes, namely, the M - Q A M transmission scheme. In these schemes, we assume that M - Q A M is used adaptively at the transmitter without any error-correction coding. We assume that U different modulation schemes are available at the transmitter. The first action and the second action correspond to no transmission and BPSK transmission, respectively. Action Uj, j — 3, • • • , U, corresponds to 2 2 - 7 _ 4 -QAM transmissions. Note that, for the M - Q A M case, the number of modulated symbols transmitted during a time-slot is equal to the number of channel 'We consider uncoded transmission in this chapter. However, our proposed framework can also valid for any coded transmission. C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 42 uses. That is, % ' = N j , where Vj is number of bits that modulates a 2 2 j _ 4 - Q A M symbol. We consider two different ways of fixing BER to a specified value. In both cases, we fix an equal BER for all channel states and choose the power costs for a certain channel state-action pair. The details of these two techniques are discussed below. Fixed Instantaneous B E R Case First, we constrain the instantaneous BER of each state to the same maximal instantaneous value, considering the worst possible received SNR of that channel state. For channel state when scheduler takes action Uj, j = 3, • • • ,U and correspondingly chooses 2 2 j _ 4 - Q A M transmission with ^(UJ) = N F ^ ~ A \ the instantaneous BER, valid for both low and high SNR, can be expressed approximately as [143] e.MQAM [cuuj) = 4 A useful approximation for BER that can be inverted or differentiated is derived in [10] as follows: Pe.MQAM(Ci,U,-) = 0.2 e x p (2.22) P(2vi - 1) •J The above equation (2.22) for BER is tight to within 1 dB for Vj > 2 and Pe < 10 - 3 . For BPSK transmission, D / \ 1 f I W 7 » - i - P t ( 4 ) ( c i , u 2 ) . Pe ,BPSK(Ci, U 2 ) = 2 e r f C I V p I (2-23) As in (2.15), the above BER equations give a pessimistic estimate of the necessary power to achieve a specified BER for every possible SNR of the channel state ciy i — 1,2, • • • , C. For a specified instantaneous BER, the upper bound of the power P^\ci, u2) can be determined using the lower threshold of the received SNR for different actions and channel states. C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 43 Fixed Average B E R Case An optimistic estimate of the transmitter power can be obtained by fixing an equal average BER for all channel states, given that the channel is in a respective state, i.e., For a given fixed average BER, transmission power P}5\ci, Uj) can be evaluated numerically from ( 2 . 2 4 ) . 2.4 Objectives of Scheduling Scheme and Associated Costs The objectives of the packet scheduling scheme are three fold: minimizing transmission power (which is very important for extended battery life), minimizing buffering delay and minimizing the probability of packet-dropping from the buffer. Each objective is associated with a cost that gives its numeric importance while the three objectives are fulfilled simultaneously. In the following sections, we model the representation of these three objectives in terms of costs. 2.4.1 Transmission Power Cost Wireless devices are usually powered by a battery of limited energy content. Thus, efficient utilization of battery energy can extend battery life. Also, interference to other wireless de-vices can reduced by decreasing the transmit power of the wireless device. Hence, minimizing transmission power is one of the main goals of wireless systems. The immediate transmission power cost is the instantaneous power level that the scheduler is using for transmitting packets. Therefore, the transmission power cost for the kth analyzed case is given by In ( 2 . 2 5 ) , function x ( s n ) gives the channel state for composite system state sn. Note that the transmission cost only depends on the channel condition and the action chosen. However, it is independent of buffer occupancy and incoming traffic. ( 2 . 2 4 ) GP(sn,un) = Ptw(x(sn),un), V6 n 6 B and a" e A ( 2 . 2 5 ) C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 44 2.4.2 Buffer Costs In this chapter, we consider two buffer costs, namely packet delay in the buffer and the proba-bility of packet-dropping from the buffer. Packet Delay in the Buffer Delay is an important parameter to consider for communications systems involving transmis-sion buffers. The maximum tolerable packet delay for a particular system depends on the quality of service requirements of the application being handled. For example, real-time traffic must have very low delay. For this traffic, the received packet is only useful when the strict delay requirements are maintained by the scheduler. On the other hand, best-effort traffic is not real-time and is quite unsusceptible to delay. The delay experienced by a packet is com-posed of buffer-queuing, encoding, propagation and decoding delay. In this thesis, we consider only buffer delay, since encoding, propagation and decoding delay are usually fixed and are negligible compared to buffer delay. The average packet delay in the buffer is related to the average buffer occupancy via Little's theorem, as follows: The scheduler takes into account buffer delay by buffer delay cost G r j ( s n , « n ) . Therefore we can write the immediate buffer delay cost as2 In (2.27), function ifj(sn) gives the buffer state for composite system state sn. 2Note that, for an infinite-size buffer, the time-average cost of (2.27) is equal to the time-average delay in the buffer when no packets are lost due to overflow. With a finite-size buffer, if the control policies are constrained to avoid overflows, then this equality is also true. However, if packet overflows are allowed, the time-average of (2.27) will give a lower bound on the average delay. (2.26) (2.27) C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 45 Packet-Dropping Probability When the transmission buffer is of infinite length (which is not realizable in practice), no packets are lost due to lack of storage space in the buffer. However, when the buffer size is finite, which is true for real communication systems, packets are dropped from the buffer for bursty incoming traffic. As it will be evident later in this chapter that if the incoming traffic is constant in nature, however, the packet-dropping can be avoided at the cost of increased transmission power. Therefore the second buffer cost considered in this chapter is the packet-dropping probabil-ity cost. At the beginning of a time-slot, the available storage space is given by bB — bn + * (un). If the number of incoming packets o n are more than this vacancy, packet-dropping occurs. We assume a First In First Out (FIFO) buffer, i.e., the packets which comes first will be transmitted first or will be dropped first for insufficient vacancy case." The packet-dropping cost can be given by, bs CIA G0(sn, un) = W)-*(«")=*} E P ^ s " ) = ° * ) ' V c " 6 C ( 2 2 8 ) k=bB— CLA+I a.j=bB—k+l where I{x\ is a indicator function, and returns 1 when x holds and 0 otherwise. In (2.28), function C(s") gives the traffic state for composite system state sn. Note that since no error control coding is used, it is assumed that the packets are either received with no error or with a specified small error. The transmitter do not keep any copy of the packets taken from the buffer for transmission. Hence no retransmission is made. 2.4.3 General Formulation We formulate the problem as an average-cost Markov decision process (MDP) with a compos-ite state space S = BxCxA = {si, s2, • • • , ss}, where S = (L+1) x C x (A +1) is the total number of states. Let II denote the set of all Markov stationary policies 7r = {//,//,•••} with H : B x C x A H-> Q(U), where Q(U) represents the set of discrete probability distributions over action set U. For brevity, we refer to {//, /z, • • • } as the stationary policy JJL. Note that, C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 46 for finite MDP, given any history-dependent policy there exists a Markov policy with the same average cost. Therefore, it is sufficient to restrict attention to Markov policies while seeking an optimal policy (cf. Theorem 5.5.1 of [145]). With stationary policy (J, E U, the expected long-term average power cost is 1 " GP = limsup - V E [ G P (s n , /z(s n))], (2-29) • H i £ the expected long-term average delay cost is i « G% = limsup -^2E[GD(sn,n(sn))], (2-30) 71=1 and the expected long-term packet-dropping probability cost is 1 H GQ = limsup - V E [ G 0 ( S " , / X ( O ) ] - (2-31) Here, the expectation operator is over the random received SNRs and random arrivals. 2.5 Optimal Scheduling In this section, we discuss the evaluation of optimal scheduling policy through dynamic pro-gramming algorithms. We provide two different approaches to the problem. Whereas the first formulation gives deterministic policies, the second formulation gives randomized policies. The details of these two formulations and their solution techniques are given below. 2.5.1 Unconstrained MDP Formulation As mentioned earlier concerning the systems in Fig. 2.1, we are interested in minimizing three conflicting objectives, namely, the average delay, average power and average packet-dropping probability. Since these goals are diverging, for unconstrained Markov decision processes (UMDP) we consider minimizing a weighted combination of the three criteria involved, and formulate the problem as an average cost MDP. Our objective is to find the optimal stationary C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 47 policy /x* over all Markov deterministic stationary policies, II that minimizes the average cost per stage, given by At each decision epoch n, the controller observes the state sn of the system. Based on this state, the controller chooses a control action un and corresponding transmission rate ^(un), and incurs a per-stage cost of GT(sn, M O ) = GP (sn, M O ) ) + PiGD(sn, M O ) + p2G0(sn, M O ) (2-33) where the non-negative weighting factors Q\ and 32 assume the role of Lagrangian multipliers and indicate the relative importance of average buffer delay and average dropping probability over the average power. Smaller values of 3\ and 02 correspond to placing less importance on average delay and average dropping probability, respectively. It is evident that the state spaces are finite and that, therefore, the costs are bounded. Also the system is stationary, that is, the system equation, cost per stage and transition probabilities do not change from one stage to the next. Such UMDP problems can be solved using dynamic programming (DP) techniques, such as relative value iteration algorithms or policy iteration algorithms [145]. D e f i n i t i o n 1. A stationary policy whose associated Markov chain has a single recurrent class and a possibly empty set of transient states is called a unichain policy. A stationary policy whose associated Markov chain has two or more closed irreducible classes is called a multi-chain policy. D e f i n i t i o n 2. A MDP is referred to ^communicating if, for every pair of states st and Sj e S, there exists a deterministic stationary policy under which Sj is accessible from S j , even though some of the policies may be multichain. A MDP is said to be weakly communicating if there exists a closed set of states, with each state in that set accessible from every other state in that set under some deterministic stationary policy, plus a possibly empty set of states that is transient under every policy. Therefore, weakly communicating models may be viewed as communicating models with additional transient states. (2.32) C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 48 In general, the scenario being considered forms a weakly communicating MDP because some policies are multichain, yet there exist policies under which the transition probability matrix is unichain; we show a simple example for constant incoming traffic in Appendix B to illustrate this fact. However, in the special case of Poisson traffic with a A > & (uu), all policies are unichain, and the sup operator in (2.29)-(2.32) can be removed because the expectation is well defined for each policy. Bellman's optimality equations, V S J € S of this model are given below: m i n s = 0 (2.34) and A (S J ) -I- h(si) = mmueUs. GT{su u) + X ^ i P ^ M M s ? ) (2.35) where A(s<) is the gain or average cost for S{ 6 <S, h(si) is the relative cost or differential cost for each state S* e S, pai,Sj(u) is the stationary transition probability with which the system moves to state Sj from state Sj when control action u is applied, and USi C U is the set of allowable actions in state s<. For a unichain policy, the gain is constant, i.e., A ( S J ) = A, V S J . Therefore, (2.34) vanishes for any stationary policy that is unichain. Also, for both the communicating and weakly communicating models, the optimal gain is constant. Hence, the relative value iteration algorithm for the unichain model can be applied directly to both models to find a stationary e-optimal policy and its corresponding gain [145]. The algorithm converges to a unique optimal deterministic policy finitely, and the optimal average cost is independent of the initial state [146]. The problem can also be solved using a policy iteration algorithm. We have seen from our simulations that all of the improved policies are unichain, and hence the policy iteration algorithm terminates finitely with an optimal stationary deterministic policy [146]. For a given value of /?x and # 2 , let /x* be an optimal policy and GP , G£> and GQ be the corresponding average power, average delay, and average packet-dropping probability as given C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 49 Figure 2.3: An illustration of the weakly communicating structure of the MDP problem. in (2.29), (2.30) and (2.31), respectively. Here, GP must be the minimum average power such that the average delay and average packet-dropping are less than G^D and GQ , respec-tively. By varying d\ and 02, we can find a finite number of different stationary deterministic optimal policies, and their corresponding average powers, average delays and average packet-dropping probabilities. By joining the adjacent points corresponding to deterministic sched-ulers with average dropping probabilities below a fixed bound, we get the piece-wise linear optimal power/delay curve. Points above the optimal power/delay curve are achievable with a certain scheduler, while there is no scheduler that can have power/delay performance below the curve. 2.5.2 Constrained M D P Formulation In the previous section, we minimized the weighted sum of the three objectives. In this section, we formulate the problem as a CMDP where one type of cost is minimized while keeping the other types of cost below some given bounds. We bind the average delay and average packet-dropping probability to a specific value and seek to find the optimal stationary policy fx* that C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 50 minimizes the average power, as follows: subject to: minGp and dropping (2.36) where D and dropping are the maximum tolerable average delay and average packet-dropping probability. It is known that, for a given state s E S, optimal Markov policies / i * for the constrained MDP problems are randomized [145]. The optimal randomized scheduler \x* is uniquely char-acterized with probability 0M. (s) (u) of applying policy u E Ua in state s E S. Unlike the general randomized scheduler, in the case of the deterministic scheduler, the probabilities 9^ (s) (u) take on values of only 0 or 1. The constrained MDP problem formulated above can be solved using an equivalent LP methodology, as presented below (cf. [145]). It can be shown that there is a one-to-one correspondence between the LP optimal solution and the CMDP optimal solution; LP is feasible if and only if CMDP is feasible. Let u(s, u) represent the "steady-state" proba-bility that the process is in state s and action u is applied. We seek to find the control policy that is represented in terms of probability distribution v over S xU. The optimal policy v* can C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 51 be obtained by solving the linear program: min Gp(s,u)u(s,u) V ses,ueu. subjectto: ^ GD(S,U)V(S,U) < D s£S,u£Us ^ G0(s,u)v(s,u) < Propping s€S,u€Ua J%(s',u) = v(s,u)pStS>(u), Vs' G <S s€S,u£Us u(s, u) > 0, Vs 6 S, \fu G Ua (2.37) This formulation provides a simple tool for the determination of optimal randomized schedul-ing policies. Suppose there exists an optimal solution v* to the LP problem. Then there exists an optimal randomized stationary policy p* for the CMDP problem, where p* satisfies [145], If Y J u ' e w "*(s>u') = 0 for some s G <S, an action that drives the system to the recurrent class of states SR = {s G S : X ^ u ' e w ^ * ( s ' u ) > 0} *s chosen in each state [145]. An algorithm for efficient identification of states in the recurrent class is given in [147]. We note from our simulations that the number of states within the recurrent class where randomized policy is applied is not greater than one. This resembles that of the recurrent constrained MDP shown in Theorem 4.4 of [148]. In general, linear programming can handle problems with a large number of variables, and the above linear program can be easily solved using interior-point methods (cf. [144]). 2.6 Suboptimal Scheduling In certain cases of practical interest, it is not possible to use the optimal scheduling policy explained in the previous section. This situation may occur when computational resources at C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 52 Optimal Policy for Channel State Figure 2.4: A n illustration of the optimal rate allocation policy for the second (ergodic capacity) information-theoretic model. the transmitter or the receiver are limited, or when the statistical characterization of the channel is not known in advance. The computational load of optimal scheduling increases exponentially with the increase of state space. We present here a simple approximate functional form of policy that we call log scheduling. This suboptimal policy does not depend on the statistics of the fading channel, and gives a functional relationship between the action, channel state and buffer state. Such a scheduler avoids the curse of modeling and curse of dimensionality (as termed in [149]), which may make the optimal scheduler infeasible. CHAPTER 2. OPTIMAL AND SUBOPTIMAL PACKET SCHEDULING 53 2.6.1 Log Scheduling The motivation for choosing the log function for scheduling stems from the shape of the optimal rate-allocation policies that can be produced from, for example, unconstrained formulation. An example of the dependence of the optimal rate allocation for the information-theoretic model based on ergodic capacity, in terms of buffer and channel state, is given in Fig. 2.4. The following parameters are used: / m T B = 0.01, Pi = 1.5849 and (32 = 0. It can be seen that the optimal rate allocation is increasing approximately logarithmically in both buffer and channel states. The previous guidelines can be accommodated with the following policy: un = fi(bn, c n) = L l o g ( r 6 " ( 7 n ) K ) J. (2.39) Coefficient r assumes a similar role as factor Pi and determines how aggressive the rate al-location policy is going to be, and [x\ denote the largest integer that is not larger than x. A larger value of r implies that power allocation is going to be more aggressive, assigning higher rates for all states and the policy resembles channel inversion power control. Consequently, the allowable delay decreases and the scheduler needs more average power. On the other hand, a smaller value of r implies that weaker states choose lower transmission rates while stronger states are assigned higher transmission rates and the policy resembles water-filling power con-trol. Power K influences the dependence of the scheduling policy on the channel state and can be chosen to appropriately shape the policy. It determines the greediness of the policy as chan-nel state improves. In determining log policy, the average normalized SNR for each channel state has been used in order to find the appropriate rate action for that state. For the first trans-mission case, substituting (2.39) in (2.16), it can be noticed that power is linearly proportional to buffer occupancy for all channel states, thus avoiding excessive transmission power. Similar conclusions can be drawn for the other three cases. We have observed that the class of policies (2.39) parameterized by r and K is a very rich and can closely approximate optimal policies for different values of average delay, fading rate and average packet-dropping probability. For practical implementations, the scheduling policy of (2.39) has to be modified, and can be given CHAPTER 2. OPTIMAL AND SUBOPTIMAL PACKET SCHEDULING 54 as un = /i(6",c n) = max (min (uatbest, [log(r6n(7n)")J), uatleast) (2.40) where uatbest — max„{u € W | \ I / ( M ) < bn} and the minimization operation ensures that the number of transmitted packets at time-slot n is not greater than buffer occupancy bn, and Uatleast — min u{u € U\^[u) > bn — bs + A} and the maximization operation ensures that there are no buffer overflows for CPR traffic of A packets. For bursty arrival, A should be substituted with the to avoid overflows. In order to allow dropping, A can be replaced by another suitable value. 2.6.2 Threshold Scheduling We compare the performance of our scheduler with the channel threshold scheduler proposed in [87]. For the channel threshold scheduling policy, the action in state s" is chosen as max u{u e Us\V{u) < mm(6",r t)} if cn > jt u\ otherwise The threshold rate parameter rt and received SNR parameter 7 t are chosen to minimize average power under the constraints on average delay and packet loss rate. 2.6.3 Mixed Scheduling We have compared the optimal scheduler and proposed sub-optimal log-scheduler with the information-theoretic results of channel inversion and water-filling power control policies. The reader is referred to [13] for a detailed definition of these policies. For simplicity and insight, we discuss the constant incoming traffic for these policies. Let deterministic policy He be defined as un = nc(bn, c") = mm(uatbest, uv) (2.42) C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 55 That is, the maximum possible number of packets is always sent from the buffer, irrespective of the channel state. This policy resembles the channel inversion power allocation policy [ 1 3 ] . Furthermore, let the deterministic policy fiw be defined as un = Hw{bn, cn) = max (min {uatbest, uwf{jn)), u a t i e a a t ) ( 2 . 4 3 ) where 1x^/(7") is the opportunistic rate allocation policy dependent only on the current channel state. This policy corresponds to the water-filling power allocation policy for the given fading statistics and capacity [ 1 3 ] . Let us consider the solution to the following optimization problem: pt-.c^R+ 1 { \ a2 J J subject to: E7 {Pt(j)} < P ( 2 . 4 4 ) where 7 is a random variable with steady-state distribution 7r7 and Pt : C 1—• R+ is a power allocation, i.e., a function that indicates the average power used for each channel state ck E C. For a given rate C, the solution of ( 2 . 4 4 ) gives the optimum power allocation P t ( 7 ) . This power allocation is known as "water-filling" allocation over the channel state space, and is given by p ' ( 7 ) = G" iw) + , V 7 € C <2-45) where p is a constant so that the average power constraint is met. Maximization and mini-mizations in ( 2 . 4 3 ) modify the opportunistic rate allocation 1x^/(7") policy so that there are no dropped packets in the buffer and the number of packets sent from the buffer does not exceed its occupancy. We now form the mixed scheduling policy, \iM, by randomizing the previous two policies. This policy \iM applies scheduling according to the policy p,w with probability a and scheduling according to the policy /xc with probability 1 — a, for some a E [0,1]. 2.7 Results and Discussions In this section, we show simulation results for the average power versus the average delay trade-off with fixed allowable average packet-dropping probability. We show results for both C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 56 the optimal scheduler and the suboptimal scheduler. The simulations are performed under dif-ferent normalized fading rates fmTB, a different numbers of actions U and different incoming traffic models (e.g., constant arrivals and Poisson arrivals). We present results for all five cases outlined in Section 2.3, with appropriate power costs. We also include the results for aver-age packet-dropping probability vs. average delay curves whenever applicable. Although not shown in figures, as both approaches give the same results, policies are obtained using both the UMDP and the CMDP formulation. In UMDP, we use relative value iteration (RVI) algorithm and policy iteration (PI) algorithm to find the optimal deterministic policies. The PI algorithm converges 2-3 times faster than the RVI algorithm for the problems under consideration. It has been observed that with the PI algorithm all of the improved policies produced during the running of the algorithm are unichain. On the other hand, we use a linear programming (LP) technique to solve the CMDP and find optimal randomized policies. The importance of the UMDP formulation is that it produces deterministic policies that can be implemented more easily in practice than the randomized policies. However, our simulations showed that the con-strained LP approach is computationally more efficient and is 5-6 times faster than the RVI approach. Another benefit of using the LP approach is that it is able to generate the policy for any feasible average delay with any constraint on average packet-dropping probability, whereas this is not possible when the set of policies is constrained only to deterministic policies. As the number of constraints increases, it is easier to handle the problem with CMDP by specifying the bounds on the constrained costs than by using the UMDP formulation. For the second, third and fifth analyzed cases, the optimal schedulers can be chosen to avoid instances of packet-dropping. In any state, the optimal schedulers are designed to avoid accepting any transmission rate greater than the number of available packets in the buffer. For all of the models under investigation, we fix the following parameters: average arrival rate A — 2 packets/block, Np/Nf = 1, buffer size B — 100 packets, number of channel states C = 10, block rate RB — 1 0 4 block/time slot, power P — 1 mW and corresponding normalized average SNR 7 = 1. We investigate FSMC with i.i.d. Rayleigh fading as well as correlated C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 57 Rayleigh fading with different normalized fading rates fmTB = 0.005,0.01,0.02,0.04. For the first information-theoretic model, Fig. 2.5 demonstrates the influence of the normal-ized fading rates and the incoming traffic models (CPR and Poisson traffic with CLA = 10) on the optimal average power/delay curves. Since a lower bound on the SNRs is used to estimate the achievable rates for each channel state, it is obvious that for a Rayleigh fading no transmis-sion is possible in the weakest channel state. Packet-dropping events are therefore inevitable; the probability of packet-dropping vs. average delay is shown in Fig. 2.6. Here, the probabil-ity of packet-dropping is constrained to Pdropping — 0.04. Because of no transmission at the lowest channel state, the least possible average delay is more than 1. The average transmission power decreases as the average delay increases. Relaxing the constraint on average delay gives the scheduler the flexibility to store more packets when the channel state is lower. When the channel moves to a higher state, these stored packets can be sent using a lower transmission power. Therefore, by putting a less stringent constraint on the average delay, the scheduler can save transmission power on average. It can also be seen that the increase in fading rate decreases the necessary power for the same average delay, since as the fading rate increases, the channel state stays in a particular state for a shorter time. It can be noted from Fig. 2.4 that in a lower channel state, the scheduler chooses actions that transmit a smaller number of packets or no packet. Therefore, packet accumulation is greater in a lower channel state. If the channel stays in lower channel state longer, to maintain the same average delay the scheduler has to use a higher transmission power to send more packets, due to the larger number of pack-ets in the buffer. Thus, when channel fading is slower the average power is higher for the same delay. Furthermore, it can be seen that due to the burstiness of the Poisson incoming traffic, larger transmit power is required than for CPR traffic for the same delay and same constraint on dropping probability. In contrast to the continuous transmission rates used for information-theoretic analysis, in this thesis we assume that only discrete transmission rates are available. Discrete transmission rates are practical since a practical modulation scheme can only transmit an integer number of packets. The non-smooth nature of the dropping probability curves may CHAPTER 2. OPTIMAL AND SUBOPTIMAL PACKET SCHEDULING 58 be due to the discrete nature of the action set. The results of the second information-theoretic model assuming ergodic capacity with con-stant incoming traffic are shown in Fig. 2.7. Transmission in this case is possible even in the weakest channel state, and the packet-dropping probability is constrained to zero. A similar trend of decreasing the necessary power with increased fading rate and increased average de-lay is also observed in this case. Note that very long delays are not feasible due to the finite size of the buffer. We also show results for the log scheduler (2.40) and channel threshold scheduler (2.41) for the same settings. Channel threshold policies for different average delays are produced by changing the channel threshold jt from ^ to 7 c and finding the threshold transmission rate rt that gives the smallest average power. It can be noted that such a channel threshold policy performs well only for small delays (or small values of 7 t ) , not all feasible de-lays can be achieved with such a policy for a limited number of channel states and actions. For higher average delays, this policy performs poorly, due to the poor utilization of the channel when the channel state is below the threshold 7 t . Different log-scheduling policies and corre-sponding average powers and delays are obtained by varying the value of r. We set K = 2, as it has been observed through simulations that this value provides very good performance over a range of fading rates and number of actions. The proposed log policy performs better than the channel threshold policy for all ranges of average delays. The other benefit of the log-scheduling policy is that it avoids the need for optimization. Its performance is consistently becoming closer to that of the optimal policy as the fading rate increases. The log scheduler is compared with the optimal and mixed schedulers in Fig. 2.8, where the influence of the coefficient a on the power/delay curve of mixed scheduling is shown. For a — 0, the mixed scheduling policy is equivalent to the deterministic channel inversion power-control policy (with unit delay), while for a = 1, the mixed scheduling policy is equivalent to the water-filling power-control policy. It can be easily observed that this mixed policy behaves worse than log scheduling for almost all values of a. As the value of a increases, the average C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 59 power first starts increasing and then after crossing some value of a, it starts decreasing again. We may explain this phenomenon as follows. When the normalized received SNR is in a lower state of the channel state space C, the water-filling allocation policy allows the packet to accumulate in the buffer, but the channel inversion allocation policy sends these packets with a very high cost (i.e., using a higher transmission power). It is also noted that the average power required decreases as the normalized fading rate increases, due to the fact that as the fading rate increases, there is less chance that channel state will remain in the lower channel state for a longer time and consequently less chance to accumulate the packet in the buffer for a long time. This prevents the average buffer size from becoming very big for fast fading. It is interesting that for slower fading rates and a — 1, i.e., if pure water-filling is applied, the necessary power is larger for this policy than for the log-scheduler. Therefore, we can conclude that straightforward application of water-filling power allocation is not suitable for practical systems where buffer dynamics as well as delay is taken into account. We now discuss the M Q A M model with equal maximal instantaneous BER for all states. Fig. 2.9 shows the power vs. delay curve for different fading rates and allowable packet-dropping of Pdropping — 0.04. As in the first information-theoretic model, packet-dropping is inevitable, and its dependence on the average delay is shown in Fig. 2.10. As expected, as the fading rate increases, the minimum obtainable packet-dropping probability decreases. In order to explore the influence of dropping probability on the necessary transmission power, we have also included the results for Pdropping = 0.02 and fmTB = 0.02. It can be seen that placing a tougher constraint on the dropping probability necessitates a higher power in the larger average delay region, and decreases the maximum feasible average delay. In Fig. 2.11, we show the power vs. delay trade-off for the average BER M Q A M model. Since no dropping is necessary in the weakest channel state of this transmission model, we constrain the dropping probability to 0. As in the previous cases, higher fading rates allow for the use of a lower power for the same average delays. Furthermore, due to very large differences in power costs P^4\ci, Uj) in terms of channel state the optimal power vs. delay C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 60 tradeoff curve drops rapidly as the delay increases. The rate of this decrease is higher for higher fading rates. Note that, adding more actions does not provide any decrease in the necessary power for very small and very large average delays. The average BER M Q A M model with Poisson arrival and maximum allowable dropping probability, Propping = 0.01, is shown in Figs. 2.12 and 2.13 for both log and optimal sched-ulers. The maximum number of packet arrivals for truncated Poisson traffic is assumed to be 10. The optimal scheduler provides 3-10dBm less power for a delay of 8 time-slots com-pared with the suboptimal case with the same bounds on dropping probability. The gain of the optimal scheduler diminishes with increased fading rate. The optimal power vs. delay curves for the average BER M Q A M model without dropping are shown in Fig. 2.14. It can be seen that in the higher delay regions, the difference between the optimal and suboptimal case decreases. Note from Fig. 2.11 and 2.14 that as the average BER decreases, the scheduler saves more power by delaying packets, especially in the lower delay regions. 2.8 Conclusions In this chapter, we have analyzed the power/delay tradeoffs of a single-user system with a finite buffer with fixed allowable packet-dropping probability over a Rayleigh faded finite-state Markov channel. We have shown that the cross-layer packet scheduling problem forms a weakly communicating Markov decision process. Both unconstrained and constrained formu-lations have been proposed to solve the problem to obtain optimal deterministic and random-ized policies, respectively. The formulated models are general in the sense that they incorporate different constraints and different transmission models with appropriate costs. Extensive sim-ulation results have been given for both constant and Poisson-distributed packet arrivals. The influence of different fading rates, action sets, bit error rate, and constraints on dropping prob-ability have been shown. A simple log functional suboptimal scheduler have been proposed, C H A P T E R 2. O P T I M A L A N D S U B O P T I M A L P A C K E T S C H E D U L I N G 61 and its performance has been shown to be close to that of optimal scheduler over a range of fading rates, arrival types and action sets. We have also shown that it performs better than other methods in the literature (e.g., the optimization-based channel threshold scheduler). C H A P T E R 2. O P T I M A L AND SUBOPTIMAL P A C K E T SCHEDULING 62 Figure 2.5: Comparison of the average power vs. average delay tradeoffs with bounded allow-able average packet-dropping probability for different normalized fading rates and different traffic statistics. The first information-theoretic model discussed is considered here. C H A P T E R 2. O P T I M A L AND SUBOPTIMAL P A C K E T SCHEDULING 63 Average Delay Figure 2.6: Comparison of the average packet-dropping probability vs. average delay curves corresponding to Fig. 2.5 for different normalized fading rates and different traffic statistics. The first information-theoretic model discussed is considered here. C H A P T E R 2. O P T I M A L AND SUBOPTIMAL P A C K E T SCHEDULING 64 T 12! C=10, B=100, U=5 -Optimal, fmT B=0.05 -Optimal, fmTB=0.1 -Optimal, fmT B=0.2 - Optimal, fmT B=0.4 -Optimal, IID fading Log, f mT B=0.05 Log,f mT B=0.1 Log, f m T B =0.2 Log,f m T B =0.4 Log, IID fading Threshold, f T =0.05 m B .Threshold, f T =0.1 m B Threshold, f T =0.2 m B - - Threshold, f T =0.4 m B - a - Threshold, IID fading 10 12 14 Average Delay 16 18 20 22 24 Figure 2.7: Comparison of the average power vs. average delay tradeoffs with optimal packet scheduling, log scheduling and channel threshold scheduling for the second information theo-retic model discussed. No packet-dropping is allowed in this case, and comparison graphs are simulated for different normalized fading rates. C H A P T E R 2. O P T I M A L AND SUBOPTIMAL P A C K E T SCHEDULING 65 —*— Optimal, i.i.d. fading — e — Optimal, f mT B=0.05 - • - O p t i m a l , fmTB=0.01 - » - Mixed, i.i.d. fading - e - Mixed, f T =0.05 m b - • - M i x e d , f T =0.01 m B . . . . . . Log, fmTB=Q=0i1 ...©•• Log, fmTB=0.05 x Log, i.i.d fading a= 1 a = 1 i n n n I I I U M a= 1 Simulation Data: B=100, C=5, U=10, Average Arrival=2 packets/time-slot, N /^N =3 10 15 20 Average Delay 25 30 32 Figure 2.8: Comparison of the average power vs. average delay tradeoffs with optimal schedul-ing, log scheduling and mixed scheduling for the third information theoretic model discussed. No packet-dropping is allowed in this case and comparison graphs are simulated for different normalized fading rates. C H A P T E R 2. O P T I M A L AND SUBOPTIMAL P A C K E T SCHEDULING 66 C=10, B=100, 11=4, BER=10 - 3 - 0 - P , dropping =0.04, f T =0.01 m B D ^dropping =0.04, f T =0.02 m B 1 ^dropping =0.04, f T =0.04 m B ^ ^dropping =0.04, MD Fading '"'dropping =0.02, f T =0.02 m B 0 0 0 0 0 0 0 0 0 9 0 0 0 10 12 Average Delay 14 16 18 20 Figure 2.9: Comparison of the average power vs. average delay tradeoffs for the first M Q A M model with equal maximal instantaneous BER for all channel states. The effect of the average allowable packet-dropping probability is also shown, along with the effect of normalized fading rates. C H A P T E R 2. O P T I M A L AND SUBOPTIMAL P A C K E T SCHEDULING 67 Average Delay Figure 2.10: Comparison of the average packet-dropping probability vs. average delay curves corresponding to Fig. 2.9 for the first M Q A M model, with equal maximal instantaneous BER for all channel states. The effect of the average allowable packet-dropping probability is also shown, along with the effect of normalized fading rates. C H A P T E R 2. O P T I M A L AND SUBOPTIMAL P A C K E T SCHEDULING 68 400 r C=10, B=100, Average BER=10 U=4, f T =0.005 m B U=4, f T =0.01 m B U=4, f T =0.02 m B U=4, f T =0.04 m B U=4, IID fading - e - U = 6 , f TD=0.005 m B — U=6, f TD=0.01 m B - * - U = 6 , f T =0.02 m B -+—U=6, f T =0.04 m B U=6, IID fading 4 5 Average Delay Figure 2.11: Comparison of the average power vs. average delay tradeoffs for different num-bers of actions and different normalized fading rates. The second M Q A M model with equal average BER for all channel states has been used. C H A P T E R 2. O P T I M A L AND SUBOPTIMAL P A C K E T SCHEDULING 69 Figure 2.12: Comparison of the average power and average delay with bounded allowable average packet-dropping probability for the second M Q A M model. Optimal packet scheduling and log scheduling are compared for Poisson arrivals. C H A P T E R 2. O P T I M A L AND SUBOPTIMAL P A C K E T SCHEDULING 70 Figure 2.13: Comparison of the average packet-dropping probability with average delay corre-sponding to Fig. 2.12 for the second M Q A M model. Optimal and log scheduling are compared for Poisson arrivals. C H A P T E R 2. O P T I M A L AND SUBOPTIMAL P A C K E T SCHEDULING 7 1 Figure 2.14: Comparison of the average power vs. average delay tradeoffs for the second M Q A M model with optimal and suboptimal log schedulers. Constant arrivals and no packet-dropping are considered for this figure. C H A P T E R 3. S C H E D U L I N G W I T H R E C E I V E R D I V E R S I T Y 72 CHAPTER 3 Scheduling with Receiver Diversity 3.1 Introduction To satisfy high data rate wireless services with different quality of service requirements over notoriously time-varying fading channels, various adaptive schemes have been proposed in the literature and standards. Adaptive modulation schemes employing spectrally efficient multi-level modulation, such as M - Q A M , have been used as powerful techniques to provide high data rates and compensate channel degradation [10]. Space diversity is another useful tech-nique to combat fading, and can often be combined with adaptive modulation to further im-prove system performance [14]. Considering the fact that in practical systems the arriving packets are non-constant and transmission buffer is of finite size, we present a cross-layer opti-mal packet scheduling scheme that considers physical layer random fading, as well as data-link layer buffering delay and packet-dropping due to random packet arrivals. Whereas in Chap-ter 2, we studied a packet scheduling system that adapts its transmission parameters over a single-transmit, single-receive antenna wireless link, in this chapter, we study two cross-layer optimization problems for M - Q A M systems that adapt transmission rate with channel state and buffer occupancy over a single-transmit, multiple-receive antenna Nakagami-m fading chan-nel. C H A P T E R 3. S C H E D U L I N G W I T H R E C E I V E R D I V E R S I T Y 73 A brief summary of our contribution is presented below: • We present two optimal packet scheduling schemes for M - Q A M systems over the Nakagami-m fading channels with different receive diversity-combining techniques. The objective is to optimize the system performance across different layers. • We formulate both cross-layer problems as constrained Markov decision process prob-lem and give linear programming technique-based solutions. In the first problem, our objective is to minimize average transmission power under constraint on average delay and average packet-dropping probability. In the second problem, we minimize average bit error rate (BER) with average delay and average packet-dropping probability con-straints. • We describe the Nakagami-m fading channel with diversity-combining as a finite-state Markov channel. We consider both selection combining and maximal-ratio combining at the receiver. We also consider a finite-size transmission buffer and bursty traffic. • Simulation results are given for different diversity-combining techniques. Adaptive trans-mission gain is discussed, as is employing multiple antennas at the receiver. The remaining part of this chapter is organized as follows. Section 3.2 describes the system model and notations used in the chapter. The modeling of the channel as a FSMC is given in Section 3.3. In Section 3.4, we formulate the problem as a CMDP, give its ingredients and describe its LP solution methodology. Section 3.5 provides numerical results and discussions. We conclude in Section 3.6. 3.2 System Model The schematic of an adaptive packet transmission system with receiver diversity is illustrated in Fig. 3.1. The incoming packets from a higher-layer application are assumed to be randomly varying and to obey Poisson distribution, as described in Section 2.2.1. The transmitter consists C H A P T E R 3 . S C H E D U L I N G W I T H R E C E I V E R D I V E R S I T Y 74 Higher Layer Appl icat ion Finite Adapt ive Buffer Modula tor Power/Rate Control ler Correlated Nakagami-m Fading Channel Receive Diversity Combiner Noiseless Feedback of C S I Higher Layer Appl icat ion Demodulator Perfect Channel Est imator Figure 3.1: Schematic of the cross-layer adaptive packet transmission system with receiver diversity of a finite-size transmission buffer, an adaptive modulator, a controller and a single transmit antenna. The controller determines the rate and power of the transmission depending on the instantaneous buffer occupancy and the channel condition. The fading channel over which the transmitter sends information is assumed to have Nakagami distribution. The buffer occupancy takes place as described in Section 2.2.2. Unlike the system model described in Section 2.2, the receiver in this system has multiple receive antennas. Let nR denote the number of receive antennas. In addition to multiple receive antennas, the receiver consists of a perfect channel state estimator and a demodulator. In the next section, we describe the modeling of the diversity channel. 3.3 Correlated Nakagami-m Channel In land-mobile and indoor-mobile multipath propagation environments, the received signal en-velope has Nakagami-m distribution. This distribution spans a wide range of fading conditions, and the fading severity parameter m ranges from \ to oo. In the special case, where m = 1, the signal corresponds to the Rayleigh distribution and does not have a direct line of sight (LOS) component. For m = 2,3, • • •, it closely approximates the Rice distribution, which does have a LOS component. As m increases, the LOS component becomes gradually stronger. When C H A P T E R 3. S C H E D U L I N G W I T H R E C E I V E R D I V E R S I T Y 7 5 m —* + 0 0 , the Nakagami-m fading channel converges to a non-fading AWGN channel. Like a non-diversity wireless channel, a slowly varying diversity Nakagami-m fading channel can be modeled as a FSMC by partitioning instantaneously received SNR 7 ^ at the output of the diversity combiner, which is proportional to the square of the received signal amplitude at the output of the diversity receiver, into a finite number C of non-overlapping states [125]. Let / 7 ( 7 d c ) and ^ 7 ( 7 ^ ) = J 0 7 d c fl(y)dy denote the probability density function (pdf) and cumulative distribution function (cdf), respectively of 7 ^ . Suppose C = {ci, C2, • • • , cc} de-note the state space of the combined diversity channel and T ^ = { 7 ^ , 7<f c i, • • • , 7<fc c} is the corresponding set of received SNR thresholds of the diversity combiner in increasing order with 7 d C o = 0 and 7 ^ = 00 . The average received SNR of each independent branch is de-noted by 7 = E { 7 } . When the fading of the considered Nakagami-m channel is slow, the first order FSMC model can be approximated as a Birth-and-Death process. The average duration of the channel in state Cj is TBi = = TslVda- The crossover and self-transition proba-bilities of the channel states can be found from (2.3), (2.4) and (2.7) by using the appropriate level crossing rate values given in the following sections. In the next sections, we discuss how to partition the received SNR and how to calculate the level crossing rate needed to compute transition probabilities for the different diversity-combining techniques considered in this the-sis. We consider two such techniques. We assume that the diversity channels are independent and the fading parameters of all channels are identical. 3.3.1 Selection-Combining Diversity The selection-combining (SC) diversity technique only processes one of the diversity branches, specifically the one with the largest instantaneous received SNR. The pdf and cdf of the re-ceived output SNR of a nfl-branch diversity combiner are [125], respectively, f-yilsc) = nR (3.1) and F 7 ( 7 S C ) = '-rim, ) r(m) (3.2) C H A P T E R 3 . S C H E D U L I N G W I T H R E C E I V E R D I V E R S I T Y 76 where ^ 3C — E{7S C} is the average received SNR at the combiner output. T(m) = /0°° i m _ 1 e _ t d t and 7(771, x) = f0x £m - 1e~*d£ are the complete and the lower incomplete Gamma functions, re-spectively. Using EPM (i.e., combining (2.6), (2.10) and (3.2)), the received SNR thresholds can be obtained numerically from the following equation: [T(m)]B« _ ) Use lipi,— ) 7sc C (3.3) The average received SNR of state Cj can be given by 1 PSCi (3.4) and the LCR of state Cj can be given by SCi N. SCi Is 2 m-/aCi e isc [r(m)]nfl 3.3.2 Maximal Ratio-Combining Diversity 7(m, Isc nn-1 (3.5) The SC diversity is the simplest form of the "space diversity on receive" technique and yields suboptimal performance. The maximal ratio-combining (MRC) diversity technique with per-fect combining (perfect knowledge of the branch amplitudes and phases) is the optimal di-versity scheme among linear diversity-combining techniques, but it requires knowledge of all branch parameters and independent processing of each branch. The pdf and cdf of the received output SNR of MRC schemes are given by [125], respectively, 777 and F"y(7mrc) r(mT O r c) 7 ( r a m r c / m T 7 m r C ) v jmrc ' (3.6) (3.7) r ( ? T 7 . m r c ) where, mmrc = mnR and j m T C = JUR. The average received SNR for state Cj can be given by 7mrcj 7n ^mTc^mra T (m. m r c ) - 7 ( ™ m r c + 1, [7("Wc + 1 , — = ) Imrc ^mrclfmrci-i 7mrc (3.8) C H A P T E R 3. S C H E D U L I N G W I T H R E C E I V E R D I V E R S I T Y 77 and the LCR corresponding to state c\ can be given by \/7?Tvf (m 'V \ ' " r n r C 2 mmralmrci TLT V^HJrn I 11'mrc Imrci \ = »- / i n \ NmTCi = nm~i l ~ i — 1 e • ( } 3.4 Optimal Rate, Power and BER Adaptation We consider two transmission rate, power and BER adaptation problems for M - Q A M systems with constraints on buffer costs. We formulate both cross-layer adaptation problems as CMDP problems. The formulated CMDP problem can be described by the following ingredients: a set of time-slots T = {1, 2, • • • , H}, a set of system states S — B x C = {s\, s2, • • • , ss} with total number of states S = ( B + l ) x C , a set of actions U = {ui, u 2, • • • , uv), a set of state-and action-dependent transition probability matrices V and a set of state- and action-dependent immediate cost vectors Q. Each action is mapped to a modulation scheme or M - Q A M con-stellation. The first action corresponds to no transmission, the second action corresponds to BPSK transmission, and the next higher actions correspond to a particular M - Q A M constel-lation. At the start of a particular time-slot n, the scheduler takes a particular action u{ and moves to a system state Sj from a state Sj according to the state transition probability given by the following, a A PsilSj(Ui) = PxiaO.xM 6 M a j ) " (^S«) - °fc + P(Ofc) (3-10) ak=0 where function S(x) returns 1 if x = 0 and 0 otherwise. Function %(s) gives the corresponding channel state of the composite state s and can be expressed as xw = E c '*(r(^i) i - i ) ( 3 - n ) where function \x) is called the ceiling function of x and gives the smallest integer that is greater than or equal to x. The buffer state corresponding to the system state s can be expressed as ^(s) = E E b ^ s ( s - ( 5 + ! ) i - » ) • ( 3 - 1 2 ) j=0 t=l C H A P T E R 3. S C H E D U L I N G W I T H R E C E I V E R D I V E R S I T Y 78 It can be seen that the considered problem forms a weakly communicating average cost MDP, since there exists a closed-set of states, with each state in that set accessible from every other state in the same set under some deterministic stationary policy, plus a possibly empty set of states which is transient under every policy [145]. The objective of the cross-layer adaptation problem is to compute an optimal decision rule to be used at all time-slots, ir — {/i1, / i 2 , • • • fim} from the set of all admissible policies IT. We assume that the immediate costs and transition probabilities do not vary with respect to time-slot n. Therefore, the policy to be determined is stationary (does not depend on the time-slot) and we can write it by 7r = {//, fi, • • • //}. For brevity, we denote it by ft. 3.4.1 Optimal Rate and Power Adaptation with Constant B E R In the first problem, we adapt transmission rate and power with the channel condition and buffer occupancy, keeping the BER of all channel states to a specified equal value. We are interested in minimizing three objectives, namely, average power, average delay, and average packet-dropping probability. Since the objectives are conflicting in a CMDP problem, we minimize average power with bounds on average delay and average packet-dropping probability. The objective in this case is to find the optimal stationary policy /z* so that 1 H mmGP = lim - V E [ G P ( s" , /Z^) ) ] (3.13) /x H—HX ri 71=1 1 H subjectto: J im — ^ E [ G D ( s " , / i " ( s n ) ) ] < D 71=1 1 H and lim - V E[Go(sn, /i"(sn))] < Popping, H-*oo ti ' 71=1 where GP (sn, /i"(sn)) is the immediate transmission power cost, and GD(sn,/j,n(sn)) and Go(sn, /in(s")) are the buffer-related costs described in Section 3.4.4. D and •^dropping are the maximum tolerable average delay and packet-dropping probability. C H A P T E R 3. S C H E D U L I N G W I T H R E C E I V E R D I V E R S I T Y 79 3.4.2 Optimal Rate and B E R Adaptation with Constant Power In the second problem, the transmitter power is assumed to be always the same, and we adapt transmission rate and BER with the channel condition and buffer occupancy. Our goal is to minimize average BER, average delay, and average packet-dropping probability. We formu-late the problem as a CMDP where we minimize average BER with constraints on average delay and average packet-dropping probability. Therefore, the objective is to find the optimal stationary policy fi* so that 1 H mmGB = lim - V E ^ S " , / / ^ " ) ] (3.14) n H—KX> ri » n=l 1 H subject to: lim — V E[GD(sn, /xn(s"))] < D H—»oo H —' n=l 1 H and lim — V E[G 0(s n, /xn(sn))] < Piping, H—>oori n=l where G B (s n, /i n(s n)) is the immediate BER cost. 3.4.3 Solution Techniques A linear programming (LP) methodology can be used to solve the CMDP problem (3.13) and (3.14) formulated above [145]. There is a one-to-one correspondence between the feasible (and hence optimal) solution of the LP and the feasible (and hence optimal) solution of CMDP; LP is feasible if and only if CMDP is feasible [148]. Let v{s,u) represent the "steady-state" probability that the process is in state s and action u is applied. We seek to find the control policy that is represented in terms of probability distribution v over S xU. The optimal policy C H A P T E R 3. S C H E D U L I N G W I T H R E C E I V E R D I V E R S I T Y 80 v* can be obtained by solving the linear program, min Gx(s,u)u{s,u) (3.15) s£S,u£Us s.t.: ^2 GD(s,u)v(s,u) < D seSjueiis ^ G0{s,u)l>(s,u) < Pdropping 5^ I/(S',U)= i /(s,u)p,y(u), V a ' e S i/(s, u) = 1; i/(s, it) > 0, Vs G <S and u G Us, s£S,u€U3 where ZYS is the set of actions that are allowed in state s and Gx(s, u) represents either imme-diate power cost Gp(s,u) or BER cost GB(S,U), depending on the problem. Suppose there exists an optimal solution v* to the LP problems (3.15). Then there exists a stationary policy p* that is optimal for the respective CMDP problem. The optimal policy /x* for CMDP is ran-domized and is uniquely characterized with a probability 0M-(s)(u) of applying policy u G Ua in state s G <S, where If Y J u ' e w , ^ " ^ i n 0 = 0 for some s G S, an action that drives the system to SR = {s G S : 2~2u'eus Z / '*( s) u >) > 0} * s chosen in each state [145]. In general, LP can handle problems with large numbers of variables, and the above linear programs can be easily solved using interior-point methods [144]. 3.4.4 Evaluation of Immediate Costs for the Above Problems When the scheduler takes an action in a particular state, it incurs a one step immediate cost Q : IC t-> R, where IC = {(s, u) : s G S, u G Us} is the set of state-action pairs. The costs for the above two problems are discussed below. C H A P T E R 3. S C H E D U L I N G W I T H R E C E I V E R D I V E R S I T Y 81 B u f f e r - R e l a t e d C o s t s We consider two different costs associated with the buffer, namely, buffer-delay cost and buffer packet-dropping probability cost, which can be found from (2.27) and (2.28), respectively. P o w e r a n d B E R C o s t s M - Q A M scheme is a useful modulation technique for achieving high data rate transmission without increasing the bandwidth of wireless communications systems. An approximate ex-pression of BER for M - Q A M , valid for both low and high SNR, can be expressed as [150] / / \ F««- - ^ - TM} |>c ((2i - ^m^p) • (3-'7) where v — log 2(M) is the number of bits that modulates a 2"-QAM symbol. For BPSK transmission, the BER is F e,BPSK = ^erfc (J^jPj • (3-18) • P o w e r C o s t : For a certain channel state c^  and action Uj, and with a fixed specified bit error rate Pe for all channel states, the power cost Gp = Pt can be found numerically from (3.17) for M - Q A M or (3.18) for BPSK using average received channel SNR (3.4) and (3.8) for selection combining and maximal ratio combining, respectively. It can also be calculated numerically from the following equation: 1 n*H Pe(ci,Uj) = - 7 — / Pe,BPSKM-QKM(ci,uj)fJ('ydc)d'ydc. (3.19) 9dci y 7 d c ._ l • B E R C o s t : Similar to a fixed transmitter power Pu for a certain channel state Cj and action Uj, the bit error rate cost GB — Pe. can be found either from (3.17) and (3.18) using average received channel SNR value or from (3.19) numerically. Note in (3.19) dc= sc or mrc. C H A P T E R 3. S C H E D U L I N G W I T H R E C E I V E R D I V E R S I T Y 82 3.5 Numerical Results and Discussions In this section, we present simulation results for both problems, employing both selection-combining and maximal ratio-combining diversity techniques at the receiver. Results are given for average transmit signal power P = 1 mW, average received SNR of each branch 7 = 1, packet size to number of channel uses ratio Np/Nj = 1, block duration TB — 10 - 4 seconds, number of channel states C = 10, buffer size B = 100 packets, number of actions U — 6, and maximum allowable packet-dropping probability -Propping = 10 - 3 . The set of packet transmission rates is W = {0,1,2,4,6,8} packets/time-slot. The first action corresponds to no transmission, the second action corresponds to BPSK transmission and the ith, i = 3,4, • • • , U action corresponds to square constellation 2 2 l _ 4 - Q A M transmission. We also assume that the input traffic is Poisson distributed with an average value A = 2 packets/time-slot and the maximum number of packet arrivals in a slot A = 10 packets. For the first problem, the power cost is determined using average BER, Pe = 10 - 4 . In Fig. 3.2, we present the variation of average power with average delay for SC diversity techniques with the Nakagami severity parameter, m = 2. It can be seen that, like the non-diversity channel discussed in Chapter 2, using the dynamic packet-scheduling technique for the diversity channel the average power decreases as the delay increases. Also, instead of transmitting in the next immediate time-slot (time-slot 1), by delaying packet transmission only by 1 more time-slot, the scheduler can save more than 50% of transmitter power. The rate of decrease of power with delay increases as the fading rate increases. When the number of selection-diversity branches increases, the receiver has more flexibility to choose a branch with the best received SNR. Therefore, as expected, the average transmitter power decreases for the same delay as the number of branches UR increases from 2 to 3. Fig. 3.3 shows the average power vs. average delay trade-off for MRC diversity with the Nakagami parameter m = 2. As with the SC case, we show the influences of fading rate and number of diversity branches in this case as well. These parameters have a similar effect on the performance curves as in the SC case. However, power savings for MRC as compared to C H A P T E R 3. SCHEDULING WITH RECEIVER DIVERSITY 83 C=10, m=2, U=6 Square M Q A M , B=100, Average BER=10~ 4 , =10~3 dropping - n R = 2 ' fmV001 •-nR= 2' fmV 0 0 2 . n =2, f T =0.04 R m B n =3, f T =0.01 R m B -n =3, f T =0.02 R m B •n =3,f T =0.04 m B u II l! 1 st++ >-o~ •e - -o--o- - e- - , 9 11 13 15 Average Delay [Time-slot] 17 19 21 23 25 Figure 3.2: Influence of the number of receive antennas UR and normalized fading rate fmTB on the average power vs. average delay curve for the selection-combining scheme at the receiver. SC increase as the number of branches increases. Since MRC takes all diversity branches into account optimally, it can be seen that the average power is less for MRC than for SC for the same average delay. Also, the rate of decrease of average power with delay for MRC is less than for SC. The effect of the Nakagami fading severity parameter m on the power/delay curve is shown C H A P T E R 3. SCHEDULING WITH RECEIVER DIVERSITY 84 T 1 1 r C=10, m=2, U=6, Square M Q A M , B=100, Average BER=10" 4 , P., =10~3 11 i i i i i i i i i i i I 1 3 5 7 9 11 13 15 17 19 21 23 25 Average Delay Figure 3.3: Influence of the number of receive antennas UR and normalized fading rate / m T s on the average power vs. average delay curve for the maximal ratio-combining scheme at the receiver. in Fig. 3.4. The curve is plotted for a 2-branch SC diversity combiner and different fading rates. Note that the behavior of the curve in terms of power savings improves as the value of m increases. This result is expected because as the value of m increases, the channel condition improves toward that of a non-fading channel. Therefore, the received SNR has fewer flue-C H A P T E R 3. SCHEDULING WITH RECEIVER DIVERSITY 85 C=10, n =2, U=6, Square M Q A M , B=100, Average BER=10 P H . =10" dropping 8.5 x m=2, f T =0.01 m B • -m=2, f T =0.02 m B - e - m = 2 , f T =0.04 m B + m=3, f T =0.01 m B e -m=3, f T =0.02 m B ^ - m = 3 , f T =0.04 m B Si V 8". © 11 13 15 Average Delay 17 19 21 23 25 Figure 3.4: Average power vs. average delay curve for selection combining scheme at the receiver with different Nakagami severity fading parameters m tuations. However, the gain obtained by increasing the number of diversity branches is much more than the gain of increasing parameter m of the channel. Fig. 3.5 depicts the average BER performance as a function of average delay for an SC diversity receiver with parameter m = 1, which corresponds to the Rayleigh fading. The trans-mitter power is kept constant at Pt — 8 mW, and curves are given for two different numbers of C H A P T E R 3. SCHEDULING WITH RECEIVER DIVERSITY 86 5 6 7 8 Average Delay [Time-slot] Figure 3.5: Effect of the number of receive antennas UR and the normalized fading rate fmTB on the average BER vs. average delay trade-off curve for the selection-combining scheme at the receiver branches, namely, TIR = 2 and 3. The influence of the fading rate is shown, and it can be seen that as the fading rate increases, the average BER decreases for the same delay. This result is due to the fact that as the fading rate increases, the channel stays in a particular channel state for a shorter time. Therefore, the accumulation of packets in the transmission buffer is smaller, C H A P T E R 3. SCHEDULING WITH RECEIVER DIVERSITY 87 1 i i i i 1 1 1 1 r Average Delay [Time-slot] Figure 3.6: Effect of the number of receive antennas nR and the normalized fading rate fmTB on the average BER vs. average delay trade-off curve for the maximal ratio-combining scheme at the receiver. and the scheduler can send packets with a lower-order modulation, since lower-order modula-tion has a lower BER for a particular channel state with fixed transmission power. Therefore, the average BER decreases as the fading rate increases. It can also be noted that for a larger number of branches, the rate of decreasing BER increases as the fading rate increases. As we C H A P T E R 3. S C H E D U L I N G W I T H R E C E I V E R D I V E R S I T Y 88 discussed earlier, for a larger number of diversity branches, the scheduler has more flexibility to choose the best branch with the highest received SNR, therefore, the BER is lower for a higher number of diversity branches. Since in the lower average delay region another degree of freedom is added, the flexibility of the scheduler is even greater in these regions, resulting in an increased rate of BER decrease. The BER/delay tradeoff curve for an MRC diversity scheme is shown in Fig. 3.6. As with the SC case, this curve is also plotted for m = 1 and shows a similar effect of the parameters nR and fm. However, the transmitter power in this case is kept at Pt = 4 mW. It can be noted that the adaptive MRC diversity M - Q A M scheme shows a similar BER/delay performance curve as the adaptive SC diversity M - Q A M scheme, but with 50% of the transmitter power. However, it is obvious that this gain comes with increased receiver complexity in the MRC. 3.6 Conclusions The M - Q A M scheme is a useful modulation technique for achieving high data rate transmis-sion without increasing the bandwidth of wireless communications systems. We have presented two constrained Markov decision process-based transmission adaptation techniques over a di-versity Nakagami-m fading channel with memory. For each cross-layer adaptation problem, we have shown performance results for selection-combining and maximal ratio-combining techniques with different fading rates, Nakagami-m parameters, and numbers of branches. The results show that the scheduler can save power or reduce BER by adapting the transmission rate with buffer occupancy in addition to adapting only with channel conditions. C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 89 CHAPTER 4 .Rate Adaptation over MIMO Channels 4.1 Introduction We extend the contribution of Chapters 2 and 3 in this chapter by considering multiple-input, multiple-output (MIMO) channel and selective-repeat, automatic repeat-request (SR-ARQ) protocol. We study a cross-layer optimization problem for a rate-adaptive M Q A M system that maximizes throughput, and minimizes packet error rate, delay and overflow using the framework of a Markov decision process over a Nakagami-m MIMO fading channel. Spatial diversity using multiple transmit and/or receive antennas can mitigate channel fading without sacrificing bandwidth resources, and can increase capacity significantly over single-antenna systems. Space-time coding is an effective transmit diversity technique to combat fading in wireless communications. Space-time block codes (STBC) are attractive techniques in wire-less communication, because of their complexity and the high bit-rate and high capacity trans-missions that result. STBCs are designed to achieve the maximum diversity order for a given number of transmit and receive antennas, and using a simple maximum likelihood decoding al-gorithm based on linear processing. STBC schemes using two transmit antennas are proposed in [151], and later generalized for multiple transmit antennas in [152] using orthogonal designs. As discussed in previous chapters of this thesis, link adaptations are another powerful tech-C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 90 nique to compensate for the random variation of wireless channels, where some parameters, such as transmitter power, symbol transmission rate, constellation size, coding rate/schemes or any combination of these parameters are adapted to channel conditions [16]. In this chapter, we combine both techniques to evaluate the performance of M - Q A M systems. We employ SR-ARQ at the data-link layer to guarantee link layer reliability. A brief summary of our contribution is presented below: • A cross-layer adaptation for an M - Q A M system is investigated for use in a Nakagami-m MIMO channel under the framework of a Markov decision process. The objective of the scheme is to maximize throughput, and minimize packet error rate, delay and overflow. • We analyze the rate adaptation problem from the cross-layer point of view by considering transmit diversity and adaptive modulation at the physical layer, and selective-repeat ARQ at the data-link layer. Transmit diversity is achieved using an orthogonal space-time block code. • We derive the level crossing rate of the Nakagami-m MIMO fading channel and model the channel as a finite-state Markov channel. • We formulate the problem as a Markov decision process and provide a policy iteration algorithm for computing optimal cross-layer adaptation policies. A l l of the elements of MDP (e.g., system state transition probability, costs, etc.) are derived. • Simulation results are provided for different Nakagami-m parameters, different numbers of actions and different numbers of receive antennas, consider s finite-size buffer and random incoming traffic arrivals. The remainder of the chapter is organized as follows. In Section 4.2, we describe the system model including modeling of incoming traffic, buffer and Nakagami-m MIMO channel with transmit diversity. We also explain the nature of the problem and our objective in this section. The formulation of the problem as a Markov decision process problem, as well as derivation C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 91 of A C K / N A K probabilities, cost functions, transition probability, and solution technique, are discussed in Section 4.3. We provide simulation results in Section 4.4 and conclude in Section 4.5. 4.2 System Modeling Data is encoded using a space-time block code, and the encoded data are split into nr streams that are simultaneously transmitted using n T transmit antennas. The received signal at each receive antenna is a linear superposition of the B T transmitted signals perturbed by noise. Maximum-likelihood decoding is achieved in a simple way through decoupling of the sig-nals transmitted from different antennas, rather than joint detection. This procedure uses the orthogonal structure of the space-time block code and yields a maximum-likelihood decoding algorithm based only on linear processing at the receiver. In this chapter, we consider a MIMO system with finite transmission buffer, as shown in Fig. 4.1. It employs adaptive modulation (AM) and transmit diversity through orthogonal STBC at the physical layer, and SR-ARQ at the data-link layer. The packets come from a higher-layer application and are stored in a buffer of size B packets. We assume that the nature Random Incoming Packets, a° 1 Finite Buffer SR-ARQ 1 ACK/NAK Feedback Controller | * m , n . T ? A-MIMO Fading Channel J 1 ' Z , 5 Rate Controller Adaptive STBC Modulator Symbol Detection Demodulator . Packet Output Noiseless Feedback of CSI Perfect Channel Estimator Figure 4.1: Schematic of the Adaptive M Q A M system using SR-ARQ and STBC of the incoming packets are random, but obey Poisson distribution given by (2.1). C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 92 4.2.1 Problem Preliminaries We assume that the scheduler determines the number of packets to be transmitted in each time-slot. To decide how many packets should be transmitted, the scheduler relies on the randomly varying buffer state information and channel state information. Although the buffer state information is available at the transmitter, we assume perfect channel state information is also available at the transmitter without latency over a noise-free feedback channel. Let us assume that B — {b0, h, • • • , bs} and C = {ci, c 2, • • • , cc} denote the state space of the buffer and the channel, respectively. Thus, bi corresponds to i packets in the buffer and bi < bi+i. Note that the scheduler determines the number of packets to be transmitted dynamically. For example, no packet will be transmitted if the buffer has insufficient packets to transmit, although the channel state may be very good. 4.2.2 Objectives of the Scheduling Problem Since the nature of the problem is dynamic, it falls into the general category of stochastic dynamic programming problems. To solve this dynamic problem, we form a composite state space, combining the buffer state space and channel state space. Let us denote this composite state space by S = B x C = {si, s2,--- , ss}, where sk — [bi, Cj], i = 0,1, • • • B, j = 1,2, • • • , C, and A; = 1, 2, • • • , S. S = (B + 1) x C is the total number of states of the system. At each time-slot n, the scheduler chooses an action depending on the current composite system state sn. A decision rule denoted by //" specifies the action at time-slot n. Decision rules for all time-slots constitute a policy of the problem. Therefore, this policy (also called a contingency plan) can be given by TT = {fJ?,^1, • • • ,HH}, where H is the horizon of the problem. We consider an infinite-horizon problem, where our objective is to optimize long-term average expected cost for different goals to be achieved over horizon H —> oo. We are interested in maximizing throughput and minimizing packet error rate, delay and overflow. Note that, for the ARQ case, these objectives are uni-directional, that is, maximizing effective throughput automatically minimizes packet error rate, delay and overflow. Let f l denote the set of all C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 93 admissible policies TC, i.e., the set of all sequences of functions IT — {/z0,//, • • • } with u-n : <S i • Us, where Us denote the set of actions available in state s. A stationary policy is an admissible policy of the form 7r = {//, / v • •}, and its corresponding cost function is denoted by G^. The long-term average expected throughput associated with the stationary policy / i can be expressed as, G% = lim ^-E H E G ™ ( s n , p ( s n ) ) n=l (4.1) where GTH(S", MS")) *s m e immediate throughput reward at state sn for action /i(s n). 4.2.3 Correlated M I M O Channel Using Transmit Diversity Let the MIMO system have nT > 1 transmit antennas and nR receive antennas. We assume that the channel is slowly varying and flat-fading, and can be modeled with independent quasi-static Nakagami-m distribution. That is, the spacing between antenna elements at both ends of the transmission link should be sufficient and the received SNR remain constant for at least a time-slot. The time-slot over which the received SNR remains constant satisfies the condition, rp < 5^23. Therefore, the MIMO system with diversity order L = nTnR can be represented Jm by the channel matrix, H=[aiJfcexp(t0ifc)]J2L"r. ( 4 - 2 > where i is an unit imaginary number, i.e., i2 — — 1, ajfc is the path gain between the kih transmit and jth receive antennas, and 0^ is the phase, which is uniformly distributed across [0,2ir]. The input-output relationship can then be expressed as Y = H X + V, (4.3) where the received signal Y is a nR x Nf matrix, X is the nrxNf matrix of transmitted symbols, the receiver noise matrix V is nR x Nf with elements i.i.d. complex circular Gaussian random variables, each with a CJ\f(0, a2) distribution, and Nj is the number of symbols transmitted in a time-slot. The probability distribution function (pdf) of path gain ajk for a Nakagami-m C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 94 distribution is given by where Cljk — E[a| fc] is the average fading power. To achieve transmit diversity through STBC over wireless links, each iV* < Nf input symbols is mapped into UT orthogonal sequences of length Nf to be transmitted simultaneously with TIT transmit antennas. The information code rate of the STBC is therefore given by Rc = Ni/Nf. The input-output relationship of each sub-channel before maximum likelihood detection can be described by the following relationship [75]: y, = c\\U\\2F x, + v„ (4.5) where c is a code-dependent constant that satisfies P„ — Pt/cnTRc- P3 is the average energy per transmitted symbol and Pt is the total transmit power per symbol time. In (4.5), \\.\\ F denote the matrix Frobenius norm, xs corresponds to either the real or imaginary part of a transmitted symbol with power Ps/2, and vs is the noise term after STBC decoding with distri-bution M(0, c\ |H|\ 2 Fo 2/2). The effective received SNR per symbol at the output of the decoder is therefore given by [75] *-dbcl|H|1'- (4-6) where 7 = mPt/a2 is the average SNR per receive antenna, and \\H\\F = 2~2j,k a% is t n e s u m of L independent Gamma random variables, each with parameter m and unit mean. Using ran-dom variable transformation techniques, the SNR at the output of the receiver can be given by the Gamma distribution with parameter mT = rnL and mean 7 T = < 7 > = mLj/mnTRc = npfy/Rc. Its pdf is given by [75] H t ) - | ^ ( ? r ^ f ° r 7 . > 0 . (4.7) We model the slowly varying Nakagami-m flat fading MIMO channel as a FSMC by quantizing the received SNR at the output of the decoder into a finite number of states. We use EPM (2.10) to obtain the received SNR threshold, where stationary probabilities are found using (2.6). Let C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 95 ^si-i a n d jSi denote the lower and upper received SNR thresholds associated with channel state Cj. Thus, the probability of staying in channel state Cj G C can be given by _ 7 K , j r 7 * ) - 7 ( ^ , ^ 7 ^ ) ) (4.8) T(m T) where the cumulative distribution function (cdf) of the received SNR is given as F h ) = 7 ( m r ' ^ (4 9) 7 l 7 s J r(mr) • ( } The level crossing rate at the corresponding received SNR threshold /ySi can be deduced in a similar way as [125] V S ^ C r n ^ y - i e M _ ^ h ( 4 1 0 ) r(m r) V 7r / 7r Using the level crossing rate given by (4.10), the crossover probabilities of the channel states can be approximately calculated using (2.3) and (2.4). Also the self-transition probabilities of the channel states can be found using (2.7). 4.3 Scheduling for Rate-Adaptive M-QAM Systems It is clear from the discussion in the previous section that the cross-layer scheduling problem forms an average-cost discrete-time Markov decision process (DT-MDP) problem [145], where our task is to maximize long-term average throughput. Hence, mathematically, our objective is to find a stationary optimal policy, / i G II, so that, 1 max GT„ — max lim — E„ H n=l (4.11) The average-cost DT-MDP problem can be solved using dynamic programming algorithms, such as the relative value iteration (RVI) algorithm, policy iteration (PI) algorithm, etc. For the scheduling problem at hand, the PI algorithm is computationally less expensive and converges faster than the RVI algorithm. Therefore, we use a PI algorithm to compute stationary optimal deterministic policies. C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 96 4.3.1 D e s c r i p t i o n o f the E l e m e n t s o f the F o r m u l a t e d M D P The formulated MDP problem consists of the following elements: a set of states <S, a set of ac-tions U, a set of costs Q and a set of transition probabilities V [145]. Let U = {ui, u 2 , • • • , denote the set of U actions available for the rate-adaptive M - Q A M system. Therefore, the evolution of our MDP problem can be stated as: at the start of a particular time-slot n, the scheduler determines the state of the system s" from the knowledge of buffer state bn and channel state c". It applies the optimal action un from the action sets available for state s". As a result of taking a particular action un in state sn, the scheduler receives a reward GTH{Su, un) and moves to a new state sn+1 G S, determined by the transition probability. Each action un corresponds to a one-to-one transmission mode T M n that uses a particular M-ary quadrature amplitude modulation (M-QAM) scheme and transmits a particular number of packet wn from the buffer. Let W = {wi,w2, ••• ,wv} denote the set of number of packet transmissions that corresponds to a set of transmission modes or set of actions. Therefore, we can write wn — ^ (u"), where $ : W H> W. The updated buffer occupancy can be calculated from the decoding results and incoming traffic, and can be expressed as 6 n + 1 = min {bn - (w11 - weT) + an, bB} , (4.12) where wer is the number of packets in error in a time-slot. Note that wn is constrained between 0 and bn, i.e., wn = [0,6"]. Each packet in the data-link layer comprises Np information bits that include serial number, payload, and cyclic redundancy check (CRC) bits. Let transmission mode TMj correspond to an M - Q A M rate of Ri = log 2(M) bits/symbols; hence, each packet is mapped to Np/Ri sym-bol blocks. For action Ui and a corresponding M - Q A M rate Ri, the total number of symbols per time-slot at the input of the STBC encoder is given by Ni — Nc + WiNp/Ri, (4.13) where iV c is the number of pilot and control symbols. C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 97 4.3.2 Immediate Reward for the Objective of the Problem As we discussed in Section 4.2.2, for systems with an ARQ protocol in the data-link layer, maximizing average throughput is equivalent to minimizing packet error rate or buffer delay, or packet overflows. Therefore, our only objective is to maximize average throughput. As in [16], we assume a Nyquist pulse shaping filter with bandwidth Wf — 1/TS, where Ts = TB/Nf is symbol duration. When action Uj is chosen, each transmitted symbol carries Rti = Rc log2(Mj) information bits, where M ; is the constellation size of the M - Q A M corresponding to action i ij . Since SR-ARQ is used to handle packet retransmissions, only erroneous packets in a par-ticular time-slot will be retransmitted. Therefore, the immediate effective throughput can be defined by the number of packets received without error for a particular action in a particular state. Therefore, the immediate throughput reward in state Sj when action Ui is taken can be expressed as Wi GTH(sj,Ui)= ^2 wscp{wac\sj,Ui), (4.14) where p(w3C\, Sj, iiA is the probability of receiving w3C packets successfully without error, given that Wi packets have been transmitted for state-action pair (sj,Ui). The number of packets, w3C = u>i — weT received without error for state-action pair (SJ, Ui) can be obtained from the single packet error rate, discussed in Section 4.3.3. We analyze the performance of the scheme in terms of average delay in addition to average throughput, where the the immediate buffer delay cost can be written as (2.27). 4.3.3 P E R and A C K / N A K Probability The instantaneous packet error rate with received SNR ry3i can be given in terms of BER by the following: P p ( 7 s J = 1 - (1 - P . ( 7 J ) " ' . (4-15) The exact expression for BER with rectangular M - Q A M transmission can be given by [153] / l o g 2 / l o g 2 J \ P M - i ^ ( M ) (E ™ + E ftWj • (4.16) C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 98 where M = I x J, and.P/(fc) is the error probability of the kih, k e {1,2, • • • , log 2 /} bit in 7-ary PAM and can be obtained from [153] the following: ( l - 2 - f c ) / - l i=0 The expression for Pj(V) is similar (see [153] for details). The average packet error rate for channel state Cj when the scheduler chooses action Ui in a particular time-slot n can be calcu-lated numerically as follows: Pp(Cj,Ui) = i - p P„(7)/ 7(7)d7- (4-17) The receiver sends a negative acknowledgment (NAK) to the transmitter, requesting retrans-mission of the same packet, when the packet errors are uncorrectable by the error correc-tion code. Therefore, the N A K probability PN(cj,Ui) for channel state-action pair (cj,Uj) is given by (4.17). The positive acknowledgment (ACK) probability is simply PA{cj,Ui) — 1 - Pp(Cj, ut). The probability of wer packets being in error among u>i transmitted packets for the channel state-action pair (CJ, ut) can be found as follows: P(wer\Cj,ui) = (Wi) pp(Cj,uir~(i -pp(Cj,ui)r-w-. (4.i8) Note that the packet error rate and hence N A K / A C K probability does not depend on the buffer state. Therefore, for all bn, the N A K probability for state-action pair (s n, un) can be written as p{o2\sn,un) = PN(cn,un). 4.3.4 S y s t e m S ta te T r a n s i t i o n P r o b a b i l i t y The state transition probability for all Ui e USi and S j , Sj € S can be written as follows: Psi,Sj{ui) = Px(si),x(sj) ^2 S P{ai)p{wsc\suui)5 (V>(sj) - min(^(s i) + a{ - wsc, bB)) Wsc=0 ai=ao It can be easily proved that for nonconstant traffic with CLA > *(«rj), the Markov chain associ-ated with a particular policy has a unichain structure and hence the problem can be formulated ( -D L l ( 2f c - 1 -i.2 k-l 1 + 2 )erfc (2t + 1) 31og 2 (M) 7 a i P + J 2 - 2 C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 99 as a unichain MDP [132]. A unichain MDP has a recurrent class, where for every pair of states there exists a stationary policy under which one state is accessible from other states, and a possibly empty transient class. 4.3.5 Solution Techniques The DT-MDP problem at hand can be solved using dynamic programming techniques. We use a policy iteration algorithm that consists of two steps: policy evaluation and policy improve-ment [145]. In the kth policy evaluation step, we obtain average cost \£J and differential cost h^k\si) for policy // f c) and Sj e S satisfying, A£> + hM{8t) = GTH(Si,^(Si)) + £ p ^ > ( % ; ) ) / i ( f c ^ ; ) - (4-19) j=i Note that A i ^ and can be computed as the unique solution of the linear system of equations (4.19) together with the normalizing equation h^k\st) = 0, where st is any reference state. The stationary policy for the first policy evaluation step can be initialized arbitrarily. In the kth policy improvement step, we find a stationary policy ^ k + l \ where for all S j , / # + 1 ) ( S J ) is such that s V . . . f u ( f c + 1 > i * M ( f c ) i s , 0 = G r a ( a « , / i ( f c + 1 ) ( a 0 ) + £pW/* ( f c + 1 )(*))fc ( f c )(«i) = max j=i s GTH(si,u) + J]p. j, J i(«)fc (* )(sj) If / / f c + 1 ) = ^ k \ the algorithm terminates; otherwise, the process is repeated, with p^k+^ replacing jjSk\ 4.4 Numerical Results and Discussions Simulation results for the optimal cross-layer packet scheduling problem over a Nakagami-m MIMO channel are given in this section. Unless otherwise specified, we use the follow-ing data for all the simulations: number of channel states C — 4, maximum Doppler fre-quency fm = 100 Hz, Nakagami parameter m = 1, number of blocks per second RB = 10 4 C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 1 0 0 blocks/second, buffer size B — 50 packets, average arrival rate A = 4 packets/time-slot, maximum packet arrival A — 15 packets, full rate STBC (i.e., Rc = 1), number of transmit antennas nr = 2, number of receiver antennas UR = 1, packet size Np = 1080 bits/packets, and number of actions U=8. The first action corresponds to no transmission, the second ac-tion corresponds to BPSK transmission, the ITH action corresponds to 2 , _ 1 - Q A M transmission, where i = 3,4, • • • , 8. We assume that the scheduler does not drop any packet if the transmis-sion is unsuccessful. The only unavoidable packet drop from a finite-size buffer could be due to overflow. In Fig. 4.2, we show the average throughput as a function of average received SNR for various numbers of receive antennas and various normalized fading rates. It can be seen that the average throughput increases as the average received SNR increases. This result is expected because as the average received SNR increases, the probability of packet error decreases for all modulation schemes. Hence, the scheduler can select a higher-order modulation scheme to send a greater number of packets, which results in a lower packet overflow and buffering delay, and hence higher throughput. For a fixed average received SNR, the throughput increases as the number of receive antennas increases. As the number of receive antennas increases, the re-ceiver has more diversity in terms of getting a better channel state, hence the result is improved throughput. Higher fading rates also give larger throughput, due to more frequent switching from a bad channel state to a good channel state, which results in less storing of packets in the buffer and consequently less overflow from it. We show the effect of number of actions on the average throughput for a particular traffic model in Fig. 4.3. This figure explains how many maximum number of actions are needed by the scheduler to cope with particular traffic. We consider two different Poisson-distributed traffic models. It can be seen from the figure that for A — 2, the throughput increases as the number of actions increases from U = 3 to U = 4. However, further increase in the number of actions does not increase the throughput further. A similar limit is also seen for A = 4, where more than [7 = 6 actions is unnecessary. Therefore, we can conclude that the rate set should not be chosen arbitrary, but rather should be chosen according to the traffic at hand. C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 101 + 4Jl + + 4) + 0 + + ,ti — fmV0-01'nR=1 fmV001'nR=2 f T =0.01, n B=3 m B R — — f T =0.01, n =4 m B R - a- fmV004' n R = 1 - e- f m T B = 0 0 4 ' n R = 2 - v - fmV°04' " R = 3 - 0 - fmV004' n R = 4 12 16 20 Average Received S N R in dB 24 28 30 Figure 4.2: Average throughput for different numbers of receive branches. In Fig. 4.4, the effect of the Nakagami severity parameter m is shown on the average throughput vs. average received SNR curve. It can be seen that in the lower SNR region, the throughput decreases as the value of m increases. But in the higher SNR region, the throughput increases as the value of m increases. The variation of channel fading decreases as the value of m increases. Therefore, in the lower average received SNR region, the packet error rate in the higher channel state increases as m increases, which gives reduced average throughput. On the C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 102 A,U = 5 - e - A = 4,1/ = 6 A = 4,1/ = 8 —~A = 2,f/= 3 - e- A = 2,[7 = 4 A = 2,1/ = 6 1 1 1 ft„«"»-9 •fr i * ® ~ -a—jg—g- -g- -y- f - y - - y - -y- -y - - t — y — i — f 0 * 10 12 14 16 18 20 22 24 Average Received S N R in dB 26 28 30 Figure 4.3: Average throughput for different numbers of actions. other hand, in the larger average received SNR region, the packet error rate for a lower channel state decreases as m increases, and hence the average throughput increases. The influence of parameter m on average delay is shown in Fig. 4.5. It is seen that the average delay decreases for all the curves as the average received SNR increases. Packet trans-mission is more successful for better channel conditions. For this reason, fewer packets are stored as average SNR increases and hence delay decreases. As mentioned earlier, since the C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 103 0 2 4 6 8 10 12 14 16 18 20 Average Received S N R in dB Figure 4.4: Effect of Nakagami parameter on the average throughput. packet error rate in a lower channel state decreases in the higher SNR region, the delay de-creases as m increases. C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 104 Figure 4.5: Effect of Nakagami parameter on the average delay. 4.5 Conclusions In this chapter, we have presented a cross-layer optimization problem that maximizes through-put and minimizes the packet error rate, buffer delay and packet overflow. An MDP-based framework has been utilized to schedule packets based on both the channel state and the buffer state information. Transmit diversity using orthogonal STBC and adaptive M - Q A M have been C H A P T E R 4. RATE ADAPTATION OVER M I M O C H A N N E L S 1 0 5 employed in the physical layer, and SR-ARQ has been employed in the data-link layer. We have established that the throughput of the cross-layer adaptation scheme depends on the received SNR as well as on the number of actions and incoming packet arrival statistics. Increasing the number of actions does not necessarily increase the throughput. C H A P T E R 5. R A T E A N D P O W E R A D A P T A T I O N F O R H A R Q S Y S T E M S 1 0 6 CHAPTER 5 Rate and Power Adaptation for HARQ Systems 5.1 Introduction In the last three chapters, we discussed the cross-layer adaptations of transmission rate with both physical layer and data-link layer parameters. In this chapter, we present a general frame-work for joint rate and power adaptation of type-I hybrid ARQ systems. This framework can be applied to the adaptive resource allocation problem on correlated flat-fading or frequency-selective fading channels for bursty non-constant packet arrivals. As discussed in Chapter 1, reliability and throughput are two important aspects of packet-oriented communication systems over wireless channels. Hybrid automatic repeat request (hybrid ARQ) that employs one code for error correction and another code for error detection is frequently used to achieve these two goals simultaneously. As we discussed in the previous chapters that from a higher-layer per-spective, resource adaptation techniques for time-varying fading channels in general require the use of a finite transmission buffer to provide a degree of flexibility in handling a time-varying number of transmitted packets. This buffer introduces time-varying delay in the transmission of packets, which can have detrimental effects on the transmission of delay sensitive data, such as voice or video traffic. The effect of the delay variation is even more exacerbated in the case of slow-fading channels where long deep fades can introduce intolerable delays in transmis-C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 107 sion. Buffer overflow is another important factor that should be taken into consideration while considering communications of finite buffer systems over slow-fading channels. However, as-suming infinite transmitter buffer, conventional adaptive transmission scheme with perfect CSI at the transmitter does not consider the buffering delay and packet loss due to overflows. These schemes also have not considered the packet arrival statistics and have assumed that packets are always available for transmission. A brief summary of our contribution in this chapter is presented below: • We propose cross-layer adaptation techniques to optimize the transmission power, buffer-ing delay and packet overflow rate for an adaptive type-I hybrid ARQ scheme. Both the transmission power and rate are varied with buffer occupancy and channel conditions to obtain the optimal scheduling strategy over correlated fading channels. We consider two cases for packet scheduling that are of practical importance. • In the first problem of this chapter, we tackle the scheduling problem when both the perfect information of the channel and the observation feedback of the decoding results are available from the receiver. The transmitter takes transmission decision based on the instantaneous channel state, and SW-ARQ protocol is used to send decoding results of the transmission. We formulate and solve the cross-layer optimization problem using the tools provided by the theory of Markov decision processes (MDP). • In the second problem, we investigate the adaptation of the transmission power and rate for the case where the perfect channel state information is not available, and transmission decisions are taken based on the observation feedback of the previous transmissions. For this case, we propose a new technique to estimate the channel conditions. The control ac-tion and observation feedback of several previous time-slots is used as a history record to estimate the probability of A C K / N A K for the present time-slot. We model the delay and packet overflow constrained adaptive type-I hybrid ARQ as MDP problem. In essence, this is a partially observable Markov decision process (POMDP) problem. However, by C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 108 truncating the history tracking, posing the problem as MDP and finding of the optimal policies, it becomes much less computationally intensive. • We analyze the performance of both models, relying on the knowledge of the channel statistics, which is assumed to be modeled as a finite-state Markov channel model. We address and analyze the finite-state analysis of the frequency-selective channel with the MMSE receiver in this chapter. In this chapter, we also assume bursty Poisson arrivals and a finite-size buffer. • The optimal adaptation laws for both models are computed using the theory of MDP by applying relative value iteration, policy iteration or linear programming algorithms for both the flat fading and the frequency-selective fading channel. Simulation results are given for different fading rate and buffer length for all cases. The organization of the chapter is as follows. Section 5.2 introduces the general system model, the traffic model and the channel model for the adaptive type-I hybrid ARQ scheme considered in the chapter. In Section 5.3, we demonstrate the formulations of the power and rate adaptation problems as MDP. We discuss two cases of the problem according to the availability of perfect CSI through feedback channel. We also provide the associated costs and transition matrices for both cases of CSI knowledge. The solution techniques employing iterative dynamic program-ming and linear programming are given in Section 5.4. Numerical results and discussions are given in Section 5.5, while conclusions are given in Section 5.6. 5.2 General Model Description We consider a point-to-point wireless link between two units equipped with a finite transmis-sion buffer over a fading channel. Packets are arriving from a higher-layer application and are placed into a transmission buffer of length B. Like Chapter 2, we assume discrete-time model and communication over infinitely long horizon. Unless otherwise specified, we assume that the same descriptions and notations as previous chapters will be applicable for this chapter. C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 1 0 9 5.2.1 System Modeling Higher-layer Application a" I Finite Buffer w Size=B Higher-layer Application i fa-Observation Feedback, <Dn (a) i (b) Rate and Power Controller Error Detection/ Correction Encoder u Adaptive Modulator Fading Channel Perfect Channel State Estimator Demodulator Error Correction/ Detection Decoder Figure 5.1: Schematic of the Adaptive Type-I Hybrid ARQ Systems (a) when perfect CSI is available (b) when perfect CSI is not available. We assume that at the beginning of each time-slot n, the transmitter takes wn packets from the buffer and first encodes the corresponding k — wnNp bits using a high-rate error detection code (k\ k). We use cyclic redundancy check (CRC) for error detection purposes and assume that it is capable of detecting all the errors perfectly in the received codeword. These encoded packets are subsequently encoded using a forward error correction (FEC) code. A particular (n',k') Bose-Chaudhuri-Hocquenghem (BCH) code (cf. [154]) is adopted in this chapter for error correction. Note that the analysis is general enough to take into account other codes including convolutional codes, turbo codes, etc. as well. The resulting codeword is modulated with an adaptive multi-level modulator and transmitted over the fading channel in time-slot n. The modulation constellation and the transmission power is dependent on the action taken at the start of that particular time-slot. The control action chosen at the start of a time-slot is continued until the end of that time-slot. We assume that all the packets transmitted in a block during a time-slot experience the same channel gain. Note that the processing unit at the data-link layer is a packet consisting of Np bits and the processing unit at the physical layer is a frame consisting of Nf modulated symbols. We assume a Nyquist pulse shaping filter with bandwidth Wf = l/Tg, where T3 = TB/Nf is the symbol duration. Therefore, the C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 1 1 0 number of symbols per frame can be expressed as, Nf = Nc+ R d R c R . , where Nc is the number of pilot and control symbols, and Rd = k/k' and Rc = k'/n' are the error detection and error correction coding rate, respectively. Ri = log 2(M) is the number of bits that modulates a single transmitted symbol in the frame corresponding to the constellation M of the M-ary modulation scheme. The value of wn depends on the modulation and coding rates and can be given by wn — —V, 1 c d = constant x R+. After being received at the receiver, the frame is demodulated. JVp The error correction decoder first attempts to correct any errors in the received demodulated frame, then the decoded frame is checked for error detection. Observation feedback is sent to the transmitter over a noiseless and fast feedback channel. We use SW-ARQ protocol in the data-link layer for handling retransmission of erroneous packets. If no errors are detected, the packets are delivered to the higher-layer application and an A C K observation feedback is sent to the transmitter. Accordingly, the transmitter removes those packets from the buffer. Otherwise, the receiver discards the packets and sends a N A K observation feedback requesting a retransmission of the same packets. The process is repeated until the packet is successfully received. We assume that the observation feedback is received without any delay. Note that A C K and N A K are the observations in our framework and we denote the observation state space by Q = { w i , ^ } = {ACK, NAK}. The number of packets arrive in a particular time-slot is distributed either as i.i.d. uniform distribution or as i.i.d. Poisson distribution. For i.i.d. uniform distribution, the probability of a* packets arrival in time-slot n is given by p(a" = O i ) = and for i.i.d. Poisson distribution it is given by (2.1). The dynamics of the buffer in terms of packet occupancy is given by the following: bn+1 = min {max(b0,bn - unwn + an),bB} . (5.1) The value of parameter ojn equals 1 if the previous transmission is successful and A C K is received, and 0 if the previous transmission is unsuccessful and N A K is received. It is clear that the transmitter has to send at most the number of packets that are currently stored in the buffer, i.e., 0 < wn < B, even when the channel condition is good enough to send more packets. Also, if the number of arriving packets is more than the empty space buffer currently C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 111 has, additional packets will be dropped due to overflow. 5.2.2 Markov Modeling of Rayleigh Fading Channel Let x(m) be the sequence of signals sent by the transmitter with Pt = Pt(m) — E[(x(m))2] denoting the transmitting power. The signaling is assumed to be symmetric, i.e. E[x(m)] = x(m) = 0. In time-slot n, the channel with inter-symbol interference (ISI) can be approximated by an equivalent discrete-time linear filter. It has finite-length impulse response of length J taps with coefficients /i(0), h(l), • • • ,h(J — 1) that are assumed to be known to the receiver. The sequence of signal on the receiver side z(m) is given with where u(m) is the additive Gaussian noise with variance a2,. Let us denote the received power gain of (I + l)th tap as a(l) — {h(l))2, I = 0,1 • • • , J — 1. Also let {cx, c 2, • • • , cc} denote the state space of the channel with C states. We consider both the flat-fading and the frequency-selective fading channel for analyzing the performance of the rate and power adaptive type-I HARQ system. Whether flat-fading or frequency-selective fading occur depends on the modulated symbol bandwidth and the co-herence bandwidth of the channel. If the bandwidth of the transmitted symbol is larger than coherence bandwidth, the channels undergo frequency-selective fading channels [155]. Flat Fading Case The flat fading channel can be considered as a special case of frequency-selective channel. For the flat-fading case, the number of taps J is equal to 1 and the channel is uniquely described by the sequence of fading power gains a — a(0). We assume that the fading of the channel is slow enough to model it as a first order finite-state Markov channel. We quantize the sequence of channel gains with the set of following received power gain thresholds A = {ao, a i , • • • , a/}, where ao = 0, aj = oo, and a< < aj for i < j. In the flat-fading case, the number of channel (5.2) C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 112 states C is equal 7. Without loss of generality for our simulations, we use the equal probability method for partitioning the channel gain thresholds, due to combination of its inherent simplic-ity and accuracy as described in Section 2.2.3. The stationary probability of Ith, i = 1,2, • • • , 7. power gain state of each tap distribution can be given by the following: where for the Rayleigh fading distribution the probability density function fa(a) of channel gain a = a(0) is exponentially distributed and can be written as where a = E{a} is the average power gain of the tap. The level crossing rate and the tran-sition probabilities between channel gain states are found using (2.5, (2.3), (2.4) and (2.7), respectively by replacing received SNR 7 with channel gain a. For a flat-fading channel with power gain a, the received SNR 7 can be calculated as (5.5) Frequency-Selective Fading Case In the frequency-selective channel, channel power gains a(l) — (h(l))2,1 = 0,1, • • • J — 1 of all taps are quantized into I levels as discussed above. Let a(l) denote the average gain of tap I = 0,1, • • • , J — 1 of the frequency-selective channel. Therefore, the whole chan-nel is characterized with C = IJ states, where each state Q is characterized by a J-tuple { £ O ( C J ) , £ i ( c t ) , • • • ) £ / - i ( c i ) } - Function £J(CJ) denote the channel gain state of Ith tap for chan-nel state C j . Based on the experimental studies of [155], we assume that channel gain in each tap is independent and it follows the Rayleigh distribution. Therefore, the transition from com-posite channel state C j to composite channel state c , is given by the following: (5.3) (5.4) J-I (5.6) C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 113 where P{m(c<) m^(cj) is the channel gain transition probability discussed in flat fading case. We consider the use of minimum mean square error (MMSE) receiver to mitigate the inter-symbol interference caused by frequency-selective channels [131]. The operation of MMSE receiver is equivalent to filtering with a suitably chosen finite impulse response filter. Let Ni and N2 de-note the length of the noncausal and the causal part of the estimator. The sequence of received symbols z(m) = [z(m — N2) z(m - N2 + 1) • • • z(m + JVi)] is given by the following: z(m) = Hx(m) + [v(m - N2) v(m - N2 + 1) • • • v{n + iV~i)P (5.7) where x(m) = [x(m — N2 — J + I) x(m - N2 - J + 2) ••• x(m + iVi)] T and u{m) is the sequence of i.i.d. noise samples with variance a2. The channel convolutional matrix H has dimension of N x (TV + J — 1) and is given by [131], H ( h(J-l) h(J - 2) 0 h(J-l) h(J-2) KO) 0 MO) 0 0 \ (5.8) y 0 ••• 0 h(J-l) h(J-2) ••• h(0) J Using the sequence of the received symbols z(m), a linear estimate x(m) of the transmitted symbol x(m) at the output of the MMSE receiver is given by the following: x(m) = (q(m))Hz(m) (5.9) where ( ) H is the Hermitian operator, and q(m) is the coefficient of the estimator that can be found by solving following equation, argmin E(\x(m) — x(m)\2) q(m) (5.10) where E(.) is the expectation operator. The MMSE solution found from (5.10) can be given by the following: q(m) = Cov(z(m),z(m)) 1Cov(z(m),x(m)) = PtH(m) xs (5.11) C H A P T E R 5. R A T E A N D P O W E R A D A P T A T I O N F O R H A R Q S Y S T E M S 114 where the expressions for E(m) and s are given by the following: E(m) = Cov (z ( m ) , z(ro)) = a2uIN + H V ( m ) H H V(m) = Cov(x(ro),x(m)) = diag[P t(m- J - JV2 + 1), • • • , P t (m + M)] S = H[0IX(JV2+J_I) 1 OixAfJ Therefore, the estimated values of the transmitted symbol becomes, x(m) — Cov (x(m), z (m ) )Cov(z (m) , z (m ) ) _ 1 z (m) (5-12) where Cov(xi,x 2 ) = E(xxx2) — E(xi)E(:c;f) is the covariance operator. Using a similar approach to that in [131] and [156] we approximate the output of the MMSE receiver with the Gaussian distribution1. This is a very useful approximation as we can calculate the bit error rate using standard expressions for additive Gaussian noise channels. Therefore, it is assumed that the probability density functions p(x(m)\x(m)) are Gaussian distributed, where the mean and the variance are given by the following, respectively, e(m) = A(m)(r(m))Hsx(m) (5.13) M m ) ) 2 = (A(m)) 2.((r(m))"S - P t(m)(r(m)) V r ( m ) ) (5.14) where A(m) = ( l + (1 - P t(m))(r(m))"s) - 1 and r(m) = V{rn)-ls. Therefore, the SNR af-terMMSEreceiverisimplicitlydependentonthepowergainsofindividualtapsa(0),a(l), • • • , a( J — 1) and the transmission power Pt. This SNR can be expressed as - ( rm „ m „( 7-1V P\- { £ ( m ) ) 2 - ((r(m))^x(m))2 (5.15) and this expression will be used to evaluate the packet error probabilities of the adaptive type-I hybrid ARQ protocol. 1 The merits and accuracy of this approximation have been previously addressed by [ 157] for M M S E multiuser C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 115 5.3 Rate and Power Adaptive Transmission for Type-I Hy-brid ARQ Systems We consider the combined transmission rate and power adaptation technique for type-I hybrid ARQ systems over correlated flat-fading channels as well as ISI channels. The heterogeneous traffic (e.g., voice, video, interactive messages, file transfers, web browsing, etc.) that are en-visioned to support in the modern and future generation wireless networks have different QoS requirements. For example, each traffic type has its own delay requirements. In practice, the transmission buffers for wireless systems are of finite size. Because of the finite size of the buffer and burstiness of the incoming traffic, packet overflows from the buffer are unavoidable over the wireless links. Likewise, the tolerance on the maximum packet overflow rate is dissim-ilar for different traffic types. Transmission power is an another important factor for wireless mobile devices that are usually operated with limited battery capacities. Therefore, wise min-imization of transmission power is of paramount importance for wireless networks. In this section, we address the scheduling problems over wireless links that minimize the transmis-sion power subject to the fact that both the packet delay in the buffer and the packet overflow rate from the buffer are less than some prescribed values. We adapt both the transmitter power and the modulation constellation with the instantaneous channel conditions and the instanta-neous buffer occupancy. Thus, we are concerned with a system that can choose its modulation constellation among the discrete finite set of constellations V. Let W be the total number of different constellations in the set and W = {wi,w2, • • • , ww} be the set of number of packets to be transmitted. There is a one-to-one correspondence between the set of constellations and the set of number of packets to be transmitted. In this chapter, we consider only the discrete modulation classification since it was observed in [10] that the maximum spectral efficiency of modulation is nearly the same under both continuous and discrete rate adaptations. We discuss two different cases for the problem at hand. In the first case, we assume that the perfect CSI is tracked at the receiver and sent to the transmitter without latency and errors. C H A P T E R 5 . RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 116 In this case, the scheduler adapts the transmission rate and power with the perfectly known channel state. Note that the buffer state information is always known at the transmitter before transmission. Although the transmission decision is accurate when the perfect CSI is available at the transmitter, there is a possibility of packet error depending on the state of the channel. This error can be known after decoding the received packet. We use SW-ARQ protocol to notify the decoding result in terms of observation to the transmitter. The same feedback channel is used for both purposes. Sometimes, in certain practical circumstances, the perfect CSI may not be available at the transmitter and transmission decision has to be based on the observations only. The second considered case deals with the transmission rate and power adaptation based on the buffer occupancy, and the history of the observations and actions taken in the previous time-slots. It is obvious that this case is suboptimal as compared to the optimal case when perfect CSI is known. 5.3.1 When Both the Perfect CSI and Observations are Known We formulate the problem as average cost Markov decision process that can be defined through the following ingredients: a finite set of system states S = {si, s2, • • • , ss}> a finite set of actions U = {ui, u2, • • • , uv}, a set of state and action dependent immediate costs Q : K i—> R, and a set of state and action dependent transition probabilities V : IC t—> 9(<S), where set K = {(s, u) : s E S, u E Us} is the set of state-action pairs, R is the set of real numbers and 0(<S) is the set of discrete probability distributions over set S. The system state space for the perfect CSI case is composite and consists of buffer state and channel state, namely, S = B x C = {(bo, c{), (bo, c2), • • • , (bg, cc - i ) , ( & B , CC)} w r t ; h total number of S — (B + 1) x C states. The action state space maps to a set of transmission modes T M , which is composite and made up of discrete transmission power levels and discrete mul-tilevel modulation schemes. The first action u\ corresponds to no transmission. The scheduler chooses this action when the buffer does not have enough packets to transmit or the channel C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 117 condition is bad and the scheduler wants to save transmitter power given that it can satisfy the constraint on the average delay and average overflow rate. Other actions correspond to a partic-ular power level and modulation scheme of the set of available transmission modes. Let T M = { T M 1 , T M 2 , T M 3 ) • • • , TMu} = {(0,0), (Ptl1 RJ, (Ph, R2), ••• , (PtE, Rw^), (PtE, Rw)} de-note the set of transmission modes, where E is the total number of power levels. Let us denote the mappings between the action, and the transmission power and number of packets to be transmitted with function <P and \& respectively, i.e., $ : U >-> £ and $ : hi i-> W, where W is the set of transmission power levels. The expressions for the costs and the transition probabilities are given in the following subsections. Costs Related to the Objectives Our objective is to minimize three goals: the long-term average transmitter power, the long-term average delay and the long-term average packet overflow rate that can be given by the following, respectively, 1 " G £ = i i m T 7 £ E i G p ( s I > n ) ) } > (5-16) H—»oo ti — n=l 1 " GD = j i m - ^ ^ { C i , ( * " , « " ) ) } (5.17) H—too ti — and 1 H ° o = i i m TTE^^O5",^))} (5-18) H—>oo ti n=l where Gp(sn, un), Go{sn, un), and Go(sn, un) are the immediate power, the immediate packet delay in the buffer and the immediate packet overflow from the buffer costs for state sn and action un. These immediate costs are defined below: The Power Cost: The immediate power cost depends on the transmitter power level cho-sen. It can be noted that when the channel condition is bad, the transmission power can be raised to send packets. On the other hand, when the channel is in a higher state, the packet can be sent with lowest transmitter power level. Therefore, the power cost is the transmitter power C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 118 level chosen for a particular action un at a particular system state sn and can be given by, The power cost is independent of the current system state. But it should be chosen according to the current system state (so that both the delay and the packet overflow rate bounds are satisfied). Note that the current system state is dependent on the current channel state, the previous packet arrivals and the previous channel state (hence decoding results). The Delay Cost: The immediate delay cost is independent of the current channel state and action taken in current time-slot and only depends on the current buffer state. However, the current buffer state is determined by the decoding results of previous time-slots (which is dependent on channel state) and the previous packet arrivals. As discussed in Chapter 2, the immediate buffering delay experienced by a packet in the queue depends on the buffer occupancy and the average arrival rate and can be given from the well known Little's law using The Overflow Cost: The buffer overflows occur when the buffer has less vacancy than the number of incoming packets. Since the maximum number of packet arrivals is A, if the current buffer state is in {be-A+i, &B-A+2, • • • , & B } , there is a certain possibility of buffer overflow. For state s" and action un, we can write the buffer overflow rate in packets/time-slot by the following: G 0 ( ^ u " ) = £ £ ^ ( s » ) + a j - < ^ ( 5 - 2 ° ) where function <p(x, y) returns the difference of x and y when x > y and 0 when x < y and p(u>i\x(sn),un) is the probability of observation for the given channel state c" = x ( s " ) a n d action un. Let the (n', k') binary forward error-correction code be capable of correcting t bit errors. Then, the N A K probability (frame error probability) for channel state c n and action un can be written as GP(sn,un) = $ ( u n ) , V s n e S (5.19) (2.27). l=t+l (cn,un)) n (5.21) C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 119 where Pe(cn,un) is the average bit-error probability for channel state c" and action un. A l -though, any set of modulation schemes can be applied for our framework, without loss of generality we consider M-PSK modulation for the simulations. We present below the BER expression only for the frequency-selective channel as the flat-fading channel is a special case of the frequency-selective channel when the number of taps in the channel filter model is 1. There are two possible ways to determine the average bit error probability. Firstly, it can be computed using the average received power gain 7 ( 0 * , of the channel state ck when action Ui is chosen [158] as, Pe{ck,Ui) « Y 7 ^ e r f c (v^Cfc.uOsin ( ^ y ) ) , (5-22) where T(ui) is the number of bits in a symbol of the M-PSK scheme that corresponds to action it*. The average received power gain for state ck — (£o(cfc), £i( c fc) i • • • > £/-i(c*)) c a n D e computed using the relations • • • C ^ ' - i W k ) / a ( « ( 0 ) ) • • • / « ( a ( J - l))da(0) • • • da(J - 1) ^ U i ) = ° • • • C'-'U /a(«(0)) • • • fa(a(J - l))da(0) • • • da(J - 1) ' (5.23) where ki = £ 1 ( 0 * ) is the gain state of Z"1 tap for channel state ck and 7 m m s e ( w i ) = 7(Q !(0),--- ,cx(J— 1); $(itj)). Alternatively, the average bit-error probability for M-PSK modulation correspond-ing to action Ui in the channel state ck can be calculated as -Pe,M-PSK(Cfc, Ui) — • • • C ^ - i e r f c ( v t W s i n (pfey)) / « ( " ( " ) ) ' • • faW - 1))*»(0) • • • *»(J - 1) T(tO • • • J^;^ / a(a(0)) • • • fa(ot(J - l))<fa(0) • • • cfa(J - 1) (5.24) The integrals in the above expressions can be evaluated numerically. If the number of quanti-zation levels is large, both of the above two methods will give the same results. C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 120 Transition Probability Matrix The transition probability for the considered type-I hybrid ARQ systems depends on the incom-ing traffic statistics, current buffer occupancy, feedback observations and the channel statistics. It can be expressed by the following: Psn,sn+i(un) = S p ( a ^ p x ( ^ ) , x ( ^ + i ) P ( ^ » l x ( s n ) , ' U n ) 6 {ib{sn+1) - min((V '(s n) + a,- - u^(un)), bB)) (5.25) Since no packet is transmitted for action u\, the transition probability in that case does not depend on observation probability. For other actions, when the buffer does not have enough packets to transmit, the transition matrix and cost matrix are corrected to avoid those actions. 5.3.2 When only Previous Observations are Known In this section, we present a general framework for the identification of the control law for adaptive type-I hybrid ARQ system when the perfect knowledge of the channel state informa-tion is not supplied to the transmitter. The transmitter in this case is supplied with the decoding results of previous transmissions in terms of ACK/NAK feedback. We call previous actions and their corresponding observations as the history. Since the channel state is only partially observable through the observations, the system state of the problem is also partially observ-able. The control of dynamic stochastic systems whose state cannot be observed perfectly falls under the theory of POMDP. It is known from [159] that the optimal control policies of the POMDP problems can be formulated as either complete history dependent or belief state dependent. Unfortunately, finding optimal policies for all but the very simple examples is com-putationally intractable. Since the state space analyzed in this chapter can be very large due to possible large buffer length, we resorted to finding an approximate control policy that is dependent only on the finite length of history tracking. Let Q be the length of history that is being tracked and Ln = { i t n - 1 , a;" - 1, • • • , un~Q, un~Q} denote the history record at time-slot n. Thus, the history state space denoted with C = {h, 1%, • • • , lz} has a total of Z = (2 x U)Q C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 121 states. It is intuitively clear that by increasing the observation history to infinity we can get the optimal policy for the problem at hand. We also formulate the second problem as an average cost Markov decision process as in Section 5.3.1. The system state space in this case comprises of the buffer state and the history state, therefore, S — B x L — {(bo,h), ( 6 0 ^ 2 ) , • • • , ( & B , / Z - I ) , (&B>'Z)} w r t h total number of S = (B + 1) x Z states. Our target is to minimize the average power, average delay and average overflow rate as given by (5.16), (5.17), (5.18), respectively. We use the same set of actions, or in other words, the same set of transmission modes as in Section 5.3.1 to achieve the goals for. this problem as well. Costs Related to the Objectives Since the power cost only depends on the action and the delay cost only depends on the buffer state, these costs are the same for both the fully observable and partially observable channel state case and given by (5.19) and (2.27), respectively. The overflow occurs under the same buffer condition as in Section 5.3.1. The packet over-flow rate from the buffer can be expressed as G0(sn,un) = Yl E ¥>(^(sn) + aj-uJ^(un),bB)p(aj)P(ui\g(sn),un). (5.26) where function g(sn) returns the history state of the composite system state. The function P(oji\g(sn), un) is the probability of observation uii for given history L " = g(sn) and action un at time-slot n and can be written as p(ui\Ln,un) = p(uji\un-\ujn-\---,un~Q,ujn-Q,un) (5.27) p(uu ujn~\ • • • , un-Q\un, un~\ • • • , u"-q) p(w n - 1 ,w n - 2 , ••• , w n - 9 | u B - 1 , u n - 1 ) - - - , u n - G ) " Conditional probabilities of the observation sequence given the action sequence can now be calculated by considering all the possible combinations for the states of underlying FSMC, i.e., Q p{un,Un-\--- ,LJn-Q\Un,Un-\--- ,Un~Q)= ^ P( CV" ,Cn-Q)nP(W n~iC n _ i>U n _ i) c r V " , c N - < 2 * = 0 (5.29) C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 1 2 2 where p(cn, • • • , cn Q) denote the joint probability of all the channel states that the system occupies in the time-slot from (n — Q) to n, given by the following: P(cn, ••• , Cn~Q) = (j)n~Q X Pcn-QtCn-Q+l X • • • X pcn _ c n . (5.30) In the above expression, <j>n~Q denote the stationary probability of channel state cn~Q € C and Pa,Cj, Ci = c n - Q , c n ~ Q + 1 , • • • , c " - 1 , Cj = c n - Q + 1 , c " _ Q + 2 , ••• ,cn denote the state transition probability given in Section 5.2.2. Note that the calculation of expression p(ujn, un~l, • • • , un~Q\un, u"'1, • • • , un~®) can also be done by performing the forward-backward method explained in [160]. This is due to the fact that the observation process ujn is in fact a version of hidden Markov process (HMM) where conditional probabilities of observation u>n given the underlying FSMC state cn are also dependent on the current applied action un. The choice of the separate buffer cost and buffer overflow cost in the previously posed MDP deserves a more detailed explanation. Firstly, it should be noted that buffer overflows can always occur with non-zero probability in an ARQ system. This is due to the fact that even with the highest transmission power and the underlying channel being in the strongest state there is a non-zero probability that an infinite succession of unsuccessful transmissions can occur. Secondly, buffer overflow cost produces a different kind of transmission disruption compared to the transmission delay and these costs cannot be considered jointly. The choice of maximum allowable delay and buffer overflow rate depends very much on the possible application. Data transmission can tolerate substantial delays while buffer overflows (and packet losses) have to occur with negligible probability. As opposed to data transmission, real-time video and audio transmission might not tolerate high transmission delays, while buffer overflows that would result in the loss of video or audio frames might be acceptable with some small probability. Transition Probability Matrix The transition probability for partially observable channel state case of type-I hybrid ARQ systems depends on the incoming traffic statistics, current buffer occupancy, and feedback C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 123 observations. We can express it by the following: J V . ^ K ) = E E P(aM"Msn),un)m*n+1) - min((V>(sn) + a,- - ^ (ti")),^ ))) (5.3 Note that the same history is carried forward to the next time-slot if no transmission action u" is chosen in a particular time-slot. 5.4 Solution Techniques for the Scheduling Problems For both cases, at each time-slot n, suppose the system occupies a state s " = Sj and the scheduler selects an action un from the set of actions USi available at state Sj € S, where U S ies = ^ - After an action un is selected, the system moves to the next state Sj € S according to the probability distribution p a n = s . ) S n + i = s (un) and the decision maker incurs a one step immediate cost g(sn, un). The selection of an action un may depend on the current state, the current time-slot, and the available information about the history of the system. For each problem, a decision rule prescribes the procedure for action selection in each state at a specified time-slot. Let fin denote a decision rule at time-slot n, then fin : S i—• Us. A policy n for the both problem specifies the decision rule to be used at all time-slots, i.e., IT — {/x1, /J,2, • • • , fiH}. Let for each case II denote the set of all respective admissible policies IT. We assume that the policy does not vary with time-slot n. Therefore, the policy is stationary (i.e., /j,n — p, V n € T) and has the form 7r = {/i, p, • • • n}; for brevity we denote it by p. The expected long-term average cost per stage with stationary policy fi is 1 H G"= i i m 7 7 E E { G ( s n ' « ' ( 5- 3 2> n=l When the Markov chain induced by the policy \i is ergodic, then we have, = E {G(s, /i(s))}. The policy / i * over the set of all stationary policies II that minimizes the average cost per stage (5.32) is called the optimal policy and the corresponding optimal cost per stage is given by the C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 124 following: G ^ = m i n G ' * . (5.33) For a finite MDP, it is known that given any history-dependent policy, there exists a Markov policy, dependent only on the previous state, with the same average cost. So, it is sufficient to restrict attention to Markov policies while seeking the optimal policy [145]. For a non-constant incoming traffic, in general, the formulated MDP is weakly communicating [132]. An MDP is said to be weakly communicating if there exists a closed set of states, with each state in that set accessible from every other state in that set under some deterministic stationary policy, plus a possibly empty set of states which is transient under every policy. If the state spaces are finite, the costs are bounded, and the system is stationary (that is, the system equation, the cost per stage and the transition probabilities do not change from one stage to the next stage), the average cost per stage problem (5.33) can be solved using dynamic programming techniques. 5.4.1 Iterative Dynamic Programming based Approach It can be noted that if we want to minimize transmitter power, the scheduler will send with a lower power level. This in turn increases the delay and the overflows as the probabil-ity of success is reduced. Since the considered objectives are contradictory, in this chapter we are interested to investigate the trade-off between these objectives. The problem with conflicting objectives can be solved in two ways, namely, forming the problem as an un-constrained MDP (UMDP) or as an constrained MDP (CMDP). In UMDP, we find the total immediate cost by summing the weighted combination of the different immediate costs (i.e., GT(sn,n(sn)) = GP(sn,fi(sn)) + PiGD(Sn^(sn)) + p2G0(sn^(sn)), where p\ and p\ are nonnegative constants) and formulate the problem as an average cost per stage problem as in (5.32). The optimal policy fj,* of such weakly communicating UMDP can be computed us-ing any iterative dynamic programming technique, such as relative value iteration algorithm or policy iteration algorithm for particular fixed value of constants Pi and p2 [146]. C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 125 5.4.2 Linear Programming based Approach In CMDP, one objective cost is minimized while keeping the other objective costs (also called the constrained costs) below some given bounds. In the proposed framework, our objective is to minimize average power cost imposing certain upper bounds on the average delay cost and av-erage overflow rate cost. Mathematically, the stationary policy which satisfies the constrained optimization problem below is our optimal policy fx*, min GP', subject to: G*£ < D and GQ* < OF, (5.34) where GP', G£*, and G£* are the optimal average power, average delay and average overflow rate costs, respectively for optimal policy fi* over the set of stationary randomized policy HR. The nonnegative constant D and OF are the maximum allowable average delay in time-slots and average overflow rate in packets/time-slot, respectively. The value of these two bounds depend on the application being considered. The CMDP problem formulated above can be solved using equivalent linear programming (LP) methodology described in [145]. It can be shown that there is a one-to-one correspondence between the feasible (and the optimal) solution of the LP and the feasible (and the optimal) solution of CMDP. LP is feasible if and only if CMDP is feasible [148]. Let u(s, u) represents the "steady-state" probability that the process is in state s and action u is applied. We seek to calculate the control policy which is represented in terms of probability distribution v over S x U. The optimal policy u* can be obtained by C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 126 solving the linear program, minimize ^ Gp(s,u)u(s,u) (5.35) seS,u€U subject to: ^ GD(S,U)U(S,U) < D (5.36) ^2 Go(s,u)u(s,u) < OF (5.37) s€S,u£U ^ V ( t , u ) = ^2 v(s,u)p(t\s,u)tVteS (5.38) ueu ses,ueu J2 K s , « ) = l (5.39) ses,ueu is(s, u) > 0, Vs 6 S, Wu 6 W (5.40) The inequality constraints in (5.36) and (5.37) are to keep the long-term average delay and average overflow rate below the specified bounds whereas the equality constraint in (5.38) satisfies the well known Chapman-Kolmogorov equation. The constraint in (5.39) guarantees that the sum of the probabilities v(s, u) is equal to one, while (5.40) confirms the nonnegativity of the individual probabilities. Suppose there exists an optimal solution u* to the LP problem. Then there exists a stationary policy /x* that is optimal for the CMDP problem. The optimal policy fi* for CMDP is randomized and is uniquely characterized with probability ^.(^(u) of applying policy u G U in state s e S, where e^)(u)= U*{S.f if £ */(*,«') >0. (5.41) If 2~2u'eu v*{siu') ~ 0 f ° r some s € S, an action that drives the system to SR — {s e <S : Ylu'&A u*(s'u') > 0} i s chosen in each state [145]. The above linear program can easily be solved using Interior-Point methods [144]. Using standard software optimization packages such as Matlab, the linear program with 104 variables can be easily solved. 5.5 Simulation Results and Discussions We present the numerical results to explore the performance of different schemes for system parameters introduced in the Section 5.2. The performance analysis for the adaptive type-I C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 127 hybrid ARQ systems is given over both the flat-fading channel and the frequency-selective fading channel in this section. In this section, we illustrate the qualitative behavior of the proposed scheduling framework using the costs associated with particular objectives described in Section 5.3. We show the relative performance of average power and average overflow rate as a function of average delay for different values of Doppler frequencies, buffer sizes, incoming traffic rates and frequency selectivities of the channel. We also show comparison of two considered cases, namely, when the perfect CSI is available, and when it is not available. Unless specified otherwise, we use the following set of data for all the numerical simulations: maximum Doppler frequency fm = 200 Hz, number of channel states C = 4, number of blocks per second RB = 104, noise power a2 = 1 mW, average received power gain a — 1 dB for each tap [10], available transmitter power levels Ptl — 20 mW, Pt2 = 50 mW, Q-PSK and 8-PSK modulation schemes, buffer size B — 50 packets, Poisson-distributed traffic with maximum packet arrivals A — 7 packets/time-slot and average packet arrival A = 1 packet/time-slot, incoming packet size Np — 255 bits, frame size Nf = 255 symbols. The bits for error detection and control symbols are not counted for throughout calculations (i.e., Nc = 0). Forward error correction (FEC) coding is used only in Fig. 5. From the above data, it is clear that the number of actions is U — 5. In the above example the dimensionality of the state space is 50 x 4 — 200, and the computation of the optimal policy using either relative value iteration or linear programming is easily performed. The data rate in bits per second depends on three parameters: number of blocks per second RB, incoming packet rate A in packets per time-slot and the number of bits per packet Np. Therefore, data rate can be given by RBANP. For the above parameters, the data rate is 104 x 1 x 255 = 2.55 Mb/s. The optimal policies and costs are computed using the following steps: (i) computation of transition probability matrices V using equation (5.25) and cost matrices Q using equations (5.19),(2.27) and (5.20), (ii) computation of the optimal probability distribution v* by solving the linear program given by (5.35)-(5.40), (iii) the optimal policy for all states is calculated using (5.41), (iv) optimal power, delay and overflow rate costs are computed by C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 128 Figure 5.2: The effect of normalized fading rate / m T s on the average power vs. average delay for perfect CSI case over flat fading channels. Observation 1 (Study offading rate on the average power and packet overflow over flat-fading channels when perfect CSI is known:) We show the dependence of the average power and average packet overflow rate for three values of maximum Doppler frequency, namely, fm = 200, 300, 400 Hz in Figs. 5.2 and 5.3, respectively. In order to demonstrate the effects C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 1 2 9 Average Delay [Time-Slot] Figure 5.3: The effect of normalized fading rate / m T s on the average overflow rate vs. average delay for perfect CSI case over flat fading channels. of the overflow rate bound, we have chosen OF = 5 x 1 0 - 3 packets/time-slot for this example. Uniformly-distributed packet arrival with A — 2 is assumed in this case. It is seen from the figures that the average power decreases as average delay increases. Further, for the same average delay, the average power is less for higher Doppler frequencies (and higher fading rates). This fact can be explained as follows: with the increase of fading rate the likelihood of C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 130 getting a higher channel state after a lower channel state increases. In order to save power, the optimal scheduler has a strategy to store packets in the buffer at a lower channel states, and send those in the higher channel states with a lower power. However, in order to meet a specified delay requirement for a particular application, in slower fading channels the scheduler is forced to send packets even in a lower channel states with a higher transmission power action. As a result of this phenomenon, the average power for the same delay requirement increases when fading rate decreases. The simulations show that packet overflow rate does not significantly vary with the fading rate (Fig. 5.3). Observation 2 (Study ofpacket arrival rate on the average power and packet overflow over flat-fading channels when perfect CSI is known:) Figs. 5.4 and 5.5 show the effect of different packet arrival rate on the average power vs. average delay and the average packet overflow rate vs. average delay curves, respectively. The overflow rate bound for this observation has been chosen to be OF — 5 x 10 - 2 packets/time-slot. The scheduler needs more power when the average arrival rate increases, because with the increase of packet arrival rate the scheduler gets less idle time (the time when the scheduler has nothing to send). Therefore, scheduler needs to send packets even in a weaker channel state to be able to maintain the delay requirements. As the chance of full buffer is lower when the average arrival rate is lower, the packet overflow rate is also less for a lower arrival rate. Note that for larger incoming packet rates of A = 1.5 packet/time-slot, overflow rate bound (5.37) with the average overflow rate of OF = 5 x 10 - 2 packets/time-slot is attained for all feasible delays D. This is due to the fact that buffer of size B = 50 packets is too small to cope with a very high incoming packet rates, rendering overflows inevitable. Observation 3 (Study of buffer size on the average power and packet overflow over flat-fading channels when perfect CSI is known:) In Figs. 5.6 and 5.7, we plot the power/delay tradeoff and corresponding overflow rate vs. delay curves, respectively for different buffer sizes B. The plots show that as the buffer size decreases the minimum achievable power increases and the feasible delay region also decreases. In the low delay region, the power is C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 131 Figure 5.4: The effect of arrival packet rate A on the average power vs. average delay for perfect CSI case over flat fading channels. almost the same for all considered buffer sizes B — 30, 40, 50 packets. This is due to the non-utilization of the whole buffer under more stringent delay requirement. The power is slightly less for smaller buffer due to increased packet overflow rate. In the higher delay region while the packet overflow rate attains the bound of OF = 4 x 10~3 packets/time-slot, the power is more for smaller buffer sizes. This is due to the fact when the buffer size is larger, the scheduler C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 1 3 2 •»• A = 0.5 Packets/Time-slot + - A = 1.0 Packets/Time-slot —A =1.5 Packets/Time-slot o * t » » * » •—•• 3 4 5 6 7 Average Delay [Time-Slot] 1 0 Figure 5.5: The effect of arrival packet rate A on the average overflow rate vs. average delay for perfect CSI case over flat fading channels. has increased flexibility to store more packets in the lower channel state and send those with a lower power in the higher channel state. Therefore, to maintain the same delay, the scheduler has to use higher power as the buffer size decreases. Observation 4 (Study of the influence of channel coding on the power performance of ARQ scheme:) Channel coding is used to increase error resilience of data bits sent over a noisy C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 133 Figure 5.6: The effect of buffer size B on the average power vs. average delay for perfect CSI case over flat fading channels. channel by adding the redundancy in the code words. We have explored the use of (255,215) binary BCH code that corrects up to 5 bits error to protect packets sent over the time-varying flat-fading channel [154]. To ensure a fair comparison to the case without coding, it is assumed that the bandwidth of the system is constant, making the code symbol rate in the coding case equal to the bit rate in the no coding case. Therefore, by employing the above code each packet C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 134 x10 I 1 1 1 1 r 3 4 5 6 7 8 9 Average Delay [Time-Slot] Figure 5.7: The effect of buffer size B on the average overflow rate vs. average delay for perfect CSI case over flat fading channels. contains only 215 bits instead of 255 in the no-coding case. Furthermore, to ensure a fair comparison between the two cases, in the coding case we have increased the average number of incoming packets and buffer occupancy by the factor of 255/215 to keep the average incoming traffic in bits and buffer capacity in bits constant. Figs. 5.8 and 5.9 shows the comparison of the power vs. delay and corresponding overflow vs. delay performance of the hybrid ARQ C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 135 No Coding • B C H (255,215) Code 3.5 4.5 5 5.5 6 Average Delay [Time-Slot] 6.5 7.5 Figure 5.8: Influence of channel coding on the power/delay performance of the optimal sched-uler in the perfect CSI flat-fading channel. system with and without coding. The buffer overflow rate was constrained to OF = 4 x 10 - 2 packets/time-slot, buffer capacity equal to 255 * 50 bits and Doppler frequency equal to 200Hz. It can be seen that for the lower values of the delay, use of coding substantially reduces the operating average power of the optimal scheduler. However, for the case of very large delays the no coding case needs slightly lower powers than the coding case. As a result of C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 136 x10 I 1 1 1 1 1 1 1 1 r i i i i i i i i i I 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 Average Delay [Time-Slot] Figure 5.9: Influence of channel coding on the overflow/delay performance of the optimal scheduler in the perfect CSI flat-fading channel. adding the redundancy in coding case, less information is contained in each packet and less data can be sent during the longer intervals of favorable channel conditions. This is of particular importance under less stringent delay requirements. For the simulation settings outlined above, it has been observed that packet success rate is very high for the higher channel states even without the use of coding, making the use of coding for higher channel states useless. C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 1 3 7 ....... Flat Fading, f T = 0.05 a m B .<>. Flat Fading, f T = 0.02 IT) D —•—Frequency-Select ive Fading, f T_ = 0.02 —-— Frequency-Selective Fading, f m T B = 0.05 . »i« o « ' O '' O ' ' O - • •©•• -o< 4 5 Average Delay [Time-Slot] Figure 5.10: Influence of the channel model on the power vs. delay performance of the optimal scheduler in the perfect CSI flat and frequency-selective fading channel. Observation 5 (Study of the influence of frequency-selective fading on the power perfor-mance of ARQ scheme:) Figs. 5.10 and 5.11 show the comparison of the average power and average overflow rate performance, respectively in terms of the delay for the flat fading and the frequency-selective fading channels. The buffer capacity is assumed to be 255 packets, buffer overflow rate is constrained to OF = 5 x 10~ 2 packets/time-slot, and the frequency-selective C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 138 Flat Fading, f m T B =0.05 Flat Fading, f T =0.02 a m B • Frequency-Selective Fading, fmT B=0.05 . Frequency-Selective Fading, f m T B = 0.02 4 5 Average Delay [Time-Slot] Figure 5.11: Influence of the channel model on the overflow vs. delay performance of the optimal scheduler in the perfect CSI flat and frequency-selective fading channel. channel is modeled as finite impulse response (FIR) filter with number of taps J = 2, non-causal part length N± = 5 and causal part length N2 = 5. Each tap of the frequency-selective channel is assumed to follow the Rayleigh distribution. Total received energy is assumed to be constant to ensure the fair comparison between the flat-fading and frequency-selective chan-nel. Therefore, the average gain is a — 10 for the flat-fading case, while the average gain for C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 1 3 9 frequency-selective case is a(0) = a(l) = 5 for each of the taps of the FIR filter describing the frequency-selective channel. It can be seen that due to the combining of the two discernible paths of the frequency-selective channel performed by the MMSE receiver, delays achieved by the optimal scheduler are significantly less than the flat-fading channel. Note that the equivalent channel gain after the MMSE combining receiver has less time variations than the channel gains of each of the taps of the frequency-selective channel prior to MMSE combining. This implies that the scheduler does not need to utilize the buffer (as much as in the flat fading case) and increase the delay in order to schedule transmission in a higher channel state. Observation 6 (Comparison of the power performance of hybrid ARQ schemes with his-tory tracking and perfect CSI:) Knowledge of CSI enables the scheduler to suitably adapt the transmitted power and employed transmission rate to the time-varying channel conditions. In cases when CSI is not perfectly known, such as the case of history tracking presented in Sec-tion 5.3.2, the scheduler has to act more cautiously considering that it is not certain of precise channel state. Figs. 5.12 and 5.13 demonstrate the increase (compared to the perfectly known CSI) in the needed average transmission power for the case of history tracking. It is assumed that the scheduler tracks the channel based only on the previous observation (ACK/NAK) and the previous action taken, i.e. history length of Q — 1. In order to demonstrate the effects of the overflow rate bound we have chosen OF = 5 x 10 - 2 . Note that the optimal history dependent policy is computed by solving the MDP problem as described in Section 5.3.2. The long-term average power, average delay and average overflow rate costs for both perfect CSI case and history tracking case are computed using (5.16), (5.17) and (5.18) of Section 5.3.1. As expected, it can be observed in Fig. 5.12 that the power needed for the transmission with the history tracking is larger than in the case of knowledge of perfect CSI. With the in-crease of the fading rate fmTB, average powers for the same delay in both perfect and imperfect CSI cases decrease. While, that phenomenon has been explained for the perfect CSI case in Observation 1, the case of history tracking deserves additional explanations. Two effects influ-C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 1 4 0 Figure 5.12: Influence of perfect and non-perfect CSI on the power vs. delay performance of the optimal scheduler in the flat-fading channel. ence the power performance of the history tracking algorithm with the increase of fading rate: (i) increase of the diversity provided by the more frequent changes in the channel (as discussed in Observation 1), (ii) diminished ability to predict the outcome of the next transmission, due to more frequent changes in the channel. The two effects have the opposite influence on the power performance, and it can be seen that for the simulation settings in Fig. 5.12 the first C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 141 • .!••••* y»..?. i.' Perfect CSI, f T =0.05 m B . History Tracking, fmT B=0.05 - History Tracking, fmT B=0.02 Perfect CSI , f T =0.02 m B 0.01 h Average Delay [Time-Slot] Figure 5.13: Influence of perfect and non-perfect CSI on the overflow vs. delay performance of the optimal scheduler in the flat-fading channel. effect is dominant, facilitating the decrease of the needed power with the increased Doppler frequency. C H A P T E R 5. RATE AND POWER ADAPTATION FOR H A R Q SYSTEMS 1 4 2 5.6 Conclusions This chapter presents a general approach for optimal delay and packet overflow constrained adaptive joint power and rate allocation for type-I hybrid ARQ systems. Presented optimal adaptation laws have been obtained through a control-theoretic framework of Markov decision processes. We have discussed the adaptation strategy for two cases: when the perfect knowl-edge of the channel state information is provided by the estimator at the receiver, and when no perfect knowledge of the channel state information is known and the channel state is estimated by the history tracking mechanism. For both problems the state spaces, control actions, transition probabilities and cost func-tions of the respective MDP's have been identified. We have suggested several algorithmic approaches to solve such MDP's effectively, namely, the relative value iteration, policy itera-tion, and linear programming algorithms. We have explored the influence of the channel model (in terms of frequency selectivity) on the power performance for the optimal scheduling schemes. The transmitter power can be reduced by increasing the transmission delay. Also, power decreases with increase of fading rate, increase of buffer size, and decrease of average arrival rate. It has been seen that due to the increased diversity provided by the MMSE filter receiver for frequency-selective channels, the variation of the SNR at the output of the filter is decreased. This results in the decreased delay for the same power as compared with the flat-fading case. We have also investigated the case when the perfect CSI is not available at the transmitter at the time of transmission, and compared the results with that of perfect CSI case. The power allocation based on history tracking results in an increase in average power as compared with the perfectly observed case. It has been seen through the simulations that the packet scheduling for the hybrid ARQ scheme with no perfect CSI works better as the fading rate decreases. C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 1 4 3 CHAPTER 6 Scheduling with Partially Observable CSI 6.1 Introduction So far we have assumed that the transmitter has perfect knowledge of the channel state in-formation before attempting any transmission. This channel state information is estimated at the receiver and fed back to the transmitter over a noiseless channel. However, sometimes for many practical systems perfect knowledge of the channel cannot be known at the transmitter before transmission. Only the outcome of the previous transmission can be known. In this chapter, we assumed that the transmitter does not have any knowledge of the channel state at the time of transmission. The only knowledge it has is the history of the previous action taken and results of decoding in terms of observation feedback. In this chapter, utilizing the previous information on action and observation, we explore a new control theoretic adaptation scheme for type-I hybrid ARQ systems. Adaptive HARQ schemes can broadly be classified into two categories. In the first category of schemes (e.g., type-II HARQ), information bits are transmitted with few or no parity bits. Incremental redundancy bits are transmitted on request if previous transmissions are not successfully decoded at the receiver. The receiver combines the transmitted and retransmitted bits together to recover the information. The key idea of the second category of schemes is to vary the code rate of the FEC with the channel condi-C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 144 tions. That is, the transmitter chooses a lower-rate coding (uses more error protection) when the channel condition is worse and chooses a higher-rate coding (uses less error protection) to send more information bits when the channel condition is better. Each transmission is decoded independently and does not depend on the previous transmissions. In this chapter, we consider the second type of scheme, namely adaptive type-I HARQ scheme. Most of the authors in the recent literature propose to change the operation mode and correspondingly adapt the code rate based on the channel state information in some ad hoc fashion. In Chapter 5, we investigated the adaptation of transmission power and rate with the channel conditions and buffer occupan-cies for a type-I HARQ system. We formed the problem as a Markov decision process (MDP) by truncating the history of action and corresponding ACK/NAK feedback of several previous time-slots. However, in this chapter we formed the problem as a partially observable Markov decision process A brief summary of our contribution is presented below: • We consider the adaptation of the coding rate of a type-I HARQ scheme, where the incoming traffic states and time-varying fading channel states are hidden from the trans-mitter. We formulate the problem as a partially observable Markov decision process (POMDP) since the true state of the system is not known exactly. • Unlike previous coding rate adaptation scheme in the literature, the proposed scheme does not require channel estimation by counting the number of NAKs. Instead, control actions are chosen based on the belief of the states. The calculation of belief distribution is derived and is maintained by tracking the observations of the hidden states. • In the formulated POMDP problem, the scheduler tries to maximize reward (e.g., through-put) and/or minimize cost (e.g., delay) in the face of noisy system state information. In-stead of considering FEC at the physical layer, and automatic repeat request (ARQ) and buffer delay at the link layer separately, we present a cross-layer formulation that com-bines these layers judiciously to maximize throughput and minimize buffer occupancy C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 1 4 5 (and hence delay). • The complexity of finding the optimal solution of the POMDP problem is discussed and the feasibility of three heuristics to find the approximate solution of the POMDP problem is explored. Extensive simulation results are given to compare the performance of three proposed heuristic policies with the fully observable channel state and traffic state case, and to verify the performance of the proposed heuristic policies. The chapter is organized as follows. Section 6.2 describes different models, such as system model, channel model, incoming traffic model and buffer model adopted in the chapter. For-mulation of the problem as a POMDP and its optimal and policy heuristic solution techniques are discussed in Section 6.3. Section 6.4 provides performance results and discussions. Con-clusions are presented in Section 6.5. 6.2 Model Description In this chapter, we consider an adaptive type-I HARQ system as shown in Fig. 6.1, where a single wireless terminal with a finite transmission buffer is communicating over a time-varying wireless fading channel. The packets are coming from a higher-layer application and are stored into the buffer for transmission in the subsequent time-slots. The actual state of the incoming traffic is unknown at the transmitter. However, it knows the number of packets that arrive at the buffer in a time-slot. At the beginning of a particular time-slot n, which is defined as the interval [(n - 1)TB,TITB), the controller decides the action to be applied. The control action taken at the start of a time-slot is continued until the end of that time-slot. The choice of control action for the adaptive system depends on the estimated channel condition and current buffer occupancy. Accordingly, the transmitter takes wn packets from the buffer and first encodes the corresponding k = wnNp bits using a high rate (k1, k) error detection code, where k' is the number of bits in the codeword after adding cyclic redundancy check (CRC) bits to the packets. These encoded packets are subsequently encoded using outer FEC code of rate k'/n' C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 146 into codeword of length n' bits to be transmitted in that particular time-slot. It is clear that the transmitter has to send at most the number of packets that are currently stored in the buffer, even when the channel condition is good enough to send more packets. In this chapter, we consider the adaptation of coding rate for the type-I hybrid ARQ sheme. U different (n', k') FEC codes corresponding to U actions are adopted for error correction. We use BCH and convolutional codes to evaluate the performance of the system for shorter and longer block size FEC codes respectively. In a time-slot, a particular code rate is chosen according to the controller decision on action. There is a one-to-one correspondence between the elements of a set of actions and the elements of a set of coding rates. The resulting codeword is modulated and transmitted over the fading channel. After being received at the receiver, the block is demodulated. The decoder first attempts to correct any errors in the received demodulated block, then the decoded block is checked for error detection. If no errors are detected, the packets are delivered to the higher layer and an ACK is sent to the transmitter. Accordingly, the transmitter removes those packets from the buffer and updates the buffer occupancy for the next time-slot. Otherwise, the receiver discards the packets and sends a N A K requesting retransmission of the same packets. The process is repeated until the packet is successfully received. We assume that the feedback channel is error free and ACK/NAK feedback is received without delay. This assumption could be at least approximately satisfied by using a fast feedback link with powerful error control for feedback information. 6.2.1 Incoming Traffic and Buffer Model Since the state of the incoming traffic is unknown at the transmitter, we model the incoming traffic as an H M M process (also sometimes called Markov modulated process (MMP)). Let T = { / i , / 2 , . . . , />} be the hidden state of the incoming traffic with piijj being the probability of transition from traffic state fc to traffic state fj. Let A = {ao , a i , . . . , CLA} be the set of all possible numbers of incoming packets (i.e., distinct observations) with a 0 being the probability of 0 packet arrival, and p(dj\fc) be the probability of a,j packet arrivals conditioned on the C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 147 Higher Layer Application Size=B Finite Buffet Error Detection/ Correction Encoder Modulator w" Buffer State, bn Fading Channel Demodulator Higher Layer Application 4 Error Correction/ Detection Decoder Coding Rate Controller Action, u" <* •••• 7 ACK/NAK Feedback Observation Incoming Traffic Observation, a" Figure 6.1: Schematic of the adaptive type-I HARQ systems traffic state being f{. Let the incoming packets in time-slot n be an packets. We assume that packets arriving in time-slot n can only be transmitted in time-slot n + 1 or later. The packets transmitted in a particular time-slot experience the same channel gain. It is also assumed that packet arrival is independent of the channel fading and noise processes. H M M traffic model is particularly useful for modeling bursty and correlated incoming traffic, e.g., compressed video traffic, file transfer, etc [109]. An important special case of MMP is the switched-batch Bernoulli process (SBBP) that is usually used to model discrete-time queues. In SBBP, the arrival batch size is Bernoulli distributed and is modulated by a underlying two state Markov chain. Let us denote the probability of a i packet arrivals in traffic state / i and / 2 respectively by p ( a i | / i ) = q\ and p ( f l i | / 2 ) = Q2- The dwell time at state fit i = 1,2 are geometrically distributed with mean Vi = ( 1 — Pfi,fj)/Pfi,fj> j *• Note that when the number of states of MMP is one, it re-duces to Bernoulli-distributed traffic (e.g., in [7]), where the probability of a\ packets arrival in a time-slot is 7 > ( ° i l / i ) = Q a n ^ is independent of previous packet arrivals. The behavior of the buffer for the problem at hand is similar to the transmission buffer described in Chapter 5. Therefore, buffer update equation can be given by (5.1). C H A P T E R 6. S C H E D U L I N G W I T H P A R T I A L L Y O B S E R V A B L E CSI 148 6.2.2 Hidden Markov Channel Model We adopt the H M M to model the time-varying Rayleigh fading channel in this chapter. This model is general enough to capture various statistical properties (such as autocorrelation func-tion, level-crossing rates, etc.) [130] of a wide range of practical fading channels. An H M M is a probabilistic function of the states of a Markov chain and is a doubly embedded stochastic process with an underlying stochastic process that is not observable, but can only be observed through another stochastic process that produces the sequence of observations. There are two possible HMM-based approaches applicable to fading channel modeling for applications in-volving ARQ. In first approach, the error processes over the fading channel are modeled as an H M M . This approach stems from the early work of Gilbert and Elliot [114], where hidden states correspond to "good" and "bad" states of the fading channel. This approach was sub-sequently extended in [161] for packet level transmission with error-correction coding. The second approach is to model the quantized fading power gain states of the channel as hidden states of an H M M . In this manuscript, we adopt the second approach as it provides deeper physical understanding of the channel. Furthermore, this model can be readily extended to more general situations with diversity reception and space-time coding.(cf. [108]). An H M M channel is characterized by the following: a set of hidden states of the channel C = {ci ,c 2 , - - - , C c ? } , state transition probability matrix V = [ p C i ) C . p c\ < C j , C j < cc], a set of distinct observations Q, = {ACK, NAK} = {u>i,u>2}, initial state probability vector 7r = [p(c* = Ci), c\ < C{ < cc], and observation probability matrix Vu = \p(uJi\cj), C\ < Cj < cc,u>i € fi], where C is the number of H M M states. The H M M parameters can be estimated by fitting the model with simulation or experimental data (e.g., by minimizing the Kullback-Leibler divergence using Baum-Welch iterative algorithm). We model the underlying Markov chain of the H M M as finite-state Markov channel (FSMC), where the hidden states of the channel are represented by finite C number of non-overlapping received power gain states [117]. The upper and lower channel gain thresholds of all states and the transition probabilities of the underlying hidden Markov chain can be found from Section C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 1 4 9 5.2.2. 6.3 Packet Scheduling Techniques in Partially Observable Environments The scheme described in Section 6.2 can be formulated as a P O M D P problem since the exact state of the system is not known at the time of transmission. Although the exact buffer state may be known at the transmitter, both the channel state and the incoming traffic state are hidden at the transmitter. Nevertheless the observation of the previous time-slot is known after getting feedback from the receiver and after getting incoming packet to the buffer input. This makes the system state partially observable. The P O M D P problem at hand can be described by the following elements: a set of time-slots T = {1,2, • • • , H} over which decisions have to be made, a set of system states S, a set of actions U, a set of transition probabilities V, a set of observations O, a set of observation probabilities Vu and a set of rewards Q. Let S = B xC x F = {(b0, ci , / i ) , (6i, cu / i ) , • • • , (bB, cc, fF)} = {si, s2, • • • , ss} denote the composite state space for our model with a total number of S = (B + 1) x C x F states. The action space USi is the set of possible choices of coding rates in state st. There is a one-to-one correspondence between action ui € U = {ui,u2,• •• ,uu} and error correction coding (n', k') of rate k'/n', where U denote the number of actions. Action Ui denote transmission of Wi = ^ ( u i ) packets, where f : W H W and W = {w\,w2, • • • , wv} is the set of number of packets to be transmitted. After an action «j is chosen at time-slot n, the system moves to the next state Sj e S from state Sj € S according to the transition probability psn=SiiSn+i=s. (u^ and incurs a one step immediate cost G(si, Ui) = 2Z^=iPsi,sj\Ui)g(si, Ui, Sj), where g(si, u i } Sj) is the immediate cost if the system moves to state Sj when action ut is taken in state S j . As a consequence of taking a particular action in a particular state, the system also receives an observation. Let O = {ouuao, o w i i 0 1 , • • • , ouuaA, o W 2 ) a o , o W 2 i 0 l , • • • , oW 2 ) O A} be the set of observations, where ouuai corresponds to the A C K feedback from the receiver and receiving di C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 1 5 0 incoming packets in the current time-slot. Similarly, oW2 | Q i corresponds to the N A K feedback from the receiver and receiving a; incoming packets in the current time-slot. For convenience, let us consider that variable uj\ has a numerical value of 1 and u>2 has a value of 0. Symbolically, we can write the transition probabilities, rewards and observation probabilities as: V : S x U i—^ 6(5), G : S xU i-^ Rmdfl : S xU >-> 0 (0) . The immediate cost at time-slot n for the problem at hand can be given as the signed weighted sum of two objectives, namely, maximizing the throughput and minimizing the buffer occupancy GT(sn,un) = -p(o;i|x(s n),« n)*(tt n) + ^ ( 0 (6-1) where 0 is a nonnegative weighting factor and signify the importance of achieving throughput at the cost of delay and vice versa. The first right hand side part of (6.1) is the immediate throughput reward GTH(sn,un) = p(wi|x(sn), un)ty(un) and second right hand side part of the immediate buffer occupancy cost GBo = ib(sn). It can be noted that the buffer occupancy cost is proportional to the buffer delay cost and therefore represent cost due to buffer delay indirectly. 6.3.1 Expression for A C K Probability We consider two families of FEC codes: BCH codes with shorter codeword length and convo-lutional codes with longer codeword length. Let us assume that the (n', k') binary FEC block code be capable of correcting t bit errors. Then, the ACK probability for a given channel state c n when action un (and corresponding BCH code of rate k'/n') is taken can be written as p M c - . u " ) = £ ( P e ( c " ) ) ' (1 - P.(c")) B ' - ' (6.2) where Pe{cn) is the bit-error probability for channel state c n . If a convolutional code of rate k'/n' and codeword length n' are used, the A C K probability can be upper bounded by the union bound as ( n1 \ 1 - "dPe(dcn)) (6.3) d=d{,cc J C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 151 where dfree is the free distance of the employed convolutional code and aj. is the weight spec-tra coefficients of the convolutional code (i.e., the number of paths with weight d). Here, we assume that the error detection code can detect all the remaining errors. Although any mod-ulation scheme can be used for transmitting the coded bits, we shall concentrate on BPSK modulation for simulation purposes. For BPSK modulation, the BER can be approximated by, Pe(cn) — |erfc (^\Ja^2P^, where Pt is the transmitted power and a2 is the variance of the Gaussian noise present in the channel. The average power gain for state Ci can be found as fQl afa(a)da a ^ ~ £ / „ ( • ) * . ( 6 ' 4 ) 6.3.2 Observation and State Transition Probabilities The observation probability conditioned on the state sn and action un is equal to p(oUi,a]\sn,un)=p(uJi\x(sn),un)p(aj\c(sn)), Vui etlmdaj € A (6.5) where function c(s) gives the incoming traffic hidden state of the composite state s. The state transition probability for all u{ € USi and s^, Sj € S can be written as Psusj{ui)= P«(*o,«(»i)P3c(»i).x(»i) Y J]p(aik(s»))p(wilx(si),'"») x 5 (ip(Sj) - min(tKsi) + * - bB)) • (6.6) 6.3.3 Problem Formulations In this chapter, our objective is to maximize throughput and minimize buffer delay. The ex-pected long-term average cost per time-slot with stationary policy p, is Assuming that the current system state is perfectly known, the objective is to find the optimal policy p,* over all stationary policies that minimizes the expected long-term average cost given C H A P T E R 6. S C H E D U L I N G W I T H P A R T I A L L Y O B S E R V A B L E C S I 1 5 2 T i m e s lo t n-1 T i m e s lo t n T i m e s lo t n+1 Figure 6.2: Graphical representation of POMDP belief dynamics. by (6.7). To solve this problem, we can use the Bellman equation and for Sj = s\, s2,--- , ss it can be given by [146] the following: A + h(si) = min[GT(si,u) + V] pSitSj(u)h{sj)] (6.8) w h e r e A i s t h e o p t i m a l c o s t , a n d h(si) h a s t h e i n t e r p r e t a t i o n o f a d i f f e r e n t i a l c o s t ( o r r e l a t i v e v a l u e f u n c t i o n ) f o r e a c h s t a t e S*. 6.3.4 Transformation into Belief-State M D P In a POMDP model, the state of the system is not known exactly, so the previous approach cannot be applied directly. However, based on the observations, a belief state space or infor-mation state space of the system can be formed. The belief state space represents a sufficient statistic for the history of previous actions and observations and the optimal actions can be chosen depending on that belief state. The belief state z is defined as a probability distribution over all possible states given the history of actions and observations. The belief at time-slot n can be given by the following: ,o°,u°,s0); Vs n <ES (6.9) p(sn\on,un,on-\un-\ By maintaining the prior distribution p(s n _ 1 ) over states s n _ 1 G <S, the belief can be computed recursively using Bayes rule. In the ARQ case, at the start of a particular time-slot, the observa-tion uses the number of incoming packets information and the A C K / N A K for the action chosen C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 153 in the previous system state and not for the action taken in the current system state. Therefore, we have to slightly modify the belief state update equation for POMDP problem of [159] by introducing the notion of initial belief z(s) and updated belief z(s) as shown in Fig. 6.2. For the problem at hand, the initial estimate of the belief is based on the updated belief of previous time-slot and for particular time-slot n € T, it can be written as s zn(Sj) = ft^p,,,,.^-1)?-1^) for j = 1, • • • , S (6.10) i=l The control policy will be found by a mapping p*(zn) from the initial belief to actions. After getting incoming packets and ACK/NAK observation for certain action taken, the belief of state j = 1,2, • • • , S, at time-slot n can be updated as zn(Sj) = 02p(on\un,Sj) 5><,.i(«B"1)*n"1(*i) (6.H) i=l In (6.10) and (6.11), 0\ and 02 are normalizing constants that make the belief distribution sum to 1. The detailed algorithm for tracking the belief is given in Appendix C. 6.3.5 Optimal Algorithms The concept of belief state is the basis for the operation of POMDP. Whereas an MDP policy dictates an action for every completely observable physical state, a POMDP policy dictates an action for every possible belief state. Again, it can be shown that solving a POMDP on a physical state is equivalent to solving an MDP on the corresponding belief state [159]. Since a POMDP can be considered as a belief state MDP [159], it may seem reasonable to apply a dynamic programming algorithm for an MDP directly (e.g., value iteration algorithm) to find the optimal policy by computing a value function over the belief state space. Unfortunately, the belief space is continuous and has dimensionality R 5 _ 1 , which would mean applying the MDP algorithm to an uncountably infinite-state space. Although there are algorithms that can solve POMDPs exactly and are guaranteed to converge to an optimal solution, none of them are useful in solving our large-dimensional problem at hand. The Sondik/Monahan's enumera-tion algorithm and Cheng's linear support algorithm can be used for solving POMDP problem C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 154 with 2 to 5 states and recently developed witness, incremental pruning, and generalized in-cremental pruning algorithms have been used to solve problem with 10-20 states [162]. It has been shown that finding optimal policy even for a simplified finite horizon POMDP is PSPACE-complete or more complex [163]. In complexity theory, the class PSPACE is the set of decision problems that can be solved using a polynomial amount of space (memory). A decision problem is in PSPACE-complete complexity class if it is in PSPACE, and every problem in PSPACE can be reduced to it in polynomial time. PSPACE-complete problems are considered to be harder than NP-complete problems. The general case of an infinite-horizon stochastic POMDP is EXPTIME-hard for Boolean rewards, and is known to be undecidable for general (but bounded) rewards [164]. The relation between the complexity classes can be expressed as: NP C PSPACE C EXPTIME [163]. On the other hand, there are a number of heuristics for finding policies suboptimally that perform well in different real world situations. In the sequel, we describe three such heuristics and apply them for solving the problem. 6.3.6 Maximum-Likelihood Policy Heuristic The first policy heuristic is a simple maximum-likelihood policy heuristic (MLPH) [165], where at time-slot n for a particular belief zn(s), the policy can be represented as A*ML(2n) = / i M D p ( a r g m a x (6-12) s In this expression, / / M D P ( S J ) * s m e optimal policy1 for state Sj of the system and can be deter-mined as r *s \ GT{si,u) + ^ p ^ ( i i ) / i ( s j ) > (6.13) where s< = s\, • • • , ss, and h(sj), Sj E S can be determined via relative value iteration algo-rithm for average cost per stage MDP using Bellman equation (6.8). The maximum-likelihood policy given by (6.12) is stationary since the optimal policy of the underlying perfectly observ-able MDP is stationary. ^MDp(si) € arg min { ueus. 'Note that other good heuristical state-dependent policies can be used instead of HMDP-C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 1 5 5 This heuristic works as follows: first it finds the most probable state of the system from the belief of the state. If two or more states are equally likely, it chooses one arbitrarily. Then the best policy for that particular state (as if that is the true present state of the system) dictated by dynamic programming algorithm for MDP is applied. A full belief is maintained during execution, but the scheduler uses the maximum-likelihood state for determining the next action (using underlying MDP). This heuristic assumes that the system is in the most likely state and future action will be based upon the underlying system state. It is intuitively clear that this policy can only perform well when the most probable state of system is much more probable than all the other states. As a general rule of thumb, this policy performs well when the entropy of the belief distribution is low. 6.3.7 Voting Policy Heuristic The second policy heuristic is the voting policy heuristic (VPH) [166], which can be regarded as a smoother version of MLPH. A major problem of MLPH is that it completely neglects all but a single state to determine the action. It chooses a particular action that is optimal for the most likely state despite the fact that the system is more likely to be in a state where the other action is the best action. Instead of state, VPH assigns a probability distribution over the actions. As in MLPH case, during planning only the underlying MDP is solved using dynamic programming algorithm, however a full belief is maintained during execution. The policy for VPH at time-slot n can be obtained as This heuristic assumes that the belief state represents competing hypotheses on the state of the system with the policy being the same for several competing hypotheses. The VPH chooses the action that is most probable under these assumptions. (6.14) Si€S C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 156 6.3.8 Q -MDP Policy Heuristic The MLPH and VPH do not address the exploratory nature of the optimal POMDP policy, i.e., these heuristic policies are chosen only to minimize the average cost of the underlying MDP problem. However, the optimal POMDP policies are also information-gathering, that is, they are designed not only to minimize the average cost per stage but also to improve the knowledge of the system state by refining the belief state. We will address here a simple information gathering heuristic, sometimes called Q-MDP heuristic (Q-MDPH) or fully-observable after one step heuristic [162], which computes one step of POMDP value iteration and then uses the MDP Q-function values for future expected cost. For a particular belief, the Q-MDPH is given as where Q*(si,u) is the optimal differential cost corresponding to state-action pair (si,u). We give the complete algorithm for determining optimal Q-function in Appendix D. It can easily be deduced that the Q-MDPH becomes the optimal control policy if the true state of the system becomes observable after a single step. The information gathering aspect of this policy is that it takes into account the ambiguity of the state knowledge in the current step and chooses the actions more conservatively than the previous two heuristics. A drawback of this policy is that it may choose a suboptimal action in case the ambiguity about the system state does not disappear after a single step. 6.4 Simulation Results and Discussions In this section, we present the simulation results that illustrate the performance of the three heuristics discussed in Section 6.3. We compare the performance of these heuristics in terms of average throughput and average buffer occupancy with that of perfectly observable system state (POSS) case. We have used the average-cost per stage MDP to find the policy in POSS case. For all the simulations, we use the following data: the horizon in time-slots over which (6.15) Si€S C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 157 0.1 0.3 0.4 0.5 0.6 Packet Arrival Probability, q Figure 6.3: A comparison among different heuristics and perfect CSI in terms of throughput. The effect of fading rates is also shown Monte-Carlo simulation has been performed H = 106 time-slots, weighting factor 0 = 0, average channel gain 3 = 1, noise power a1 = 1 mW and block rate RB = 104 blocks per second. In all the simulations, we have assumed that the current buffer state is completely observable. In Figs. 6.3-6.9, the incoming traffic is assumed to be Bernoulli distributed with probability of packets arrival being q. C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 158 Figure 6.4: A comparison among different heuristics and perfect CSI in terms of buffer occu-pancy. The effect of fading rates is also shown Observation 1 (The effect of the normalized fading rate on the average throughput and average buffer occupancy): We show the dependence of the average throughput and average buffer occupancy on the maximum Doppler frequency and hence normalized fading rate of the channel for BCH code in Figs. 6.3 and 6.4 respectively. For this experiment, we use fading rates fmTB = 0.01, 0.1. The curves are drawn for three actions that correspond to three different C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 1 5 9 coding rates of a BCH code. As an example, we have used U\ = no transmission, u 2 — (63,18,10) BCH code, u3 = (63,36,5) BCH code. The number of hidden channel states is C = 2, buffer length B = 30 packets, transmission power Pt — 0.8 mW, size of packet Np = 18 bits, block size n' = 63 bits and maximum number of packet arrivals in a time-slot ai = 1 packet. It is seen that both the throughput and the buffer occupancy increase with the increase of packet arrival probability. We also investigate the performance of three heuristics as compared with the POSS case. It is seen from the plot that when the fading rate is slow all three heuristics perform almost in the same manner and as good as the POSS case. As the fading rate increases, Q-MDPH outperforms other two heuristics in the higher incoming packet probability region. It can also be noticed that the throughput increases as the fading rate increases. To explain the change of throughput with the fading rate, let us first consider the POSS case. In a bad channel state, the scheduler takes a lower-rate actions with more redundancy bits, and thus more information packets are accumulated in the buffer. When fading is slow, a lower-rate actions are often successively applied in a row and therefore, due to finite size of the buffer, that may result in packet overflows and a decrease of throughput. Note that when the fading rate increases, the chance of getting better channel state soon after the bad channel state is increased. Therefore, when fading rate is higher, the channel moves quickly from the bad state to good state and the scheduler can take a higher-rate actions in good channel states. This reduces the continual accumulation of the packets in bad channel state, and hence reduces packet overflows and throughput loss. Therefore, the delay decreases as the fading rate increases. In the non-perfectly observable case, the scheduler uses the estimated channel information through the belief state. For the analyzed cases, all policy heuristics can determine the perfect channel state very well from the knowledge of belief state, so their performance is also close to that of POSS case. The loss in the throughput performance of the proposed heuristics compared with the POSS case may be even higher for a higher fading rate. This is due to their uncertain knowledge of C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 160 CSI and large belief state entropy (cf. Fig.6.5). However, the cases of even higher fading rates than the ones considered in this section are of limited practical importance. Note that the value of B equal to zero means the buffer occupancy is not taken into consideration. Although not shown in the figures, we have explored the influence of the weighting factor B of equation (6.1) on the performance of HARQ schemes. It was observed that increasing parameter j3 and consequently placing more importance on the buffer occupancy do not change the through-put and buffer occupancy performance significantly. This effect can be explained by taking into account the fact that minimization of the buffer delay in an adaptive HARQ scheme also maximizes the throughput and vice versa. In Fig. 6.5, we plot belief state entropy as a function of time-slot for fading rates fmTs = 0.01, 0.1, where MLPH is considered and number of channel states C = 4. It can be observed that as the fading rate increases, the level of entropy also increases. It is remarkable that despite the values of entropy for fmTB = 0.1 and C = 4 are close to the upper bound of the belief state entropy, MLPH performs very well compared with the POSS case. Observation 2 (The effect of total number of actions on the throughput:) Fig. 6.6 gives the results for the following settings: number of channel states C = 4, transmission power Pt = 5 mW, normalized fading rate fmTB = 0.01, packet size Np = 60 bits. We show the influence of different numbers of actions (U = 3,5) on the throughput for three heuristics and POSS case. For U — 3, the actions are u\ = no transmission, u2 = (127,64,10) BCH code, « 3 = (127,120,1) BCH code and for U = 5 the actions are u\ — no transmission, u2 = (255,63, 30) BCH code, u3 = (255,123,19) BCH code, u4 = (255,187,9) BCH code, ub = (255,247,1) BCH code. The length of the buffer is kept at B = 30 packets. The results show that throughput increases as the total number of actions increases from 3 to 5. As the number of actions is increased, the system possesses more flexibility to deal with the time-varying nature of the channel which results in its improved performance. The throughput increase is distinct after q = 0.3. Observation 3 (Study of maximum packet arrivals, number of channel partitions, frame C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 161 Maximum-likelihood heuristic, f =100 Hz m I 1 1 1 1 1 1 1 1 r 01 i i i i i i i i i I 0 100 200 300 400 500 600 700 800 900 1000 Time slot Maximum-likelihood heuristic, f =1000Hz 0.5 -Q\ I I I I I I I I 1 0 100 200 300 400 500 600 700 800 900 1000 Time Slot Figure 6.5: Belief state entropy variation as a function of time-slot for two normalized fading rates (fmTB = 0.01 and fmTB = 0.1, where TB = lO" 4). length for convolutional code on the throughput:) We explore the effect of maximum packet arrivals, channel partitioning, and frame length in Figs. 6.7, 6.8, and 6.9 respectively for rate-compatible punctured convolutional (RCPC) code. Since PER of convolutional codes is based on the upper bound, all throughput results should be interpreted as the lower bounds of actual throughput. The higher-rate codes of the RCPC are obtained from a mother 1/2 rate code by puncturing successively greater number of code symbols. For all settings with convolutional codes, we simulate results using four actions that corresponds to coding rate of 0,1/2,3/4, and 1. The rate 3/4 is obtained by puncturing from the mother 1/2 rate RCPC code with memory C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 162 0.2 0.3 0.4 0.5 0.6 Packet Arrival Probability, q 0.7 0.8 0.9 Figure 6.6: The effect of number of actions on throughput as a function of incoming traffic probability. The performance of different heuristics for different numbers of actions are also shown mb = 6 and generator polynomial [133,171]. The free distance corresponds to rate 1/2 and 3/4 is 10 and 5 respectively. First six elements of weight spectra coefficients for rate 1/2 and 3/4 are given respectively by (36 0 211 0 1404 0) and (8 31 160 892 4512 23307) [45,167]. We assume incoming packet size of Np — 384 bits. C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 163 Figure 6.7: Effect of the maximum number of packet arrivals on the HARQ throughput The effect of the maximum number of packet arrivals in a time-slot on the throughput per unit packet arrival is studied in Fig. 6.7 for a\ = 1 packets/time-slot and oi = 2 packets/time-slot. The experimental setup for this observation is as follows: transmission power Pt = 2 mW, number of channel states K = 6, normalized fading rate fmTB — 0.04, buffer size B — 50 packets. Since the size of the buffer is finite, more incoming packet means more overflows. Therefore, the throughput/unit packet arrival decreases as the maximum packet C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 164 0.55 h —0-- P O S S M L P H - e - V P H 6 Q - M D P H C=4 ySc=6 A A' C=10 J.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 Packet Arrival Probability, q 0.9 0.95 Figure 6.8: Influence of different numbers of channel partitioning on the HARQ throughput arrival increases. In Fig. 6.8 we show the throughput performance for different numbers of channel states. It can be seen that all heuristics perform relatively close to that of POSS case for any number of channel states. Note that the comparison between throughput performances for different numbers of channel states is of no merit and it is only fair to compare the relative performances of different schemes for the same number of channel states. This is a consequence of the fact C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 165 that for the same fading rate, increasing the number of channel states produces an effectively slower channel model, due to the imposed birth-death structure of the channel model explained in Section 6.2.2. Because to reach a particular gain level from another gain level, the channel needs more transitions for higher number of states. Note that in this model only transitions to adjacent channel states are allowed, and adding more states effectively reduces channel states mixing rate. We use the following settings for simulations: transmission power level PT — 2 mW, buffer size B = 50 packets, maximum number of packet arrivals ai = 1 packets/time-slot, and normalized fading rate fMTB — 0.04. It is seen that the throughput for higher number of channel states case is lower. Since the throughput is mostly determined by the weaker states and the PER increases in the weaker states as the channel has more states, this result is expected. Fig. 6.9 shows the effect of the transmission frame size on the throughput and compares the adaptive case with non-adaptive case. In this setting, we have kept the system bandwidth, symbol rate, traffic statistics and buffer length fixed. We have simulated results for three frame lengths n' = 384, 768, 1152. The duration of the block, and hence maximum number of incoming packets and transmitted packet in a time-slot in second and third cases are two and three times of the first case, respectively. The buffer size is taken as B = 50 packets. Note that since the time-slots are 2 and 3 times longer in the second and the third cases than the first case, the block rates for the second and the third cases are RB = 5 x 103 blocks per second and RB = 104/3 blocks per second, respectively. Other data for the curves are: transmission power Pt = 0.8 mW, number of channel states C = 4, Doppler frequency fm — 200 Hz, maximum number of packet arrivals ax = 2 packets/time-slot for n' = 384. It can be seen from the figure that with the increase of frame size, the throughput decreases for all heuristics. Two effects have adverse influence on the throughput performance when changing the codeword length: the PER of the convolutional code2 increases with the increase of frame length as can be seen from 2 This fact is reversed for turbo codes where the increase in code length decreases the PER, due to the changing structure of the turbo code facilitated by the use of interleavers. C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 166 • 0 - P O S S 0 11 1 1 1 1 1 i i i I 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Packet Arrival Probability, q Figure 6.9: Effect of frame length on the HARQ throughput (6.3), and the effective traffic is increased when the frame length increases. The latter effect also decreases the throughput since the buffer length is fixed and the likelihood of buffer overflows between the two scheduling decisions increases when effective traffic increases. We plotted figures for non-adaptive case for convolutional code with rate 1/2. Note that the throughput of the adaptive schemes is increased over that of non-adaptive schemes significantly. In practice, longer codewords are used to reduce the burden on the feedback and the scheduler. C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 167 0.65 Q . CO 0.55h cn I 0.45 0.15 0.2 0.25 0.3 0.35 Packet Arrival Probability of State 1, q Figure 6.10: Effect of incoming traffic burstiness on the HARQ throughput Observation 4 (Study of the influence of SBBP traffic on the throughput and the buffer occupancy with different buffer lengths and traffic models:) We consider the incoming traffic distributed as SBBP with two states T = {/i, / 2} for these experiments. We give simulations for two incoming traffic models: TSi with Pfltf2 = Vhd\ = 0-01 a n d TS 2 with Pfuf2 — P/2,/1 — 0.1. Therefore, the dwelling time of TS\ and TS 2 are 99 and 9, respectively for both the traffic states. We show the performances of the system with SBBP traffic in Figs. 6.10, C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 168 0.15 0.2 0.25 0.3 0.35 Packet Arrival Probability, q Figure 6.11: Influence of buffer size on the HARQ throughput 6.11, and 6.12, where the packet arrival probability of the first state is taken qi = q and the second state is taken q2 — 1 — q. In this way, we explore the burstiness of the traffic (or relative unbalances of packet arrivals) on the proposed scheduling schemes while maintaining average incoming traffic constant. The following data are used for all three curves: number of channel states C — 6 , normalized Doppler frequency fmTB = 0.04, maximum number of packet arrivals ai = 2 packets/time-slot. In Fig. 6.10, we show the effect of the two traffic C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 1 6 9 11 10 Ui I B=25 A • P O S S V M L P H - -0 - V P H - No Adaptation B=10 0.05 0.1 0.15 0.2 0.25 0.3 0.35 Packet Arrival Probability of State 1, q 0.4 0.45 0.5 Figure 6.12: Effect of buffer size on the buffer occupancy. Difference between no adaptation and adaptation cases is also shown. model on the throughput for buffer size B — 50 packets and transmission power Pt = 0.8 mW. It can be seen that the throughput decreases as the traffic states become more unbalanced. Also for higher dwelling time, the throughput is less. The Q-MDPH does not perform well when the dwelling time increases. MLPH and VPH perform better in all cases. The effect of buffer size on the throughput is shown in Fig. 6.11 for transmission power CHAPTER 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 1 7 0 Pt = 0.8 mW, and traffic model TSi . It is seen that the throughput increases as the buffer length increases. Since the overflow events are less frequent for a larger buffer and the scheduler has more flexibility to deal with the packets (that means, storing packet in worse channel states and sending them in better channel states), thus this result is expected. In Fig. 6.12, we show the variations of buffer occupancies for different buffer lengths using traffic model TSi. The transmission power has been kept at Pt = 2.0 mW. For adaptive case, it has been seen that the buffer occupancy increases as the buffer length increases. This fact can be explained in the same way as Fig. 6.11. As the buffer length increases, the buffer can store more packets in worse channel conditions and can send them in good channel conditions. For the case of no adaptation, the buffer occupancy is more when the traffic states are more unbalanced. This fact is expected as the number of packets stored in this situation is more than the balanced case and the scheduler cannot take the opportunity of sending more packets in the good channel states. Therefore, the average buffer occupancy is increased for increased unbalance of the traffic states. 6.5 Conclusions We have shown that finding optimal policies for adaptive type-I HARQ coded wireless sys-tem over time-varying channel is inherently a POMDP problem. Since the optimal solution is infeasible, in this chapter, we have explored the effectiveness of three approximate policy heuristic algorithms for the solution of the POMDP-based coding rate allocation problem. We have also compared the performance of these three heuristics with fully observable channel state and no adaptation cases in terms of throughput and average buffer occupancy. Our results have been expressed as a function of input packet arrival probability, Doppler frequency, num-ber of available actions, maximum packet arrivals, number of channel states, frame sizes, and buffer length as parameters. Explored policies are dependent on the belief distribution of the channel and traffic states, which is maintained by tracking the A C K / N A K observation feedback C H A P T E R 6. SCHEDULING WITH PARTIALLY OBSERVABLE CSI 171 from the receiver and number of incoming packets, and action taken at the transmitter. When the fading rate is slow the performance of all the heuristics is almost the same and they pick up the appropriate action in almost all cases as if the perfect CSI is known. However, Q-MDP heuristic consistently outperforms the maximum-likelihood heuristic and voting heuristic as the fading rate increases. It has been demonstrated that the increase in the number of available coding rates (i.e., actions), as well as the increase in the length of the buffer leads to the in-creased throughput. Finally, we have explored the influence of traffic burstiness on throughput of analyzed heuristics. Simulation results have shown that the throughput decreases as traffic states mixing rate becomes slower, and MLPH and VPH cope best with the slower changing bursty traffic. C H A P T E R 7. A C M FOR TYPE-I H A R Q SYSTEMS 172 CHAPTER 7 .ACM for Type-I HARQ Systems 7.1 Introduction In Chapter 6, we explored the coding rate adaptation for type-I hybrid ARQ systems over a partially observable Rayleigh fading channel. The policy heuristic based control laws are given for Markov modulated incoming traffic, and the SW-ARQ protocol is used at the data-link layer for retransmission of the detected erroneous packets at the receiver. In this chapter, we present a cross-layer adaptation scheme that adapts both the coding rate and the modulation (ACM) scheme with buffer occupancy and hidden channel state. We consider Poisson distributed traffic and selective repeat ARQ protocol for retransmission of those packets that are in unrecoverable error. A C M coupled with HARQ has been proposed in different wireless standards, 3GPP HSDPA, 3GPP2 lxEVDO, IEEE 802.16 (WiMAX), etc for increased throughput and reliability over time-varying wireless channels. Among three ARQ schemes, selective repeat ARQ (SR-ARQ) has been reported to show the best throughput performance [25]. Therefore, in this chapter, by employing SR-ARQ, we are interested to analyze system performance to get an upper-limit of the throughput that an ARQ protocol can achieve in practice. A brief summary of our contribution is presented below: C H A P T E R 7. A C M FOR TYPE-I H A R Q SYSTEMS 173 • we study a cross-layer scheduling problem employing A C M at the physical layer and SR-ARQ at the data-link layer by considering the dynamic nature of the problem, due to random channel gain and random packet arrivals. The wireless channel is correlated and assumed to have Nakagami-m distribution. The underlying correlated wireless channel is modeled as a Nakagami-m fading finite-state Markov channel (FSMC) and its state is assumed as hidden at the transmitter. • We consider convolutionally coded M-ary Q A M for transmissions, and cyclic redun-dancy check (CRC) for error detection. We present adaptation techniques that not only maximize throughput but also minimize delay, packet error rate (PER) and packet over-flows. We take into consideration the fact that the channel SNR may be sufficiently good to transmit with highest transmission mode, but the buffer may not have a packet at all to transmit and optimize the mentioned goals. This important fact has been ignored by most of the previous authors in the literature. • We formulate the cross-layer optimization problem as a partially observable Markov decision process (POMDP) and give policy heuristic based control algorithms since the optimal solution is not feasible. We give simulation results to show the performance of two heuristic in this scheme and compare the results with the case when the channel state is fully observable. The remaining part of the chapter is organized as follows. In Section 7.2, we describe the system model, channel model, incoming traffic model and buffer model of our work. The notion of information state and the formulation of the problem as POMDP, and its solution techniques are discussed in Section 7.3. We give simulation results in Section 7.4 and conclude in Section 7.5. C H A P T E R 7. A C M FOR TYPE-I H A R Q SYSTEMS 174 7.2 System Modeling In this chapter, an adaptive type-I HARQ system communicating over an unknown channel with a single-transmit, single-receive antenna is considered. The system diagram is depicted in Fig. 7.1. The system consists of an A C M module at the physical layer and an ARQ module at the data-link layer. The processing unit at the data-link layer is a packet, which comprises multiple information bits. On the other hand, the processing unit at the physical layer is a frame, which is a collection of multiple transmitted symbols. The fading channel over which the system communicates is assumed to be the correlated Nakagami-m. 7.2.1 Description of the System Higher-Layer Application Higher-Layer Application Finite Buffer size=B I C R C - ] Encoder b" Correlated Fading Channel Convolutional Coding Encoder and Modulator u" | A / Demodulator and CRC FEC Decoder Decoder Coding and Modulation Rate Controller SR-ARQ ^ . Controller Observation Feedback from Receiver, w" Figure 7.1: Schematic of the HARQ system employing A C M and SR-ARQ over the partially observable Nakagami-m fading channel. Let us assume that the system state is composite and determined by the buffer state and the channel state. Let <S = B x C = {si, s2, • • • , ss} denote an arbitrary finite set of system states, where B and C are the buffer state space and the channel state space, respectively. We assume that the perfect CSI is not known at the transmitter. But, decoding results of the C H A P T E R 7. A C M FOR TYPE-I H A R Q SYSTEMS 175 previous transmissions are known in terms of observations. Note that A C K and N A K are the observations for the considered HARQ systems. We denote observation state space by Q = {ui, u2} = {ACK, NAK}. In each time-slot n, the scheduler chooses a particular transmission mode T M " consisting of a specific coding and modulation pair (as in HIPERLAN/2, IEEE 802.11a, and 3GPP standards) from a set of transmission modes available for system state s". Since the scheduler does not know the current channel state perfectly, it estimates the system state from the knowledge of the previous actions chosen and the corresponding observations received. Let U = {u\, u2, • • • , uu} denote the available action set of the problem, where U is the total number of actions. Each action un corresponds one-to-one to a transmission mode T M n . Let wn denote the number of packets taken from the buffer corresponding to transmission mode T M " and W — {w\, w2, • • • , wr/} denote the set of the number of transmitted packets. Coherent demodulation and maximum-likelihood decoding are used at the receiver. We assume that CRC used for error detection can detect all the errors. The feedback channel is assumed to be error free. Each data-link layer packet contains A^ bits that consist of the serial number, payload, and CRC bits. After coding and modulation with mode T M " of rate Rn bits/symbol, each packet is mapped to a block containing Np/Rn symbols. wn such blocks together with Nc pilot symbols and control parts constitute one frame to be transmitted at the physical layer. Therefore, number of symbols per frame to be transmitted in a time-slot is Nf = Nc + ^R^E. We assume that packets coming from a higher-layer application are random and bursty in nature and describe it with a Poisson distribution given in (2.1).The packets are first stored in the transmission buffer and then transmitted in the successive time-slots. The buffer fol-lows first-in first-out (FIFO) service strategy. The updated buffer occupancy using SR-ARQ transmission protocol can be calculated from the decoding results and incoming traffic using (4.12). C H A P T E R 7. A C M F O R T Y P E - I H A R Q S Y S T E M S 176 7.2.2 Channel Model As mentioned earlier, we describe the underlying channel with the Nakagami-m distribution. We assume the variations of the channel is slow and model it as a finite-state Markov channel. The probability density function of the received SNR (which is proportional to the square of received signal amplitude) for a Nakagami-m fading environment has a gamma distribution and can be given by the following: In this chapter, we also consider equal probability method for partitioning the channel. The stationary probability of channel state Cj, i — 1, • • • , C can be expressed as (2.6), where the cumulative distribution function of 7 can be given by the following: Since the channel is assumed to be slow, the first-order Markov chain can be described by a Birth-and-Death process and the channel transition probabilities can be computed by (2.3), (2.4) and (2.7). The level crossing rate at the received SNR threshold 7^ either in positive direction or in negative direction can be given by the following: Like Chapter 6, we formulate the cross-layer coding and transmission rate adaptation problem as a POMDP since the system state is uncertain at a particular time-slot. This is because although the buffer state is known at the transmitter, the exact channel state at the time of transmission is unknown. The evolution of our problem in the POMDP framework can be described as follows: at the start of a particular time-slot n, the scheduler takes an action un G U depending on the knowledge of estimated information state. As a consequence, it incurs costs (7.1) (7.2) (7.3) 7.3 Scheduling Techniques using ACM and HARQ C H A P T E R 7. A C M F O R T Y P E - I H A R Q S Y S T E M S 177 G(sn,un) 6 Q and gets an observation ojn(sn,un) £ Q with probability p(un\sn, un) E Ow. The system also moves to a new state with probability p s n s n + i (un) E V. The information state distribution is updated and next time-slot begins. The action un taken at time-slot n maps to a convolutionally coded M - Q A M transmission. Exact closed-form BER and PER for the coded modulations are not available. Therefore, like [69] and the references therein, we rely on the following approximate PER expression for transmission mode T M J : P . M - f 1 , i f ° < ^ < - ( , 4 ) ( Tiexp(-Oa), ifj> j t h i ) where parameters Tj, 0J and 7^ . are mode-dependent (i.e., it depends on the weight distributions of the code). The values of these parameters for particular convolutional code and modulation pair can be obtained by fitting (7.4) with least square method to the exact PER obtained through Monte-Carlo simulations. Let Pp(cj,Ui) denote the average packet error rate for channel state Cj when the scheduler chooses action Uj in a particular time-slot n. Therefore, Pp(CJ, «») can be obtained from (7.1), (2 .6 ) and (7.4), and given by the following: Pp(Cj,Ui) = - | - f1" PPi(7)/7(7)d7 =^ ) (7(m, dnj) - -y(m, da^i)) (7.5) where di = ?r + 6i. When packet errors at the receiver are unrecoverable by the error correction code, a N A K is sent to the transmitter requesting retransmission of the same packet. Therefore, the N A K probability p(u>2\cj, for channel state Cj E C and action Ui E U is given by (7.5). The A C K probability for pair (CJ, u<) is simply p(ui\cj, ttj) = 1 — Pp(cj,Ui). The probability of weT packets in error among u>i packets for channel state-action pair (CJ, u^) can be found as p{Wer\Cj,ui) = (Wi)pp(Cj,uir~(i - pp(Cj,<>r-^ (7.6) \WerJ C H A P T E R 7. A C M FOR TYPE-I H A R Q SYSTEMS 178 Note that the packet error rate and hence N A K / A C K probabilities do not depend on the buffer state. Therefore, for all bn, the N A K probability for state-action pair (sn, un) can be written as It can be understood that maximizing spectral efficiency like [69] and the references therein does not maximize overall throughput for the considered dynamic scheduling problem at hand. Because maximizing spectral efficiency means that the scheduler always chooses highest-order modulation with rate Rv. In weaker channel state, this increases the packet error rate and num-ber of retransmissions, and consequently decreases effective throughput. The buffer may also have less packets than asuitable for transmission with rate Ry. Therefore, to schedule packets both the channel state and the buffer state conditions have to be taken into consideration. In this chapter, our objective is to maximize average throughput, and minimize average delay, packet error rate and packet overflows. Note that maximizing effective throughput automati-cally minimizes packet error rate and packet overflows. Therefore, it is sufficient to consider maximization of throughput and minimization of delay. In our scheme, the SR-ARQ is used to handle packet retransmissions. That is, only erro-neous packets in a particular frame will be retransmitted. The effective immediate throughput reward as a result of taking action un in state sn is given by the following: where p(wsc\sn, un) is the probability of receiving wsc — wn — wer packets successfully (with-out any error) given that wn packets have been transmitted for state-action pair (sn, un). The immediate buffer delay cost for buffer occupancy bn using Little's theorem can be written as (2.27). p(u2\sn,un) = p(u2\cn,un). 7.3.1 Costs (7.7) C H A P T E R 7. A C M FOR TYPE-I H A R Q SYSTEMS 179 7.3.2 Transition Probability The state transition probability for all u{ e USi and S j , Sj e <S can be written as YI ^  p{di)p(u!sc\si, Ui)5 (I/J{SJ) - mm(ip(si) + a4 - io s c , 6B))(7.8) iu s c=0 a;=0 7.3.3 Solution Methodology As discussed last chapter that the optimal solution for the formulated POMDP problem is impractical to determine. Therefore, we apply a maximum-likelihood policy heuristic and a voting policy heuristic to obtain approximate solutions. The details of these two heuristic are discussed in Sections 6.3.6 and 6.3.7, respectively. The immediate cost for the underlying MDP can be given by the Lagrangian sum of immediate throughput and immediate delay as G T ( s n , M O ) = -GTH(sn,n(sn) +r31GD(sn,fi(sn), (7.9) where Pi is a nonnegative constants and signifies the relative importance of delay compared to throughput. In general the problem at hand forms a weakly communicating MDP because some policies may be multichain, yet there exist policies under which the transition probabil-ity matrix is unichain [132]. As mentioned earlier, for any unichain policy, the average cost for a particular state Sj is constant and it is the same for all the states. For weakly communi-cating models, the optimal cost is constant. Therefore, the relative value iteration algorithm for unichain model can be applied directly to weakly communicating model. The Bellman optimality equation for the underlying MDP can be given by (6.8). 7.4 Simulation Results and Discussions We present simulation results for the HARQ scheme using SR-ARQ and A C M in this section. The transmission modes are adopted from the HIPERLAN/2 standard (although any transmis-sion mode can be fitted into our general formulation). With packet length Np = 1080 bits, C H A P T E R 7. A C M FOR TYPE-I HARQ SYSTEMS 180 Figure 7.2: Variation of average throughput with average received SNR for different Nakagami severity parameter m and different policy heuristic. the PER approximation parameters of (7.4) has been found by fitting curves to the simulated PER [69]. The generator polynomial of the mother code is [133, 171]. The coding rates are obtained from the puncturing pattern P2 in the HIPERLAN/2 standard. The modulation for-mat, coding rate, transmission rate (in bits/symbol), T J , 0* and 7 t / l i (in dB) for transmission mode 2,3, - •• ,6 are as follows: (BPSK, 0.5, 0.5, 274.7229, 7.9932, -1.5331), (QPSK, 0.5, C H A P T E R 7. A C M FOR TYPE-I H A R Q SYSTEMS 181 1, 90.2514, 3.4998, 1.0942), (QPSK, 0.75, 1.5, 67.6181, 1.6883, 3.9722), (16-QAM, 0.75, 3, 53.3987, 0.3756, 10.2488), (64-QAM, 0.75, 4.5, 35.3508, 0.09, 15.9784). No packets are transmitted in transmission mode 1. Number of packets transmitted in transmission mode 2, 3, 4, 5, 6 are 1, 2, 3, 6, 9, respectively. Unless otherwise specified, other data for simu-lations are as follows: buffer size 73 = 50 packets, Doppler frequency fm — 100 Hz, number of channel states C — 4, Nakagami parameter m — 1, blocks per second RB = 104, length of the simulation periods 105 time-slots, average packet arrival rate A = A packets/time-slot, maximum packet arrival A = 15 packets/time-slot, weighting factor Q\ = 1. A l l the curves are drawn as a function of average received SNR. In Fig 7.2, the variation of average effective throughput is shown as a function of average received SNR for two values of m. It is seen from the plot that with the increase of average received SNR, the throughput increases. This result is expected since with the increase of average SNR, the packet error rate decreases. Since for a higher value of m the channel gain has less fluctuations, the throughput is increased for a higher value of m. It is seen that the performance of the two policy heuristics are almost the same and close to that of the fully observable channel state (FOCS) case. The effect of different numbers of channel partitioning is shown in Fig. 7.3. Although the throughput is different for these three channel partitioning schemes, it does not change significantly with the number of channel partitions. As the fading rate increases, the storage of packets in the bad channel state is less, due to a shorter stay there. Hence, the buffer overflow is less and consequently throughput is more. This fact is shown in Fig. 7.4. The effect of larger buffer size is shown in Fig. 7.5. It can be seen that by increasing the buffer size from 50 to 100, throughput can be increased slightly. Because when the buffer has more capacity, it can store more packets in the bad channel state, which in turns gives a higher throughput. However, buffer size should be chosen carefully so that it is not under utilized. We have also evaluated the effect of the number of actions, it has been seen that action UQ is unnecessary for this considered traffic and the first five actions yield the same performance. Hence, it can be concluded that increasing number of actions does not always C H A P T E R 7. A C M FOR TYPE-I HARQ SYSTEMS 182 Figure 7.3: Influence of channel partitioning on the average throughput. increase the throughput. The action set has to be chosen according to traffic statistics. Our simulation also shows that the weighting factor Bx does not have any effect on the throughput performance. C H A P T E R 7. A C M FOR TYPE -I HARQ SYSTEMS 183 Average Received S N R in dB Figure 7.4: Effect of fading rate on the average throughput for different heuristic. 7.5 Conclusions We have analyzed the cross-layer optimization problem for HARQ systems with A C M when the state is unknown, due to unavailability of perfect CSI at the transmitter. We have formulated the problem as POMDP, where we have adapted coding and modulation rate depending on the buffer state and the channel state so that effective goodput is maximized, and delay, packet C H A P T E R 7. A C M FOR TYPE-I HARQ SYSTEMS 184 Figure 7.5: Effect of buffer size on the average throughput for different heuristic. error rate and packet overflows are minimized. It has been seen through simulations that the performances of the two heuristic are as good as the fully observable channel state case. Also, the consideration of buffer dynamics along with channel fading has been shown important for adapting transmission rate. For a particular application, incoming traffic statistics should have to be considered to determine the proper action set. C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 185 CHAPTER 8 . Scheduling Techniques for IR-HARQ Systems 8.1 Introduction Incremental redundancy hybrid automatic repeat request (IR-HARQ) is provisioned as a part of EDGE standard and is also proposed as a part of 3G evolution cellular system standards, such as W-CDMA high-speed downlink packet access (HSDPA) and CDMA2000 lx Evolution-Data Optimized (EVDO) for high-speed reliable packet data communications. IR-HARQ employs forward error correction (FEC) technique in the physical layer as well as ARQ technique in the data-link layer to cope with the time-varying fading channels, and to guarantee both the high reliability and the high throughput. In IR-HARQ schemes, information packets are first trans-mitted with no or few parity bits for error detection and correction. Incremental redundancy bits are transmitted upon retransmission request. The receiver combines the transmitted and retransmitted bits together to form a more powerful error correction code to recover the infor-mation. So far we have studied cross-layer adaptation schemes for type-I HARQ systems. In this chapter we study the rate and power adaptation issues for type-II incremental redundancy HARQ systems. We adapt the transmission power and modulation order of an IR-HARQ system based on both the channel state and the buffer state to minimize three goals: transmission power, buffer C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 186 delay and packet overflow. Because of the incorporation of a finite buffer, calculation of delay and overflow is dynamic in nature and we cannot resort to a static optimization problem to find the optimal control policy. Three mentioned objectives are conflicting and therefore the ensuing problem can be formalized as the minimization of average transmission power under the constraints on the average buffering delay and the average packet overflow. Furthermore, we assume that the control actions in the IR-HARQ problem are made at the beginning of the transmission and kept unchanged during the possible retransmissions. Since the optimiza-tion criteria is dynamic in nature, the problem falls under the purview of stochastic dynamic programming methods. Further, due to stochastic nature of the duration of successive decision-epochs and dependence of the costs on the decision-epoch duration, the optimization problem is formulated as a semi-Markov decision process (SMDP) problem. The optimal solution of the formulated cross-layer adaptation problem is found by converting it into an equivalent auxiliary discrete-time Markov decision process (DT-MDP) problem and utilizing linear programming (LP) methods. To our best knowledge, this is the first work that analyzes SMDP-based cross-layer adaptation law under latency and overflow constraints for IR-HARQ systems.1 We briefly summarize the contributions of this chapter below: • We present a general framework for making scheduling decision of the rate and power adaptive IR-HARQ transmission scheme with the objectives of minimizing transmission power, buffering delay, and packet overflow • We propose the SMDP-based model due to the inherent dynamic nature of the problem for obtaining optimal control laws of IR-HARQ systems. This framework provides new way to compute the optimal power and rate allocation policy considering cross-layer optimization goals • We propose and discuss how to choose and calculate the costs for the transmission pa-rameters and show how we can translate the SMDP framework into the MDP framework 'The methodology described in this chapter can equally be applied to any H A R Q scheme with packet (e.g. Chase) combining at the receiver. C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 187 in order to calculate the optimal policy efficiently. The proposed framework can be ap-plied to any system that uses hybrid ARQ method utilizing either incremental redundancy or packet combining techniques at the receiver. • We discuss three different adaptation models for the IR-HARQ schemes and compare their performances with themselves and with the non-adaptive scheme. The convexity structure of the adaptation problem is analyzed. The remaining part of the chapter is organized as follows. In Section 8.2, we describe the system model including incoming traffic, buffer and channel models used in the chapter. The observation probability for the scheme being considered is also described. We explain the formulation of the problem as SMDP and three adaptation models in Section 8.3. In Section 8.4, the CMDP formulation of the equivalent DT-MDP and its solution techniques are given. The non-increasing and convexity properties of the average power with respect to average delay and overflow constraints are also discussed in this section. We give simulation results in Section 8.5 to show the performance of all adaptation models and conclude the chapter in Section 8.6. 8.2 IR-HARQ Modeling We consider a type-II HARQ system using RCPC code with single-transmit single-receive an-tenna in Fig. 8.1. The transmitter is equipped with a finite buffer that can accommodate a maxi-mum of B packets. Discrete-time representation of relevant variables is adopted in this chapter also. The discrete duration of transmission/or retransmission, decoding of received packets and observation feedback constitutes a time-slot (also referred to as block). We represents the du-ration of each discrete time-slot with TB. In general, the transmission/retransmission interval is smaller than the time-slot. However for simplicity of notation, unless otherwise specified, it will be assumed that transmission/retransmission interval is equal to TB. Thus, time-slot k is the interval between time tk and time th+1 as shown in Fig. 8.1. Unless otherwise specified, C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 188 we use superscript k to denote the value of particular variable at A;* time-slot. We assume the channel condition is known at the transmitter perfectly through noiseless feedback channel. 8.2.1 System Model Higher Layer Application (a) • Observation (ACK/NAK) Feedback ! RCPC Encoder Finite — CRC Low Rate 1/N Buffer Encoder Encoder b° Correlated Fading Channel Convolutional Encoder ]Wl 0 0 1.. 0 lY , f*- Modulatorl Puncturing Matrix J -(Demodulator! • Higher Layer Application j i i RCPC and CRC Decoder Rate and Power Controller Perfect Channel Estimator 0>) f H4-Noiseless Feedback of CSI H—H H— f T ° t 1 T ' e t 4 t 5 Figure 8.1: (a) System diagram of the incremental redundancy hybrid ARQ system and (b) Typical sample path for a SMDR Let ak denote the number of incoming packets at the buffer in time-slot k. We assume that the incoming traffic is non-constant, and is independent and identically distributed (IID) with P(a,i) being the probability of a * packets arrival. In particular, for Poisson-distributed traffic, the probability of a{ packet arrivals in time-slot k can be given by p(ak = O j ) = e x p ( - ^ T B ) ^ f i ; O i € {0,1, • • • , A}, where A is the maximum number of packet arrivals in a time-slot and P(ak = a A) —+ 0. Therefore, the state space of the incoming traffic can be expressed as A — { a 0 , a i , • • • , a ^ } . Assume that at the start of decision-epoch n G Z* = {0 ,1 , • • • }, the scheduler chooses a particular action un to transmit wn packets from the buffer. The buffer occupancy bn and channel condition c" in decision-epoch n determines the choice of action un. Each decision-epoch consists of 1 up to a maximum of R + 1 time-slots, where R is the maximum number of retransmissions. The duration of the decision-epoch is random variable and depends on the decoding result. The control action is taken at the start of a decision-epoch and the same C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 189 action is continued until the end of that decision-epoch. Each control action corresponds to a specific modulation constellation and transmitter power level. We denote decision-epoch by the superscript n to distinguish it from the time-slot (which is denoted by superscript k in this chapter). Let B = {6 0,61, • • • , bB} denote the buffer state space in terms of packet occupancy, where b{ corresponds to i € {0,1, • • • ,B} packets in the buffer. Note that no transmission is possible when the buffer has an insufficient number of packets for transmission. That is, the transmitter is in idle mode when bn+1 < w, where w is the least number of packets that can be transmitted with any action. The buffer dynamic that gives the number of packets in the buffer at the start of a particular decision-epoch can be given by the following: bn+1 = bn - r]wn + ak + • • • + a k + r (8.1) where the multiplier 77 has value of 1 for positive acknowledgment and 0 for negative acknowl-edgment feedback in the last retransmission. In (8.1), an = ak is to be assumed and r is the random variable that characterizes the number of retransmissions. Note that the A C K and N A K feedback from the receiver that reflect the decoding result constitute the observation for the IR-HARQ scheme. Let C i > C 2 > • • • > CK denote the K rates offered by a family of RCPC codes which are obtained from a best low rate code CK = 1/iV (e.g., 1/2 or 1/3). Parity check bits m^c for error detection, and tail bits mtb to properly terminate the encoder memory and decoder trellis are appended with TOJ& = wnNp information bits that correspond to wn packets taken from the buffer with packet size Np bits/packet. Total m r = rriib + " V c + mtb bits are encoded with the original mother code of rate CK encoder. In the first transmission, code bits in the starting code C\ are sent to the receiver over the correlated fading channel with the modulation constellation and transmitter power level determined by the scheduler. A Viterbi decoder is used for error correction followed by cyclic redundancy check error detection. If no error is detected, the receiver sends an A C K to the transmitter and next decision-epoch starts; otherwise, the receiver sends a NAK. The incremental redundancy bits yielding code C2 from code Cu which were deleted by puncturing process, are then transmitted and decoding is performed using code C2 C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 1 9 0 by combining the first and the second sets of transmitted bits. This process is continued until decoding process results in no error being detected or the maximum number of retransmissions is reached. In either case, the buffer occupancy is updated and the whole process is started in next decision-epoch. Note that, the maximum number of retransmissions is less than the number of available rates K and the Viterbi decoding process is performed by combining the current and all the previous received bits to form a lower rate code for error correction. The wireless channel in the analyzed IR-HARQ system is assumed to be ergodic flat-fading obeying Rayleigh distribution. We model the channel as a finite-state Markov channel as given in Section 2.2.3. We use equal probability method for partitioning the received channel SNR. In a special case when C = 2, the model can be described as a Gilbert-Elliot channel containing a "bad" state (state c0 and "good" state (state c2). The steady-state probabilities of such two 1 — Pc c states channel can be given by 0* = 2 - ( p c I'+Pc"! ) ' * = ^ ^" 8.2.2 R C P C Code and Observation Probability In a family of rate compatible punctured convolutional codes, all the code-word bits of the higher-rate codes are embedded in the lower-rate codes, which guarantees smooth transition to different lower rates. A mother rate 1/N convolutional code is periodically punctured with period V to obtain a family of rate compatible convolutional code with decreasing rates, y^, where L can be varied from 1 to (N — 1)V. The operation of deleting bits from the output of the low-rate 1 /N encoder is represented by a TV x V puncturing matrix. A zero is used to indicate deleted bits. Therefore, for variable code-rate adaptive schemes, only the puncturing matrix needs to be chosen judiciously so that all the puncturing codes of interest are obtained from the same low-rate encoder. Assume that a code Cr is obtained from a family of RCPC codes by combining r successive transmissions starting with C\. Thus, given channel states {c 1, c 2, • • • , c r}, the upper bound for the first error event probability of the Viterbi decoding algorithm with code Cr is expressed C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 191 as [56], PE{CT\c\c\--- ,c r) oo * E £ £ • • • £ « f ) ( ^ ^ ^ • • • ^ ^ ) ) f t ( ^ ^ ^ • • • . d ( r ) l ^ ^ • • • . o "•—"•free ^ dW+d^U \-d^=d (8.2) where d ^ e e is the free distance of code Cr, and adr\d^\ d( 2\ • • • , d^) is the number of paths of weight d, d ( 1 ) is the distance contribution of code Cu d (2) is the contribution of the added bits to Ci yielding code C2, and so on. The term P d (d ( 1 ) , d ( 2 ) , • • • , d ( r ) | c \ c 2, • • • ,c r) is the probability that a wrong path at distance d from the correct path is selected and, for hard-decision decoding [56], it can be given by2. pd(&\ <P\ • • •, dt-y, c\ • • •, o = Y (tO 3d - ec0(d(1)-ei) ei ^ ? (t22))e"(1" £c2)(d(2)"e2) • * (t>)e?(1" £cr)(dW"er) (8'3) X e2 where eci, i = 1,2, • • • , r is the bit error rate (BER) in channel state c'. We consider BPSK and M - Q A M for the transmission of coded bits. The average BER for particular channel state for BPSK and M - Q A M can be found from (3.18) and (3.17), respectively, where average received SNR can be found using (2.20). In (8.3), et is the number of bits received in error among d ^ bits in their specific positions transmitted in channel state c\ The d ^ bits locations are determined by the specific path through the decoding trellis. The summation of ej satisfy the following inequality: 5>>£^ ( i ). (8.4) 2 i=l i=l 2To the best knowledge of the authors, the soft-decision decoding analysis for packet error rate of the rate-compatible codes in time-varying Markovian channels is currently not available. However, as discussed in [158], Section 8.2.4, coding gain achieved by using the soft-decision decoding instead of hard-decision decoding is 2.5dB or less for binary codes in additive gaussian noise channels. Therefore, under the assumption that channel gain is changing slowly, it is reasonable to conclude that approximately the same coding gain of 2.5dB between soft-decision and hard-decision decoding for the rate-compatible convolutional codes can be achieved. C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 192 It is understood that at the event of decoding success, i.e., when the received packets are totally error free, the transmitter received an A C K observation. Truncated ARQ, where the num-ber of retransmissions are limited to finite number (e.g., 2 or 3), has been proven to increase throughput and decrease delay at the same time. However, successful transmission cannot be guaranteed even after R retransmissions and there is some probability of packet error [55]. Let us denote the event {decodingfailure with code CT under channel states {c1, c 2, • • • , c r}} with {NAKlc 1 , c 2, • • • , c r}. The probability of {NAKlc 1 , c 2, • • • , cr} with action u is given by the following: PN(c\ c\ • • • , cr, u) = 1 - ( 1 - PE{CT\c\ C\ • • • , cr))1, (8.5) where I is the number of stages in a trellis for decoding (mib+rricrc+mtb) information bits with code Cr — y^£- and I = mib+m^-c+mtb _ j^eTe^ w e assume that all the errors can be detected by the error detection code. Note that for the case of C\, the probability of decoding failure or NAK, P T V ( C 1 > u) i s given by the following: PN(c\u) = 1 - (1 - eci)mib+m^+mtb. (8.6) The probability of the event {decoding success with code CT under channel states {c 1, c 2, • • • ,cr i.e., the probability of {ACKlc^c 2 , • • • ,c r} can be found as, PA(cl,c2, • • • ,cr,u) - 1 -PN(cl,c2,--- ,cr,u). 8.3 SMDP Formulation of the Scheduling Problem We have discussed in Section 8.2 that the time between successive control choices is variable and depends on the current state and the choice of action. The cost per decision-epoch depends on the time required for transition from one state to the next. Therefore, the problem at hand forms a semi-Markov decision process (SMDP) problem ( [146], Section 5.3). The semi-Markov decision process problem can be modeled through a tuple {S, U, J7, Q, G}, where C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 193 • S = {si, • • • ,ss} is system state space that contains a finite number of states. The system state of the IR-HARQ problem at hand is composite and consists of buffer state and channel state. We represent it as S — B x C — {(bo, c i ) , (61, ci), • • • , (bB, cc)}, where total number of states, S = (B + 1)C. • U = {ui, • • • ,uu} is finite action space. We consider three adaptation models, where each action is mapped to a set of transmission parameters. The action mapping for these cases are discussed in Section 8.3.1. We denote by Ua C U those actions that are available at state s E S. The choice of action in a state is determined by a policy. In general, the policy 7r in the policy space TI can be described as TT = {/z1, \i2, • • • }, where action un = /i"(s") is applied at decision- epoch n. • Sojourn time for decision-epoch n, Tn = i n + 1 — tn represents the time spent in a partic-ular state before moving to the next state, where tn is the time of occurrence of the start of the nih decision-epoch with t° = 0 (see Fig. 1(b) for pictorial details). In IR-HARQ scheme, the sojourn time could be maximum 1 time-slot for first transmission, 2 time-slots for first retransmission, R + 1 time-slots for R retransmissions and depends on the channel state and the action chosen. • Q is the set of transition distributions (also called SMDP kernels) and Q S i , S i (j, •">) rep-resents the probability of moving from state s< to state Sj at or before time r i f action is chosen. • Q is the set of cost matrices. We denote the cost associated with state-action pair (s, u) for objective "X" by Gx(s, u). In this chapter, we consider minimization of three ob-jectives, namely, power, delay and buffer overflow. The costs associated with different objectives will be explained in Section 8.3.3. We consider the average cost criterion for scheduling packets in IR-HARQ schemes. There are two natural definitions of average cost per time-slot for SMDP. According to one definition, C H A P T E R 8. S C H E D U L I N G T E C H N I Q U E S F O R I R - H A R Q S Y S T E M S 194 called time-average, the average cost is the limit of expected total costs for a specified policy ir over the finite deterministic horizon divided by the length of the horizon, G" = HmsupV { fGx{s(t),u(t))dt\ . (8.7) According to the second definition, referred to as ratio-average, the average cost is the limit of the expected total costs over finite number of jumps divided by the expected cumulative time of these jumps, G^lrm^np^—^E^J^ Gx(s(t),u(t))dty (8.8) where s(t) = sn and u(t) = un for tn < t < tn+1, and T m is the completion time of the mth transition. The expectation operator E£ is the conditional expectation when the probability measure is determined by the policy IT, and the conditioning event is {s° = s}. Although, in general these criteria are different, for the unichain problem at hand time-average cost equals ratio-average cost [146]. Therefore, we adopt the second definition due to its analytical conve-nience. 8.3.1 Three Adaptation Models We consider three different adaptation models for the IR-HARQ problem, each of which is described by choosing different set of transmission parameters of the SMDP problem.. Let 8 — {e\,e2, • • • ,e£j} denote the set of all allowable transmitter power levels, and W = {wi,W2, • • • ,u>w} denote the set of available transmission rates that corresponds to the set of modulation constellations V = {vi,v2, • • • , % } . Two mapping functions $ and re-spectively map the action into the power level and transmission rate, i.e., $ : U i-> £ and * : U ^ W. P o w e r A d a p t a t i o n w i t h C o n s t a n t P o w e r t h r o u g h o u t t h e D e c i s i o n E p o c h In this case, each action corresponds to a transmission power level and the number of trans-mission rates W = 1. The power level Pt = $(un) chosen at the start of a decision-epoch n is C H A P T E R 8. S C H E D U L I N G T E C H N I Q U E S F O R I R - H A R Q S Y S T E M S 1 9 5 kept fixed until the end of that decision-epoch. P o w e r A d a p t a t i o n w i t h I n c r e m e n t a l P o w e r f o r A d d i t i o n a l R e t r a n s m i s s i o n Changing transmission power level and taking corresponding action during the retransmission phase would be a more general problem, and may decrease the number of retransmissions compared with the case where control action is decided at the beginning of the decision-epoch and kept fixed during the retransmission phase as in previous section. However, this approach would considerably increase both the computational burden of the device and the memory needed for storing the optimal transmission policy. For example, if we consider the problem where transmitter power levels are adapted in each time-slot irrespective of new transmission or retransmission of an IR-HARQ system, the problem can be formed as a constrained Markov decision process, where the state space would have to be increased to include all the previous retransmission informations. Therefore, to transmit a particular packet, all previous actions and channel states have to be tracked from the beginning of the new transmission up to the current retransmission. This is necessary to compute the probability of a packet error in a certain retransmission time-slot (see (8.2) for details) and form the costs to find the optimal policy. Storing previous action and channel states means that the state space would have to contain B x C x (C x U)R states instead of B x C states when no transmission adaptation is considered during retransmission. This reduces the feasibility of the implementation of such policy even for moderate dimensionality of state and action space. One way to deal with this problem is to use pre-decided increasing power levels in succes-sive retransmission time-slots. We assume that the transmission rate throughout the decision-epoch remains the same, but transmission power is different in different time-slots. Let ef and denote the transmission power levels for first transmission and r t h retransmission, respec-tively when an action corresponds to ith power level has been chosen, i.e., e{ = (ef^ej1*,--- ,ej r )). Therefore, each action corresponds to a set of power levels, where successive power levels are increasing and applied in consecutive retransmission time-slots. C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 196 Joint Rate and Power Adaptation We adapt both the transmission power and the rate simultaneously for this model and there-fore each action corresponds to a pair of transmission power level and rate. We denote the set of the pair of transmission power and rate by X — £ x W — {x\,x2, • • • ,xu} — { ( e i . w i ) , (ei,iw2),-• • , ( « £ , % ) } . 8.3.2 SMDP Kernel and Transition Probability The transition probability is defined as the probability of switching from current state to future state at/or before next decision-epoch. Therefore, for certain actions, it determines the length of the decision-epoch in terms of time-slots. Since the transition probabilities and the costs for a state-action pair depend on the length of the decision-epoch, the transition probability distribution is a very important parameter for an SMDP problems. Mathematically, transition distribution specifies the joint distribution of the transition interval r and the next state s n + l and for a given state-action pair (s n, un) it can be expressed as [145] Q a n ) S „ + i ( r , u n ) = P{tn+1 -tn < T , s n + l \ s n , u n } . (8.9) Let, for a particular fixed point T) on the time-axis, / ( r ,T/ ) denote a step function that has value 1 if r > T/ and zero elsewhere, f(r,Tf) = { 1, ifr>Tf; 1 (8.10) 0, elsewhere. In practical systems, truncated HARQ with limited number of retransmissions have been found to optimize both the throughput and the delay, i.e., it increases the throughput and minimizes the packet delay in the buffer at the same time [55,58]. It is also intuitively clear that rather than retransmitting large number of times, it is better to try limited number of times and if still decoding fails, then transmit again with a higher transmission power and/or a lower rate. Without loss of generality and to avoid cumbersome long expressions, we assume that the C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 1 9 7 maximum number of retransmissions is two. Therefore, the transition distribution function can be expressed as Q3nt3n+1(T,un) = f(r,TB)PA(cn+1\cn,un)PA(bn+1\bn,un) +/(r, 2TB)PKA(cn+1\cn, un)PNA(bn+1\bn, un) +/(r, 3TB)PNiNtA(cn+1\cn, un)PNtNA(bn+1\bn, un) +/(r ,3T B )P^ J v, N (c" + 1 |c" , U ")F i v,^(6"- t - 1 |6" , W n ) (8.11) where the channel transition probabilities upon getting observation for c n + 1 G C are given by the following: PA(cn+l\cn,un) = PA(cn,un)PcntC^ (8.12) PN,A(cn+1\cn,un) = YpN(cn,un)PA(cn,ck+1,un)PcntCk+lPck+l}Cn+l (8.13) ck+i PNlN,A(cn+1\cn,nn) = Y, Y P " ( C " ' U " ) F w ( c n , C * + 1 , Un)PA(0n, C * + 1 , C k + \ Un)P^ c k + i P c H l c H 2 P C H 2 , C » + 1 , (8.14) P O T ( c " + 1 | c \ u n ) = PN(cn, Un)PN{Cn, C k + \ Un)PN(cn, C k + \ C k + 2 , « " ) P c „ i C H l P c H l , c H 2 P C H 2 , c - . + l ck+2 ck + l (8.15) The term PN<NtA(cn+1\cn,un), for example, is the probability of switching to channel state c n + i f r o m channel state cn for action un, which causes N A K in the first transmission and in the first retransmission and causes ACK in the second retransmission. In (8.11), the term PN,N,A{bn+1\bn, un), for example, is the probability of occupying buffer state bn+1 from buffer state bn for incoming traffic an and action un, which causes N A K in the first transmission and in the first retransmission and causes ACK in the second retransmission and can be given by, PN,NAbn+1\bn,un) = Y 6{bn+1-(bn-^f(un) + an)}P(an), (8.16) ane{0,l,-,3A} C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 198 where P(an) is the probability of total an E {0,1, • • • , I x A} packet arrivals in I E {1, 2,3} time-slots. Other probability terms can be explained similarly. We assume that for all states sn and s n + 1 , and controls un E Usn, Q s n s n + i (r, un) are known and that the average transition time is finite, i.e., / •OO / rQ s n i S n+i (d r ,u n ) < oo. (8.17) Jo The expected value of the transition time corresponding to state-action pair (sn, un) can be given by, /oo TQsntSn+1(dr,un) = Y2TB[PA(cn+1\cn,un) + 2PNtA(cn+1\cn,un) s . . - r i f c o c n + l +3PNtN,A(cn+1\cn,un) + 3PNtNtN(cn+1\cn,un)}. (8.18) As a consequence of choosing a particular action un in a particular state sn, the system moves to a new state sn+1 with probability given by transition probability. The transition probabilities can be specified by transition distributions via, psn^+i(un) = lim Q a n , a „ + 1 ( r , u n ) = F i 4 ( c " + 1 | c B , « n ) i ' A ( 6 n + 1 | 6 n , « n ) Tl—>00 +PN,A(cn+1\cn, un)PNtA(bn+1\bn, un) + PN,N,A(cn+l\cn, un)PN,NtA(bn+1\bn, un) + P W ( c n + 1 | c " , u n ) F m ( 6 " + 1 | 6 n , i x " ) . (8.19) 8.3.3 Costs Associated with Different Objectives In this chapter, our objective is to minimize three parameters that guarantee the QoS require-ments for the IR-HARQ problem: transmitter power, buffer delay and packet overflow. We discuss the corresponding costs for achieving these objectives in the following sections. The one-stage expected transition cost for objective "X" corresponding to state-action pair (s", un) is defined as * S poo Gx(sn,un)= / 9x(sn,un,sn+\r)rQsn!sn+1(dT,un) (8.20) C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 1 9 9 Power Cost Minimizing transmission power is of particular importance for wireless devices that usually operate with a battery of limited energy. The power cost for a particular state-action pair is the transmission power corresponding to that action and does not depend on future buffer state. The immediate power cost for state-action pair (sn, un) can be written as GP(sn,un)= TB[gP(;sn,un,sn+\l)PA(cn+1\cn,un) +gP(sn, un, sn+1,2)PN,A(cn+1\cn, un) +gP(sn, un, s n + \ 3)PN,N,A(cn+1\cn,*B) +gP(sn,un,sn+\3)PNtN,N(cn+1\cn,un)}, (8.21) where gP(sn, un, s n + \ 1) = ef and gP(sn, un, s n + 1 , k) = ef + r = !,••• ,Rare the immediate power cost for first transmission and total up to r* retransmission, respectively3. For the first and the second models in Section 8.3.1, the transmitter power does not change during the decision-epoch, therefore, e,-^  = • • • = e\R\ It can be noted that (8.21) is sum of four terms. The first term is due to ACK in the first time-slot, the second term is due to N A K and A C K in the first and the second time-slots, respectively, the third term is due to NAK, N A K and ACK in the first, second and third time-slots, respectively, and the fourth term is due to NAK, N A K and N A K in the first, second and third time-slots, respectively. Buffer Delay Cost Since different traffic have different delay sensitivity, delay is another important parameter that quantifies QoS requirements in modern wireless networks. It can be noted that since packets 3The above power cost formulation can also account for unequal transmission intervals in differ-ent equal duration time-slotted retransmissions by weighting transmission power levels with appropri-ate factors. For example, for a maximum of two retransmission scheme, if the transmission inter-val of the first transmission is eiT^ and transmission intervals of the second and the third retrans-mission are e^Ts and e^Ts, respectively, then the transmission power levels should be weighted to eigP{sn,un, sn+1,1), e2gP(sn,un, sn+1,2) and e3gP{sn,un, sn+1,3) C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 200 are coming in each time-slot, delay is not fixed throughout the decision-epoch. The immediate delay cost, therefore, for IR-HARQ problem depends on the present buffer state as well as the next buffer state, and on the stochastic evolution of the buffer occupancy between two decision epochs. We proposed three variants of approximate immediate buffer delay cost in this chapter. We show their differences numerically in Section 8.5. Three models for immediate delay cost are as follows: First Model is based on the buffer occupancy at the start of a decision-epoch gD(sn, un, sn+1) = Vc", c" + 1 E C (8.22) Second Model is based on the average buffer occupancy during a decision-epoch gD(sn,un,sn+1) = 6 " + ^ 1 ~ 2 i Vc" ,c n + 1 E C (8.23) Third Model is based on the buffer occupancy at the end of a decision-epoch gD(sn, un, sn+1) = ^ = ^ ; Vc", cn+1 E C (8.24) The expected delay cost for the IR-HARQ system with two retransmission is given by the following: GD(sn,un) 9D(sn,u\sn+1)TB[PA(cn+1\c\un)PA(bn+1\bn,un) •bn+1escn+1ec +2PNtA(cn+1\cn:un)PNA(bn+1\bn,un) +3PN>NlA(cn+1\cn, un)PNtNA(bn+l\bn, un) + 3 P O T ( c n + 1 | c n , un)PNtNttf(bn+1\bn, un)) (8-25) Buffer Overflow Cost It can be noted that while the scheduler is trying to send packets in a particular time-slot, some incoming packets may be dropped due to insufficient space in the buffer. Therefore, packet overflow rate from the buffer is an important QoS requirements when the incoming C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 201 traffic is bursty and the buffer is of finite length. The buffer overflow rate costs depend on the current buffer occupancy, maximum number of packets that can come in a time-slot and the observation for certain action and channel state. When the buffer has B—A+l or more packets, then 0,1,2, • • • , up to a maximum of 3^ 4 packets can be dropped as a result of overflow. The expected overflow rate cost for above buffer states can be given by the following: A G0(sn,un) = J2 TB[PA(cn+1\cn,un) £ P(an)(an-rn) cn+1eC a n = l + r n 2A +PNAcn+l\cn,un) Y2 P(an)(an-rn) an=l+r" 3A +PN<NAcn+1\cn,un) J2 P^n)(an-rn) a n = l + r n 3A +PN,N,N(cn+1\cn,un) P(an)(an-rn + $(un))}, (8.26) a " = l + r " where rn is maximum number of packets that can be accommodated without overflow and equals B + 3>(un) - bn. Note that when B - 2A +1 < bn < B - A, the number of packets that can be dropped due to overflow is 0,1,2, • • • up to a maximum of 2 A Therefore, the expected overflow cost is given by the sum of last three terms of (8.26). WhenB—3A+1 <bn < B—2A, the number of packets that can be dropped due to overflow is 0,1,2, • • • up to a maximum of A. The expected overflow cost for these buffer occupancies can be given similarly. For all other buffer occupancies, no packet overflow occurs, therefore the cost is zero. 8.4 Solution Techniques Semi-Markov average cost problem formulated in Section 8.3 can be transformed into an auxil-iary discrete-time average cost problem, which can be solved easily with the dynamic program-ming algorithms for DT-MDP. The details of the equivalence between the SMDP and DT-MDP is discussed in the next section. C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 202 8.4.1 Equivalent Auxiliary DT-MDP Formulation for SMDP Let us assume that pai,3i{ui) < 1 for all Sj 6 5 and Ui e USi, and 7 be any scalar satisfying following inequality: 0 < 7 fSi(Ui) l-Psi<Si{Ui) The Bellman's optimality equation for the equivalent auxiliary discrete-time problem is given by the following: h(si) = min <>S , Si = s i , - - - ,ss, (8.28) G(si,Ui) - X + YPsi,si(ui)h(sj) where the relation between the differential cost function is given by the following: h(si) = -fh(si), Si = s i , • • • , ss. (8.29) The transition probability corresponding to state-action pair ( S J , for auxiliary discrete-time problem can be obtained from the following: In (8.30), for all Si and Sj, we have, as P.i,«,-(«i) > 0, Y P°i,"AUi) = X> Psi,s>(Ui) = 0 i f 3 n d 0 n l y i f Psus^Ui) = 0. (8.31) The expected average cost per time-slot corresponding to state-action pair (s^Ui) for the aux-iliary problem can be given as G( S t ,<) = ^ i ^ (8.32) A SMDP is considered unichain if every policy, p,n induces a single recurrent class plus pos-sibly an empty set of transient states (i.e., under every p,n, the state process is an ergodic Markov chain). For finite SMDP with unichain structure and bounded costs, the optimal policy is stationary and Markovian, i.e., it is only dependent on the current system state. Note that the infinite horizon average cost of a finite unichain SMDP are not dependent on the initial C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 203 state s° [145] and thus dependence on the initial state has been dropped from the notation. The analyzed transmission rate and power adaptation problem for the IR-HARQ system pos-sesses the unichain structure. To prove this statement, we first note that the channel compo-nent of the state variable is independent on the actions and its evolution is ergodic. Next, we demonstrate that the buffer component of the state variable is also ergodic under any policy. Let the current buffer state be bn, then the next buffer state, bn+1 can fall into the interval of [max(6n — iui, 0), • • • , min(&n + (R+ I) A, bs)] with non-zero probability under any action. Repeating this reasoning for a sufficient number of decision-epochs, any buffer state is reach-able from any other buffer state. Thus, the buffer state component of the state variable is also ergodic under any policy. This guarantees that the SMDP has a unichain structure. The auxiliary DT-MDP and the SMDP have the same probabilistic structure [146]. Thus, if a stationary policy is unichain for SMDP problem, the same is true for the auxiliary DT-MDP problem. Therefore, dynamic programming (DP) algorithms for DT-MDP can be applied to the auxiliary problem in order to solve the semi-Markov problem. The multi-objective DT-MDP problem can be solved in two ways. In unconstrained Lagrangian formulation, the optimal policy is obtained iteratively using DP algorithm by solving the Bellman's optimality equa-tion (cf. [132]). In constrained formulation, the long-term average cost for one objective is minimized keeping other costs (called constraints costs) below some specified bounds. We consider constrained Markov decision process (CMDP) formulation for the equivalent DT-MDP problem. The motivation of taking CMDP formulation is its mathematical artistry of solving problem with large number of states and easy incorporation of more than one con-straints. 8.4.2 Constrained MDP Formulation The CMDP problem can be expressed by the following equations, where our objective is to find optimal policy IT* over the set of all stationary policies II that satisfy [145] the following: mmGP, subject to: GnD < D and GQ < Pof, (8.33) 7r£n C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 204 where long-term expected average power cost for policy 7r can be given by the following: H GP = lim sup -jrpE„ J2GP(sn,n(sn)) n=l (8.34) Constrained long-term expected costs GVD and GQ in (8.33) for policy it can be given, respec-tively by G ^ l i m s u p - ^ H n=l and G S = l i m s u p - E w H n=l (8.35) (8.36) Nonnegative constants D and P0f are the maximum allowable long-term average delay and long-term average overflow rate. The choice of these parameters depends on the QoS require-ments of a particular application. The constraints (8.33) are called active if the equality holds for the optimal policy TT*. The CMDP formulation can be solved using linear programming method as given in next section. 8.4.3 Linear Programming Solution Technique Let v(s, u) represents the "steady-state" probability that the process is in state s and action u is applied. We seek to find the control policy which is represented in terms of probability distribution v over S x U. The optimal policy u* can be obtained by solving the linear program: min V" Gp(si,Ui)v(si,Ui) subject to: ^ GD(si,Ui)v(si,Ui) < D; sies,ui€USi ^2 Gote.UiMsi.tO < Pof (8.37) 8ies,v.i£USi v(si,Ui) = 1; i/(si,Ui) > 0; VSJ G SandVuj G USi. Since there is an one-to-one correspondence between the feasible solution of the LP and the feasible solution of CMDP, then there exists an optimal randomized stationary policy /z* for the C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 205 CMDP problem if there exists an optimal solution v* to the LP problem [145]. The probability of applying action u € Us in state s € S satisfies, d ^ { U i ) = v v*{Ttl u ^ i f £ > °- ( 8 - 3 8 ) The linear program given above in (8.37) can easily and very efficiently be solved using interior point methods [144] or using optimization toolbox in any mathematical software package (e.g., function l i n p r o g in MATLAB). 8.5 Simulation Results In this section, we present simulation results of the IR-HARQ scheme for all the three adap-tation techniques. Unless otherwise specified, we use the following sets of data: number of channel states C = 2, buffer size B = 100 packets, power P — 1 mW, average received SNR for P is 7 = 1, maximum number of retransmissions R — 2, block rate RB = 104 blocks/sec, average arrival rate A — 1 packet/time-slot, maximum number of packets arrival in a particular time-slot A = 7 packets/time-slot, number of actions U = 3 and binary phase shift keying (BPSK) transmission. The transmission power Pt may depend on the action taken and on the particular time-slot within a particular decision-epoch. The set of rates of the family of RCPC codes at the transmitter is {1,1/2,1/3}, which are generated from a parent rate 1/3 code with memory mtb — 4 (Table IV, [56]). We use LP to find optimal randomized scheduling policy and average transmission power for specified delay and overflow bounds. Observation 1 (Power Adaptations Only). First we consider proposed first adaptation model given in Section 8.3.1 We choose the set of power levels so that for all actions and channel states, the values of ACK/NAKs probabilities are distinct and provide diversity for the choice of power levels in different channel states. In Observation 2, we show an optimization procedure to calculate optimal power levels for a specific delay and overflow constraint. For simulation purposes in Figs. 8.2-8.7, we use the following: set of power levels £ = {6.4,15,30} mW, C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 206 6h n i 1 1 r - e - / m r „ = o.oi — fmTB = 0.02 — fmTB = 0.05 -*- fmTB = 0.07 •*-fmTB = 0.1 6 7 8 9 10 Average Delay [Time-slot] 11 12 13 14 15 Figure 8.2: Trade-off between average transmitter power and average buffer delay for con-stant incoming traffic arrival, specified overflow bound and different fading rates. Comparison between power adaptive and no adaptation is shown. number of packets taken for transmission in each decision-epoch w = 2, and overflow bound P0f = 10 - 4 . When the scheduler has nothing to send, the transmit power is equal to zero as it is in idle state on that time. In Fig. 8.2, we show the trade-off between the average transmission power and average buffer delay for constant incoming traffic of 1 packet/time-slot. It is seen C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 207 Figure 8.3: Trade-off between average transmitter power and average buffer delay for Poisson-distributed incoming traffic arrival, specified overflow bound and different fading rates. The effect of different buffer sizes is also shown. from the figure that the power decreases as delay increases and the rate of decrease of power is more for faster fading channels in the lower delay regions. But in the higher delay regions, the rate of decrease of power is more for slower fading channels. For smaller delays, flexibility of storing packets is limited. Therefore, when the fading rate is slow, the scheduler has to C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 208 12 11 h 10 d) cn 2 3 i 1 1 r ~i r -&—A = 0.9 —A = l.o 6 7 8 9 10 Average Delay [Time-slot] Figure 8.4: The influence of different packet arrival rates on the average transmitter power/average buffer delay trade-off for fixed buffer size and packet overflow bound. choose a higher transmission power to send the packets. For larger delays, the scheduler has more flexibility to store packets in bad channel states when the fading is slow and can allow more fluctuation in the buffer. Also for slower fading, the channel states in a decision-epoch are more predictable, therefore the decoding success is increased. Hence, in these regions the faster fading channels need more power. We also compare the achievable average power C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 209 6 7 8 9 10 11 12 13 14 15 Average Delay [Time-slot] Figure 8.5: The effect of unequal channel state stationary probability on the average transmitter power vs. average buffer delay curves for fixed number of channel states and packet overflow bound. vs. average delay curves for adaptive and non-adaptive cases. For non-adaptive case with specified bound of 10~4 on the average packet overflow, we vary the immediate transmission power from 30 mW to 7 mW, and find the average power and the average delay. It can be seen that the non-adaptive case needs approximately double average transmission power than the C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 210 Figure 8.6: Comparison of average transmitter power as a function of average buffer delay for constant and Poisson-distributed incoming packet arrival for different channel state memory and packet overflow bound. adaptive case for the same average delay and average overflow bounds. Fig. 8.3 shows the effect of buffer sizes on the power/delay curves for different fading rates and the same average rate Poisson-distributed traffic. The figure shows the same trend as the constant traffic. Since the larger buffer gives more flexibility in terms of storing packet in a lower channel state, the C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 211 x 10 0.9 0.8 -2 OJh | 0.6h o ra CL o > o 0.5 % 0.4 $ Q. g> CO 2 §> 0.3 0.2 0.1 •Pa,ci = 0.95, Vi - * — p C i , C i = 0.9, Vi ^ p C i , C i = 0.8, Vi - • — p C i , c , = 0.7, Vi 12 14 16 18 20 Average Delay [Time-slot] 22 24 26 28 30 Figure 8.7: Comparison of average packet overflow as a function of average buffer delay for constant and Poisson-distributed incoming packet arrival for different channel state memory and packet overflow bound. feasible delay region is larger and hence achievable average power is smaller. The influence of different packet arrival rate with fixed buffer size is shown in Fig. 8.4. It is seen that for the same average delay, the average transmission power increases as average arrival rate increases. Since to maintain the same delay and overflow the scheduler has to send more packets with C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 212 larger power in the lower channel state, the average transmission power increases as arrival rate increases.- The effect of unequal stationary probabilities of the channel states is shown in Fig. 8.5 for two states channel. The figure shows that the average power increases as the stationary probability of a lower channel state increases. Since the scheduler uses a larger transmission power action to send packets in a lower channel state, the average power increases with larger stationary probability of a lower channel state, due to longer stay there. In Fig. 8.6, we give a comparison between the constant and Poisson-distributed incoming traffic for the same average packet arrival rate of 1 packet/time-slot and different channel state memory. It is seen from the figure that the difference between transmission power for constant and Poisson traffic with the same delay decreases as the fading correlation increases. The packet overflows as a function of delay for constant and Poisson traffic is shown in Fig. 8.7. Note that to maintain fixed overflow rate, Poisson traffic suffers more delay. It can also be noticed that the packet overflow rate is almost zero in a smaller delay region, but it increases as delay increases. Therefore, when the buffer size is large, the overflow rate bound is not important in the lower delay regions. That is, in the lower delay regions, it is sufficient to fix a constraint on the average delay as the average overflow constraint does not represent an independent degree of freedom. But, in higher delay regions both are important to consider. Observation 2 (Optimization of Power Levels). In this observation, we explain the choice of the power level set used in Observation 1. The optimal choice of power level set is made for a given constraint on average delay and a given constraint on average overflow. We carried out an outer nonlinear optimization to calculate the best power level set that gives minimum long-term average power for specific average delay and average overflow bounds. Therefore, the op-timization problem is divided into two distinct problems: (1) inner dynamic programming opti-mization problem which provides the optimal power control law TT* and optimal average power G*(D, P0f) for a fixed transmission power level set and (2) outer static optimization problem to choose the best set of transmission power levels that gives the minimum average power G*P. MATLAB optimization toolbox is used for this purpose. We computed optimal power level C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 213 sets for average delay bounds from D = 2 toD = 15 and average overflow bound P o f — 10 4 . For example, the optimal power level set found for delay bounds D = 4,6,10 and 15 are, respectively as follows: £* = [6.8524 14.9988 29.9380] mW, [6.8151 14.9192 29.6175] mW, [6.5433 15.5318 29.6426] mW and [6.4000 15.1252 23.8367] mW. The corresponding optimal average powers are 7.1769, 6.6073, 6.1859 and 6.0265 mW, respectively. From the calculated optimal power level sets for different delays, we choose [6.4 15 30] mW for our simulations. The optimal average power for this set and above delays are as follows: 7.2808, 6.6786, 6.2154 and 6.0351 mW, respectively. The simulation results are given for B — 50 packets, fmTB = 0.1 and Poisson-distributed traffic. Observation 3 (Comparison between Different Buffer Costs). In Section 8.3.3, we have pro-posed three approximations for the immediate costs that account for the delay in the transmis-sion buffer. These three different immediate costs model the change of the buffer occupancy and delay between two subsequent decision-epochs in three different ways. In Fig. 8.8, we explore the influence of these models on the power vs. delay performance for the same set of data as Fig. 8.3. It can be seen that the choice of the immediate delay cost does not signifi-cantly influence the power performance of the IR-HARQ scheme. Therefore, whatever delay cost among these three costs in this framework is chosen, the results are the same. Observation 4 (Incremental Power Adaptations). The performance of incremental power adap-tation in successive time-slots is shown in Fig. 8.9 for Poisson traffic with A = 7 packets/time-slot. Two sets of transmission power are used to compare the performance with constant power described in Observation 1. Sets 1 and 2 are respectively as follows: £ = {(6.4, 8, 10), (15, 17, 20), (25, mW and £ = {(6.4, 8, 10), (15, 18, 22), (30, 32, 35)} mW. It is seen that the performance of the incremental power in general is better than constant power case. But, one needs trial or outer optimization to find best set of incremental power. In our example, power set 1 performs better than set 2. Observation 5 (Rate and Power Adaptations). Results for third adaptation model described in Section 8.3.1 are given in Fig. 8.10 for Poisson traffic with A = 7 packet/time-slot and FGf = C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 2 1 4 f T =0.01 m B f T =0.02 m B f T =0.05 m B -*— First Delay Model e - Second Delay Model • Third Delay Model 10 12 14 16 18 20 Average Delay [Time-slot] 22 24 26 28 30 Figure 8.8: Comparison of average power/average delay curves for different immediate delay cost models. A l l immediate delay cost models give approximately the same performance. 1(T4. The set of transmission parameters is X = {(9.4, 2), (9.4, 3), (20, 2), (20, 3), (30, 2), (30, 3)}, where 4—QAM and 8—QAM are used to transmit 2 and 3 packets, respectively. The approx-imate expression for BER of M-ary quadrature amplitude modulation (M-QAM) that is valid for both low and high SNR is derived in [150] (eq. (18)). The performance for joint rate and power adaptation is compared with only power adaptation case with 4—QAM and the same set C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 2 1 5 9.51 1 1 1 1 1 1 1 1 1 1 r •I i i i i i i i i i i 1 1 1 3 5 7 9 11 13 15 17 19 21 23 25 Average Delay [Time-slot] Figure 8.9: Trade-off between average transmitter power and average buffer delay for constant and incremental transmission power during a particular decision-epoch. For suitably chosen incremental power sets, incremental power actions outperform constant power actions. of power levels £ — {9.4, 20, 30} mW. It is revealed from the figure that as the joint rate and power adaptation has more degrees of freedom and hence wider range of action set, the power needed for the same delay is less compared to only power adaptation case. C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 216 Average Delay [Time-slot] Figure 8.10: Comparison of only power adaptation with combined rate and power adaptation. Joint power and rate adaptation provides better performance than only power adaptation, due to the addition of more degrees of freedom in the action set. Observation 6 (Convexity). Dependence of the average power on the average delay and over-flow constraints is shown in Fig. 8.11. This figure offers empirical evidence that for a fixed delay constraint, the change in overflow constraint does not significantly influence the average power. Further, in the lower delay regions, this implies that it is sufficient to fix a constraint on C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 217 Figure 8.11: Variation of average power with average buffer delay and average packet overflow. Only power adaptation and constant traffic are considered for the 3-D plot. the average delay as the average overflow constraint does not represent an independent degree of freedom. But, in higher delay regions both are important to consider as evident from Fig. 8.7. Observation 7 (Policy Structure). Finally, in Fig. 8.12 we show the variation of the optimal policy with the current buffer and channel states. The optimal deterministic policies have been computed for two sets of Lagrangian multipliers. It can be seen that for larger average delays, the optimal scheduler applies a less aggressive transmission policy. That is, it can delay applying action associated to larger powers until larger buffer occupancies. C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 218 Average Delay = 7.6935 [Time-slot] Average Delay = 1.5080 [Time-slot] Channel State Channel State Figure 8.12: Optimal policies are shown as a function of channel state and buffer state. For fixed sets of Lagrangian multipliers, we used relative value iteration (RVI) algorithm to com-pute optimal deterministic policies by solving corresponding Bellman's equation iteratively. We consider only power adaptation with constant traffic for this plot. In each subplot, the average overflow is below 10~4. 8.6 Conclusions An SMDP framework has been utilized to calculate the optimal rate and power adaptation poli-cies of an IR-HARQ system. To the best of our knowledge, this is the first result that analyzes C H A P T E R 8. SCHEDULING TECHNIQUES FOR IR-HARQ SYSTEMS 2 1 9 the rate and power adaptation under latency and overflow constraints for an IR-HARQ system. For the stated IR-HARQ system with finite transmission buffer, we have derived the SMDP kernel, transition probability and costs associated with different objectives. The SMDP prob-lem has then been converted into an auxiliary discrete-time CMDP problem and its solution has been obtained by linear programming. Simulation results have been given to examine the influence of randomly varying channel and traffic parameters for three different transmission models to allocate rate and power policy optimally. It has been shown that by employing opti-mal power allocation, fast-fading channels perform the best under stringent delay constraints, while the situation is the opposite if delay constraints are relaxed. A significant power sav-ing can be achieved with either incremental power allocation policy or joint rate and power allocation policy as compared to only power allocation policy. C H A P T E R 9. CONCLUSIONS AND FUTURE DIRECTIONS 220 CHAPTER 9 Conclusions and Future Directions 9.1 Introduction In the final chapter of the thesis, we summarize our main contributions, and discuss the ad-vantages, limitations and applications of the proposed work in the field of wireless networks. Further work to extend the wireless systems and protocols of this thesis is also presented. We summarize our work and the results in Section 9.2. Interesting future research directions are suggested in Section 9.3. 9.2 Summary With the enormous and constant demand for high data rates, wireless networks need an ef-ficient and fast scheduler that can schedule packets optimally. For wireless systems, this is of particular and paramount importance, since the resources are limited. Understanding that system performance can be significantly improved if the adaptations are done across various layers, we have proposed a cross-layer optimal and suboptimal scheduler for different wire-less schemes in this work. Our simulation results have demonstrated the performance of the proposed schemes for variation with different system parameters. C H A P T E R 9. CONCLUSIONS AND FUTURE DIRECTIONS 221 In our initial work in Chapter 2, we have presented an MDP-based framework for schedul-ing packets over a correlated Rayleigh fading channels. The optimal cross-layer adaptation control laws are given for both the information-theoretic transmission scheme and the M - Q A M transmission scheme. Analyzing the structural properties of the associated MDP problem, we have shown that the adaptation problem has a weakly communicating structure. We have ex-amined two approaches to solving the weakly communicating MDP and discussed their relative benefits and limitations. Analyzing the nature of the optimal policy with changing buffer occu-pancy and channel conditions, we have proposed a suboptimal log-scheduling policy and com-pared its performance with the optimal policy, channel threshold policy and mixed-scheduling policy. The performance of the log-scheduling policy was found to be close to that of the opti-mal policy, and is better than both the channel threshold policy and the mixed-scheduling policy for correlated fading channels. Instead of using an infinite-size buffer, we have considered a realistic scenario by adopting a finite-size buffer, as well as non-constant random traffic arrivals for our analysis. In Chapter 3, we extended the optimal scheduling scheme using an M - Q A M transmission for the Nakagami-m channels with diversity combining at the receiver. We have studied two receive diversity-combining techniques for two cross-layer adaptation problems, under the following conditions: the minimization of average transmission power under average delay and average packet-dropping probability constraints and the minimization of average bit error rate under average delay and average packet-dropping probability constraints. In Chapter 4, this scheme is further extended for MIMO system utilizing the orthogonal space time block code. In addition to physical-layer forward-error correction coding, in this chapter we have also studied data-link layer error-detection coding. We have analyzed the system performance for the selective-repeat automatic repeat-request protocol. Adaptive modulation was used in the physical layer to optimize throughput. However, unlike most studies, we have adapted the modulation rate depending on both the channel condition and the buffer occupancy. The latter is important, because packet arrivals is not constant in nature and the buffer is finite. We have concluded that the optimal choice of action set is also dependent on incoming traffic statistics. C H A P T E R 9. CONCLUSIONS AND FUTURE DIRECTIONS 2 2 2 The joint rate and power adaptations across the physical layer and the data-link layer for hy-brid automatic repeat-request systems have been evaluated in Chapter 5. The analyzed frame-work is general enough to consider both the correlated flat-fading channel and the frequency-selective channel. Depending on the availability of perfect channel state information at the transmitter, we have studied two problems. For both cases, we have assumed that the SW-ARQ protocol is used at the data-link layer for error detection purposes. In the first case, we have given the optimal policy when both the perfect information of the channel state and the observation feedback of the decoding results are available from the receiver. The second case deals with the problem when there is no perfect channel state information. We have proposed and evaluated a new technique to estimate the channel conditions and hence to make trans-mission decisions. We have addressed the finite-state analysis of frequency-selective channels using an MMSE receiver. Whereas in Chapter 5 we introduced a new scheduling technique based on history tracking for the case when perfect channel state information is unavailable, in Chapter 6 we have devel-oped another technique for scheduling packet for the same scenario but with no perfect CSI. In Chapter 6, we have investigated the coding rate adaptations of type-I hybrid ARQ scheme, where both the incoming traffic states and the time-varying fading channel states are unknown at the transmitter. The control laws for this scheme were chosen based on assumption about the hidden physical system states. The systems in Chapter 5 and Chapter 6 are both described as partially observable Markov decision processes. We have discussed three heuristic-based meth-ods and applied them to our considered systems, and analyzed their benefits and limitations. The framework of Chapter 6 has been extended in Chapter 7 with the inclusion of joint coding and modulation rate adaptations and selective-repeat, automatic repeat-request protocols. The fading distribution has been generalized by assuming the Nakagami-m distribution. The final contribution of this thesis is presented in Chapter 8. In this chapter, we have discussed the adaptation of power and rate for an incremental-redundancy hybrid automatic repeat-request system over the Rayleigh fading channels. Multi-level modulations and rate-C H A P T E R 9. CONCLUSIONS AND FUTURE DIRECTIONS 223 compatible punctured convolutional codes have been used in this scheme. Since the length of the decision-epoch is not fixed and varies stochastically, we have formulated the problem as a semi-Markov decision process. We have proposed three adaptation models and compared their performances via simulations. We have also shown that the optimal average power is a non-increasing function of the average buffer delay and average packet overflow. The optimal policies for all three models have been computed by transforming the semi-Markov decision process problem into an equivalent discrete-time Markov decision process problem. 9.3 Future Work The research conducted in this thesis can be extended in a number of directions. In Chapter 2, we have proposed a MDP-based optimal scheduler to minimize three goals: power, delay and overflow. The optimal policies are computed offline for this scheme. We have also proposed a suboptimal scheduler that approximates the policy online using instantaneous channel SNR and buffer occupancy information. The traffic is assumed to be Poisson distributed. The research work of this chapter can be extended in different areas. The distribution of the traffic often can-not be described with Poisson distribution. The traffic can be modeled more accurately with hidden Markov models. Therefore, the scheme can be modified to incorporate H M M traffic and scheduling policies can be investigated for the new scheme. We have assumed that the per-fect CSI is always available at the transmitter before transmission. However, in many practical situations, the CSI may be either erroneous or delayed. These two facts may be considered in future research. Scheduling decisions are made assuming that the user has only one kind of data traffic to be transmitted. Sometimes, the users may have more than one type of data for transmission. These data can be classified according to the allowable BER requirements or delay requirements. It would be useful to devise a scheduling technique that judiciously takes into consideration the requirements of multiple traffic and schedule those so that total trans-mitter power is minimized and other respective QoS parameters are maintained. Our research C H A P T E R 9. CONCLUSIONS AND FUTURE DIRECTIONS 224 has been carried out assuming a single user case; therefore, multiuser case could also be con-sidered. Also, scheduling over multi-carrier channels would be an interesting area for further research. It has also been pointed out in the thesis that the complexity of the scheme increases with the state space of the system. Therefore, several suboptimal or simple heuristic-based or learning-based scheduling policies could be explored that have some kind of trade-off between complexity and accuracy. Although we have extended the results for two diversity cases, other diversity-combining techniques could also be considered. It would also be interesting to ap-ply a formulation for the case of V-BLAST. Slow-fading channels have been considered for our scheduling techniques. However, fast-fading channels with ISI could also be considered. Scheduling could be extended for future MIMO-OFDM wireless interfaces. The joint rate and power adaptation scheme in Chapter 5 could be extended by considering other FEC codes, such as Turbo code, LDPC code, etc. Also the above-mentioned scenarios can be included in future research. The coding rate adaptations in Chapter 6 can use the different coding schemes (e.g., LDPC codes, Turbo codes, etc) mentioned above. Rate-compatible punctured Turbo codes could be used to extend the results of Chapter 8. Further research on Chapter 8 could consider the issues discussed above for extending different scheme as well. BIBLIOGRAPHY 225 BIBLIOGRAPHY [1] T. S. Rappaport, A. Annamalai, R. M . Buehrer, and W. H. Tranter, "Wireless communi-cations: Past events and a future perspective," IEEE Communications Magazine, vol. 40, pp. 148-161, May 2002. [2] R. A. Berry and E. M . Yeh, "Fundamental performance limits for wireless fading channels- cross-layer wireless resource allocation," IEEE Signal Processing Magazine, vol. 21, pp. 59-68, Sept. 2004. [3] W. Xiao, F. Wang, R. Love, A. Ghosh, and R. Ratasuk, "lxEV-DO system performance: analysis and simulation," in Proc. IEEE VTC'04-Fall, vol. 7, Los Angeles, CA, Sept. 26-29, 2004, pp. 5305-5309. [4] R. Love, A. Ghosh, W. Xiao, and R. Ratasuk, "Performance of 3GPP high speed down-link packet access (HSDPA)," in Proc. IEEE VTC'04-Fall, vol. 5, Los Angeles, CA, Sept. 26-29, 2004, pp. 3359-3363. [5] S. Shakkottai and R. Srikant, "Scheduling real-time traffic with deadlines over a wireless channel," ACM/Baltzer Wireless Networks Journal, vol. 8, no. 1, pp. 13-26, Jan. 2002. BIBLIOGRAPHY 226 [6] Y. Jungnam and M . Kavehrad, "Markov error structure for throughput analysis of adap-tive modulation systems combined with ARQ over correlated fading channels," IEEE Transactions on Vehicular Technology, vol. 54, pp. 235-245, Jan. 2005. [7] A. Chockalingam, M . Zorzi, L. B. Milstein, and P. Venkataram, "Performance of a wire-less access protocol on correlated Rayleigh-fading channels with capture," IEEE Trans-actions on Communications, vol. 46, pp. 644-655, May 1998. [8] A. J. Goldsmith and S.-G. Chua, "Adaptive coded modulation for fading channels," IEEE Transactions on Communications, vol. 46, pp. 595-602, May 1998. [9] S. Nanda, K. Balachandran, and S. Kumar, "Adaptation techniques in wireless packet data services," IEEE Communications Magazine, vol. 38, no. 1, pp. 54-64, Jan. 2000. [10] S. T. Churtg and A. J. Goldsmith, "Degrees of freedom in adaptive modulation: a unified view," IEEE Transactions on Communications, vol. 49, pp. 1561-1571, Sept. 2001. [11] J. K. Cavers, "Variable rate transmission for Rayleigh fading channels," IEEE Transac-tions on Communications, vol. 20, pp. 15-22, Feb. 1972. [12] E. Cianca, A. D. Luise, M . Ruggieri, and R. Prasad, "Channel-adaptive techniques in wireless communications: an overview," Wireless Commununications and Mobile Com-puting, vol. 2, pp. 799-813, 2002. [13] A. J. Goldsmith and P. P. Varaiya, "Capacity of fading channels with channel side infor-mation," IEEE Transactions on Information Theory, vol. 43, no. 6, pp. 1986-1992, Nov. 1997. [14] M . S. Alouini and A. J. Goldsmith, "Capacity of Rayleigh fading channels under differ-ent adaptive transmission and diversity-combining techniques," IEEE Transactions on Vehicular Technology, vol. 48, pp. 1165-1181, July 1999. BIBLIOGRAPHY 227 [15] R. K. Mallik, M . Z. Win, J. W. Shao, M.-S. Alouini, and A. J. Goldsmith, "Channel capacity of adaptive transmission with maximal ratio combining in correlated Rayleigh fading," IEEE Transactions on Wireless Communications, vol. 3, pp. 1124-1133, July 2004. [16] M . S. Alouini and A. J. Goldsmith, "Adaptive modulation over Nakagami fading chan-nels," Kluwer Journal on Wireless Communications, vol. 13, pp. 119-143, May 2000. [17] A. Goldsmith and S.-G. Chua, "Variable-rate variable-power M Q A M for fading chan-nels," IEEE Transactions on Communications, vol. 45, pp. 1218-1230, Oct. 1997. [18] S. Hanly and D. Tse, "Multi-access fading channels: Part II: Delay-limited capacities," IEEE Transactions on Information Theory, vol. 44, no. 8, pp. 2816-2831, Nov. 1998. [19] L. Ozarow, S. Shamai, and A. Wyner, "Information theoretic considerations for cellular mobile radio," IEEE Transactions on Vehicular Technology, vol. 43, pp. 359-378, May 1994. [20] G. Caire, G. Taricco, and E. Biglieri, "Optimum power control over fading channels," IEEE Transactions on Information Theory, vol. 45, pp. 1468-1489, July 1999. [21] K. M . Kamath and D. L. Goeckel, "Adaptive-modulation schemes for minimum outage probability in wireless systems," IEEE Transactions on Communications, vol. 52, pp. 1632-1635, Oct. 2004. [22] M . S. Alouini, X . Tang, and A. J. Goldsmith, "An adaptive modulation scheme for simul-taneous voice and data transmission over fading channels," IEEE Journal on Selected Areas in Communications, vol. 17, pp. 837-850, May 1999. [23] Q. Zhou and H. Dai, "Joint antenna selection and link adaptation for MIMO systems," IEEE Transactions on Vehicular Technology, vol. 55, pp. 243-255, Jan. 2006. BIBLIOGRAPHY 228 [24] A. Milani, V. Tralli, and M . Zorzi, "On the use of rate and power adaptation in V-BLAST systems for data protocol performance improvement," IEEE Transactions on Wireless Communications, vol. 5, pp. 16-22, Jan. 2006. [25] G. Femenias, "SR ARQ for adaptive modulation systems combined with selection trans-mit diversity," IEEE Transactions on Communications, vol. 53, pp. 998-1006, June 2005. [26] J. Yang, A. K. Khandani, and N . Tin, "Statistical decision making in adaptive modula-tion and coding for 3G wireless systems," IEEE Transactions on Vehicular Technology, vol. 54, pp. 2066-2073, Nov. 2005. [27] M . A. Kousa and M . Rahman, "An adaptive error control system using hybrid ARQ schemes," IEEE Transactions on Communications, vol. 39, pp. 1049-1057, July 1991. [28] B. Vucetic, "An adaptive coding scheme for time-varying channels," IEEE Transactions on Communications, vol. 39, pp. 653-663, May 1991. [29] M . Rice and S. B. Wicker, "Adaptive error control over slowly varying channels," IEEE Transactions on Communications, vol. 42, pp. 917-925, Feb./Mar./Apr. 1994. [30] B. A. Harvey and S. B. Wicker, "Packet combining systems based on the viterbi de-coder," IEEE Transactions on Communications, vol. 42, pp. 1544-1557, Feb./Mar./Apr. 1994. [31] M . Rice and S. B. Wicker, "A sequential scheme for adaptive error control over slowly varying channels," IEEE Transactions on Communications, vol. 42, pp. 1533-1543, Feb./Mar./Apr. 1994. [32] S. S. Chakraborty, M . Liinabarja, and E. Yli-Juuti, "An adaptive ARQ scheme with packet combining for time varying channels," IEEE Communications Letters, vol. 3, pp. 52-54, Feb. 1999. BIBLIOGRAPHY 229 [33] H. Minn, M . Zeng, and V. K. Bhargava, "On ARQ scheme with adaptive error control," IEEE Transactions on Vehicular Technology, vol. 50, pp. 1426-1436, Nov. 2001. [34] S. S. Chakraborty and M . Liinabarja, "On the performance of an adaptive GBN scheme in a time-varying channel," IEEE Communications Letters, vol. 4, pp. 143-145, Apr. 2000. [35] S. S. Chakraborty and M . Liinaharja, "Performance analysis of an adaptive SR ARQ scheme for time-varying Rayleigh fading channels," in Proc. IEEE ICC"01, vol. 8, Helsinki, Finland, June 11-14, 2001, pp. 2478-2482. [36] Y. Yao, "An effective go-back-N ARQ scheme for variable error rate channels," IEEE Transactions on Communications, vol. 43, pp. 20-23, Jan. 1995. [37] A. Mehta, D. Kagaris, and R. Viswanathan, "Throughput performance of an adaptive ARQ scheme in Rayleigh fading channels," IEEE Transactions on Wireless Communi-cations, vol. 5, pp. 12-15, Jan. 2006. [38] S. Choi and K. G. Shin, "A class of adaptive hybrid ARQ schemes for wireless links," IEEE Transactions on Vehicular Technology, vol. 50, pp. 777-790, May 2001. [39] M . B. Pursley and S. S. Sandberg, "Variable-rate hybrid ARQ for meteor-burst commu-nications," IEEE Transactions on Communications, vol. 40, pp. 60-73, Jan. 1992. [40] D. J. Costello, J. Hagenauer, H. Imai, and S. B. Wicker, "Applications of error-control coding," IEEE Transactions on Information Theory, vol. 44, pp. 2531-2560, Oct. 1998. [41] D. Garg and F. Adachi, "Packet access using DS-CDMA with frequency-domain equal-ization," IEEE Journal on Selected Areas in Communications, vol. 24, no. 8, pp. 161— 170, Jan. 2006. [42] R. Love, B. Classon, A. Ghosh, and M . Cudak, "Incremental redundancy for evolutions of 3G CDMA systems," in Proc. IEEE VTC'02, vol. 1, May 6-9, 2002, pp. 454-458. BIBLIOGRAPHY 230 [43] J. Hagenauer, "Rate-compatible punctured convolutional codes (RCPC Codes) and their applications," IEEE Transactions on Communications, vol. 36, pp. 389-400, Apr. 1988. [44] J. Hagenauer, N . Seshadri, and C.-E. W. Sundberg, "he performance of rate-compatible punctured convolutional codes for digital mobile radio," IEEE Transactions on Commu-nications, vol. 38, pp. 966-980, July 1990. [45] D. Haccoun and G. Begin, "High-rate punctured convolutional codes for viterbi and sequential decoding," IEEE Transactions on Communications, vol. 37, pp. 1113-1125, Nov. 1989. [46] S. Kallel and D. Haccoun, "Generalized type II hybrid ARQ scheme using punctured convolutional coding," IEEE Transactions on Communications, vol. 38, pp. 1938-1946, Nov. 1990. [47] S. Kallel, "Analysis of memory and incremental redundancy ARQ schemes over a non-stationary channel," IEEE Transactions on Communications, vol. 40, pp. 1474-1480, Sept. 1992. [48] , "Efficient hybrid ARQ protocols with adaptive forward error correction," IEEE Transactions on Communications, vol. 42, pp. 281-289, Feb./Mar./Apr. 1994. [49] A. Shiozaki, "Adaptive type-II hybrid broadcast ARQ system," IEEE Transactions on Communications, vol. 44, pp. 420^422, Apr. 1996. [50] P. Frenger, P. Parkvall, and E. Dahlman, "Performance comparison of HARQ with Chase combining and incremental redundancy for HSDPA," in Proc. IEEE VTC'01-Fall, vol. 3, Oct. 7-11, 2001, pp. 1829-1833. [51] J.-F. Cheng, Y.-P. Wang, and S. Parkvall, "Adaptive incremental redundancy," in Proc. IEEE VTC'03-Fall, vol. 2, Oct. 6-9, 2003, pp. 737-741. BIBLIOGRAPHY 2 3 1 [52] E. Visotsky, Y. Sun, V. Tripathi, M . L. Honig, and R. Peterson, "Reliability-based incre-mental redundancy with convolutional codes," IEEE Transactions on Communications, vol. 53, pp. 987-997, June 2005. [53] A. Roongta and J. M . Shea, "Reliability-based hybrid ARQ and rate-compatible punc-tured convolutional (RCPC) codes," in Proc. IEEE WCNC'04, vol. 4, Atlanta, GA, Mar. 21-25,2004, pp. 2105-2109. [54] X . Wang and M . T. Orchard, "On reducing the rate of retransmission in time-varying channels," IEEE Transactions on Communications, vol. 51, pp. 900-910, June 2003. [55] E. Malkamaaki and H. Leib, "Performance of truncated type-II hybrid arq schemes with noisy feedback over block fading channels," IEEE Transactions on Communications, vol. 48, pp. 1477-1487, Sept. 2000. [56] Q. Zhang and S. A. Kassam, "Hybrid ARQ with selective combining for fading chan-nels," IEEE Journal on Selected Areas in Communications, vol. 17, pp. 867-880, May 1999. [57] F. Babich, "Performance of hybrid ARQ schemes for the fading channel," IEEE Trans-actions on Communications, vol. 50, pp. 1882-1885, Dec. 2002. [58] L. Lugand, J. D. J. Costello, and R. H. Deng, "Parity retransmission hybrid ARQ using rate 1/2 convolutional codes on a nonstationary channel," IEEE Transactions on Com-munications, vol. 37, pp. 755-765, July 1989. [59] L. Lin, R. D. Yates, and P. Spasojevic, "Adaptive transmission with discrete code rates and power levels," IEEE Transactions on Communications, vol. 51, pp. 2115-2125, Dec. 2003. [60] T. Ji and W. Stark, "Rate-adaptive transmission over correlated fading channels," IEEE Transactions on Communications, vol. 53, pp. 1663-1670, Oct. 2005. BIBLIOGRAPHY 232 [61] L. Zhao, J. W. Mark, and Y. C. Yoon, "A combined link adaptation and incremental re-dundancy protocol for enhanced data transmission," in Proc. IEEE Globecom '01, vol. 2, San Antonio, TX, Nov. 25-29, 2001, pp. 1277-1281. [62] A. Das, F. Khan, and A. Nanda, "A 2 IR: An asynchronous and adaptive hybrid ARQ scheme for 3G evolution," in Proc. IEEE VTC"01-Spring, vol. 1, Rhodes, Greece, May6-9, 2001, pp. 628-632. [63] J. Huang, R. A. Berry, and M . L. Honig, "Wireless scheduling with hybrid ARQ," IEEE Transactions on Wireless Communications, vol. 4, pp. 2801-2810, Nov. 2005. [64] G. E. 0ien, H. Holm, and K. J. Hole, "Impact of channel prediction on adaptive coded modulation performance in Rayleigh fading," IEEE Transactions on Vehicular Technol-ogy, vol. 53, pp. 758-769, May 2004. [65] K. J. Hole and G. E. 0ien, "Spectral efficiency of adaptive coded modulation in urban microcellular networks," IEEE Transactions on Vehicular Technology, vol. 50, pp. 205-222, Jan. 2001. [66] D. V. Duong, G. E. 0ien, and K. J. Hole, "Adaptive coded modulation with receive antenna diversity and imperfect channel knowledge at receiver and transmitter," IEEE Transactions on Vehicular Technology, vol. 55, pp. 458^165, Mar. 2006. [67] S. Vishwanath and A. J. Goldsmith, "Adaptive turbo-coded modulation for flat-fading channels," IEEE Transactions on Communications, vol. 51, pp. 964-972, June 2003. [68] D. L. Goeckel, "Adaptive coding for time-varying channels using outdated fading esti-mates," IEEE Transactions on Communications, vol. 47, pp. 844-855, June 1999. [69] Q. Liu, S. Zhou, and G. B. Giannakis, "Cross-layer combining of adaptive modulation and coding with truncated ARQ over wireless links," IEEE Transactions on Wireless Communications, vol. 3, pp. 1746-1755, Sept. 2004. BIBLIOGRAPHY 233 [70] , "Queuing with adaptive modulation and coding over wireless links: cross-layer analysis and design," IEEE Transactions on Wireless Communications, vol. 4, pp. 1142— 1153, May 2005. [71] , "Cross-layer scheduling with prescribed QoS guarantees in adaptive wireless net-works," IEEE Journal on Selected Areas in Communications, vol. 23, pp. 1056-1066, May 2005. [72] H. Zheng and H. Viswanathan, "Optimizing the ARQ performance in downlink packet data systems with scheduling," IEEE Transactions on Wireless Communications, vol. 4, pp. 495-506, Mar. 2005. [73] L. Caponi, F. Chiti, and R. Fantacci, "A dynamic rate allocation technique for wireless communication systems," in Proc. IEEE ICC'04, vol. 7, Paris, France, June 20-24, 2004, pp. 4263^1267. [74] L. B. Le, E. Hossain, and A. S. Alfa, "Service differentiation in multirate wireless networks with weighted round-robin scheduling and ARQ-based error control," IEEE Transactions on Communications, vol. 54, pp. 208-215, Feb. 2006. [75] A. Maaref and S. ATssa, "Combined adaptive modulation and trancated ARQ for packet data transmission in MIMO systems," in Proc. IEEE Globecom '04, vol. 6, Dallas, TX, Nov. 29-Dec. 3, 2004, pp. 3818 - 3822. [76] , "Exact closed-form expression for the bit error rate of orthogonal STBC in Nak-agami fading channels," in Proc. IEEE VTC'04-Fall'04, vol. 5, Los Angeles, CA, Sept. 26-29, 2004, pp. 1243-1247. [77] , "Capacity of space-time block codes in MIMO Rayleigh fading channels with adaptive transmission and estimation errors," IEEE Transactions on Wireless Communi-cations, vol. 4, pp. 2568-2578, Sept. 2005. BIBLIOGRAPHY 234 [78] , "Performance analysis of orthogonal space-time block codes in saptially corre-lated MIMO Nakagami fading channels," IEEE Transactions on Wireless Communica-tions, vol. 5, pp. 807-817, Apr. 2006. [79] B. Collins and R. Cruz, "Transmission policy for time varying channel with average de-lay constraints," in Proc. Allerton Conf. Commun. Control and Comput.'99, Monticello, IL, Oct. 1999, pp. 709-717. [80] R. A. Berry and R. G. Gallager, "Communication over fading channels with delay con-straints," IEEE Transactions on Information Theory, vol. 48, pp. 1135-1149, May 2002. [81] D. Rajan, A. Sabharwal, and B. Aazhang, "Delay and rate constrained transmission policies over wireless channels," in Proc. IEEE Globecom '01, vol. 2, San Antonio, TX, Nov. 25-29, 2001, pp. 806-810. [82] , "Delay-bounded packet scheduling of bursty traffic over wireless channels," IEEE Transactions on Information Theory, vol. 50, pp. 125-144, Jan. 2004. [83] A. T. Hoang and M . Motani, "Buffer and channel adaptive modulation for transmission over fading channels," in Proc. IEEE ICC'03, vol. 4, Anchorage, A K , May 11-15, 2003, pp.2748-2752. [84] , "Decoupling multiuser cross-layer adaptive transmission," in Proc. IEEE ICC'04, vol. 5, Paris, France, June 20-24, 2004, pp. 3061 - 3065. [85] M . Goyal, A. Kumar, and V. Sharma, "Power constrained and delay optimal policies for scheduling transmission over a fading channel," in Proc. IEEE INFOCOM'03, vol. 1, Mar. 30-Apr. 3, 2003, pp. 311 - 320. [86] H. Wang and N . B. Mandayam, "Opportunistic file transfer over a fading channel under energy and delay constraints," IEEE Transactions on Communications, vol. 53, pp. 632-644, Apr. 2005. BIBLIOGRAPHY 235 [87] , "A simple packet-transmission scheme for wireless data over fading channels," IEEE Transactions on Communications, vol. 52, pp. 1055-1059, July 2004. [88] E. Uysal-Biyikoglu, A. E. Gamal, and B. Prabhakar, "Energy-eficient packet transmis-sion over a wireless link," IEEE/ACM Transactions on Networking, vol. 10, pp. 487-499, Aug. 2002. [89] B. Prabhakar, E. Uysal-Biyikoglu, and A. E. Gamal, "Energy-efficient transmission over a wireless link via lazy packet scheduling," in Proc. IEEE INFOCOM'01, vol. 1, An-chorage, A K , Apr. 22-26, 2001, pp. 386-394. [90] A. Fu, E. Modiano, and J. N . Tsitsiklis, "Optimal transmission scheduling over a fading channel with energy and deadline constraints," IEEE Transactions on Wireless Commu-nications, vol. 5, pp. 630-641, Mar. 2006. [91] A. Fu, E. Modiano, and J. Tsitsiklis, "Optimal energy allocation for delay-constrained data transmission over a time-varying channel," in Proc. IEEE INFOCOM'03, vol. 2, San Francisco, CA, Mar. 30-Apr. 3, 2003, pp. 1095-1105. [92] R. Negi and J. M . Cioffi, "Delay-constrained capacity with causal feedback," IEEE Transactions on Information Theory, vol. 48, pp. 2478-2494, Sept. 2002. [93] P. Nuggehalli, V. Srinivasan, and R. R. Rao, "Energy efficient transmission scheduling for delay constrained wireless networks," IEEE Transactions on Wireless Communica-tions, vol. 5, pp. 531-539, Mar. 2006. [94] J. D. Choi, K. M . Wasserman, and W. E. Stark, "Effect of channel memory on retrans-mission protocols for low energy wireless data communications," in Proc. IEEE ICC '99, vol. 3, Vancouver, BC, June 6-10, 1999, pp. 1552-1556. [95] D. Zhang and K. M . Wasserman, "Energy efficient data communication over fading channels," in Proc. IEEE WCNC'00, vol. 3, Chicago, IL, Sept. 23-28, 2000, pp. 986-991. BIBLIOGRAPHY 236 [96] , "Transmission schemes for time-varying wireless channels with partial state ob-servations," in Proc. IEEE INFOCOM'02, vol. 2, June 23-27, 2002, pp. 467-476. [97] L. A. Johnston and S. Krishnamurthy, "Opportunistic file transfer over a fading channel: a POMDP search theory formulation with optimal threshold policies," IEEE Transac-tions on Wireless Communications, vol. 5, pp. 394-405, Feb. 2006. [98] A. Ekbal, K.-B. Song, and J. M . Cioffi, "QoS-constrained physical layer optimization for correlated flat-fading wireless channels," in Proc. IEEE ICC'04, vol. 7, Paris, France, June 20-24, 2004, pp. 4211 - 4215. [99] A. T. Hoang and M . Motani, "Buffer and channel adaptive transmission over fading channels with imperfect channel state information," in Proc. IEEE WCNC'04, vol. 3, Atlanta, GA, Mar. 21-25, 2004, pp. 1891-1896. [ 100] R. Gallager, "A perspective on multiaccess channels," IEEE Transactions on Information Theory, vol. 31, no. 8, pp. 124-142, Mar. 1985. [101] A. Ephremides and B. Hajek, "Information theory and communication networks: An unconsummated union," IEEE Transactions on Information Theory, vol. 44, pp. 2416-2434, Oct. 1998. [102] I. E. Telatar and R. G. Gallager, "Combining queueing theory with information theory for multiaccess," IEEE Journal on Selected Areas in Communications, vol. 13, pp. 963-969, Aug. 1995. [103] J. F. Hayes and T. V. J. G. Babu, Modeling and analysis of telecommunications networks. Hoboken, NJ: John Wiley & Sons, 2004. [104] M . Zorzi, "Data-link packet dropping models for wireless local communications," IEEE Transactions on Vehicular Technology, vol. 51, pp. 710-719, July 2002. BIBLIOGRAPHY 237 [105] V. Frost and B. Melamed, "Traffic modeling for telecommunications networks," IEEE Communications Magazine, vol. 32, pp. 70-81, Mar. 1994. [106] A. Adas, "Traffic models in broadband networks," IEEE Communications Magazine, vol. 35, pp. 82-89, July 1997. [107] Y. H. Kim and C. Kwan, "Performance analysis of statistical multiplexing for hetero-geneous bursty traffic in an ATM network," IEEE Transactions on Communications, vol. 42, pp. 745-753, Feb./Mar./Apr. 1994. [108] W. Turin and M . Zorzi, "Performance analysis of delay-constrained communications over slow Rayleigh fading channels," IEEE Transactions on Wireless Communications, vol. l,pp. 801-807, Oct. 2002. [109] A. Lombardo, G. Morabito, and G. Schembra, "An accurate and treatable Markov model of MPEG-video traffic," in Proc. IEEE INFOCOM'98. [110] W. Willinger, M . S. Taqqu, R. Sherman, and D. V. Wilson, "Self-similarity through high-variability: statistical analysis of ethernet L A N traffic at the source level," IEEE/ACM Transactions on Networking, vol. 5, pp. 71-86, Feb. 1997. [ I l l ] M . Mushkin and I. Bar-David, "Capacity and coding for the gilbert-elliott channels," IEEE Transactions on Information Theory, vol. 35, pp. 1277-1290, Nov. 1989. [112] M . Sajadieh, F. R. Kschischang, and A. Leon-Garcia, "A block memory model for cor-related raylegh fading channels," in Proc. IEEE ICC'96, vol. 1, Dallas, TX, June 23-27, 1996,pp. 282-286. [113] E. N . Gilbert, "Capacity of a burst-noise channels," Bell Syst. Tech. J., vol. 39, pp. 1253-1266, Sept. 1960. [114] E. O. Elliott, "Estimates of error rates for codes on burst-noise channels," Bell Syst. Tech. J., vol. 42, pp. 1977-1997, Sept. 1963. B I B L I O G R A P H Y 238 [115] M . Zorzi and R. R. Rao, "On channel modeling for delay analysis of packet commu-nications over wireless links," in Proc. 36th Annual Allerton Conf.'98, Monticello, IL, Sept. 1998, pp. 526-535. [116] W. Turin, Performance Analysis and Modeling of Digital Transmission Systems. New York, NY: Kluwer Academic/Plenum Publishers, 2004. [117] H. S. Wang and N . Moayeri, "Finite-state Markov channel- a useful model for radio communication channels," IEEE Transactions on Vehicular Technology, vol. 44, no. 1, pp. 163-171, Feb. 1995. [118] H. Bischl and E. Lutz, "Packet error rate in the noninterleaved Rayleigh fading channel," IEEE Transactions on Communications, vol. 43, pp. 1375-1382, Feb./Mar./Apr. 1995. [119] M . Zorzi and R. R. Rao, "ARQ error control for fading mobile radio channels," IEEE Transactions on Vehicular Technology, vol. 46, pp. 445-455, May 1997. [120] C. Tan and N . Beaulieu, "On first-order Markov modeling for the Rayleigh fading chan-nel," IEEE Transactions on Communications, vol. 48, pp. 2032-2040, Dec. 2000. [121] H. Liu and M . Zarki, "Performance of H.263 video transmission over wireless channels using hybrid ARQ," IEEE Journal on Selected Areas in Communications, vol. 15, pp. 1775-1786, Dec. 1997. [122] Q. Zhang and S. Kassam, "Finite-state Markov model for Rayleigh fading channels," IEEE Transactions on Communications, vol. 47, pp. 1688-1692, Nov. 1999. [123] C. Pimentel, T. H. Falk, and L. Lisboa, "Finite-state Markov modeling of correlated Rician-fading channels," IEEE Transactions on Vehicular Technology, vol. 53, pp. 1491-1501, Sept. 2004. BIBLIOGRAPHY 239 [124] J. Lu, K. B. Letaief, and M . L. Liou, "Robust video transmission over correlated mobile fading channels," IEEE Transactions on Circuits and Systems for Video Technology, vol. 9, pp. 737-751, Aug. 1999. [125] C.-D. Iskander and P. T. Mathiopoulos, "Fast simulation of diversity Nakagami fad-ing channels using finite-state Markov models," IEEE Transactions on Broadcasting, vol. 49, pp. 269-277, Sept. 2003. [126] M . D. Yacoub, J. E. V. Bautista, and L. G. de Rezende Guedes, "On higher order statistics of the Nakagami-m distribution," IEEE Transactions on Vehicular Technology, vol. 48, pp. 790-794, May 1999. [127] M . D. Yacoub, C. R. C. M . da Silva, and J. E. V. Bautista, "Second-order statistics for diversity-combining techniques in Nakagami-fading channels," IEEE Transactions on Vehicular Technology, vol. 50, pp. 1464-1470, Nov. 2001. [128] C. D. Iskander and P. T. Mathiopoulos, "Analytical level crossing rates and average fade durations for diversity techniques in Nakagami fading channels," IEEE Transactions on Communications, vol. 50, pp. 1301-1309, Aug. 2002. [129] L. Yang and M.-S. Alouini, "Average level crossing rate and average outage duration of generalized selection combining," IEEE Transactions on Communications, vol. 51, no. 12, pp. 1997-2000, Dec. 2003. [130] W. Turin and R. van Nobelen, "Hidden Markov modeling of flat fading channels," IEEE Journal on Selected Areas in Communications, vol. 16, pp. 1809-1817, Dec. 1998. [131] M . Tiichler, A. C. Singer, and R. Koetter, "Minimum mean squared erorr equalization using a priori information," IEEE Transactions on Signal Processing, vol. 50, pp. 673-683, Mar. 2002. BIBLIOGRAPHY 240 [132] A. K. Karmokar, D. V. Djonin, and V. K. Bhargava, "Optimal and suboptimal packet scheduling overtime-varying flat fading channels," IEEE Transactions on Wireless Com-munications, vol. 5, pp. 446-457, Feb. 2006. [133] D. V. Djonin, A. K. Karmokar, and V. K. Bhargava, "Optimal and suboptimal schedul-ing over time-varying flat fading channels," in Proc. IEEE ICC '04, vol. 2, Paris, France, June 20-24, 2004, pp. 906-910. [134] A. K. Karmokar, D. V. Djonin, and V. K. Bhargava, "Delay constrained rate and power adaptation over correlated fading channels," in Proc. IEEE Globecom '04, vol. 6, Dallas, TX, Nov. 29-Dec. 3, 2004, pp. 3448-3453. [135] A. K. Karmokar and V. K. Bhargava, "Optimal packet scheduling over correlated Nakagami-m fading channels with different diversity-combining technique," in Proc. IEEE Globecom '05, vol. 3, St. Louis, MO, Nov. 28-Dec. 2, 2005, pp. 1217 - 1222. [136] , "Optimal packet scheduling using adaptive M - Q A M and orthogonal STBC in MIMO Nakagami-m fading channels," in Proc. IEEE ICC'06, Istanbul, Turkey, June 11-15,2006. [137] D. V. Djonin, A. K. Karmokar, and V. K. Bhargava, "Joint rate and power adaptation for type-I hybrid ARQ systems over correlated fading channels under different buffer cost constraints," IEEE Transactions on Vehicular Technology, in press, accepted on Feb. 19, 2006. [138] A. K. Karmokar, D. V. Djonin, and V. K. Bhargava, "POMDP based coding rate adap-tation for hybrid ARQ systems over fading channels with memory," IEEE Transactions on Wireless Communications, vol. 12, pp. 3512-3523, Dec. 2006. [139] A. K. Karmokar and V. K. Bhargava, "Coding rate adaptation for hybrid ARQ sys-tems over time varying fading channels with partially observable state," in Proc. IEEE ICC'05, vol. 4, Seoul, Korea, May 16-20, 2005, pp. 2797-2801. BIBLIOGRAPHY 2 4 1 [140] , "Adaptive coding and modulation for hybrid ARQ systems over partially observ-able Nakagami-m fading channels," in Proc. IEEE Globecom '06, San Francisco, CA, Nov. 27-Dec. 1,2006. [141] A. K. Karmokar, D. V. Djonin, and V. K. Bhargava, "Cross-layer rate and power adap-tation strategies for IR-HARQ systems over fading channels with memory: a SMDP-based approach," IEEE Transactions on Communications, submitted for peer review on Feb. 21, 2006, revised on Dec. 6, 2006. [142] , "Delay-aware power adaptation for incremental redundancy hybrid ARQ over fading channels with memory," in Proc. IEEE ICC'06, Istanbul, Turkey, June 11-15, 2006. [143] M . K. Simon and M.-S. Alouini, Digital Communication over Fading Channels, 2nd ed. New York, NY: John Wiley & Sons, 2005. [144] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, UK: Cambridge University Press, 2004. [145] M . L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Program-ming. New York, NY: John Wiley & Sons, 1994. [146] D. P. Bertsekas, Dynamic Programming and Optimal Control, 2nd ed. Belmont, M A : Athena Scientific, 2001. [147] B. L. Fox and D. M . Landi, "An algorithm for identifying the ergodic subchains and transient states of a stochastic matrix," Comm. ACM, vol. 2, pp. 619-621, 1968. [148] E. Altman, Constrained Markov Decision Processes: Stochastic Modeling. London, UK: Chapman and Hall/CRC, 1999. [149] D. P. Bertsekas and J. N . Tsitsiklis, Neuro-Dynamic Programming. Belmont, M A : Athena Scientific, 1996. B I B L I O G R A P H Y 242 [150] J. Lu, K. B. Letaief, J. C.-I. Chuang, and M . L. Liou, "M-PSK and M - Q A M BER com-putation using signal-space concepts," IEEE Transactions on Communications, vol. 47, no. 2, pp. 181-184, Feb. 1999. [151] S. M . Alamouti, "A simple transmit diversity technique for wireless communications," IEEE Journal on Selected Areas in Communications, vol. 16, no. 8, pp. 1451-1458, Oct. 1998. [152] V. Tarokh, H. Jafarkhani, and A. R. Calderbank, "Space-time block codes from orthog-onal designs," IEEE Transactions on Information Theory, vol. 45, pp. 1456-1467, July 1999. [153] K. Cho and D. Yoon, "On the general BER expression of one- and two-dimensional amplitude modulations," IEEE Transactions on Communications, vol. 50, pp. 1074-1080, July 2002. [154] S. B. Wicker, Error Control Systems for Digital Communication and Storage. Upper Saddle River, NJ: Prentice Hall, 1995. [155] T. S. Rappaport, Wireless Communications: Principles and Practice, 2nd ed. Upper Saddle River, NJ: Prentice Hall, 2002. [156] H. V. Poor and X . Wang, "Iterative (turbo) soft interference cancellation and decoding for coded CDMA," IEEE Transactions on Communications, vol. 47, pp. 1046-1061, July 1999. [ 157] H. V. Poor and S. Verdu, "Probability of error in mmse multiuser detection," IEEE Trans-actions on Information Theory, vol. 43, pp. 858-871, May 1997. [158] J. G. Proakis, Digital Communications, 4th ed. New York, NY: McGraw Hill , 2000. [159] S. Russel and P. Norvig, Artificial Intelligence: A Modern Approach, 2nd ed. Upper Saddle River, NJ: Prentice Hall, 2003. B I B L I O G R A P H Y 243 [160] L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989. [161] J. R. Yee and E. J. Weldon, "Evaluation of the performance of error-correcting codes on a Gilbert channel," IEEE Transactions on Communications, vol. 43, pp. 2316-2323, Aug. 1995. [162] M . Littman, A. Cassandra, and L. Kaelbling, "Learning policies for partially observable environments: scaling up," in Proc. of the Intern. Conf. on Machine Learning'05, San Francisco, CA, 1995, pp. 362-370. [163] C. H. Papadimitriou and J. N . Tsitsiklis, "The complexity of Markov decision pro-cesses," Mathematics of Operations Research, vol. 12, no. 3, pp. 441-450, 1987. [164] O. Madani, S. Hanks, and A. Condon, "On the undecidability of probabilistic planning and related stochastic optimization problems," Elsevier Artificial Intelligence, vol. 147, no. 1-2. [165] I. Nourbakhsh, R. Powers, and S. Birchfield, "DERVISH: An office-navigating robot," in AI Magazine '95, vol. 16, 1995, pp. 53-60. [166] R. Simmons and S. Koenig, "Probabilistic navigation in partially observable envi-ronments," in Proc. of the Intern. Joint Conf. on Artificial Intelligence '95, Montreal, Canada, Aug. 1995, pp. 1080-1087. [167] G. C. C. Jr. and J. B. Cain, Error-Correction Coding for Digital Communications. New York, NY: Springer-verlag, 1981. APPENDICES 244 Appendices APPENDIX A. M A R K O V DECISION PROCESS 245 APPENDIX A Markov Decision Process The Markov process is a framework for modeling uncertainty and describes how the state of a system evolves. It rests upon two key assumptions. First assumption is that the system can be described probabilistically. That is, each possible state of the system can be identi-. tied, and there exist a well-defined, stationary probability distributions that describe how the system can change as a consequence of each action. The second assumption is known as the "Markov assumption", where the future is assumed to be independent of the past conditioned upon knowledge of the present. Once current state is known, the past is of no predictive value. Markov decision process (MDP), also referred to as stochastic dynamic programming, sequen-tial stochastic optimization, or stochastic control problems, are models for sequential decision making when outcomes are uncertain. A MDP is defined through the following ingredients: a set of decision-epochs (also called stage) T = {1,2, • • • , H}, a state space S = {si, s2, • • • , ss}, a set of actions bi = {u\, u2, • • • , uu} a set of state and action dependent immediate costs for minimization problems or rewards for maximization problems Q : K i—> R, and a set of state and action dependent transition proba-bilities V : fC >-> 9(5). Set JC = {(s, u) : s G <S, u G Us} to be the set of state-action pairs and Q(S) represents the set of discrete probability distributions over the set S. For discrete-time MDP a decision-epoch is equal to a discrete time-slot. However, for semi Markov decision APPENDIX A. M A R K O V DECISION PROCESS 246 process (SMDP), the decision epoch may be consists of several time-slots. Suppose, at each decision epoch n G T, the system occupies a state sn = Si and the decision maker selects an action un = u from the set of actions USi available at state Sj G S, where \JaeS USi = U. After an action u is selected, the system moves to the next state Sj G S according to the probability distribution pSUSi{u) = p(sn+1 = Sj\sn = Si,un — u) and the decision maker incurs a one step immediate cost G(s n , un) = 2~23^es ^(s") u"> sn+1)p(sn+1\sn} un). The selection of an action un may depend on the current state, the current time-slot, and the available information about the history of the system. A decision rule of the MDP prescribes a procedure for ac-tion selection in each state at a specified time-slot. Let pn denotes a decision rule at time-slot n, then pn : S i — • Us. A policy (also called control law) IT specifies the decision rule to be used at all decision epoch, i.e., IT — {pl,p2, • • • , pH}. Let II denotes the set of all admissi-ble policies IT. We assume that the immediate cost and transition probability do not vary with respect to time-slot n. A policy is called stationary if it does not vary with time-slot n, i.e., pJ1 — / i , V n G T. The stationary policy has the form IT = {/i, /x, • • • p); for brevity it can be denoted by p. The expected long-term average cost per stage with stationary policy p is given by, 1 H H—>oo ri — n = l When the Markov chain induced by the policy p is ergodic, then we have, G" = E{G(s,M*))}. (A-2) The policy p* over all stationary policies II that minimizes the average cost per stage (A.l) is called the optimal policy and the corresponding optimal cost per stage is given by, G* = minG„. (A.3) For a finite MDP, it is known that given any history-dependent policy, there exists a Markov policy, dependent only on the previous state, with the same average cost. So, it is sufficient to restrict attention to Markov policies while seeking optimal policy [145]. If the state spaces are finite, the costs are bounded, and the system is stationary, (i.e., the system equation, the cost per APPENDIX A. M A R K O V DECISION PROCESS 247 stage and the transition probabilities do not change from one stage to the next stage) the average cost per stage problem can be solved using dynamic programming (DP) techniques, such as relative value iteration (RVI) and policy iteration (PI) algorithms [145], [146]. The Bellman equation for average cost per stage MDP for Si — si, s2, • • • , ss can be given by [146], s min u&ASi U=i = 0, and (A.4) A ( S J ) + h(si) = min U€USi s G(si,u) + YPsi,sj(u)h(sj) , (A.5) 3 = 1 where, A ( S J ) , Sj G S is the gain or average cost per stage, h(si), Sj 6 S is the relative cost for each state, and USi C U is the set of allowable actions in state Sj. For recurrent and unichain MDP, the gain is constant, i.e., A ( S J ) = A ; V S J G S and hence (A.4) vanishes. Since practically sometimes in a multi-objective problem the objectives are conflicting, we are interested in the trade-off between the objectives. The problem with conflicting objectives can be solved in two ways, namely, forming the problem as an unconstrained MDP (UMDP) or as an constrained MDP (CMDP). In UMDP, we find the immediate cost G(s,fi(s)) = SifcLi PkGk(s, p(s)) by summing the weighted combination of the different costs to be op-timized and formulate the problem as an average cost per stage as (A.l), where weighting factor pk, k = 1,2, • • • , K are determined by the relatives importance of the objectives. The optimal policy fx* can be found using any of the DP techniques mentioned above. In CMDP, one type of cost (called objective cost) is minimized while keeping the other types of costs (called the constrained costs) below some given bounds. Therefore, our objective is to find the optimal stationary policy //* so that, min G i (A.6) subject to: G£ < Gk, fc = 2, • •• • , K. (A.7) In the above equation, the terms Gx and G£ denote the expected long-term average cost of the immediate costs G\ (s n, /x(s")) and Gk {sn, /u(s")), fc = 2, • • • , K corresponding to stationary APPENDIX A. M A R K O V DECISION PROCESS 248 policy /J, and are given respectively by, 1 H G i = i i m T7y2^{Gi(sn,Ksn))}, and (A.8) H—»oo r i z — ' n=l 1 H ii—too J i ' * n=l and Gk, k = 2, • • • , i f denotes the bounds corresponding to the constrained costs. In Sections A . l , A.2 and A.3 below, the dynamic programming algorithms for solving the UMDP and CMDP are discussed. A.l Relative Value Iteration (RVI) Algorithm for UMDP The UMDP problem can be solved using the relative value iteration algorithm to get the de-terministic optimal policy. The relative value iteration algorithm for unichain DT-MDP can be described as follows: 1. Set k = 0, tolerance e > 0 and initial value of /i ( 0 )(si) = 0, Vs* e S 2. Select a state s r as a reference state 3. Find/i^ f c + 1)(sj), V S J E S from the following equations: h^k+1\Si) = (ThW)(Si) - (Th^)(sr), Vsi E S (A.9) where the mapping function T has the form: s {Th{k))(Si) = min ' u&ASi G(si}u) + J2Psi,sj(u)h^(sj) (A.10) 4. Findc<*> = mmSi[(Th^)(Si) - h^(Si)} and c(fc) = maxSi[(Th^)(Si) - h^(Si)}. If ?j(fc) — c^> < e, go to next step. Otherwise increment k by 1 and return to step 3 5. Find the optimal average cost using: A = (Th{k))(sr) (AM) APPENDIX A. M A R K O V DECISION PROCESS 249 and the optimal policy using: r J~ ... 1 , Vsi € S (A. 12) /z( fc )(si) = min U&As; S G(suu)+ Y,Psusi(u)h{k\sJ) A.2 Policy Iteration (PI) Algorithm for UMDP The UMDP problem can also be solved using the policy iteration algorithm to get the deter-ministic optimal policy. The policy iteration algorithm for unichain DT-MDP can be described by as follows: Initialization Set k = 0 and select an arbitrary initial policy u-^1 Policy Evaluation Select a state sr as a reference state and put h^k\sr) — 0. Find A ( f c ) , /i ( f c )(si), i = 1,2, • • • , S by solving the following set of equations: 5 A<*> + = G(Si)^k\Si)) + X > ^ > ( f c ) ( * ) ) * ( f c ) ( « i ) . V S i e S (A.13) Policy Improvement Find new improved policy using: G ( a i > M ( f c + 1 ) ( * 0 ) + E ^ ^ ( f c + 1 ) ^ ) ) f c ( f c ) ^ ) = ^ s 1 (A. 14) Termination If /z ( f c + 1) = /x^ , the algorithm terminates; otherwise, the process is repeated with ii( f c + 1) replacing u^k\ A.3 Linear Programming (LP) Algorithm of CMDP The CMDP problem formulated above can be solved using equivalent linear programming (LP) methodology described in [145]. It can be shown that there is an one-to-one correspondence between feasible (and hence optimal) solution of the LP and the feasible (and hence optimal) solution of CMDP. LP is feasible if and only if CMDP is feasible [148]. Let v(s, u) represents APPENDIX A. M A R K O V DECISION PROCESS 250 the "steady-state" probability that the process is in state s and action u is applied. We seek to find the control policy which is represented in terms of probability distribution v over <S x U. The optimal policy v* can be obtained by solving the linear program, minimize Gi(s, u)v(s, u) ( A . 15) subject to: E Gk(s,u)u(s,u) < Gk, k — 2, • • • , K ses,ueu3 y ~ ] u(s, u) = 1; v(s, u) > 0, Vs € S and u E U3. a€S,u€Us Suppose there exists an optimal solution v* to the LP problem. Then there exists an stationary policy p* that is optimal for the CMDP problem. The optimal policy p* for CMDP is random-ized and is uniquely characterized with probability 9^(s)(u) of applying policy u E Ua in state s E S, where ° ^ { U ) = V «0 i f ^ > 0. (A .16) If Ylu'&As u*(s>u') ~ ® w r s o m e s e a n a c t i ° n m a t drives the system to SR = {s E S : YJU'&AS ^*( s ' u ' ) > 0} is chosen in each state [145]. The dual Linear Program of the above program is given as follows, K m a x — Y^ TZkGk fc=2 K subjectto + v(s) < Gi(s,u) + Y^™kGk(s,u) + Y2PsAu)v(t)> s E S,u EUS, fc=2 tes (A .17 ) where, $ G R are decision variables, U : 5 H > K and Wk are nonnegative constants. In general, LP can handle problems with large number of variables and above linear programs can easily be solved using interior-point methods [144]. APPENDIX B. W E A K L Y COMMUNICATING STRUCTURE: A N E X A M P L E 251 APPENDIX B Example of Weakly Communicating Structure Figure B . l : An illustration of the weakly communicating structure of the MDP problem. We consider a simple illustrative example to explain the general structure of the MDP problem. Let buffer size, B = 3 packets, number of channel states, C = 2 and set of actions, U — {ui,U2,uz} = {0,1, 2}. The state spaces of the buffer and channel are B — {bo,bi,b2,b3} and C — {ci, c2} respectively. Let the incoming traffic be constant and packet arrival rate be A = 1 packet/block. The composite Markov chain then have eight states and is given by S = {si, s2, S3,54, S 5 , 5 6 , S7, s8}. The first four states correspond to channel state c\ and the last four states correspond to channel state c2. Actions that may lead to packet dropping are considered not feasible. Therefore, allowable actions in the states that correspond to buffer APPENDIX B. W E A K L Y COMMUNICATING STRUCTURE: A N E X A M P L E 252 Table B. 1: System transition matrix corresponds to policy P(na) o P s u s M o o o p a i > s a ( / i 0 ) o o o p S 2 , a 2(/i a) o o o p32,sM o o o o p S 3 , s M o o o (j*a) 0 o o p S 4 , s M o o o 0 Ps5,s2(Va) 0 0 0 P„ 5 ,a 6 (/ia) 0 0 0 PS6,s2(Pa) 0 0 0 (/*«) o 0 o o Ps7,sM o o o G O o o o p 3 & , s M o o o (Va) 0 state 6 0,6i,6 2, and 63areiV(60) = {ui},U(pi) = {ui,u2}, U{b2) = {1*1,1*2,1*3}, andcV(63) = {1*2,1*3}- Now, let us consider two policies, namely, fj,a = [1*1 u2 u2 u 3 u\ u2 u2 t*3] and A*6 — [1*1 1*2 1*3 1*3 1*1 1*2 1*3 1*3]- The transition matrices corresponding to policy / i Q and fib are given in Table I and Table II respectively. The Markov chain corresponding to policy /zQ is multichain since it has two recurrent classes, i.e., S R L — {s2, s6} and SR2 = {s 3 ,s 7} and a transient class ST = {s\, s4, S5, s 8}. The Markov chain for this policy is shown in Fig. B. 1(a). On the other hand, the Markov chain corresponding to policy u-\> is unichain as it has sin-gle recurrent class, R = {s2, SQ} and a transient class T — { s i , s 3 , S4, S5, S7, sg}. Therefore, the problem forms a weakly communicating MDP. Using initial policy /in = [1*1 1*1 1*1 1*2 1*11*11*11*2], for Bi — 1 and 32 = 0, the policy iteration algorithm yields the following improved policies Hi — [ui 1*2 i*2 1*2 1*1 1*2 1*3 1*3], A*2 = [1*1 1*2 1*3 1*3 1*1 1*2 1*3 1*3] with optimal policy A** = A*2 = A*b- The Markov chain corresponding to these policies are shown in Fig. B.l(b), B.l(c), and B.l(d) respectively. It can be seen from the Markov chains of these policies that all generated stationary policies encountered in the course of the algorithm are unichain, therefore, the policy iteration algorithm terminates finitely with an optimal stationary policy [146]. For APPENDIX B. W E A K L Y COMMUNICATING STRUCTURE: A N E X A M P L E 253 Table B.2: System transition matrix corresponds to policy P(pb) 0 0 PsilS2{pb) 0 0 0 PausM 0 Ps2,sM 0 0 0 pS2>S6((ib) 0 o pS3,*M o o o Ps3,S6(pLb) o 0 o p,AtsM o o o P34,SM o 0 P.B,SM 0 0 0 pS6,S6(Pb) o 0 0 Ps6,sM o o o pa6i36(pb) o o 0 Ps7,s2(Vb) 0 0 0 pS7,S6{pb) 0 0 0 o p.s,M o o o Psa,M o non-constant incoming traffic, it can be shown that if max {a"} < ^f(uu) and max {a™} < B, then there always exist some policies that are multichain. This condition can be verified con-sidering following policy, umuitichain = maxu{it € W|*(u) < mod (&", ^(uu) + 1)}, where mod (x,y) = x - y\*\. APPENDIX C. A L G O R I T H M FOR T R A C K I N G BELIEF 254 APPENDIX C .Algorithm for Tracking Belief This algorithm finds the belief for a particular time-slot n: 1. Initialize the belief Z°(SJ), WSJ G S and action vP, set time-slot n = 1. 2. Find initial estimate of the belief zn(sj), \/SJ G <S using (6.10). 3. Find action n(zn) : <S i—> U optimally using dynamic programming algorithm for perfect CSI case or using any heuristic. 4. Update the belief zn(sj), V S J G S using (6.11) and increase n by 1 and go to step 2 until n = H. APPENDIX D. A L G O R I T H M FOR DETERMINING O P T I M A L Q - F U N C T I O N 255 APPENDIX D Algorithm for Determining Optimal Q-function This algorithm finds a stationary e-optimal action-value function (or Q-function) Q*(si, u): 1. Initialize relative or differential cost <5(5i,u) corresponding to state-action pair (si,u), Vs* € S,u 6 USi, choose the reference state st = Si, specify tolerance e > 0, and set iteration number k = 0. 2. For all state-action pair (s,, it), s* € 5 and it G Ws., compute Q ( f c + 1 ) ( s j , it) using Q{k+1){Si,u) = J2PsitsM>u){9(si,u,Sj) + m i n Q{k)(Sj,u'))} - m i n Q{k)(st,u') sjes (D.1) 3. If sp(u) < e go to step 4, where sp(i>) = max[v(si,it)] — min[u(sj, u)] (D.2) and i> (si,u) = V p S i ) S j ( n ) ) (5 (5 i ,M , s i )+ m m Q ( f e ) ( S j , i / ) Sj=S\ x J - m i n Q ( f c )(si,it'), Si = si, - • • ,ss, u e USi (D.3) Otherwise, set k = k + 1 and return to step 2. APPENDIX D. A L G O R I T H M FOR DETERMINING O P T I M A L Q-FUNCTION 256 4. Set optimal Q-mnctions, Q* = Q{-k+l\si, u),Si 6 <S, u e USi and stop. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0100638/manifest

Comment

Related Items