xio(t) \u2014^ s 1, 0i(\u00b0) = k)^(2.43) p(e+i) > xiB(t)^s, L;(0)^< p(o(t+1)^xio(t) ^O(\u00b0)^k),^(2.44) where k^(k J., . . \u2022 , kid \u2022 Proof (2.43) is implied by the fact that the probability that the buffer goes beyond level x at time t + 1 is greater if the buffer store more packets at time t [10]. We now show (2.44). The time evolution O(t) can be written as o(t+i)^(0?) z(t) )+ x(t) , where 4t) is 1 if user i transmits a packet successfully in time slot t and 0 otherwise, 4 t) is the number of packets that arrive during time slot t. Therefore, in order to show (2.44) it suffices to show that the probability of success is lower in the full-load system, i.e. ^ P(Yi = 11Bi(t) = s, BP) = k) > P(Zi = 110? ) = s, O, \u00b0) = k).^(2.45) It should be noted that the success probability of a user decreases when there is one more user that has a packet to transmit (due to (2.7)). Hence, (2.45) clearly holds.^^ Second, we restate Lemma 3 in [10], which is an application of Pakes' Lemma [98]. This Lemma is used to obtain a sufficient condition for the stability of 0( t) , which implies the stability of e ) . Lemma 2.2. For the Markov chain 0? ) define the drift Ji(j) E[0? +1) \u2014 0 (t) 10 (t) = jj. Suppose that the drift Ji(j) < oo for all j and that for some scalar 6 > 0 and integer Tj > 0 we have Ji(j) < \u20148 for all j Then the Markov chain 0? ) has a stationary distribution. For the full-load system the drift Ji (j) of the Markov chain corresponding to the buffer 50 Chapter 2. Optimal Distributed Medium Access Control of user i is independent of j and is given by K Ji(i) tU^frYmaxi E ^\u2022 k=1 ArDi \u2022 \u2022 forymax 1E4^ivA \/II u bi)dFrli (,),(1 ) \u2014. K ui(-yi))E[Vijk users tx, 1A4c] dFK ('YK)\u2022 ^(2.46) From the above equation it can be seen that a sufficient condition for the stability of 0( t) is E 7max f 7ma x< E H uibi) H (1 - tiibi))E[79 i k users tx, lAr] k=1 ArDi \u00b0^1E4^TOAkK dFl ('Yl) ... dFK (-yK ), which is also a sufficient condition for the stability of Bi t) . Therefore a sufficient condition for the stability of the system is = K i=1 ,ui< f0 rraxTh -ymax K . .^E^ui(-yi) fi (1-nibi\u00bbW(k)(1AK) k=0 AK JEAr^j OA r dF1 (71) \u2022 \u2022 \u2022 d Fic (71() \u2022 (2.48) Therefore, the system maximum stable throughput defined as the supremum of all accumu- lative input rates for which the system is stable is lower-bounded by (2.10). 2.9.2 Proof of Theorem 2.2 Let (u1(.),^, u*K (.)) be a solution to (2.12). Suppose that (74(.),\/6(.),^u*K (.)) does not satisfy Theorem 1. Without loss of generality, suppose u*K (.) is randomized in the sense of Def. 2.1, i.e. u*K (7) {0,1} for some non-zero measure set of values of \u2022-y. We will now show that the solution to the optimization problem UK()^ max 00 , -imax) [0\u202211TIC (UT (\u2022) U2(.), \u2022 \u2022 \u2022 7141C-1(.), UK (\u2022)) ^ (2.49) is non-randomized in the sense of Def. 2.2. Indeed, the system throughput when user K (2.47) 51 where 6(7) = EF(71,-,7K -1 uK (7) 0 otherwise { 1 if 8(7) > TK-1 (2.53) Chapter 2. Optimal Distributed Medium Access Control never transmits is given by 1\/. K-1 E E fi u'iTyi) fi (1-u;(-y3))w(k)(1AL, -1) k=0 4,c iEAfec- 1^jcZAIkc\u2014i dlli(71) \u2022 ..dFK-1(7K-1)\u2022^(2.50) TK \u20141 = L7max When user K uses the transmission policy uK(.) the new system throughput is given by TK(UTO)U 4 (.), \u2022 \u2022 \u2022 u*K \u20141(.),UK(\u2022)) = f -Ymaxo UK (y)6(7) + (1 \u2014 uK (7))TK- dFK (7) (2.51) E H^H (1_,L ,i(7i ))(4,(k+1)(1AK_1, 7))]. k=0^jEAlk(-1 (2.52) It is clear that is a Lebesgue measurable transmission policy that maximizes (2.51). Hence, if we assign uK(\u2022) = uK(\u2022) given by (2.53), the system throughput is still maximized. This step can be repeated until (uT u2(.), , uK(.)) only contains non-randomized policies, hence Theo- r m 1. ^ Remark: If S(.) is non-zero almost everywhere, we can claim a stronger result that solution(s) of (2.12) must consist of pure policies only. Intuitively, (5(.) is non-zero almost everywhere means that the system throughput when user i always transmits is almost surely different from the case when user i never transmits . Verifying this condition is hard. Nevertheless, we conjecture that this condition holds for the SINR reception model for CDMA systems. This condition does not hold for the collision model or the MPR reception model without CSI. 52 Chapter 2. Optimal Distributed Medium Access Control 2.9.3 Proof of Theorem 2.3 We follow the same approach as the proof of Theorem 2. Let (uT(.),^, u*K (.)) be a solution to (2.12). We will show that there exists a non-randomized piecewise continuous solution to (2.49). In particular, we will show that the solution (2.53) is a piecewise con- tinuous, non-randomized transmission policy. This can be done by showing that O(7) given by (2.52) is piecewise continuous. Given the channel states of all users, the probability that the only users indexed by Ar-1 transmit is given by P(ilikc-11-yi, = flicAK- 1 ui(yi) 11 4 dAK-1 (1 \u2014 Uj('yi)). Hence S(.) can be rewritten as (2.54)(k+1) HAK -1 7 7)] TK - 1,6(y) =Efr(71,-,7K-1,ALC-1 ) [kij where P(-y1,^,^AkK-1) is the joint distribution of 71,^k,^. As expec- tation is a smoothing operator, (5(7) is piecewise continuous if (2.17) holds. 1=1 2.9.4 Proof of Theorem 2.4 Let (u1(.), \/60,^, u*K (.)) be a solution to (2.12). Without loss of generality we will show that (2.20) holds for u'Ic (.). For (uI(.),u'2\u0300 (.),^,u*K (.)) to be an optimum, it must be the case that u*K ey) 61\u00b1 UK (-y) V7 s.t. 8(7) ^ 0, where uK (7) is given by (2.53) and (5(-y) is given by (2.52). The proof will be complete if we can show that 6(y) < 0 for all 7 > C. Due to the theorem's assumption, L (71'10, . , u*K (.)) > 2+ co. Furthermore, when user K never transmits the system throughput is TK_1 given by (2.50). From (2.50) it is easy to see that TK_1 > rk - 1 > 1 + \u20ac0. For the SINR threshold reception model (2.9) we have ^ kif(k+1)(,.;. K , n,) Iey > +^E^E (^\/Ak \u00b1 +, 1 Y\\ Ej^i 7. ^TEAK 1 <1+ E ley <^\u2014 N \u2014^< 1+ ^jEAkK- 1^ i=1 > 0 ) K -1 1 (7 < 53 Chapter 2. Optimal Distributed Medium Access Control Hence the following inequalities and equality holds: K -1 N 5(y) < 1 +^ I(ry <^- TK -1 i=1 = 1 + K -1 P('Yz >^TK -1 i=1 K -1 5_ 1 + E - TK -1 i=1 1\\1),^N-7y\" < 1 + \u2014 07 \u2014 K _1 < 1 + - -1K-1 = 1 \u00b1 CO TK -113C (By Markov's inequality) < 0.^(2.55) Hence 8(7) < 0 for all ry > C.^ ^ 2.9.5 Proof of Corollary 2.1 Under the assumption that all users have the same channel distribution, (2.10) becomes (2.21) and the optimization problem (2.12) becomes (2.22). Let (uT (.), , \/4(0) be the unique solution of (2.22). Then due to Theorem 2 \/40 is a pure policy for all i = 1, 2, ... , K. Since (2.21) is symmetric with respect to the transmission policies ui (.), , uK 0 it must be the case that any permutation of (14(.), , u*K (.)) is also a global optimum of (2.21). It follows from the uniqueness of the global optimum that u1(.) = u.;(.) for all i, j E 1, . . . , K = u*(.). Then u*(.), which is non-randomized in the sense of Def. 2.2, is the unique optimum of (2.13). ^ 2.9.6 Proof of Corollary 2.2 Similar to the proof of Corollary 2.1, under the assumption that all users have the same channel distribution, the objective function (2.10) becomes (2.21), which is symmetric with respect to the transmission policies of all users. This symmetry implies that if (ul (.), , u*K (.)) is a global optimum of (2.21) then any permutation of (7\/10, , u'i c (.)) will also be a global optimum of (2.21). It then follows that t; (\u2022) = u*(.) for all i E 1, . . . , K, and u*(.) is piecewise continuous due to Theorem 3 and u*(.) is the unique optimum of (2.1 ). ^ 54 Chapter 2. Optimal Distributed Medium Access Control 2.9.7 Proof of Theorem 2.6 Our approach to prove the theorem is as follows: first, we aim to solve for uT(.) without any assumption or constraint on u2(.). Then by symmetry, we claim the same result for 14(.). 1. yl > A ^ > \/3. (2.39) can be rewritten as fA 00 fo \u00b00 (71) 712('Y2)^> + 0 '1 ) + 1(72 > 13 + 13:YArl) - 2) + nib]) + u2 (72)g-y2 > 13)dF('-x2)dF(71). We need to select u1(.) to maximize f (71) (f u2 (72) (1 (72 < N2-1- \u2014 N) +1(1'2 > + 3 21 ) \u2014 2) dF(72) +1) dF(71) 2 For 71 > Awe have N if- -N > 0+0*. Hence I(-y2 < Nai -N)+I(72 > N aic ) 1 and ^fA u2(-y2) (1(72 <^- N) +1(72 > 13 +,3a) _ 2) dF(72) + 1 > 0.^(2.56) Therefore ut(71) = 1 for all 71 > A. Since (2.56) does not require any assumption on u2(.), by symmetry we can claim that u2(72) = 1 for all 72 > A. 2. \/3 <^< A = Airv_130 For 71 in this region we have 0 < Naf - N < + 021\\71, < A and therefore 1(72 < 1V 2,3 - N) + 1(72 > + \/FP - 2 > -1 for all 72 > A. It then follows that u2(72) (1(72 < N - N) + I(-y2 >+ OITIT ) - 2) dF (72 ) + 1 A ^> -2^u2(-y2)dF(-y2) - f U2 (72)dF(72) + 1 )3 A ^ A^cx) > -21 dF(72) - f dF(-y2) + 1 A = 1 e -Ah - 2e -13h fi,\u00b03 It follows that a sufficient condition for the strict optimality of ui (71) = 1 in this 55 Chapter 2. Optimal Distributed Medium Access Control region is (2.41). We can repeat the arguments to show that if ( 2.41) holds u'(.) = 1 for all \u2022 E [13,A]. Lastly, it is clear from (2.39) that tl(-yi) = 0 and u2(72) = 0 for all -yi,y2 E [003]. ^ 56 Chapter 3 Game Theoretic Approach for Medium Access Control 3.1 Introduction The previous chapter has deployed a centralized design approach to optimize the distributed channel-aware MAC protocol in the system throughput sense. This chapter uses a non- cooperative game theoretic approach to analyse the distributed channel-aware MAC pro- tocol.In particular, it is assumed here that each user accesses the common channel using its own transmission policy to maximize its individual expected reward (utility), instead of having a common objective as in the previous chapter. It will be showed that even though users do not cooperate, multiuser diversity is still enhanced, and results in a system throughput that increases with the number of active users in the network. Game theoretic solutions are autonomous, robust and very scalable. Therefore, game theory has been widely used for distributed resource allocation in wireless networks. For example, [46-48] adopted a game theoretic approach for random access in a collision channel model, [45, 48, 49] used the non-cooperative game theoretic formulation for power control. In general, the drawbacks of game theoretic approaches are sub-optimality of the solutions and the fact that it is rarely straightforward to analytically, or numerically compute an equilibrium of a multiuser game. In this chapter, we use a non-cooperative game theo- retic approach to exploit CSI for random access in multipacket reception networks. The above mentioned drawbacks are partly overcome in this chapter, mostly due to a threshold structural result that will be discussed below. Similarly to the previous chapter, the multiaccess system model of this chapter is wireless 57 Chapter 3. Game Theoretic Approach for Medium Access Control ca 21 H 0 7 Channel state Figure 3.1: An example of a channel-aware transmission policy. (7) 21 0 7rnax Channel state Figure 3.2: Threshold structure of the optimal transmission policy proved in this chap- ter: u*(-y) > 0*). This optimal threshold 0* can be estimated using an adaptive gradient ascent stochastic approximation algorithm. At a Nash equilibrium, every user adopts a policy of this threshold structure. networks with the G-MPR model proposed in [10]. Every user has access to its CSI at every transmission time slot and transmits with a certain probability according to its channel- aware transmission policy. An example of a channel-aware transmission policy is given in Fig. 3.1. Every user is selfish and rational and selects a channel-aware transmission policy to maximize its expected reward, without any concern for other users. As there are an infinite number of channel-aware transmission policies, the transmission game has an infinite strategy space. Therefore, the existence of a Nash equilibrium for the transmission game is not at all straightforward. In this chapter, we prove that the best response transmission policy of each user is always threshold in the channel state, and furthermore, there exists a Nash equilibrium profile at which every user adopts a threshold transmission policy. In other words, in order to optimize its utility (and at a Nash equilibrium), a user will transmit with certainty if and only if its channel state is better than a threshold (see Fig. 3.2). These results are intriguing because they imply that in order to find a Nash equilibrium, it suffices to restrict to the class of threshold policies. Furthermore, in order to optimize its transmission policy, a user has to optimize a channel state threshold, i.e. a single scalar, instead of a function. In this 58 Chapter 3. Game Theoretic Approach for Medium Access Control chapter, the optimal channel state threshold is estimated by a stochastic approximation algorithm that can track slowly time-varying systems. In other words, due to the threshold structural results, each user can adaptively learn its part of the equilibrium and the game theoretic MAC solutions can be estimated online, in an adaptive and completely distributed fashion. It has been mentioned above that a drawback of the game theoretic MAC protocol is the suboptimality in the system performance sense. However, it is shown via simulations that the system throughput achieved at the Nash equilibrium is very close to that of the centralized design optimal MAC. Furthermore, the game theoretic formulation has three parameters that can be designed to improve the system performance: the transmission cost, the waiting cost and the success transmission reward. These parameters offer a means to take into account factors such as battery constraints and delay requirements. The waiting cost, for example, can be adjusted to enhance long-term fairness in the network and can be increased to encourage transmission and reduce delay. On the other hand, the game theoretic MAC solutions can be estimated online, in a distributed fashion, where every user adaptively learn its optimal threshold. That is, the computation of game theoretic MAC solutions is much easier. Furthermore, the proposed stochastic approximation algorithm has the tracking capability and virtually does not re- quire any information other than individual instantaneous channel states, costs and rewards. However, it should be noted that, while the algorithm for estimating the best response pol- icy for each user is provably convergent, the emergence of a Nash equilibrium while users learn their optimal policies is not analytically proved (except for the two-user case). Nev- ertheless, in simulations the system always converges to a Nash equilibrium while each user adapts its optimal policy in slowly time-varying systems. Furthermore, in Section 3.7 of this chapter, we also discuss the generalization of the game theoretic MAC results to the case of wireless networks with packet\/user priorities. Packet priority is a means to prioritize specific application functions and important for networks with multiple types of traffics or classes of users, e.g., wireless local area networks (WLANs) or cellular networks. Packet priorities can even be a means of coordination or collaboration which does not require explicit communications between users. 59 Chapter 3. Game Theoretic Approach for Medium Access Control 3.1.1 Main Results The random access channel-aware MAC problem is formulated as a non-cooperative trans- mission game, where each user aims to find a transmission policy to maximize its individual expected reward. The results presented in this chapter include the following. \u2022 If the probability of correct reception of a packet transmitted by a user (player) is a non-decreasing function of its CSI, there exists a threshold transmission policy that maximizes the expected reward of the user (see Figs. 3.1, 3.2). \u2022 There exists a Nash equilibrium at which every user deploys a threshold transmission policy. Therefore, in order to find the optimal policy for a user, or an equilibrium profile of the system, it suffices to consider only threshold policies. \u2022 For the symmetric network model, where users have identical channel state probability distribution function, identical transmission and waiting costs, there exists a symmet- ric Nash equilibrium profile. That is, at the Nash equilibrium, all users deploy the same threshold transmission policy. In addition, the common threshold is increasing in the number of active users. \u2022 Due to the threshold structural result, in order to determine the optimal (best re- sponse) transmission policy for a user, it suffices to estimate only the optimal thresh- old, which is a single scalar. We propose a gradient-based stochastic approxima- tion algorithm with a constant step size for adaptive estimation of the best response transmission policy for any single user without knowing the policies, or the channel probability distribution functions of other users. \u2022 The deterministic best response dynamics algorithm and the emergence of a Nash equilibrium are studied. Convergence of the best response dynamics algorithm is proved for the two-user transmission game. For the general case of a multiuser trans- mission game, the best response functions are explicitly characterized for Rayleigh fading channels. Local asymptotic stability of a Nash equilibrium can then be deter- ministically (but numerically) verified. \u2022 A stochastic best response dynamics algorithm is derived by combining the determin- istic best response dynamics algorithm with the adaptive stochastic approximation algorithm. Specifically, in the stochastic best response dynamic algorithm, at each 60 Chapter 3. Game Theoretic Approach for Medium Access Control iteration a user updates its best response transmission policy using the proposed adaptive stochastic approximation algorithm. Then the system behavior is asymptot- ically identical to the deterministic best response dynamics algorithm. Simulations show that even without this time coordination, the system still converges to a Nash equilibrium. Furthermore, the stochastic best response dynamics algorithm can track variations in the system, such as fluctuations in the number of active sensors, or any other statistical properties. However, it should be noted that the convergence to a Nash equilibrium is not analytically proved, except for the two-user case. Lastly, in Section 3.7, we present the generalization of all the above structural results and algorithms to incorporate packet priorities into the wireless network model. In the numerical studies section, characteristics of the Nash equilibrium and performance of the proposed algorithms are studied, and the system performance at equilibrium is compared to the optimal distributed MAC protocol derived in the previous chapter. 3.1.2 Related Work The framework of this chapter is very similar to that of Chapter 2. In particular, here we also consider exploiting CSI for random access in wireless networks with the generalized multipacket reception (G-MPR) model proposed in [10]. It has been mentioned in the pre- vious chapter that the first MPR model (without CSI) was proposed in [24] and generalized to the case of an asymmetric network model in [4]. Furthermore, [25, 89, 90] studied the performance of the ALOHA random access protocol for MPR wireless networks, where CSI is not available. In [10], the G-MPR model, which is a generalization of the MPR model in [24], was proposed. In the G-MPR model, the correct reception probabilities depend on the channel states of all transmitting users. The G-MPR model explicitly incorporates CSI into the packet reception process and is suitable for exploiting CSI for cross-layer MAC optimization. Additionally, research work related to this chapter include papers on the use of game theory for various wireless network optimization problems. The game theoretic approach has been used to analyse the ALOHA random access protocol in [39, 46-48]. The channel model assumed in [39, 46] was the collision channel model. In comparison, the channel model assumed in [48] was the MPR model proposed in [25], where the number of transmitting 61 Chapter 3. Game Theoretic Approach for Medium Access Control users determines the probabilistic outcome of each transmission time slot. The ALOHA game in [48] was that every user selects its own transmission policy mapping the numbers of users contending the channel to transmit probabilities to maximize its individual utility. The main assumption in [48] was the global knowledge of the number of users contending the channel at every time slot. The authors proved the existence of a symmetric Nash equilibrium for the collision channel model in [46], [39] and the MPR reception model in [48]. CSI was not exploited in these three papers. Besides medium access (or transmission) control, the game theoretic approach has also been used for power allocation in [45, 49, 53], flow and congestion control [51], load balancing [52], pricing [50]. The main differences between the game theoretic transmission scheduling problem in this chapter and early work on game theoretic analysis of ALOHA networks [39, 46-48] are the exploitation of CSI via channel-aware transmission policies, the relaxation of the assumption that all nodes are symmetric, as well as the elimination of the assumption that the number of active users is known globally. In comparison, the most important difference between the work in this chapter and other work on exploiting CSI for optimal transmission control such as in the previous chapter or [10, 17-19], is the non-cooperative game theoretic formulation, which leads to a highly scalable, adaptive transmission scheduling algorithm. In this chapter the Nash equilibrium is learned via a stochastic best response dynamic algorithm, where every user updates its policy using a stochastic approximation algorithm. As it was mentioned previously, stochastic approximation is useful for problem of optimizing an objective function that can only be observed in noise, with respect to some adjustable parameters. Stochastic approximation is widely used in electrical engineering and wireless network optimization, for example see [100-103]. The remaining of the chapter is organized as follows: Section 3.2 describes the wireless network model. The game theoretic MAC formulation is given in Section 3.3. Section 3.4 presents the theoretical results on the structure of the optimal transmission policies as well as the existence of a Nash equilibrium. Section 3.5 analyses the existence of a symmetric Nash equilibrium for the symmetric game. The stochastic approximation algorithm for estimating the optimal transmission policy and the Nash equilibrium is proposed in Section 3.6. In Section 3.7, the structural results and numerical algorithms are generalized to incorporate packet priorities into the network model. Section 3.8 contains numerical examples. Lastly, the conclusions are given in Section 3.9, followed by Section 3.10, which contains most proofs of the analytical results in the chapter. 62 Chapter 3. Game Theoretic Approach for Medium Access Control 3.2 Distributed MAC for Wireless Networks This section describes the wireless network model and the distributed MAC mechanism that will be analysed and optimized using a non-cooperative game theoretic approach in the later sections. 3.2.1 Random Access Wireless Network Model The wireless network model considered in this chapter is almost identical to that in Chap- ter 2. In particular, we consider a wireless network of K users, where K is a finite positive integer. Index the users by i = 1, 2, ... , K. Assume the users communicate with a common base station via non-orthogonal communication channels, e.g., using CDMA with random signature sequences. Assume that time is divided into slots of equal size and transmissions are synchronized at the beginning of each time slot. Let SNR represent the channel state of a user. It should be noted that besides SNR, any parameter that influences the reception of packets at the base station can be chosen as the channel state. For example, the position of a user with respect to the base station can represent its channel state. If SNR represents the channel state, a user can estimate its individual CSI by measuring the strength of a beacon signal broadcast from the base station to all users. Denote the channel state of user i by -y, for all i E {1, 2, ... , K} and assume that -y, is a continuous random variable in [0, ymax], where -ymax E R+ and -ymax < oo. -ymax can be arbitrarily large so that [0, -ymax ] includes the entire range of CSI that is of practical interest. Denote the continuous (channel) probability distribution function of y2 by Fi(.). Furthermore, at the beginning of each transmission time slot, user i is assumed to know its instantaneous CSI, -y i , perfectly. In the network, a user that does not have any packet to transmit is referred to as an inactive user. In contrast, a user with at least one packet to transmit is referred to as an active user. At each time slot, inactive users perform no action while active users must either transmit or wait, i.e. not transmit. The transmit probability of an active user is determined by its instantaneous CSI and its transmission policy. 63 Chapter 3. Game Theoretic Approach for Medium Access Control Define a transmission policy to be a function mapping channel states of a user to its transmission probabilities: ui(\u2022) : [0, -ymax] -4 [0, 1]^ (3.1) for i E {1,..., K}. A transmission policy is sometimes referred to as a transmit probability function. We now define pure, randomized and threshold transmission policies. Definition 3.1. A transmission policy u(.) : [0, -ymax ] -4 [0, 1] is pure if u(y) E {0, 1} for all -y E [0, 7max] except for possibly a zero measure set (with respect to the probability measure F(.) of the channel state -y) of values of 7. Definition 3.2. A randomized transmission policy is a transmission policy that is not pure. Equivalently, a randomized policy is a transmit probability function u(.) : [0,-ymax] [0, 1] such that 0 < u(y) < 1 for some non-zero measure set (with respect to the probability measure F(.) of the channel state) of values of y. Definition 3.3. A threshold transmission policy is a transmission policy u(.) : [0 ,7,,, ax ] \u2014> [0, 1] such that u(7) a.e.^0^< 0 1 Otherwise (3.2) for some 0 E 0 < 0 < ymax . For a precise formulation of the game theoretic MAC optimization problem, we need to define the space of admissible transmission policies. Consider the normed linear function space L00 [0, -ymax], that contains all Lebesgue measurable functions defined on [0, -ymax] that are bounded almost everywhere (a.e.). The norm of a function in Le.3 [0, -ymax] is its essential supremum [93]. Let BL 00 [0 ,7.] [0, 1] be the set of all functions in L00 [0,-ymax] that have norms in the range [0, 1]. Then BL.[O,rymaxi [0, 1] is the admissible transmission policy space. In a transmission time slot, if user i does not transmit a waiting cost wi > 0 is recorded, if it transmits it has to pay a transmission cost ci > 0. At the end of a transmission time slot, if a packet is received correctly the user receives a reward of 1 unit. More conditions on the cost and the derivation of instantaneous and expected rewards for each user are delayed until Section 3.3. The objective of each user in the network is to find a transmission policy mapping channel states to transmit probabilities to maximize its individual expected reward. This distributed MAC problem is formulated as a non-cooperative game in Section 3.3. 64 Chapter 3. Game Theoretic Approach for Medium Access Control 3.2.2 G-MPR Model This chapter assumes the G-MPR model that is proposed in [10]. A short summary of the G-MPR model is given below for notational clarity and to make the chapter self-contained. Section 2.2.2 gives a more detailed description of the G-MPR model. In the G-MPR model, the outcome of a transmission time slot, when k users indexed by Ar transmit, belongs to an event space where each elementary event is represented by a binary k-tuple OAK 05ti : i E Ain , where 19i = {0, 1} and V, = 1 indicates that the packet sent by user i is correctly received and V, = 0 indicates otherwise. The reception capability of the system is described by a set of K functions, where the k-th function 4)(1AE; OAK) assigns a probability to the outcome eArc when k users indexed by Ar with channel state -YAK transmit: 4)(1Ar ; O Ar) P(eAikc lk users transmit, 1,40 (3.3) Equation (3.3) means that the distribution of the possible outcomes {O AK } is determined by the channel states of the transmitting users. It is assumed that the reception model (3.3) is symmetric (as in (2.5). Furthermore, the probability that the transmission from user i is successful depends on the channel states of all transmitting users as specified below: Pi (1'i \":Y4 At\" \u2014i) = Ed,[19 ilk users tx, 1'AK\u2014i], (3.4) where Ar E C2K is the set of transmitting users. For a time slotted CDMA system with matched filter receivers and the SINR threshold reception model [91], (3.4) can be rewritten as: 'Yi = I x( y; > 0).1 + N (3.5) Throughout it is assumed that Oi (eyi , Iiitc _ i ) defined by (3.4) is a Lebesgue measur- 65 Chapter 3. Game Theoretic Approach for Medium Access Control able function of ryi . Besides Lebesgue measurability, the assumptions listed below are also used. These assumptions are satisfied by most non-trivial systems, e.g., the SINR threshold reception model (3.5). 1. The success probability of user i, defined by (3.4), is non-decreasing in its channel state (-7i,^v-yi > <=>E[Vi l k users tx,^\/Ar \u2014i] > E Pi I k users tx, yi, lAkf\u2014ii V7i > ^ (3.6) Some of the analytical results in this chapter can be strengthened if the inequality in (3.6) is strict, i.e. E[i9i I k users tx, -yi ,^_ i] > E[19i I k users tx, -7i ,^Vry, > 7i .^(3.7) Unless it is stated otherwise, it is assumed that (3.6) holds, but not (3.7). 2. The success probability of a user is lowered when one more user transmits. This is also a condition in [36]. (lAr_i,11)) dh > 0 <#.1E[Vilk users tx, 'Yi,^^_ Epiik + 1 users tx, ryi (lAr _i, h)],^(3.8) where h > 0 is the channel state of the additional user. 3. When the channel state of a user is 0, its success probability is 0: 1Gi ( 0, 1Ar_ i ) = 0 ^ (3.9) 3.3 Game Theoretic MAC Formulation The distributed MAC optimization problem can be formulated as a non-cooperative game with a continuous state space as follows: \u2022 The set of players is the set of active users indexed by i = 1, 2, ... , K. 66 Chapter 3. Game Theoretic Approach for Medium Access Control \u2022 At each time slot user i can choose an action ai E A = {W T}, where W means to wait, T means to transmit. A user can also choose to transmit with some probability. \u2022 A strategy is a transmission policy, defined by (3.1). Pure and randomized transmis- sion policies are defined in Definitions 3.1 and 3.2 respectively. Since the space of pure policies is not finite, the existence of a Nash equilibrium is not straightforward. \u2022 A profile is a set of strategies deployed by all users in the network, denoted by u = 0, \u2022 \u2022 \u2022 ) 1,1K( \u2022) } The only remaining element of the non-cooperative game is the expected reward (utility) function of each user. In a transmission time slot, if user i does not transmit a waiting cost wi is recorded, if it transmits it has to pay a transmission cost ci. At the end of a transmission time slot, if a packet is received correctly, the user receives a reward of 1 unit. The instantaneous reward of user i is then determined as follows: ri = 1 \u2014 ci \u2014ci \u2014wi If ai = T, Ai = 1 If ai = T, 19i = 0 If ai = W, i.e. user i did not transmit. (3.10) The condition that ensures a successful transmission is preferable to no transmission, and no transmission is preferable to an unsuccessful transmission is 1 >ci>wi> 0 Vi=1,2,..., K.^ (3.11) By the symmetric property of the reception model (3.3), the expected reward of user i, denoted by Gi (u,(\u2022),{11_,(\u2022)}), can be easily derived: fmax Ili (TO [ Fax^k=1^leAr_i^JoAL\u2018 Gi (ui(\u2022), {u\u2014i .)})fo ,),max EK E H u1(2,1) II (1- uj(Yi)) ^ X (( 1 ci)1PieTi, -7Ar -i) ci(1- Oi(\"Yi, ;;Ar -i)))^dFi (yi)] jOi \u2014 (1 \u2014 ui(ryi))wi dF(-yi) 67 ]x Oi (-xi, --iAr_i) fi dF3 (73 ) - ci + wi dFi('Yi) \u2014 wi. K .1#1: j=1 (3.12) Chapter 3. Game Theoretic Approach for Medium Access Control \/max^\/Max f (-xi) [fo^\u2022 \u2022 \u2022 fo -rmax E E^u ( 1(\/ )^- ui k=1 Akcpi\/EAr_i^ivAr Let T i bi, {u_ 2 (\u2022)}) : [0, 7max ]^[0, 1], i^1, 2, ... , K be a function mapping channel states of user i to the (average) probability of receiving its packet correctly given the policies ^ of other users. W i^{u_ i OD can be calculated from (3.4) to be [K {tt\u2014i(*)})^E E H uieyo fi(1 -uh.obribi, ,-yAr_o \u2022 (3.13) k=1 ArDi\/EAf,c_i^ivAkc For example, for the SINR threshold reception model (3.5) for CDMA systems we have K^7rnax {U\u2014i( * )}) _ E E H u f^\/\/max )^(1- u\/(7\/)) 0^0k=1 ArDi jEAr\u2014i^\/ctpir x I( 1 \u00b1 ^ > 13)^dFi (7j). E^jOijEAr \u2014ti (3.14) Using (3.13), the expected reward of user i given by (3.12) can be rewritten as -rmaxGi(ui(.), {tt_ i (\u2022)}) = fo^Ui (-yi )^{u_i(.)}) \u2014^w2) dFi (yi ) \u2014^(3.15) In the remaining of the chapter, (3.12) and (3.15) are used interchangeably as the utility function of user i. The problem of maximizing the expected utility for user i is: sup^Gi(ui(\u2022), {u_i(\u2022)}), u i(.)EBL oo [0,,. ] [0,11 (3.16) where Gi (ui(-), fu_i(-)1) is given by (3.12) and B.Lco [0 ,..),.] [0, 1] is defined in Section 3.2.1. The optimization problem (3.16) completes the formulation of the non-cooperative game. The remaining of the chapter is dedicated to analysing the existence and structure of Nash equilibrium profiles. A Nash equilibrium profile is a profile at which no player can benefit 68 Chapter 3. Game Theoretic Approach for Medium Access Control by unilaterally deviating from its current policy [43],[41]. Definition 3.4. A profile a* =^,u*K01 is a Nash equilibrium if and only if Gieal (.), {u*_ i (.)})^Gi(ui(.), {u* j(.)}) for all players i = 1, 2 . . . , K and ui\u2022) E BL.[0,7max] [0, 1]. It should be stressed that at a Nash equilibrium point (3.16) must hold for all users. One observation that can be made at this point is that the relaxation of the assumption that all users are indistinguishable (as in [24, 25, 89, 90]) by allowing for different channel state distributions, Fi (.), can lead to some unfairness. One possible solution is to design the transmission and waiting costs to guarantee certain fairness requirements. The designing of transmission and waiting costs has not been studied in this chapter. 3.4 Structured Nash Equilibrium Having formulated the problem of optimal decentralized transmission control as a non- cooperative transmission game in the previous section, in this section we study the structure of the Nash equilibrium transmission policies. We follow a common technique in game theory to prove the existence of a structured Nash equilibrium profile. This technique consists of three steps: 1. Showing that a particular class of policies is optimal (Theorem 3.1) 2. Proving the existence of a Nash Equilibrium when the policy space is restricted to this class of policies (Theorem 3.2) 3. Proving the existence of a Nash Equilibrium in the original game by showing that a Nash Equilibrium in the game with the restricted policy space is also a Nash equilib- rium in the original game (Corollary 3.2). Readers are referred to [43], [41] for examples of the early use of this technique in game theory. 69 Chapter 3. Game Theoretic Approach for Medium Access Control 3.4.1 Optimality of Threshold Policies We prove that the utility of a user can always be maximized by a threshold transmission policy. Theorem 3.1. Consider the multipacket reception random access network of K active users described in Section 3.2. Consider the non-cooperative transmission game formulated in Section 3.3, where the problem of optimizing the utility for user i =- 1, 2, ... , K is given by (3.16). Assume the reception model (3.4) of the network satisfies (3.6) and (3.9). There exists a transmit probability function that maximizes user i's expected reward (3.12) and is a threshold policy: for some 0 E a.e^0^< u',;(7) = 1 Otherwise (3.17) Proof. The proof of this theorem is an application of the bang-bang principle, presented in [104]. The objective of user i is to maximize its utility, which is given by (3.15): \u201emax Gi (140, {u_i(*)}) = fo^ui(ryi)^fu_i^\u2014 + wi) dFi (yi) \u2014 wi It can easily be seen that if T i (., fu_i(\u2022)1), defined by (3.13), is Lebesgue measurable then u;(7i) =^1 if Wieyi, {u_i(.)}) \u2014 ci + w i > 0 0 otherwise is a function in BL c.,[0,-ymax][0, 1 ], and (3.18) G(14 (.), {u_i(.)}) = sup 1,2(\u2022)eBLc [0 ,max i.3,-^ G(u,(.), fu_ i (.)1). fo,1] In other words, if Ti(., fu_i OD is Lebesgue measurable then it is clear that the supremum of G(ui(\u2022), {u_i(\u2022)}) is attained in BLoo [0 ,..),.] [0,1] at ui ( ) given by (3.18). It will now be shown that T t e, {u_,(.)}) is indeed Lebesgue measurable. Recall that 70 Chapter 3. Game Theoretic Approach for Medium Access Control {u_ i (.)}) is given by (3.13), i.e. [ K {u\u2014i(*)}) =-E{F_i} E E H ttic-yo fi ( 1 - k=1 AL( Di letlr^jcZAr Due to Lebesgue measurability of the set of functions O i (-yi ,lAfc _ { , } ) and all transmission policies, it is clear that T i (-yi, {u_ i (\u2022)}) is Lebesgue measurable. Therefore, the supremum of G(ui(\u2022), fu_i\u20221) is u:(-yi) given by (3.18). In addition, T i bi ,{tt_ i (-)}) is non-decreasing in -yi due to (3.6), and Ti(0, {u_,(\u2022)}) = 0 due to (3.9). The latter implies that kIi i (0, fu_i(\u2022)1) \u2014 ci + wi < 0. If Wieyma,x , {u_ i (.)})-ci +wi < 0 then it must be the case that T i (-y, {u_i(\u2022)})\u2014c,+w i < 0 V -y E [0, -ymaA. The optimal solution is then not to transmit, i.e. u:(-yi) = 0 V -yi E [0, -yma\u201e]. This function is a threshold function with the threshold being equal to '' max . ^ If Wibmax, {u\u2014z(\u2022)})^wi > 0 there must exist a threshold 0 E [0 , ')\/max] so that i (ry,^fu_i\u20221) \u2014 ci + wi^0 for all -y < 0 and tIf i (-y, {u_i(.)}) \u2014 ci + wi > 0 for all -y > 0. Again, the optimal transmission policy defined by (3.18) is a threshold policy. Hence the proof is complete.^ ^ Remarks: \u2022 It can be noted from the above proof that the optimal transmission threshold for player i is the threshold Oi E [0,-y max] that satisfies <0 Vry< Bi Wie7,{u\u2014i0})^+ wi >0^> 0i \u2022 (3.19) \u2022 If the condition (3.6) is strict, i.e. if (3.7) holds, then Wibi, {u_i(.)}) is strictly increasing in ryi . In this case, there exists a unique 0i that satisfies (3.19). In other words, the transmission policy defined by (3.18) is the unique optimal policy, and its optimality is strict. Hence, Theorem 3.1 can be strengthened that the optimal transmission policy 4(-yi) is unique and of the threshold structure for every user i, given the policy of other users. Corollary 3.1. Consider a time slotted CDMA network of K active users, where K is a 71 { 0^< Oi ui (7i) = 1 Otherwise (3.20) Chapter 3. Game Theoretic Approach for Medium Access Control finite integer, with the SINR threshold reception model (3.5) for matched filter receivers. There exists a threshold transmission policy solution to the optimization problem (3.16) for user i in the network, for any i E {1, 2, ... , K}. Proof. The SINR threshold reception model (3.5) satisfies (3.6) and (3.9). The proof then follows from Theorem 3.1.^ ^ 3.4.2 Existence of a Nash Equilibrium We reformulate the transmission game allowing only transmission policies in the class of threshold policies and prove the existence of a Nash equilibrium for this case. The existence of a Nash equilibrium at which every user adopts a threshold transmission policy for the original game then follows. Reformulation of the non-cooperative transmission game: In light of Theo- rem 3.1, it is clear that in order to maximize the expected reward, it is sufficient for a user to search for its transmission policy in the class of threshold policies. Each threshold trans- mission policy is completely defined by a single parameter: the CSI threshold beyond which the user transmits with certainty. Assuming every user seeks for a transmission policy in the class of threshold policies, the non-cooperative transmission game can be reformulated as below: \u2022 The set of players is the set of users indexed by i 1, 2, ... , K. \u2022 Each player i = 1, 2, ... , K selects a transmission policy in the class of threshold policies defined by (3.2). Equivalently, player i selects a threshold 0,, which determines its transmission policy as: The space of pure policies is then the set of possible values of 0\u201e which is [0, 7\u201enax ]. \u2022 The utility function of user i is obtained from (3.15) by replacing u \u2022 by I(. > 03 ) 72 Chapter 3. Game Theoretic Approach for Medium Access Control for j^1, 2, . . . , K: = f llmax^ 7max> 0i) [fo 'imax 100 E H^> 01)k=1 ArDi \/eAr\u2014i H (1_^> 0i) )0i^H dFi (Y ) \u2014^wil dFi^\u2014 wi (3.21) Then the optimization problem (3.16) can be rewritten as sup^Gi(19i3O_i),^ (3.22) ei E[0,-yma,d where Gi(19i, O_i) is given by (3.21). Having reformulated the transmission game, in Theorem 3.2, we use a classical result on the existence of a Nash equilibrium for games with continuous utility functions ([105], [106], [107], also see Theorem 1.2 in [43]) to prove the existence of a Nash equilibrium for the new game. Theorem 3.2. Consider a multipacket reception random access network of K < oo active users, where the system and reception models are given in Section 3.2. Assume that the users only select their transmission policies in the class of threshold transmission policies. Assume that the reception model (3.4) of the network satisfies (3.6) and (3.9). The non-cooperative transmission game formulated above, where the problem of optimizing the expected reward for user i 1, 2, . , K is given by (3.22), has a Nash equilibrium profile. Proof. See Section 3.10.^ ^ Remark: The assumption that the channel distribution function is continuous is essential in the proof of Theorem 2. If the channel distribution is not continuous it is not guaranteed that the space of pure policy is a compact, convex subset of any Euclidean space and the classical result on the existence of a Nash equilibrium for infinite game with continuous payoff ([105], [106], [107]) cannot be used. Finally, we claim that the original strategic non-cooperative transmission game for- 73 Chapter 3. Game Theoretic Approach for Medium Access Control mulated in Section 3.3, where the players search for their policies in the whole subset BL.[0 ,7.] [0, 1] of the normed linear function space L c\u201e,, also has a Nash equilibrium pro- file. Corollary 3.2. Consider a multipacket reception random access network of K active users described in Section 3.2. Assume that the reception model (3.4) of the network satisfies (3.6) and (3.9). Consider the non-cooperative transmission game formulated in Section 3.3, where each user can select a transmission policy in B Leo [0 ,1,..] [0, 1]. This non-cooperative trans- mission game has a Nash equilibrium at which every user deploys a threshold transmission policy. Proof. According to Theorem 3.2, the transmission game where only threshold transmis- sion policies are considered has a Nash equilibrium. Furthermore, due to the optimality of threshold policies (Theorem 3.1), a Nash equilibrium profile in the transmission game considering only threshold policies must also be a Nash equilibrium profile in the origi- nal transmission game, where a user can use any transmission policy in BL co [0 ,7,..] [0, 1]. The existence of a Nash equilibrium at which every user deploys a threshold policy then follo . ^ Remark: If the inequality (3.6) is strict, i.e. if (3.7) holds, then threshold transmission policies are strictly optimal (see the remark after the proof of Theorem 3.1). As a result, all Nash equilibria can be obtained in the class of threshold policies. In comparison, if (3.7) does not hold, the game may admit a Nash equilibrium outside the class of threshold policies. Furthermore, in this case, it is not immediately clear whether there exists a Nash Equilibrium that is Pareto dominant outside the class of threshold policies. In light of the theoretical results presented in this section, in the remaining of the chapter we only consider transmission policies in the class of threshold policies. In that case, the policy of each user i is defined by a channel state threshold 0, the utility by (3.21). For convenience, we introduce the definition of a Nash Equilibrium channel state threshold vector. Definition 3.5. A channel threshold vector 0* = (01, . , \u00b0k) E [0,-yrna,c ]K constitutes a Nash Equilibrium if and only if, Gi (07, 0* i ) > Gi(0i , 0*_ i )^ (3.23) 74 Chapter 3. Game Theoretic Approach for Medium Access Control for all players i = 1, 2, . . . , K, Oi E [0, 'ymax ] In the next section, we study the existence of a symmetric Nash equilibrium for the symmetric network model where all users have the same channel distribution function, transmission and waiting costs. 3.5 Symmetric Nash Equilibrium for the Symmetric Game This section considers the special case of a symmetric multipacket reception network model, where all users have the same continuous channel distribution function, i.e. F3 (\u2022) = F(.) V j E {1, 2, ... , K}, the same transmission and waiting costs. In light of the op- timality of threshold policies (Theorem 3.1), let the users search for their transmission policies in the class of threshold policies. The existence of a symmetric Nash equilibrium for the symmetric transmission game is proved under a mild condition on the continuity of the success probability function (3.13) (Theorem 3.3). It is also showed that the symmetric Nash equilibrium threshold is a non-decreasing function of the number of active users in the network (Theorem 3.4). We now define the two conditions required by Theorems 3.3, 3.4. Let all users except user i deploy the same threshold transmission policy, i.e. uj('yj) = I(-y3 > 0) V j i, for some 0 E [0, -yinaid. The success probability function, given by (3.13) for user i, becomes K Ti(yi3O)=E {F, E E H^> 0) H (1 -^> 0) )1\/)i (-'i,, lAre _ {j})]. (3.24) Lk=1 Ar Di lEtli j The first condition of Theorem 3.3 is that (3.24) is a continuous function of the CSI That is for i E {1,^, K} lim^0) =^0) V-Yi,^0 E [0,-ymax], Yi^Yo (3.25) Writing^0) in the integral form reveals that the continuity of tpi (yi , -YAr_ fil ) with respect to (w.r.t) -yi is a sufficient condition for (3.25). 75 Chapter 3. Game Theoretic Approach for Medium Access Control The second condition of Theorems 3.3, 3.4 is that (ynaax , 0 ) > C \u2014 W,^ (3.26) which implies that T i (-yffia,\u201e 0) > c \u2014 w for all 0 E [0, 'ymax] since W j (-yi , 0) is non-decreasing in 0 and yi by conditions (3.8) and (3.6), respectively. Intuitively, (3.26) means that if the channel state of a user is -ynia\u201e, which is the maximum value in the channel state space, and the user transmits then it will gain a positive expected reward. This condition is not very restrictive due to the fact that -y max can be arbitrarily large. In addition, it might be desirable that the transmission and waiting costs are adjusted so that (3.26) holds in order to eliminate the possibility that the optimal policy for user i is to never transmit. It should be noted that (3.26) is sufficient (but not necessary) to ensure that u(\u2022) = 0 is not optimal. Theorem 3.3. Consider a symmetric multipacket reception random access network of K active users. Assume that (3.25) and (3.26) are satisfied. There exists a Nash equilib- rium profile at which every user deploys the same threshold transmission policy, i.e. there exists a Nash equilibrium profile a* = ,u*K(.) : u1(yi) = > 0*) V i = 1\u201e 2, ... , K for some 0* E [0, -ymax ). Proof. See Section 3.10^ ^ Theorem 3.4. Consider a symmetric multipacket reception random access network of K active users. Assume that (3.25) and (3.26) are satisfied. Let be a transmission threshold such that aK = fuj(ryi) = > C) : i 1, 2, ... , K} is a Nash equilibrium profile. Consider the same network but with K +1 users. There exists > such that a l(+ 1 =^> e) : i = 1, . . . , K +11 is a Nash equilibrium profile for the transmission game of K +1 players. Proof. See Section 3.10.^ ^ 76 Chapter 3. Game Theoretic Approach for Medium Access Control 3.6 Learning Best Responses via Stochastic Approximation The main contribution of the chapter has been to prove the optimality of threshold transmis- sion policies and the existence of a Nash equilibrium at which every user adopts a threshold transmission policy. In this section, we are interested in the secondary problem of esti- mating a Nash equilibrium for the transmission game. Although there exist many learning algorithms in game theory for estimating a Nash equilibrium [54], proving the convergence of these algorithms is very difficult. Here we focus on the best response dynamics algorithm for estimating a Nash equilibrium and exploit our structural result on the optimality of threshold transmission policies. It is known that if the best response dynamics algorithm (Algorithm 3.1 below) converges, then it converges to a Nash equilibrium [54]. However, apart from the two-user case, we have been unable to give useful sufficient conditions for convergence of Algorithm 3.1. To give more insight into the sufficient condition for conver- gence of Algorithm 3.1 below, we also give an explicit characterization of the best response functions for networks with exponentially distributed channel state (i.e. Rayleigh fading channel). Algorithm 3.1 is the deterministic best response dynamics algorithm. At each iteration of Algorithm 3.1, it is assumed that each user can solve the stochastic optimization problem (3.22) for its best response transmission policy. This optimization problem can be solved efficiently using the stochastic gradient ascent method given by (3.33) in Section 3.6.3. Combining (3.33) and Algorithm 3.1, we propose the stochastic best response dynamics algorithm (Algorithm 3.2), that behaves asymptotically identical to Algorithm 3.1. In the stochastic best response dynamic algorithm, each user uses a gradient estimator and a constant step size to estimate its best response transmission policy via (3.33). Algorithm 3.2 can track the slowly time varying best response dynamic adjustment process for a slowly time varying network. 3.6.1 Deterministic Best Response Dynamics Algorithm First, we restate the utility optimization problem that needs to be solved numerically by each user. Specifically, in order to maximize its individual expected reward, user i must solve the optimization problem (3.22): sup G ,(0,,0_,). 92 E [O,ymax ] 77 Chapter 3. Game Theoretic Approach for Medium Access Control In light of Theorem 3.1 the supremum of G,(0,, 0_,) is attainable and it follows from (3.19) that H,(0_,) = arg max^G, (0,, 0_,) = Oi e [0, -ymax] such that: Ot E[0,-y.] { <0 V-y < 0, Wi eY, 0\u2014i) \u2014 c, + w, >0 by > 0, (3.27) where W,(7, (L i ) is given by (3.13) with ui(.) = I(. > 0i) for all j = 1, 2, ... , K. H,(19_,) defined above is the best response function for user i. H,(0_,) gives the threshold that defines the optimal policy for user i in response to the policies of all other users. In this section, we assume (3.7), which implies that the best response of each player is unique (see the remark after the proof of Theorem 1). H,(0_,) is then a well-defined scalar value function mapping from [0, -ymax ]K-1 to [0, -ymax]. In addition, H,(61_,) is non-increasing in every component of 0_, due to (3.8). The best response dynamics algorithm (see [54] for details) for the transmission game, where the users take turn to update their transmission policies, can be written as below. Algorithm 3.1. Best Response Dynamics of the Transmission Game \u2022 Initialization: At batch n^1, select W(n) E [0, -ynna,c ] K \u2022 For n, = 1, 2, .. . For user i = 1, 2, ... ,K B(n) = H i(e(n) 0 ) \u20141). where 0i(n) = (0. 71)) 3* 0. Lemma 3.3. For a symmetric network i.e. all Rayleigh channels are identically distributed (Ai = A Vi = 1, 2, ... , K), the best response function (3.27) for user i has the following representation for Wi (-y, O_i): K Ilf j('Y 1 e--i) =--- E E H (1 \u2014 e -A91 )I(a > k=1 Ar Di 1 0Ar^JEA -i A EJEAr-i 6i _ e-Aa + It-2 Ale-Aa (a \u2014 EjEAr--i \u00b0i) 1 ) (-1) 1 ! (x e 1=1 \u2022) (I(k = 1) + I(k > 2) (3.30) Here a Nry1,3 \u2014 N, N and are the spreading gain and the QoS requirement parameter of the SINR threshold reception model (3.5) respectively. For an asymmetric network, i.e. the Rayleigh fading channels have different probability density functions: Ai A3 V i j, Wi(y, O_i) in (3.27) is given by K Wi(y, 0_ i ) = E E H (1 _ e -A8I)I(a^E ei ) I(k = 1) + I(k = 2) k=1 Ar Di ioAr \u2014 E Ai ej^\u2014 E Ai e; x^JEAr^E e\u2014Aia) + I(k > 3)e EAr - iGAr-i 80 Chapter 3. Game Theoretic Approach for Medium Access Control -Ai (a- K _ e\u201e) x (1 (-1)k-1 E^11^Am ) (3.31) ^ A \u2014^e^ n , EAri jEALC-i mEAk( -i-j Here a = N-y113 \u2014 N, N and 13 are the spreading gain and the QoS requirement parameter of the SINR threshold reception model (3.5) respectively. Proof See Section 3.10.^ ^ We will use Lemma 3.3 to analytically compute the best response functions for a simple three-player transmission game and verify local asymptotic stability of a Nash equilibrium in Section 3.8. 3.6.3 Stochastic Best Response Dynamics Algorithm In Algorithm 3.1, it is assumed that at each iteration, user i can do the mapping (3.28), i.e. it can solve (3.22) for its best response transmission policy. In this section, we propose an algorithm (Algorithm 3.2 below), where (3.28) in Algorithm 3.1 is replaced by a stochastic gradient ascent algorithm that can estimate the best response transmission threshold for any user i, i.e. numerically solve (3.22). As shown below, the asymptotic behaviour of Algorithm 3.2 is the same as that of Algorithm 3.1. It should be noted that each user needs to estimate its optimal transmission policy without knowing the policies, the channel distributions of other users, or the number of users in the network. Hence, the optimization problem (3.22) is a stochastic optimization problem. Because the gradient descent is an unconstrained algorithm, an important condition for the validity of Algorithm 3.2 is that the optimization problem (3.22) without the constraint 0, E [0, -ymax ] actually has a solution in [0, 7\u201e,a),], or equivalently, T i (\u2022, O_\u201e) changes sign from negative to strict positive at some 9i in [0, for all 01, E [0, -ymax ] K-1 (see (3.19)). A sufficient condition for this fact is ^(M, CI) > c \u2014 w,^ (3.32) where^.) is given by (3.13) with u3 (.) = 1(73 > 0i) V j = 1, 2, ... , K. Similarly to the explanation of condition (3.26) for the case of a symmetric network, T i (., .) is monotone, 81 Chapter 3. Game Theoretic Approach for Medium Access Control \u2014 c + w = \u2014c + w < 0 and (3.32) implies that T i (M,^\u2014 c + w > 0 for all CL.i E [0, 'yrnax ]K-1 . Hence, (3.32) is sufficient to guarantee that the solution of the unconstrained version of (3.22) is in [0, limax]\u2022 Algorithm 3.2. Stochastic Best Response Dynamic of the Transmission Game \u2022 Initialization: Outer loop index n = 1, a batch size A > 1, 0, a constant step size 0 < e < 1 \u2022 For n = 1, 2, . . . For user i = 1, 2, ... , K user i updates its transmission threshold once every m transmission time slots, other users keep their policies fixed, i.e. 19_i = 0(A,n)(1\\^(n (A,T1-1.),\\^)k ^)3 i For Inner loop index 1 = 1,2, ... , A Estimate the gradient of Ve ( t , n)G I ' n) (0 1 n) 19_i) via a gradient estimator (see (3.35) and (3.36)) 9 (\/\u00b11,n)^0 (1,n) + si t, ou ,n)G (1,n) (0(1,n) 9\u2014i)^(3.33) Implementation of Gradient Estimator First, the utility function Gi (0i, 0_,) of user i, given by (3.21), can be written as l'max Gi (0i^>^0\u2014i) \u2014^Wi) dFi (7i ) \u2014 wi l'max f=^(-yi, 6\u2014i) \u2014 +^\u2014 wi , ei (3.34) where T z (-yz ,0_,) is given by (3.13). G i (0,,0_,), given by (3.34), is differentiable almost everywhere by the Fundamental Theorem of Calculus [109]. Furthermore, the gradient of G,(0,, _ i ) is Vet Gi(0i, 0\u2014i) = \u2014 (wi(0i, 0_i) \u2014 ci + wi)fi (0i).^(3.35) 82 Chapter 3. Game Theoretic Approach for Medium Access Control Since fi(0,) can be absorbed into the step size, to estimate the gradient of G ,(61\u201e _ i) we only need to obtain an unbiased estimate of which is the probability that the packet from user i is received correctly given its channel state is equal to O. If the network is static during the period of m transmission time slots of the \/-th iteration, the unbiased estimate of l) (BY ) ,0_ i ) is: 41 (1,n) (e,n) _ i) _= Number of ACK(s) -yi = O i , Number of Tx I -y, = Oi As an example, consider a system where SNR represents the channel state of a user. The probability of success of user i when its channel state is Oi can be sampled using a power control mechanism to ensure that the received SNR is equal to Oi and counting the number of ACK(s) that are sent to the user by the station. Hence a user can obtain the unbiased estimate kIi'n) ((en) , 9_ i ) without knowing the channel distributions or transmission policies of other users. 3.6.4 Properties of Stochastic Best Response Dynamics Algorithm Define a local maximizer of the objective function (3.34) as Bz = {Oi :^O_i) = 0, VT9, Gi(Oi, O_i) < 01.^(3.37) Theorem 3.5. For each user i and a fixed outer loop index n, if A > 1\/ei , the sequence ,1 = 1, 2, ... , A, generated by (3.33) converges in probability (i.e. weakly) to the thresh- old level corresponding to a local optimizer Bi of user i's expected reward Gi(0 i , 0_0 , where = ( (a (A,n) )^(49(A,n-1)^.) )3*