Matrix Methods in Queueing and Dynamic Programming by Bernard Fernand Lamond D.Sc, University de Montreal, 1978 M.Sc, University de Montreal, 1980 A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY in The Faculty of Graduate Studies Faculty of Commerce and Business Administration We accept this thesis as conforming to the required standard The University of British Columbia December 1985 © Bernard Fernand Lamond, 1985 00 In presenting t h i s thesis i n p a r t i a l f u l f i l m e n t of the requirements for an advanced degree at the University of B r i t i s h Columbia, I agree that the Library s h a l l make i t f r e e l y available for reference and study. I further agree that permission for extensive copying of t h i s thesis for scholarly purposes may be granted by the head of my department or by h i s or her representatives. I t i s understood that copying or publication of t h i s thesis for f i n a n c i a l gain s h a l l not be allowed without my written permission. Department of The University of B r i t i s h Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 J-6 (3/81) Abstract We investigate some modern matrix methods for the solution of finite state stochas-tic models with an infinite time horizon. Markov and semi-Markov decision processes and finite queues in tandem with exponential service times are considered. The meth-ods are based on the Drazin generalized inverse and use matrix decomposition. Unlike the related Jordan canonical form, the decompositions considered are nu-merically tractable and use real arithmetic when the original matrix has real entries. The spectral structure of the transition matrix of a Markov chain, deduced from non-negative matrix theory, provides a decomposition from which the limiting and deviation matrices are directly obtained. The matrix decomposition approach to the solution of Markov reward processes provides a new, simple derivation of the Laurent expansion of the resolvent. Many other basic results of dynamic programming are easily derived in a similar fashion and the extension to semi-Markov decision processes is straightforward. Further, numerical algorithms for matrix decomposition can be used efficiently in the policy iteration method, for evaluating the terms of the Laurent series. The problem of finding the stationary distribution of a system with two finite queues in tandem, when the service times have an exponential distribution, can also be expressed in matrix form. Two numerical methods, one iterative and one using matrix decomposition, are reviewed for computing the stationary probabilities. Job-local-balance is used to derive some bounds on the call congestion. A numerical investigation of the bounds is included. It suggests that the bounds are insensitive to the distribution of the service times. ii Table of Contents Abstract ii List of Tables v List of Figures vi Acknowledgements viii Introduction 1 I. Matrix decomposition and Drazin generalized inverse 3 1.1. Review of linear algebra 4 1.2. The Drazin generalized inverse 14 1.3. Matrix decomposition algorithms 20 1.4. Solution of singular systems of equations 36 1.5. Other algorithms for Drazin inversion 42 II. Nonnegative matrix theory 44 2.1. Matrices and directed graphs 45 2.2. Eigenvalue structure of a stochastic matrix 49 2.3. Limiting and fundamental matrices 53 2.4. Geometric convergence in the aperiodic case 59 III. Discrete time Markov decision processes 63 3.1. Markov reward processes 64 3.2. Singular systems of equations 68 3.3. Computation of gain, bias and other terms 77 iii 3.4. Iterative computation of gain and bias 78 3.5. Markov decision processes 82 IV. Continuous time Markov decision processes 91 4.1. Continuous time Markov reward processes . . . 91 4.2. Algebraic structure of the resolvent 94 4.3. Laurent expansion of the discounted rewards 96 4.4. Equivalent Markov decision processes 98 V. Semi-Markov decision processes 102 5.1. Semi-Markov reward processes 103 5.2. Decomposition of the first moment 106 5.3. Series expansion of the discounted rewards 112 5.4. Semi-Markov decision processes 116 5.5. Three special cases 118 VI. Stationary characteristics of tandem queues 124 6.1. Global balance equations 126 6.2. Computation of the stationary probabilities 128 6.3. The matrix decomposition method 134 6.4. Job local balance and bounds 138 6.5. Performance results 144 Conclusion 153 Bibliography 154 iv List o f Tables 6.1.1. Transition rates out of state (ni, na) 127 6.5.1. Numerical values with A = 1, varying /xi and 145 6.5.2. Numerical values with A = fix = ^ — 1 146 6.5.3. Comparison with hyperexponential service times 151 v List of Figures 1.1.1. Directed graphs of Jfc(A) and Z2/c(A) 12 1.3.1. Compute z = A~1b} where A — PLU 21 1.3.2. Operations counts for Gaussian elimination 23 1.3.3. The slow decomposition algorithm 26 1.3.4. Operations count for the slow decomposition algo-rithm 28 1.3.5. The fast decomposition algorithm 33 1.3.6. Storage considerations for the fast algorithm 34 1.4.1. The slow solution algorithm 37 1.4.2. Algorithm to compute Ui = B^x\ 40 1.4.3. The fast solution algorithm 40 2.1.1. The directed graph G[A) 45 2.1.2. Directed graph of a periodic matrix 48 3.2.1. Directed graph of C(2) 75 3.4.1. Power method for gain and bias 79 3.5.1. Policy iteration method for p-optimal policy 89 3.5.2. Policy iteration method for A: discount optimal policy 89 6.2.1. The power method for stationary probabilities 132 6.2.2. Pascal procedures for transition rates 133 6.3.1. Unstable algorithm for stationary probabilities 136 vi 6.3.2. Modified algorithm for stationary probabilities 138 6.4.1. Failure of job local balance 140 6.5.1. Blocking probability vs service rates 147 6.5.2. Blocking probability contour curves 149 vii Acknowledgements Some of the material in Chapters II and IH is adapted from Lamond and Puterman (1985), and some of the material in Chapter VI is adapted from van Dijk and Lamond (1985). I am grateful to my research advisor, Martin L. Puterman, for his encouragement and patience in reading numerous drafts of all parts of this thesis. My work on numerical algorithms has benefited from discussions with A. Laurie Johnston at the beginning, and more recently from comments by Uri Ascher. My interest in queueing has been stimulated by my joint work with Nico van Dijk, and by frequent discussions with Shelby L. Brumelle. Recent comments by Uriel G. Rothblum and Arthur F. Veinott, Jr. have helped clarify some important ideas concerning the use of generalized inverses in Markov de-cision processes. The numerical algorithms discussed in this thesis have been coded in Pascal. The text of the thesis, as well as the tables and most figures were produced with IgX. The figures involving curves were produced with Tell-a-Graf. Part of this research was done while the author was receiving an NSERC postgrad-uate scholarship, and later an SSHRC doctoral fellowship. Generous support from NSERC Grant A-5527 is also gratefully acknowledged. viii To my parents ix Introduction The six chapters of this thesis can be grouped into three distinct parts. The first part, which consists of Chapters I and II, deals with matrix decomposition. The second part contains Chapters III, IV and V. It explores the applications of the matrix decom-position approach to Markov and semi-Markov decision processes. Finally, Chapter VI is concerned with tandem queueing systems. In Chapter I, a brief review of matrix algebra is given, concentrating on the Jordan canonical form. For a singular matrix, the generalized inverse of Drazin (1958) is presented. A significant improvement of the algorithm of Wilkinson (1982) is obtained, for computing the Drazin inverse of a matrix of index 1. Our fast algorithm uses only one third as many arithmetic operations as the recent shuffle algorithm of Anstreicher and Rothblum (1985). In Chapter II, the structure of the Jordan canonical form of a stochastic matrix is derived from nonnegative matrix theory, using results of Varga (1962). The matrix decomposition approach is then applied to study the limiting and fundamental matrices of Kemeny and Snell (1960), and the deviation matrix of Veinott (1969) and (1974). The approach used is inspired from Campbell and Meyer (1979). In Chapter III, the matrix decomposition results are applied to discrete time Markov decision processes. A new proof of many results of Blackwell (1962) and Veinott (1969) and (1974) is given. In particular, a direct derivation of the Laurent expansion of the resolvent, for a small interest rate, is given. The matrix decomposition algorithm of Chapter I is used in a new policy evaluation method. 1 In Chapter IV, the formulation of a continuous time decision process is reviewed, based on the continuous time Markov chain of Cjnlar (1975). Matrix decomposition is used to show the similarity between continuous time and discrete time processes. An equivalence considered by Howard (I960), Veinott (1969), Schweitzer (1971) and Serfozo (1979) is reviewed. In Chapter V , semi-Markov decision processes are considered. Matrix decomposi-tion is used to obtain a new derivation of many results of Denardo (1971). In particular, the decomposition of the first moment of the transition probability function is derived. A new solution algorithm is also obtained. The models of Chapters HI and IV are formulated as special cases. Also, an equivalence used by Ross (1970) is discussed. In Chapter VI, a queueing system with two service stations in series is formulated, using the notation of Disney and Konig (1985). The numerical computation of the stationary probabilities is investigated. The iterative power method and the matrix decomposition method of Wong et al. (1977) are reviewed. Bounds are introduced, based on results in Hordijk and Van Dijk (1981) and the job-local-balance property. A numerical investigation of the bounds is included. This research suggests that matrix decomposition is a powerful, as well as versa-tile, approach to the study and solution of finite state stochastic systems. Not only does it provide simple derivations and efficient algorithms, it also appears to unify the treatment of many models. 2 Chapter I Matrix decomposition and Drazin generalized inverse In this chapter, we present the mathematical basis for the subsequent chapters. The notions of Jordan canonical form and Drazin generalized inverse are reviewed, and some efficient algorithms are developed, to compute a matrix decomposition and use it as a data transformation to produce the solution of a singular system of equations. In Section 1.1, we review the Jordan canonical form and related well known results. Our approach is similar to that of Golub and Wilkinson (1976), and reflects the general approach that will be followed throughout the thesis, although we will not be concerned with actually computing the Jordan canonical form. Further, as this is to be applied to matrices with real entries, we also introduce a modified Jordan canonical form which involves only real arithmetic, but still retains the basic properties of the original Jordan canonical form. This leads to a new, direct proof of the existence of a real decomposition of a matrix, in a form that provides a suitable representation of the Drazin inverse. The idea of replacing a pair of complex conjugate eigenvalues by a 2 x 2 matrix has been exploited in numerical analysis for computing the eigenvalues of a real matrix by the QR method [see e.g., Golub and Van Loan (1983), Section 7.4]. In Section 1.2, we review the definition of the generalized inverse of Drazin (1958). We derive its representation using a matrix decomposition based on the modified Jordan canonical form. This derivation follows very much that of Wilkinson (1982). In Section 1.3, we review the Gaussian elimination algorithm for computing the 3 PLU decomposition of a matrix [see e.g., Wilkinson (1965)]. Then we specialize the method of Wilkinson (1982) for computing the Drazin inverse of a general matrix, to the special case when the matrix has an index of 1, which is the case of interest in Markov decision processes. The main original contribution here is the fast algorithm, which produces a rep-resentation of the Drazin inverse, using essentially the same amount of arithmetic as for the PLU decomposition of a nonsingular matrix. In fact, its operations count is one third that of the recent shuffle algorithm of Anstreicher and Rothblum (1985) and, in the context of dynamic programming, it is essentially the same as for the method of Howard (1960). Our algorithm, however, is well suited for use in the lexicographic policy iteration method of Miller and Veinott (1969), as we will see in Chapter III. In Section 1.4, it is shown how to use the matrix decompositions to efficiently compute the solution vectors. We call these vectors gain and bias, in anticipation of the dynamic programming applications of Chapter DI. Finally, Section 1.5 reviews some other algorithms which have been studied in the literature. 1.1. Review of linear algebra Let i i , Z 2 , . . . , i m be n-vectors with complex entries. If the system of equations a i i i + a 2 i a + . . . + ctmxm — 0 has the unique solution a i = . . . = am = 0, then the vectors are said to be linearly independent. Otherwise, they are linearly dependent and those with a,- ^ 0 can be expressed as a linear combination of the others. A complex n x n matrix A is said to be singular if its columns are linearly dependent. Otherwise, A is said to be nonsingular. It is well known that A is singular iff its determinant, denoted det(.A), is equal to zero. Let 6 be a given n-vector, called the 4 right-hand side. Then the system of linear equations Ax = b (1.1.1) has a unique solution x = A~lb if the matrix A is nonsingular. The matrix A~l is uniquely determined by A and is called its inverse. Hence a nonsingular matrix is also said to be invertible. On the other hand, if A is singular, the system (1.1.1) does not necessarily have a solution for every right-hand side vector b. A vector b for which there is a solution is said to be compatible with A. A system of equations with singular A and compatible b has infinitely many solutions. The algebraic structure of a complex n x n matrix A is described by its eigenvalues and its eigenvectors. A right-hand eigenvector of A is a nonzero column vector x such that Ax = Ax, for some complex number A. This number A is the eigenvaiue of A associated with x. Similarly, a left-hand eigenvector of A is a nonzero row vector y such that yA = Ay. Let A' be the transposed of the matrix A. Then y' is a right-hand eigenvector of A'. For simplicity, we will use the term eigenvector (alone) to denote a right-hand eigenvector. The above definition implies that an eigenvector x is any nonzero solution of the system of linear equations where / denotes the n x n identity matrix. This system has a nonzero solution x iff the matrix (XI — A) is singular, that is, iff Since det(A/ — A) is a polynomial of degree n in A, A has at least one and at most n distinct eigenvalues. Hence Equation (1.1.2) always has a nonzero solution. Let 5 be a nonsingular matrix and 5 - 1 its inverse. Then a matrix B such that (XI-A)x = 0, (1.1.2) det(A7 - A) = 0 . B = S~lAS (1.1.3) 5 is said to be similar to A, and equation (1.1.3) is a similarity transformation. Much of the material in this thesis will be concerned with similarity transformations which transform a given matrix A into a new matrix B with a more convenient structure. An important property of similarity transformations is that they preserve the eigenvalues of a matrix. Indeed, we have so that det(A7— B) = 0 iff det(A7—A) = 0. In the context of linear systems of equations, a similarity transformation can be seen as a change of variables. Let y = S"1! and c = S~1b. Then Ax = b iff By = c. This last property is important. It says that A and 6 are compatible iff B and c are compatible. When a solution exists, then x is a solution of Ax = 6 iff y is a solution of By = c. This fact will be used extensively in our discussion of Markov decision processes. One structure of much interest is that of a diagonal matrix. We denote by D = diag(d,) a matrix with diagonal entries di, i = l , . . . , n . The following well known lemma indicates when a matrix A is similar to a diagonal matrix. Lemma 1.1.1. A n n x n matrix A is similar to a diagonal matrix iff it has n linearly independent eigenvectors. Proof. Let x\,...,xn be n linearly independent eigenvectors associated each with an eigenvalue A,-, i = l , . . . , n . Then the matrix X = (xi,...,xn) has an inverse X - 1 . Hence we have AX = XA, where A = diag(A,), so that X~lAX = A. This proves the " i f part. For the "only i f part, suppose that S~1AS = D, for some matrices S and D = diag(d,-). Then the n columns of S are linearly independent, and AS = SD implies that the i-th column of 5 is an eigenvector of A associated with the eigenvalue di, i = 1,..., n. • For example, the matrix XI - B = XS'^S - S~lAS = {XI - A)S, 6 has two linearly independent eigenvectors X l = ( l ) a n d X 2 = ( - l ) with eigenvalues Ai = 2 and Aa = 4, respectively, so that \1 -l) \0 A) V0.5 -0.5y ' Observe that a matrix satisfying Lemma 1.1.1 may have an eigenvalue with multi-plicity greater than 1, that is, A,- = Ay for some t ^ j. Some matrices have fewer than n linearly independent eigenvectors. Some have only one. Such a matrix is bound to have at least one eigenvalue with multiplicity greater than 1. An example is the Jor-dan submatrix of order n, denoted by J n (A), with all diagonal entries equal to A and subdiagonal entries equal to 1. Al l other entries are zero. For example, Ji(A) = (A) is diagonal and MX) = / A 0 0 0\ 1 A 0 0 0 1 A 0 Vo 0 1 X) It is easy to verify that any eigenvector of J n (A) is a nonzero multiple of en, the n-th unit vector. The following results illustrate how the Jordan submatrix provides the building blocks for a general matrix decomposition. Lemma 1.1.2. Suppose that A is a singular, complex n x n matrix. Suppose also that there are k nonzero vectors x x , . . . , x* such that Ax\ = 0 (with X\ ^ 0) and Axi = Xi-!, t = 2,...,fc, (1-1-4) where 1 < k < n. Then X\,... ,Xfc are linearly independent. Proof. By induction on k. The result is trivially true for k = 1 (because such an X\ exists since A is singular). Now suppose X i , . . . , Xk-i are linearly independent and that Axk = Xk-i for some nonzero vector x*. If Xfc = QTiXi + . . . + QTfc_iXfc_i 7 then Axk = a.\Ax\ + atzAxi + ... + ak-iAxk-i = 0 + ctzXi + . . . 4- ark-iZfc-a # Xk-l because xi,...,Xk-i are linearly independent. Hence Xk may not be a linear com-bination of x i , . . . ,Xk-i in order to be a solution of Equation (1.1.4), and the result follows. • Lemma 1.1.3. Suppose A is a complex n x n matrix, and let 5 = ( z i , . . . , x „ ) , where linearly independent n-vectors. Then A = SJn(X)S~l (1.1.5) iff the vectors z,- satisfy (A - XI)xn = 0 and (A - XI)xi; = x , + i , t = l , . . . , n — 1 . (1.1.6) Proof. This can be seen directly by writing Equation (1.1.5) as AS = SJn(X) and expanding Jn(X) and S from their definition. For example, with n = 3 we have ( X 0 o\ 1 A 0 I , 0 1 X) which is as required. • Lemma 1.1.4. If A satisfies Equation (1.1.5), then A has exactly one linearly inde-pendent eigenvector. Proof. The vector xn is an eigenvector of A because it satisfies Axn = Xxn. Suppose y is another eigenvector of A, with eigenvalue 7. Then Ay = iy implies that SJn(X)S-1y = 1y. 8 But this implies that Jn(X)z = 72, where z = S some nonzero scalar a, so that y — Sz = aSe n dependent. 1 y , so that 7 = A and z = ae n , for = axn. Hence xn and y are linearly • Lemma 1.1.5. Suppose there is a nonsingular matrix S such that A = S where 1 < k < n — 1, with B an (n — fc) x (n — A;) matrix. Then A has at least two linearly independent eigenvectors. Proof. The matrix B has at least one eigenvector. Let z be an arbitrary eigenvector of B. Then the vectors The next two results are given without proof. See Gantmacher (1959) for a complete derivation of the Jordan canonical form. Lemma 1.1.0. A complex n x n matrix A is similar to J n (A) iff it has exactly one linearly independent eigenvector. Theorem 1.1.7. (Jordan canonical form). An n x n complex matrix A has exactly k linearly independent eigenvectors (1 < k < n) iff it is similar to a block-diagonal matrix B — diag(i?,-) such that = J n . (A,) , for i = 1,..., k, and £ £ = 1 n i = **• For example, are linearly independent, and both are eigenvectors of A. • 1.5 -0.5 -0.5 0.5 1.5 -0.5 -1 0 2 has two linearly independent eigenvectors and y 2 = I -1 9 with eigenvalues Ai = 1 and A 2 = 2, respectively. Solving Equation (1.1.6) with 1 3 = y 2 and A 2 = 2, we get so that A 2 has multiplicity 2. Letting x\ = 3/1 and S = ( x i , x 2 , x 3 ) , we get the Jordan canonical form of A as (1 1 1 W i 0 0W0.5 0 0.5 "\ 1 1 - 1 0 2 0 0 0.5 -0.5 . 1 -1 -I J \0 1 2 J \0.S -0.5 0 J Hence we have B\ = J i ( l ) and 2J2 = ^ 2(2). Some observations are in order here. First, Lemma 1.1.1 is a special case of the above theorem with k = n, while Lemma 1.1.6 is the special case with k = 1. Second, our formulation of Theorem 1.1.7 emphasizes the fact that each Jordan submatrix is associated with exactly one eigenvector of the matrix A. Third, even when all the entries of a matrix A are real, it is possible that some of its eigenvalues be complex, in which case the transformation matrix S of Theorem 1.1.7 has some complex entries too. For example, the matrix -;) has two imaginary eigenvalues X\ = 1 and A 2 = —t, where t 2 = — 1. It should be clear, from our derivation of the Jordan canonical form, that if A is real and if all the eigenvalues are real, then S is real as well, because then the columns of S are solutions of linear systems of equations with real coefficients. Also, since all the coefficients of the characteristic polynomial det(AJ — A) are real, the eigenvalues of the real matrix A which are not real are arranged in complex conjugate pairs. Hence if A + = a +1(3 is an eigenvalue of A, so is A - = o — We use this property to derive a modified Jordan canonical form which has all entries real. First, observe that if x + = u+tv is an eigenvector of A with eigenvalue A + = ct+i/3, then x~ = u — iv is also an eigenvector of A, but with eigenvalue A - = a — 4/3. Then suppose that A has k\ real linearly independent eigenvectors x,- with eigenvalues A,-, 10 i = l,...,ki, and A;2 pairs of complex conjugate linearly independent eigenvectors (xtjX^ -), i = ki + 1,..., ki + fc2. By Theorem 1.1.7, for every complex conjugate pair (x*,x^~) there is a corresponding pair of Jordan blocks (J».-(At), J B . (Ar)) . We describe how such a pair of complex conjugate Jordan blocks can be replaced by a completely real submatrix. Consider a complex eigenvalue A = a + */?. We define the 2 x 2 matrices K and A such that and Lemma 1.1.8. K is similar to A. Proof. Because K = XAX~l, where Z-(* -f) and *-. = (-;£f W ) . (,:,„) • Now we define the block Jordan submatrix Jh(A), where A is a square matrix, so that its subdiagonal blocks are equal to I, the identity. Then, for example, Ji(A) = A and MA) = (A 0 0 0\ I A 0 0 0 / A 0 V 0 0 I A) Lemma 1.1.9. Let K be defined as in Equation (1.1.8). Then Jk(K) is similar to i 2 / c ( A ) , where i 2 f c ( A ) " V 0 Ma-10))' Proof. First observe that Jfc(iiC) is similar to Jfc(A), by Lemma 1.1.8. To see this, take 5 = diag(X f), with Xt = X as in Equation (1.1.10). Then Jk(K) = 5J f c (A)5~ 1 . Next we show that Jfc(A) is similar to 7j 2 *;(A). In fact, Jfc(A) = TZi 2 f c (A)2 , _ 1 , where 11 T is a permutation matrix. To see this, let us associate to each matrix a directed graph as follows. There are 2k nodes, one per row, labelled 1 to 2k, and for every nonzero entry, there is a corresponding arc labelled by the value of that entry. The graphs of Jfc(A) and L^kW are shown in Figure 1.1.1, in which A* denotes the complex conjugate of A. One can convince oneself that the graphs are correct by checking them with k = 2. Jfc(A): £ 2*(A): A A d A* A A k + 1 1 k + 2 1 i r h J L A* A* 2k- 1 2k 2k Figure 1.1.1. Directed graphs of Jfc(A) and X2jt(A). We have •MA) -(;:)-A 0 0 0 0 A* 0 0 1 0 A 0 .0 1 0 A* 12 and (A 0 0 0 \ 1 A 0 0 0 0 A* 0 * 0 0 1 X*J Now both graphs in Figure 1.1.1 are identical, except for the node labels. Hence Jfc(A) is obtained by permuting the rows and columns of L^kW accordingly, which completes the proof. • Theorem 1.1.10. (Modified Jordan canonical form). Suppose A is a real n x n ma-trix. Then A has exactly ki linearly independent real eigenvectors and k% linearly independent complex conjugate pairs of eigenvectors iff it is similar to a block diagonal matrix B = diag(B,) such that = Jn,(A,), for i = l,...,ki, and = Jni(K{)t for t = ki + 1,..., ki + A>2, with ki fci+fca + 2 ]T rii = n. *i+i A = Proof. First apply Theorem 1.1.7 to A. We have the real Jordan blocks B{ = J n . (A,) , for i = 1,...,ki, directly. Then organize the complex conjugate pairs as ^2n,(Ai), for t = ki + 1,..., ki + &2. The result follows because by Lemma 1.1.9, J3,- = Jn.(K~i) is similar to Z^n.CA,). • For example, the matrix 2 - 1 0 0 2 0 0 0 3 - 1 2 - 1 2 - 1 2 0 has only one pair of complex conjugate eigenvectors, with eigenvalues 1 + t and 1 — t, each with multiplicity 2. Then A is similar to (1+4 0 0 0 \ 1 1+t 0 0 0 0 l - i 0 0 0 1 l - i / We have the modified Jordan canonical form A = SJ2(K)S~1, where / l 0 0 0 \ / 1 0 0 0' 5 = 13 and with X = UK) = = XJ2(A)X - l and X'1 = Hence the original Jordan canonical form of A is A = SXTL4(I + o r - 1 * - 1 ^ - 1 , where T is the permutation matrix that interchanges rows 2 and 3. The modified Jordan canonical form effectively contains all the information of the original canonical form, but it is defined with real arithmetic only. For, suppose A = S B S ' 1 where B is as in Theorem 1.1.10, then S is a solution of the equation AS = S B , which has real coefficients only. Hence Theorem 1.1.10 will be useful in establishing the existence of a matrix decomposition suitable for the definition and representation of the Drazin inverse of a real matrix. Moreover, Theorem 1.1.10 suggests an easy way to generate test matrices for evaluating the performance of computer algorithms to compute the Drazin solution of singular equations. [This method has its limitations, however. See Golub and Wilkinson (1976), Page 583]. 1.2. The Drazin generalized inverse Let A be a real n x n matrix such that A K = 0, for some positive integer k. Then A is said to be nilpotent. The smallest such integer k > 1 is called the index of nilpotency of A [see e.g., Gantmacher (1959)]. For example, the index of nilpotency of the Jordan submatrix «/n(0) is n, because Jn{0)n = 0 while J n ( 0 ) n - 1 ^ 0. However, «7n(A) is not nilpotent if A ^ 0. Lemma 1.2.1. Suppose A is a real n x n matrix. Then A is nilpotent iff zero is its only eigenvalue. Moreover, the index of nilpotency of a nilpotent matrix A is the dimension 14 of the largest block of its Jordan canonical form. Proof. First, suppose A is similar to some matrix B. Observe that A is nilpotent iff B is nilpotent. Moreover, if A is nilpotent then it has the same index of nilpotency as B. This is because Ak - SBkS~1, for any k > 0, so that Ak = 0 iff Bk - 0. Next, suppose A has m linearly independent eigenvectors, and consider the block-diagonal matrix B of the Jordan canonical form of A of Theorem 1.1.7. If m = 1, then A is similar to Jn(X) and hence A is nilpotent iff A = 0. Moreover, if A is nilpotent, its index of nilpotency is n. If m > 1, the Jordan blocks are J n i ( A i ) , . . . , Jnm{Xm). Again, A is nilpotent iff Xi = . . . = A m = 0, and if A is nilpotent, its index of nilpotency is max{nt- : i = 1,..., m}. D Now consider a general real n x n matrix A. The index of A, denoted ind(A), is the smallest nonnegative integer k such that rank(A f c + 1 ) = rank(A fc) [see e.g., Campbell and Meyer (1979) and Berman and Plemmons (1979)]. This definition is an extension of the above index of nilpotency. For, if A is nilpotent, then ind(A) is the index of nilpotency of A. If A is nonsingular, then ind(A) = 0. If A is singular but not nilpotent, we have the following decomposition. Theorem 1.2.2. Suppose A is a real n x n matrix which is singular but not nilpotent. Then there is a nonsingular matrix S such that A = s~1{o £)5' ( 1 , 2 , 1 ) where B is a non-singular, real (n — m) x (n — m) matrix and AT is a nilpotent m x m matrix, with 1 < m < n — 1. Moreover, the index of A equals the index of nilpotency of N. Proof. One such decomposition is the modified Jordan canonical form of Theo-rem 1.1.10. The matrix B contains all the modified blocks with nonzero eigenvalues, and N has all the Jordan blocks with eigenvalue zero. The result about the index of A follows because rank(A f c + l ) = rank(A fc) iff Nk = 0. • 15 Hence the index of a singular matrix A is the index of nilpotency of its nilpotent component N, which is equal to the dimension of the largest Jordan block with eigen-value zero. Note that if ind(A) = 1, then N = 0, the zero matrix. This special case will be of particular interest in the analysis of Markov processes. Following Drazin (1958), we say that a real n x n matrix AD is a Drazin generalized inverse, or simply a Drazin inverse, of the n x n matrix A if it satisfies the three following conditions: AD A = AAD (1.2.2) Am = Am+lAD for some m > 0 (1.2.3) AD = (AD)2A. (1.2.4) Drazin's definition, however, is valid in a much more general context (i.e., A may be an element of an arbitrary associative ring or semigroup, not necessarily a finite square matrix). While in a general algebraic structure there may be some element A for which no such AD exist, it is the case that every real square matrix has a Drazin inverse. We prove this as in Wilkinson (1982), after some preliminary lemmas. Lemma 1.2.3. Suppose A = S~1BS and that BD exists. Then AD = S~1BDS. (1.2.5) Proof. It is easy to verify that if B and BD satisfy (1.2.2), (1.2.3) and (1.2.4), then so do A and AD given by (1.2.5). We can see (1.2.2) as follows. AD A = S-lBDSS~1BS = S~1BDBS and AAD = S-1BSS~1BDS = S~1BBDS, so that BDB = BBD => S~lBDBS = S~1BBDS, and the result follows. The other conditions are left as an exercise to the reader. • 16 Lemma 1.2.4. If A is nonsingular, then AD = A 1 is the unique matrix that satisfies (1.2.2), (1.2.3) and (1.2.4). Proof. Because X = A-1 is the unique solution of AX = J, which is Equation (1.2.3) with m = 0. The other conditions are trivial. • Lemma 1.2.5. If A is nilpotent, then AD = 0 is the unique solution of (1.2.2), (1.2.3) and (1.2.4). Proof. It will be convenient, for later results, to use the notation N = A and T — AD. We show that X = 0 is the unique solution of Equation (1.2.4). The other conditions are trivial. We have T = T2N = T(T)N = T(T2N)N = T2(T)N'2 = ... = Tk(T)Nk = 0, where k = ind(i\T). • Lemma 1.2.6. Suppose where B is nonsingular and N is nilpotent. Then AD •(V.S) is the unique solution of (1.2.2), (1.2.3) and (1.2.4). Proof. Suppose A° = {z r ) ' and let k = ind(A). Then (1.2.3) with m = k gives (Bk 0 \ (Bk+1X 0 \ o) \ o oy' so that X = B~l is uniquely determined. Next (1.2.2) gives / / YN\ _( I BY \ \ZB TN)~\NZ NT J ' 17 so that YN = BY (1.2.6) and ZB = NZ. (1.2.7) We use (1.2.6) to show that Y = 0. We have B(YNk-1) = {BY)^-1 = (YN)Nk~l = Y(Nk) = Y{0) = 0, so that YNk-1 = 0 because B is nonsingular. Continuing this way, we get YNk~2 = 0, . . . , YN = 0 and finally, Y = 0. Similarly, we can use (1.2.7) to show that Z = 0. Hence Y and Z are uniquely determined by Equation (1.2.2). Finally, Equation (1.2.4) gives / B~l 0 \ _ (B~x 0 \ V o TJ ~ V o r 2 ^ / ' from which T = 0 is uniquely determined as in Lemma 1.2.5. • Theorem 1.2.7. Suppose A is a real n x n matrix. Then its Drazin inverse AD is unique and is given by ' A~x if A is non-singular, 0 if A is nilpotent, 5 - 1 "^^ o o) ^ otherwise, where B is the nonsingular matrix of Equation (1.2.1). Proof. Immediate from Lemmas 1.2.3, 1.2.4, 1.2.5 and 1.2.6. Uniqueness for the non-singular and nilpotent cases has been shown already. There is an apparent difficulty in the other case, because the decomposition (1.2.1) is not unique. Observe, however, that for a given similarity transformation 5, there is a one-to-one correspondance between A and C = (B £). Hence the uniqueness of CD implies that of AD, by Lemma 1.2.3. • As will be seen in Chapter HI, the Drazin inverse AD will be useful for expressing the solution of the resolvent equations which are encountered in dynamic programming. Let the matrix W be a projection onto the null space of AD. 18 Lemma 1.2.8. If A is decomposed as in Equation (1.2.1), then W = S~1(l °i)S' (1-2.8) Proof. Denote the null space of AD by xm\l(AD). By definition, x € nu\l(AD) iff ADx = 0. Also, W is a projection onto nu\l(AD) if Wx = x for x E n\i\l(AD), ADWy = 0 and WADy = 0 for an arbitrary vector y. Now write x Then ADx = 0 iff -»-,t)-d"r-5-,(J 9 that is, iff uj = 0. Hence Wx=x iff ( z r ) ( « ° 2 ) = ( £ ) ' so that y = 0 and T = I, the identity matrix. The condition A D W y = 0 for arbitrary y is equivalent to ADW = 0, that is (B-1 0 \ (X 0 \ _ / 0 0 \ \ 0 OJ \Z I) ~ \0 OJ ' which implies X = 0. Finally, the condition WM^y = 0 is equivalent to WAD = 0, which implies \z i) \ o oj ~~ o j ' so that Z = 0. Hence the result. • The following useful representation for W is immediate. Corollary 1.2.9. W = I — AAD. Proof. 19 1.3. Matrix decomposition algorithms Throughout the thesis, we will be concerned mainly with matrices A such that ind(A) = 1. Then by Theorem 1.2.2, we have In this special case, the Drazin inverse of A is called its group inverse [see Campbell and Meyer (1979) and Berman and Plemmons (1979)], and is denoted by A#. By Theorem 1.2.7, we have We will consider two algorithms for producing the group inverse of a matrix. Both methods consist essentially of two major steps: first produce a similarity transformation to decompose A as in (1.3.1) and second, invert the matrix B as in (1.3.2). They are specializations of the more general algorithm of Wilkinson (1982) to compute the Drazin inverse AD of a general matrix. Suppose ind(A) = A: > 1. Then Wilkinson's method uses k steps to decompose A as in Equation (1.2.1) and one further step to invert B. Although we will encounter some matrices of index A; > 1 in dynamic programming, we will not need Wilkinson's algorithm in all its generality, because the special structure of the problem will make it possible to break it down into a sequence of subproblems of index 1. Hence there is no need, within the scope of this thesis, for a general algorithm. Before actually discussing the algorithms, it is important to realize that, as pointed out by Wilkinson (1982), there is no need for producing the matrix A * (nor J 5 - 1 and W = I — AA#) explicitly, as we are interested in producing the vectors h = A * b and g = Wb, for a given right hand side vector b. For this purpose, the algorithms will produce a representation of B~l, A* and W in factorized form. Then a solution can be obtained by applying that sequence of factors to the right hand side 6 directly. To illustrate the idea, let us consider a nonsingular matrix A. By applying Gaussian elimination with row interchanges [see e.g., Wilkinson (1965)], we obtain an upper 20 (1.3.1) (1.3.2) triangular matrix U such that A = PLU, (1.3.3) where P is a permutation matrix and L is a unit lower triangular matrix (unit means that its diagonal entries are equal to 1). Typically, only U is obtained explicitly. L is represented by a product of elementary row operations and it is needed only to store the n(n — 1) multipliers of the Gaussian elimination. P is represented by (n— 1) integers giving the rows which were interchanged during the elimination process. Now, to solve the system of equations Ax = 6 and obtain x = A~lb, we use the factorization (1.3.3) to rewrite our system as (PLU)x = b. We can then solve the latter system in three steps, as in Figure 1.3.1. Step 1. Solve Pz = b, giving z = P~xb (permute the entries of 6). Step 2. Solve Ly — z, giving y = L~xz (apply the multipliers of the Gaussian elimination to y). Step 3. Solve Ux = y, giving x = U~xy (use back substitution, producing xn, x n _ i , . . . , i i . Figure 1.3.1. Compute x = A~xb, where A = PLU. Hence we do get without producing A U x , L 1 or P x . In fact, P and L were not produced explicitly either. A simple example will clarify this. Let us solve Ax = 6, where z = U~1L-1P~1b = A~xb 21 Using Gaussian elimination with partial pivoting [the pivot elements are chosen as big as possible, see e.g., Wilkinson (1965)], a matrix A L U and a vector NPIV are produced, such that ( 5 - 2 2 \ -0.4 -1 1 and NPIV = -0.2 CL5 1.1 J The upper part of A L U is the upper triangular matrix U itself, and the underscored entries are the multipliers that were used to produce the zeroes where they are stored. The vector NPIV tells that the rows of A were interchanged as follows: first row 1 and row 2, then row 2 and row 3 [see e.g., Johnston (1982), Chapter 2]. The solution x can then be computed as follows, as in Figure 1.3.1. Step 1. z = P~lb = Step 2. y = L~lz = Step 3. x = U~xy : Since our algorithms will use the P L U decomposition, it is worthwhile to look at the cost of producing the PLU factorization of a matrix and computing the solution for a given right hand side. We consider three types of operations. A flop, short for floating point operation, is a pair of arithmetic operations that consists of a multipli-cation/division and an addition/subtraction (for example Y27=i a*7^« c a n D e computed with n flops). A swap is the interchange of two entries of a vector or a matrix (hence interchanging two rows of an n x n matrix requires n swaps). Finally, a comparison is a relational operation (i.e. <,>). The operations counts for Gaussian elimination, as well as for solving a system of linear equations, are given in Figure 1.3.2. (The comparison operations are done when searching for the largest pivot). For n reasonably large, the terms in n 2 are negligible in the Gaussian elimination procedure. We will use those counts as a reference, to compare with the Drazin inverse algorithms. 22 Gaussian elimination of a nonsingular matrix A = PLU: | n 3 flops +n2 swaps +|n 2 comparisons. Permutations Px or P~lx\ n swaps. Triangular products Lx, L~lx, Ux, U~lx: ^n 2 flops. Solution of RHS z = U-xIrlP-xb: n 2 flops +n swaps. Figure 1.3.2. Operations counts for Gaussian elimination. Before developing the algorithms, however, it will be convenient to derive the following results. Lemma 1.3.1. Suppose A is an n x n real matrix such that rank(A) = n — m, with 1 < m < n, and A = s(Bo Bo)s" <'-3-4> for some nonsingular S, where Bu is an (n — m) x (n — m) matrix. Then Bu is nonsingular iff ind(A) = 1. Proof. Assume first that Bu is nonsingular. Then A = (SV)(B» J ) ( 5 V ) - 1 , where This implies that ind(A) = 1, because rank(A 2) = rank(i4), and the "only i f part is proved. For the "if* part, assume that ind(A) = 1. By Equation (1.3.1), there is a nonsingular matrix U and a nonsingular (n — m) x (n — m) matrix C u such that A-u(°o °o)u-'-By Equation (1.3.4), we have then 2 3 Letting V = S 1U and writing Equation (1.3.5) becomes (Bu Bl2\(X Y\_(X Y \ ( C n 0 \ \ 0 0 ) \Z T) ~ \Z TJ \ 0 Oj' Hence we get BtlX + Bl2Z = XCu (1.3.6) BuY + B12T = 0 0 = ZCu. (1.3.7) Now since Cn is nonsingular, Equation (1.3.7) implies that Z = 0. Hence for V to be nonsingular, both X and T must be nonsingular as well. But then Equation (1.3.6) implies that f?n = XCnX-1, and thus flu is nonsingular. • Theorem 1.3.2. Suppose A is an n x n matrix satisfying Equation (1.3.4) with f?n nonsingular. Then A# = s(B» 5 " ^ 5 l 2 ) 5 - 1 (1.3.8) and W = I-AA* = s(l 5 - 1 . (1.3.9) Proof. As in the first part of the proof of Lemma 1.3.1, let v - (I -B^Bii\ \ ° / )' Then so that by Theorem 1.2.7 and Equation 1.2.8, A* = S V ^ ^V-lS-l=s(^jf BnBTiBi2^s-i 24 and W = SV • Both algorithms will use Gaussian elimination to decompose a singular matrix A as in Equation (1.3.4). But rather than transforming the matrix as in Equation (1.3.1), the Drazin inverse will be represented directly using Theorem 1.3.2. Moreover, the case ind(A) > 1 will be flagged automatically when Bu is found to be singular, by verse A* of a matrix A if ind(A) = 1, and return an error condition when ind(A) > 1. A 2 x 2 example of the latter situation is when Then ind(A) = 2 and Bn = 0. Let us call slow algorithm the first, and most obvious, of our algorithms. It is shown in Figure 1.3.3. Steps 1.1 and 1.2 essentially obtain a representation of Equation (1.3.1), and steps 2.1 and 2.2 obtain a representation of (1.3.2). The reason for calling it the slow algorithm will become clear after we analyze the amount of arithmetic it requires. Before that, let us see what the algorithm does. In Step 1.1, Gaussian elimination is used to compute a P L U factorization of A such that the upper triangular matrix U has m zero rows, with rank(A) = n — m, so that Lemma 1.3.1. Hence our algorithms will produce a representation of the group in-Then in Step 1.2, the matrix B = UPL is such that Moreover, B is similar to A, because A = PLU = {PL)U(PL){PL)-1 = (PL)B(PL) - l 25 Step 1.1. Compute P, L and U such that A = PLU (using Gaussian elimination). Step 1.2. Compute B = UPL (applying P and L to the nonzero rows of U). Step 2.1. Compute R, S and T such that Bu = RST (using Gaussian elimination). (Note: R is permutation, S lower, T upper triangular). Step 2.2. If Bu is singular then "Error: ind(A) > 1" else stop (we have all the necessary information). Figure 1.3.3. The slow decomposition algorithm. Now let X = UP, and write That B has the required form can be seen directly: 0 0 The above argument is well known, and is used by Wilkinson (1982) in particular. It is important to note that while Uu is likely to be singular in general, the matrix Bu is always nonsingular, provided that ind(A) = 1, by Lemma 1.3.1. Hence Step 2.1 simply computes the PLU decomposition of the nonsingular, (n — m) x (n — m) matrix Bu, giving a permutation matrix R, a unit lower triangular matrix S and a nonsingular upper triangular matrix T. For example, let us apply the algorithm to a 3 x 3 matrix A such that ind(A) = 1: ( 1 - l °\ A = 0 0 1 1 . \-l 1 Oj • 26 Using Gaussian elimination, we get A = PLU, where P = I, the identity matrix, 1 0 0 \ (1-10 1=1 0 1 0 ) and P = I 0 0 1 - 1 0 1/ \ 0 0 0 Observe that is singular. Then we produce B = UPL = UL: Finally, using Gaussian elimination we get B = RST, where R = I, Hence B - ^ T - i S - i R - ^ ^ This is all we need in order to compute B&, using Theorem 1.3.2, and A* = {PL)B*(PL)~l. We get 0 -1 l \ / 1 —1 1 B* = | -1 -1 2 and A* = I 1 -1 2 0 0 OJ \-l 1 -1 We should keep in mind, however, that in general we will not be interested in ob-taining the numerical values of B& and The factorization will be applied directly to the right hand side b to compute a solution, using the algorithms of the next sec-tion. The computational effort required by our decomposition algorithm is shown in Figure 1.3.4. 27 Step 1.1. | (n 3 - m3) flops +|n(n - m) swaps +2Ln2 comparisons Step 1.2. |(n — m)n2 flops +(n — m)n swaps Step 2.1. |(n - m)3 flops +|(n — m)2 swaps +|(n — m)2 comparisons Figure 1.3.4. Operations connt for the slow decomposition algorithm. The count for Step 1.1 is for the worst case, that is, when the matrix Un happens to be nonsingular. This is the case when all the zero pivots are obtained in the last m stages of the Gaussian elimination process. Then the whole process is the same as for a nonsingular matrix, except that the last m stages do not need to be done. This takes | m 3 flops off the | n 3 flops required for a nonsingular matrix. Also, Step 1.2 can be implemented such that P and L are applied to each one of the (n — m) rows of U in turn. That is, the elements of row 1 of U are permuted (by P), and the multipliers are applied (i.e., L), giving the first row of B. Then the process is repeated for row 2, and so forth. Finally, Step 2.1 is just the regular Gaussian elimination of a nonsingular (n — m) x (n — m) matrix, and Step 2.2 does not require any arithmetic. Now let us consider the case when m is much smaller than n. This is the case when m = 0 (nonsingular A) and m = 1 (unichain Markov process), for instance, and also whenever there are several states per recurrent subchain, in a dynamic programming problem. With m = 0, the total number of flops is £n 3 + | n 3 + | n 3 = | n 3 flops. This is 3.5 times the number of flops required for applying Gaussian elimination to a nonsingular matrix (i.e., | n 3 flops). This fact justifies the name slow algorithm that we have been using. Our second algorithm, which we call the fast algorithm, will use only | n 3 flops when m is small, the same as when applying Gaussian elimination to a nonsingular matrix. It is based on a modified Gaussian elimination procedure in which not only the rows, but also the columns are interchanged. Hence for a singular matrix A such 28 that rank(A) = n — m, the modified procedure obtains the four matrices P, L, U and Q such that A = PLUQ, where P and Q are permutation matrices, with Lu a unit lower triangular matrix and U\\ a nonsingular upper triangular matrix. Specifically, the Gaussian elimination procedure is modified as follows. Suppose the first zero column is encountered after k elimination steps have been performed. For example, with n = 5 and k = 2, we have a matrix A^ with the following pattern: A<2> = fx x 0 x 0 0 0 0 VO 0 x X 0 0 0 x X X X X X X X x) where the nonzero entries are denoted by x. Now let Q^1) be the permutation matrix that exchanges the columns k + 1 and n. We get A ^ Q M = 0 0 0 VO x x 0 0 0 x x x x x x x x x x X 0 0 OJ and the elimination procedure is resumed. Suppose that the second zero column is encountered after the £-th elimination step. Then exchanges the columns 1+1 and n — 1. Similarly, will exchange the t-th zero column with column n — t + 1. At the end, we have A = PLUQ, where Q = Q M Q W ...Q^mK With n = 5 and m = 2, we have U = fx 0 0 0 VO x X 0 0 0 x X X 0 0 X X X 0 0 x\ X X 0 OJ as required. 29 To illustrate the idea, let us apply modified Gaussian elimination to the 3x3 matrix of the last example. We get A = PLUQ, where P = I, (1 0 - l \ / 1 0 o\ 0 1 0 , L=I 0 1 0 0 0 0 J \-l 0 1J and Q interchanges the columns 2 and 3 of A. Here Un = I is indeed nonsingular. This modified Gaussian elimination procedure PLUQ does exactly the same number of flops as the original P L U procedure, and requires only an extra mn swaps to exchange the columns. But the fact that Un is nonsingular will allow us to find a representation for Bnl which requires the PLU factorization of a small m x m matrix, instead of the large (n — m) x (n — m) matrix Bn itself. The idea is based on the following result of Faddeeva (1959), for inverting a matrix by partitioning. The proof is included for the sake of completeness. Lemma 1.3.3. Let X be a nonsingular n x n matrix and Y its inverse, such that X = ( 1 1 y _ / * n Yi2\ \X2i X22 J \Y-2i Y22 J Suppose also that X\ \ is nonsingular. Then Y22 is also nonsingular and %n ~ * u — Yi2Y2~£Y2\. (1.3.11) Proof. Since X\ 1 is nonsingular, then X is nonsingular iff the matrix / I 0\fXn Xl2\ = fXn Xia \ \—X2iXll IJ \X2i X22/ \ 0 X22 — X2\Xlx X\2) is nonsingular too, which is the case iff (X22 — X2\X^i X12) is nonsingular. Now writing the condition XY = 7, we get XuYn+XnYn^I (1.3.12) ^11^12 + *12*22 = 0 (1.3.13) X2\Yn + -X22 J21 = 0 •^21^12 + X22Y22 — I- (1.3.14) 30 Using (1.3.13), we have Y\2 = —XuX\i,Y22, so that (1.3.14) becomes {X22 — X2iX^_"_l X\2)Y22 = I, which implies that F22 is nonsingular. Using (1.3.13) again, we have -X12 = — XuY\2Y22l. Finally, Equation (1.3.11) follows by substituting X\2 into Equation (1.3.12). • For our fast algorithm, we need to construct a matrix X such that Xu = Bu and for which the inverse Y is easily obtained. Then B^1 can be represented using Equation (1.3.11). We proceed as follows. Suppose that A = PLUQ, and let C = QPL. Then write We want B = (PL)-XA(PL) = UQPL = UC, so that r> _ (Un U\2\ I Cu C i 2 \ _ / Bu Bi2\ V 0 0 J\c2l c22J - \ 0 0 J as in Equation (1.3.4), with S = PL. Now define D such that Then D is nonsingular, because Uu is nonsingular in the PLUQ decomposition, and we have We finally obtain our matrix X = DC, and it is easy to verify that X = (^n • * 1 2 j = ( Bu Bi2\ \X2i X22 J \C2i C22 J ' Now its inverse Y = X~l is obtained directly, because Y = C~lD~l = I - 1 P - 1 Q - 1 I > - 1 . (1.3.15) Theorem 1.3.4. Both Bu and Y22 are nonsingular iff ind(A) = 1. Moreover, Bu1 — Yu ~ Yi2Y2~2^Y2i. 31 Proof. Immediate from Lemmas 1.3.1 and 1.3.3. • We now have all the mathematical tools necessary for constructing our fast algo-rithm. Still, if we implement Equation (1.3.15) to produce the whole matrix Y, it will cost | ( n — m)2n flops to compute /_?""*1 plus | n 3 to do the product by L~l. The total effort for such an algorithm would then be | n 3 flops (including the initial PLUQ decomposition and ignoring the terms of degree less than 3 in n). This is even worse than the long algorithm! The key here is that we need only to produce Y ^ , which can be done as follows: (y«)= L~lp~lQ~l (~^ T Ui2) - (1.3.16) The computations are performed as follows. Back substitution is applied to each of the m columns of t7 1 2 in turn. Then the permutation matrices Q~l and P~l are applied. Finally, the multipliers of the Gaussian elimination are applied. We get even more than we need: we have to compute Y12, in addition to F 2 2 , because the matrix L~l is represented in product form (so that L2\ is not known). Even so, the total cost of computing (1.3.16) is \m[n — m)2 + | m n 2 flops, which is negligible when m is small. Hence for m small, the fast algorithm costs only | n 3 flops, which is the cost of the initial PLUQ decomposition. The fast algorithm is shown in Figure 1.3.5. Let us continue our 3 x 3 example. We have (1 0 - l \ / l 0 1 0 1 0 1 and D'1 = I 0 1 0 | , 0 0 1 / \0 0 1 so that 1 0 1 Y = L~lP-1Q-lD~1 = ( 0 0 1 1 1 1 Hence we have y i l = ( o J))' y i 2 = ( i ) ' Y 2 1 = { 1 1 ) a n d y 2 2 = ( l ) 32 Step 1.1. Compute P, L, U and Q such that A = PLUQ (using modified Gaussian elimination, with U\\ nonsingular). Step 1.2. Compute Y22 as in Equation (1.3.16) Step 2.1. Compute R, S and T such that F 2 2 = RST (using Gaussian elimination). (Note: R is permutation, S lower, T upper triangular). Step 2.2. If y 2 2 is singular then "Error: 'md(A) > I s else stop (we have all the necessary information). Figure 1.3.5. The fast decomposition algorithm. Moreover, ( 1 -1 0 -1 0 1 0 0 0 and we have Bnl = Yu — Y\iY_\xY<2\. Indeed, Yxx-YnY£Y_x = ( j o ) " ( l l ) = ( - l - l ) = B " ' The algorithms of the next section, however, will not produce Y\\ and Yu, applying rather the product representation directly to the right hand side 6. The organization of the computations is worth some discussion. It is possible to perform all the computations of the fast algorithm using one real array A L U of dimension n x n , two integer arrays PIVA and PIVB of dimension n and one real array WK of dimension n. Figure 1.3.6 shows how the various submatrices can be stored. The PLUQ decomposition of Step 1.1 can be organized to overwrite the original matrix A as in the nonsingular case, and we get an m x m lower right submatrix of zeroes. We can also implement Step 1.2 so that it produces one column of Y at a time, storing it in the real array W K (which is not shown in the figure). The (n — m) 33 A L U PIVA PIVB n — m U 12 (unused) m '21 R n — m m Step 1.1. A = PLUQ, lower right corner = 0. Step 1.2. Compute Y22 and store it in the lower right corner. Step 2.1. Y22 = RST. Figure 1.3.6. Storage considerations for the fast algorithm. components that belong to Y12 are discarded, and we need only to store the m entries corresponding to 122- Those are conveniently stored in the zero lower right corner of A L U . As we will see in the next section, there is no loss of efficiency in dropping the submatrix Y12, as it will be just as efficient to apply it to a vector using the product representation. Finally, the PLU decomposition of Y22 can be done so that S and T overwrite Y22. The permutation matrices P, Q and R are represented by the integer vectors PIVA and PIVB. The slow algorithm, on the other hand, cannot be organized as efficiently. This is because the submatrices Bu and B12 have to be produced explicitly. While B12 may very well overwrite Ui2, it is not possible for B u to overwrite Lu and Uu. This 34 is because while Un and Un will not be required any more after B is produced, the matrix L will still have to be applied when computing a solution, as will be seen in the next section. Hence not only is the long algorithm inefficient in terms of computational effort, but it also needs more memory than the fast algorithm. One aspect of both algorithms which we have not considered yet is the effect of rounding errors on the Gaussian elimination of a singular matrix. The problem is that, due to rounding errors, it is likely that the entries which would be set to exactly zero when using exact arithmetic will in fact be set to some small, nonzero numbers. The problem is to select a tolerance level, say e, such that an entry will be considered zero if its absolute value is less than €. Some theoretical bounds are known when the elimination is done using the QR method, or singular value decomposition [see Stewart (1984)]. This device is not very reliable, however, when applying Gaussian elimination to a general singular matrix A. In fact, Gaussian elimination with partial pivoting is considered to be numerically unstable when A is singular [see Golub and Van Loan (1983)]. The question, then, is why did we develop our algorithms using Gaussian elimination. The answer is three-fold. First, Gaussian elimination is mathematically well defined and is known to the expected audience of this thesis. Second, the formulation of our algorithm in terms of QR decomposition or singular value decomposition is straightfor-ward. The QR version of the fast algorithm requires | n 3 flops, twice that of Gaussian elimination. Hence our results about the operations count need only be multiplied by a factor of 2 if the QR method is used. Third, the intended application of our algorithms is with a matrix A = I—P, where P is stochastic (see Chapter H). In this case, the rank of A is known in advance of the elimination. Also, Gaussian elimination is known to be stable even without row interchanges [see Harrod and Plemmons (1984) and Funderlic and Plemmons (1981)]. (Of course, row interchanges are necessary for our application, so that the zeroes be at the bottom). 35 1.4. Solution of singular systems of equations Consider a n n x n matrix A of index 1 and an arbitrary vector 6, and let c = 56, where 5 is the transformation matrix of Equation (1.3.1). We are interested in the system of equations Ax = b. Writing y = 5s, we have Hence the system has a solution iff c 2 = 0. Moreover, if the solution exists, it has yi = J 9 - 1 C I , and y 2 is arbitrary. For an arbitrary vector 6, compatible with A or not, we define the gain as the vector g such that by Equation (1.2.8). Hence a vector 6 is compatible with A iff g = 0. Moreover, we define the bias as the vector h such that by Equation (1.3.2). In this context, the gain is the projection of the right-hand side onto the null space of A, while the bias is the unique solution which belongs to the range of A. That is, of all the possible solutions, h is the only one such that Wh = 0. This property of the Drazin inverse solution is crucial for the dynamic programming application, for which other generalized inverses would not be appropriate. The terms gain and bias are well known in dynamic programming and their in-terpretation in this context will be considered in Chapter 3. We will now construct two algorithms for computing the gain and the bias for a given vector 6, using the decompositions of the matrix A obtained with the algorithms of the previous section. where 36 These decompositions have the form of Equation (1.3.4), so that A* and W are given by (1.3.8) and (1.3.9), respectively. Hence we have and = s ( B T i 1 ( ' i + BTtBi*»)>jt (1.4.2) where c = S~lb. In both decomposition algorithms, the transformation matrix is S = PL, with P and L obtained at Step 1.1 by Gaussian elimination. The corresponding solution algorithms will have three major steps: transformation of the right hand side, computation of the transformed solution vectors, and finally, back transformation of the solution vectors. The slow solution algorithm is described in Figure 1.4.1. Step 1.1. Compute c = L~lP-lb. Step 2.1. Compute U\ = B^i-Step 2.2. Compute yx =T-1S-lR-lul. Step 2.3. Compute zx = T~xS-lR~1(ei + yt). Step 3.1. Compute g = PL(~^). Step 3.2. Compute h = PL(Z^). Figure 1.4.1. The slow solution algorithm That it effectively gives the gain and bias vectors as in (1.4.1) and (1.4.2) follows from the fact that J3 U = RST, where R, S and T are the matrices produced by using Gaussian elimination, at Step 2.1 of the long decomposition algorithm. The total 37 number of floating point operations required to compute the gain and the bias is then ^ n 2 + +m(n - m) + 2(n - m) 2 + n 2 . Neglecting the terms of degree less than 2 in n, we get 3.5n2 flops when m is small. The standard solution of a nonsingular system, with g = 0 and h — A~1b, would cost only n 2 flops. It is to be emphasized, however, that it is the decomposition which has the dominant cost, with 3.5n3 flops for the slow algorithm. To illustrate the solution process, let us consider again the matrix The matrices P, R, S, T and B12 were obtained by the slow decomposition algorithm. Applying the slow solution algorithm, we obtain: Let us now proceed to the construction of a solution algorithm that will use the output of the fast decomposition algorithm. This fast solution algorithm will work much like the algorithm of Figure 1.4.1, except that we do not have the matrix Bi2. Moreover, the fast decomposition is based on the idea that A = 0 0 1 , with 6 = Y11 — Yi2Y22 Y2i, 38 but we do not have Yn, F i 2 and F 2 i [although F l 2 had been produced in Equa-tion (1.3.16) and then discarded to save memory storage]. We do have F 2 2 = RST, Where R, S and T were obtained by Gaussian elimination at Step 2.1 of the fast decomposition algorithm. But we still need to derive a way to express Bi2, Yn, F i 2 and F 2 i in terms of P, L, U and Q, which were obtained at Step 1.1. We do this as follows. First, recall that B = UQPL, with -tt' °o)' "ft* Vo)^-{t !)• We do not know Ln and L\2 explicitly, but we do not need them either. It is the structure of L which is important here, and we have Hence the product i ? i 2 x 2 can be done efficiently by first permuting the vector and then multiplying it by Un and U12. Next, the product by Y\2 can be done using Equation (1.3.16). We get y± = F 1 2 x 2 by computing We also get y 2 , and since we do not need it, we just discard it. Similarly, we apply Yn and F 2 i to a vector as follows, using (1.3.15) to get The fact that Yn and F 2 i must be applied simultaneously to the same vector X i is not a problem. It is indeed exactly what we need because by Theorem 1.3.4, we have &iixi — (Yn ~ ^ i 2 F 2 2 1 F 2 i ) x i = ( F 1 1 x 1 ) - F 1 2 F 2 2 1 ( F 2 1 x 1 ) . 39 Step 1. Compute Zi = Y\xX\ and 23 = Y21X1 using Equation (1.4.5). Step 2. Compute v2 = T~lS~1R~1z2. Step 3. Compute tui = Yi 2 t ; 2 using Equation (1.4.4). Step 4. Compute U i = z\ — xo\. Figure 1.4.2. Algorithm to compute M\ = B\~_lx\. Step 1.1. Compute c = L~lP~1b. Step 2.1. Compute ui = B\_C2 using Equation (1.4.3). Step 2.2. Compute y\ = Bnlui as in Figure 1.4.2. Step 2.3. Compute z\ = B\~_l[c\ + J/i) as in Figure 1.4.2. Step 3.1. Compute g = PL("»»). Step 3.2. Compute A = P X ^ 1 ) . Figure 1.4.3. The fast solution algorithm. This leads to the algorithm of Figure 1.4.2 for computing u t = Bn1xi. The matrices R, S and T of Step 2 were obtained by Gaussian elimination of y 2 2 , at Step 2.1 of the fast decomposition algorithm, in Figure 1.3.5. The fast solution algorithm is shown in Figure 1.4.3. Observe that the algorithm of Figure 1.4.2 for applying Bnl to a vector is used twice as a subroutine by the fast solution algorithm. The total number of flops used for this product is [I n2 + 1 ( n _ m ) 2 ] + m 2 + J l n 3 + 1 ( n _ m ) 2 + m ( n _ m ) j = n 3 + (n — m) 2 + mn. 40 Hence the total number of flops required by the fast solution algorithm is 2mn + 1.5m2. When m = 0, the fast solution algorithm takes then 6n 2 flops, which is more work than the slow solution algorithm. We must keep in mind, however, that the fast decomposition algorithm takes only | n 3 flops. Hence, to compute the gain and bias of a singular system by the fast algorithm takes about the same amount of flops as for solving a nonsingular system with 6 right hand sides. It should be clear from Figure 1.4.3 that the only difference between our two solution algorithms is the manner in which the products by 2?i2 and J3f/ are implemented. In the slow algorithm, Bu is used explicitly and the product by Bnl is produced from the PLU decomposition of Bn (i.e., R, S and T). In the fast algorithm, however, the product by Bi2 is done by Equation (1.4.3), and the product by B^1 is done by the algorithm of Figure 1.4.2. Let us apply those products to our numerical example. Recall that P = I and Q interchanges columns (or rows) 2 and 3. Also, Y22 = (1) and R = S = T = (1). Let us do Step 2.1 of the fast solution algorithm. We have c 2 = (4), so that (1 0 - : 0 1 0 0 0 0 L = which is the same value as with the long algorithm. Now we do Step 2.2 to get y\ = 41 *1 t/2 *2 = (4) = r- 15- lij- 1^ 3 = (4) (!) This value of y x is again the same as that we obtained with the long algorithm. Finally, Step 2.3 would work the same way, giving Z\ as before. 1.5. Other algorithms for Drazin inversion Both the long and the fast algorithms of the previous sections are specializations of the general decomposition algorithm of Wilkinson (1982). We implemented them using Gaussian elimination, but we could have used singular value decomposition, or the QR decomposition [see e.g., Stewart (1984)], just as well. In fact, the singular value decomposition version of Wilkinson's method is described in Campbell and Meyer (1979), Algorithm 12.5.1. We used Gaussian elimination because it is more economical than both other decompositions [see e.g., Johnston (1982)], and it is stable when A = I — P, with P stochastic. The shuffle algorithm of Anstreicher and Rothblum (1985) uses a different approach. Instead of producing a factorization of the Drazin inverse in terms of elementary ma-trices, it produces a full matrix from which the Drazin inverse is obtained. More 42 specifically, for a matrix A of index k, the algorithm performs k steps in which Gaus-sian elimination and a "shuffle" operation are used, in which a matrix B is produced, such that AD = Bk+1Ak, at a cost of n 3 flops (if m is small). This is three times the amount of work required for applying Gaussian elimination to a nonsingular matrix. Recall that our slow algorithm is just slightly slower, with a ratio of 3.5, and that our fast algorithm is three times as fast (with a ratio of 1). In the special case of interest, with ind(A) = 1, the gain and bias vectors can be computed as follows, for a given right hand side 6: h = A*b = B2Ab g = Wb = (I- AA*)b = b-Ah. Hence the bias can be computed with 3n 2 flops, and the gain with another n 2 flops, for a total of 4n 2 flops. Recall that the slow solution algorithm of Section 1.4 takes only 3n 2 flops, while the fast algorithm takes 6n 2 flops. Other methods are based on sequences of matrix multiplications and are not compu-tationally attractive. See Campbell and Meyer (1979), Greville (1973), Hartwig (1976) and Anstreicher and Rothblum (1985). 43 Chapter II Nonnegative matrix theory In this chapter, we show how the Perron-Frobenius theory of nonnegative matrices can be used to deduce the eigenvalue structure of the transition matrix of a Markov chain. The matrix decomposition so obtained leads to a simple characterization of the limiting and fundamental matrices, using the Drazin inverse notion of Chapter I. Further, the Jordan canonical form is used to derive the geometric convergence of the power method. First, the ideas of reducibility and periodicity of a matrix are reviewed, in Sec-tion 2.1, and the correspondence between matrices and directed graphs is established. Next, the theory of nonnegative matrices is reviewed in Section 2.2. Our approach is inspired from Varga (1962), but the derivations we provide are consistent with our exposition of the Jordan canonical form in Chapter I. Lemmas 2.2.1, 2.2.2, 2.2.3 and 2.2.8 are given without proof. Although elementary, their derivation is not directly relevant to the material of the thesis. In Section 2.3, the matrix decomposition approach is used in a new derivation of the properties of the limiting matrix, the deviation matrix and the fundamental matrix of a discrete time Markov chain. We show that the probabilistic interpretation of these matrices, using Cesaro limits, requires the boundedness of the powers of the transition matrix. This property is not required for defining the matrices in terms of the Drazin inverse, though. Finally, it is shown, in Section 2.4, that when the transition matrix is aperiodic, its 44 powers converge to the limiting matrix, at a geometric rate. This result is a specializa-tion of a result of Varga (1962). 2.1. Matrices and directed graphs It is often convenient to associate the entries of a complex n x n matrix A with a directed graph G(A) with n nodes, labelled 1,..., n, and with a directed arc for every nonzero entry of A. Hence G(A) has an arc from node t to node j, labelled a,y, iff a,y ^ 0, for * = 1,..., n and j = 1,..., n. For example, the matrix ( 1 0 1 0 2 1 0.2 0.2 0 has the directed graph of Figure 2.1.1. 1 2 1 1 2 ' J 1 " 0.2 L 3 0.2 1 Figure 2.1.1. The directed graph G(A) In the context of queueing and dynamic programming, the directed graph naturally suggests that the nodes represent each a state of a stochastic system, while the arcs represent a possible transition from one state to another. Hence the entries of A are transition rates or probabilities (depending on the specific application). We have already used the directed graph representation of a matrix in Chapter I (se Figure 1.1.1). The first well known result of interest is as follows. 45 Lemma 2.1.1. Let A and B be two complex n x n matrices. Then there is a permu-tation matrix T such that B = TAT-1 iff G(B) is the same labelled graph as G(A), with the node labels permuted. Let N = {1, . . . , n} be the set of all the nodes of a directed graph G. This set N can be partitioned into m equivalence classes 5*, k = 1,..., m, with 1 < m < n, such that two nodes i, j & N are equivalent iff there is a path from i to j and a path from j to t. The equivalence classes Sk, k = l , . . . , m , are called strongly connected components [some authors use this term for the subgraphs (5*, Ek) where Ek is the set of all arcs between the nodes in Sk]. If a graph G has exactly one strongly connected component (i.e., m = 1), then G is said to be strongly connected. There is a similar concept in matrix theory. A complex n x n matrix A is said to be irreducible if there is no permutation matrix T such that A = TBT~l with where Bn and B22 are square submatrices. If such a matrix T exists, then A is said to be reducible. Lemma 2.1.2. A matrix A is irreducible iff its directed graph G(A) is strongly con-nected. Proof. If A is not irreducible then let Si and S2 be the subsets of N corresponding to B n and B22 of Equation (2.1.1). Let * G Sx and j € S2. Since B2i = 0, there is no path going from ; to * and hence G(A) is not strongly connected. To show the converse, suppose there is no path from j to i. Let S2 be the strongly connected component of G(A) that contains t, and let Si = N — S2. Then a matrix B satisfying (2.1.1) can be constructed, and hence A is not irreducible. • Suppose a graph G has m > 1 strongly connected components. Suppose also that there is an arc from some state » G Sk to some state j G N — Sk. Then Sk is a transient 46 Proof. Left to the reader. • (2.1.1) class and all the nodes in Sk are said to be transient. If no such outgoing arcs originate from Sk, then Sk is a recurrent class and all the nodes in Sk are recurrent. Theorem 2.1.3. Suppose A is a complex n x n matrix. Suppose also that G(A) has I transient classes and m recurrent classes. Then there is a permutation matrix T such that A = T ( B 0 £ ) r - , (2.1.2) with B = I Bn B\2 0 B22 V o B x t \ Bit Bu and D = (Dxx 0 0 D22 \ 0 0 \ 0 (2.1.3) where Bn,...,Bu and Dn,...,Dmm are irreducible, square submatrices. Proof. Each connected component of (7(A) has an associated submatrix. The / tran-sient classes correspond to the submatrices Bn,...,Bu, and the m recurrent classes correspond to Dn,...,Dmm. The submatrix D is block diagonal because there are no outgoing arcs from the recurrent classes. The submatrix B is block upper triangular because if both Bij and Bji were nonzero, with 1 < i < ji < £, then all the nodes corre-sponding to the submatrices Bu and B3j would belong to the same strongly connected component. Hence G(A) would have only t — 1 transient classes. This implies that at least one block must be zero, and we can sort the components so that Bij = 0. • This parsing of a graph into strongly connected components can be done with a number of arithmetic operations proportional to the number of arcs in the graph, that is, the number of nonzero entries of the associated matrix. One such algorithm uses the method of depth-first search [see e.g., Aho et al. (1974)]. As we will see in the next section, Theorem 2.1.3 has an important consequence on the eigenvalue structure of a nonnegative matrix. Also of importance is the periodic (or cyclic) structure of a graph and hence, of a matrix. Suppose A is an irreducible, complex n x n matrix with directed graph G(A). 47 Then G(A), and A itself, is said to be periodic with period k, with 2 < k < n, if the set N of its nodes can be partitioned into k subsets Sit...,Sk such that all arcs go from S{ to 5i+i, for »' = 1,..., k — 1, or from Sk to Si. That is, suppose there is an arc from node r to node s. Then either r € and s € , if 1 < » < k — 1, or else r _ Sk and a € <Si. The subsets Si,...,Sk are called periodic classes. If no such partition of N exists, then G(A) is said to be aperiodic. Lemma 2.1.4. An irreducible, complex n x n matrix is periodic with period k iff A; > 2 is the largest, integer for which there is a permutation matrix T such that A = TBT-1, with 0 . . . 0 \ B23 '•. : ( o 0 B = B12 0 0 VBfc! 0 0 0 0 0 Bk-i,k 0 (2.1.4) Proof. Because by definition, the directed graph G(A) has the structure shown in Figure 2.1.2 when A is periodic with period k. In this representation, a "node" Si denotes the set of nodes in 5,-, and an "arc" i?,y denotes the set of arcs going from nodes in 5,- to nodes in S3: There are no arcs going from nodes in Si to other nodes in the same set Si. • Another definition of the period A; of a graph is as the greatest common divisor of the lengths of all the paths going from any node to itself. We feel that the above 48 definition, in terms of periodic classes, is more immediate. Moreover, it will be used directly in the queueing application of Chapter VI. The terminology we use is worth some comments. We refer to Aho et al. (1974) for the strongly connected components of a graph and their computation. Varga (1962) defined a matrix as either irreducible or reducible, and observed that an irreducible matrix has a strongly connected graph. When the matrix represents the transition probabilities of a Markov chain, Karlin (1969) used the terms transient and recurrent to describe the irreducible subchains. Kemeny and Snell (1960) also used the term ergodic to denote a recurrent set. Karlin (1969) also used the terms periodic and aperiodic for an irreducible Markov chain. Kemeny and Snell (1960) used the terms cyclic and reguiar, respectively. The terms cyclic and primitive were used by Varga (1962) when the matrix is nonnegative. The period is then called the cyclic index of a periodic matrix. For a complex, not necessarily nonnegative matrix, Varga (1962) also used the term weakly cyclic. 2.2. Eigenvalue structure of a stochastic matr ix A real n x n matrix A is said to be nonnegative if atJ- > 0, and positive if a,y > 0, for i,j = 1,..., n. A nonnegative n x n matrix P is said to be stochastic if £^y=i Pij = 1, for » = l , . . . , n . Moreover, a nonnegative, irreducible n x n matrix Q is said to be substochastic if Yl^iQij < 1» for i = l , . . . , n , with strict inequality for at least one row. Suppose a stochastic matrix P is reducible, with £ transient classes and m recurrent classes. Then the transient submatrices Bu,..., Bu of Theorem 2.1.3 are substochas-tic, while the recurrent submatrices Du,...,Dmm are stochastic. We will derive some well known results about the eigenvalue structure of an ir-reducible, nonnegative matrix, and use them to obtain the eigenvalue structure of a general stochastic matrix. Recall that the spectral radius of a matrix A, denoted o~(A), 49 is defined as <r(A) = |A|, where A is a dominant eigenvalue of A. That is, if 7 is an eigenvalue of A, then |7| < <r{A). We begin by discussing a few basic results from Varga (1962), concerning the Perron-Frobenius theory of irreducible nonnegative matrices. Suppose A is an irre-ducible, nonnegative n x n matrix and let x > 0 be a nonzero vector. We define the quantity n rx = min {__] (2.2.1) t:x,->0 ~ . 3=1 Then rx = sup{p : Ax > px,p > 0}, and we consider the quantity r = sup{rx : x > 0, x ^ 0}. (2.2.2) Lemma 2.2.1. Suppose A is an irreducible, nonnegative n x n matrix. Then r > 0, and there is a positive eigenvector x such that Ax = rx. Lemma 2.2.2. Suppose A is an irreducible, nonnegative n x n matrix. Then <r(A) = r, i.e., if A is an eigenvalue of A, then |A| < r. Lemma 2.2.3. Suppose A is an irreducible, nonnegative n x n matrix. If x and y are such that Ax — rx and Ay = ry, then x and y are linearly dependent. Theorem 2.2.4. Suppose A is an irreducible, nonnegative n x n matrix. Then there is a nonsingular matrix S such that where B is an (n — 1) x (n — 1) matrix such that o~(B) < r, and r is not an eigenvalue of B. Proof. Such matrices B and S are defined by Theorem 1.1.10. That o-(B) < r follows from Lemma 2.2.2, and Lemma 2.2.3 implies that there is exactly one Jordan block corresponding to the eigenvalue r. We have to show that it has dimension 1. We do this as in Section 1.1, by showing that Equation (1.1.6) has no solution. Suppose x > 0 is the eigenvector of A of Lemma 2.2.1. We need to show that (rI—A)y = 50 x has no solution y. Let w > 0 be a left-hand eigenvector of A, such that w(rl—A) = 0. Such a vector exists by Lemma 2.2.1, because w' is an eigenvector of A', which is also nonnegative. Hence we have 0 = w(rl - A)y = tux > 0, so that the system is not compatible. • Lemma 2.2.5. Suppose P is an irreducible stochastic nxn matrix. Then r = 1. Proof. That 1 is an eigenvalue of P follows from the fact that Pe = c, where e is a vector with all entries equal to 1. Now suppose A is any eigenvalue of P, and let x be an eigenvector associated with A. Let k be such that |x*| > |x,|, for t = 1,..., n. Then Px = Az implies that n n |A| • \xk\ < ]T>fcy|xy| < (X>y)l**l = |z f c | , y=i y=i so that |A| < 1. Hence r = a(P) = 1. • Lemma 2.2.6. Suppose Q is an irreducible substochastic nxn matrix. Then r < 1. Proof. Suppose A is an eigenvalue of Q, with eigenvector z. Suppose also that the rows and columns of Q have been permuted in such a way that | z i | = . . . = |z f c | > \xk+i\ > . . . > | x n | . If k < n, there is at least one i < k for which g,y > 0 for some j > k, because Q is irreducible. Then Qx = Az implies n k n |A| • |z,-| < ^2qi3\xj\ = ^2qij\xi\ + ^2 qij\x3\ 3 = 1 3 = 1 3=k + l n < E*7)i*.-i ^ iz«i> 3 = 1 so that |A| < 1. If k = n, there is at least one t < n such that ]Cy=i Hj < 1- Then Qx = Az implies n n |A| • |x,| < 5>y|zy| = Efty)|*.i < 3 = 1 3 = 1 so that |A| < 1 as well. • 51 Theorem 2.2.7. Suppose P is a stochastic n x n matrix with m recurrent classes. Then there is a nonsingular n x n matrix S such that where I is the m x m identity matrix and Q is an (n — m) x (n — m) matrix such that ^(Q) < 1 and 1 is not an eigenvalue of Q. Proof. By Theorem 2.1.3, every connected class corresponds to an irreducible sub-matrix. The eigenvalues of the transient classes all have modulus less than 1, by Lemma 2.2.6, and each of the m recurrent classes has exactly one Jordan block J\ (1), by Theorem 2.2.4 and Lemma 2.2.5. Hence P is similar to the matrix of Equation (2.2.4). Finally, we will need some further results which refine Theorems 2.2.4 and 2.2.7. Lemma 2.2.8. Suppose A is an aperiodic, irreducible, nonnegative n x n matrix, and suppose r = a(A). If A is an eigenvalue of A then either A = r or |A| < r. Theorem 2.2.9. Suppose A is a periodic, irreducible nonnegative n x n matrix with period k > 2, and suppose r = a(A). Then A has exactly k eigenvalues of modulus r, such that Ay = rexp(iic2j/k)t for j = 1 , e a c h with multiplicity 1. Moreover, if A is any other eigenvalue of A, then |A| < 1. Proof. Let T be the permutation matrix of Lemma 2.1.4, and let A = TBT~l as in Equation (2.1.4). By Lemma 2.2.1, r is an eigenvalue of B and there is a positive vector z such that Bx = rx. Partition x as in Equation (2.1.4), so that It is straightforward to verify that Ay = rexp(t7r2y/A:) is an eigenvalue of B with (2.2.4) • eigenvector / exp(ur2j/k)xi 2 ( j ) _ exp{iir4j/k)x2 \exp(iir2kj / k)xk 52 for j = I,...,k. We will see that each of these k eigenvalues Ay has multiplicity 1, as follows. Con-sider the matrix B of Equation (2.1.4). Then Bk = diag(Cy), with = 1,..., k, where C\ = B12B23•..Bk\ . . . Cfc = Bk\B\2 • • • Bk-i,k-Now each Cy, for j' — l,...,k, is an aperiodic, irreducible nonnegative matrix. By Lemma 2.2.8, A is an eigenvalue of Cy implies that A = <x(Cy) or else |A| < <r(Cy). Now if 7 is an eigenvalue of B, then 7* is an eigenvalue of Bk. This implies that Bk must have r* as an eigenvalue, with multiplicity at least k, and that |A| < r*. We conclude that the multiplicity of r* is exactly k, because each of the k blocks Cy has exactly one eigenvalue A = r*. • Corollary 2.2.10. Suppose P is a stochastic matrix such that every recurrent class is aperiodic. Then the matrix Q of Theorem 2.2.7 has cr(Q) < 1. 2.3. L imi t ing and fundamental matrices In the context of Markov chains and Markov decision processes, Theorem 2.2.7 has some important consequences. Let P be the transition matrix of a Markov chain with n states. Then P is a stochastic nxn matrix, hence it satisfies Equation (2.2.4). We use this fact to provide an alternate proof of the following result of Kemeny and Snell (1960) that was fundamental in Blackwell (1962). Our approach is also different from that of Campbell and Meyer (1979), because our argument is entirely based on the decomposition property. Consider a stochastic n x n matrix P with m recurrent classes. By Theorem 2.1.3, we can write P = T (-Poo -Poi -Po2 • • • Pom \ 0 Pn 0 . . . 0 : P22 • : : ••. 0 V 0 0 PmmJ 53 where T is a permutation matrix. Here, the matrices B and C of Equation (2.1.2) are given by B = Poo and C = (Poi • • • Pom), and the matrix D of Equation (2.1.3) is such that Dkk = Pkk, for A; = 1,..., m. In the following theorem, a matrix P* is defined. Then the same permutation T is applied to the matrix P*, giving the blocks P*Jt for i,j = 1,...,m. Theorem 2.3.1. If P is a stochastic matrix with m recurrent classes, then there is a unique matrix P* such that PP* = P*P = P*P* = P* (2.3.1) and Pkk > 0 for every recurrent class, for k = 1,..., m. Moreover F * = 5 - 1 ( o fjS (2.3.2) and Z = (I - P + P*)~l exists. Proof. Decompose P as in equation (2.2.4), and apply the same similarity transfor-mation S to P*, to obtain \ -*V21 - ^ 2 2 / where Rn is (n — m) x (n — m), i^a is m x m and Ri2 and 722i have the appropriate dimensions. If m = n, then P = 7 = P* and we are done. Otherwise, using (2.2.4) and (2.3.1), we have (because B = I — Q is non-singular): PP* =P* ==> Qi?n = Rn => i ? u = 0 QR12 — R12 —>* 7212 = 0 P*P = P* => R2lQ = Tfei => 7^!= 0 P*P* = P* => R22R22 = Tiaa ==> 7^22 is a projection. Now P^ fc > 0 for all recurrent classes implies that rank(P*) = m. This implies that rank(7?22) = and hence 7^2 = I since the only projection of full rank is the identity. So we have (2.3.2). Also, it trivially follows that I-P + P*=S~l[B °^JS (2.3.3) 54 is indeed non-singular. • We recall that P* is called the limiting matrix of P, while Z is called the funda-mentai matrix [Kemeny and Snell (I960)]. Let A = / - P. By Theorem 2.2.7, the matrix A satisfies Equation (1.3.1) with B = I — Q, and hence its group inverse A# exists and is given by Equation (1.3.2). Then A# is the matrix H of Blackwell (1962), who derived the three following properties using a partial Laurent expansion of the resolvent of P (see Chapter HI). They follow immediately from our representation. Corollary 2.3.2. Let A# and P* be defined as above. Then (0 {I-P)A* = A*{I-P) = I - P \ (ii) A#P* = P*A# = 0 and (ni) {I-P*)A# = A*(I-P*) = A*. Proof. We show only (t), the rest follow in a similar manner and are omitted. • A * is also called the deviation matrix by Veinott (1974), while P* is called the eigen-projection of A at X = 1, by Rothblum (1981). The following representations for P* and Z are useful and are easily derived using (2.2.4) and (2.3.2). Corollary 2.3.3. P* = / - AA* and Z = (A + P * ) _ 1 =A*+P\ Proof. We have 55 by (2.3.2), and A* + P-= S-> (B~l f j S ^ i A + PT1 by (2.3.3). • Corollary 2.3.4. ZP* =P\ Proof. • The probabilistic interpretation of the limiting and deviation matrices is obtained using Cesaro limits. We give a new derivation of some results of Veinott (1974). Consider a sequence {A,- : t = 0,1,2,...} of n x n matrices, and let N-1 BN = Ai, for N = 0,1,2,... «=o If lim#_.oo N~1Bff = A exists, then A is called the Cesaro limit of order one of {Ai}, and we write lim Aft = A (C, 1). AT—•oo Further, if limjv_oo B^ = B (C, 1), then we write O O N=0 We investigate the limits of sums of powers of a matrix. Lemma 2.3.5. Suppose I — Q is nonsingular. Then N ^2Qi = (I-Q)-1(I-QN+1), f o r ; v>o. i=0 Proof. Because N i=0 {i-Q)J2Qi = I - Q N + 1 - (2-3-4) • 56 Lemma 2.3.6. Suppose limjv-^oo QN = 0. Then (/ — Q) is nonsingular and «=0 Proof. For N large enough, the matrix / — QN+1 of Equation (2.3.4) is nonsingular because the entries of QN+1 are small. This implies that / — Q is nonsingular. The result follows by taking limits. • Lemma 2.3.7. Suppose QN is bounded, for N = 0,1,2,..., and I - Q is nonsingular. Then lim QN = 0 (C, 1). Proof. By Lemma 2.3.5, we have t=0 The result follows because QN is bounded. • Lemma 2.3.8. Suppose QN is bounded, for N = 0,1,2,..., and J — Q is nonsingular. Then oo £ Q " = ( / - Q ) - 1 (C , l ) . JV=O Proof. Let N-l i By Lemma 2.3.5, we have x= lim jvr1 E ( / - « ) " 1 ( J - ^ + 1 ) -• oo 1=0 y=o N-l i=0 Define F = lim A " " 1 V / and Z = lim A " 1 V Q*. t=0 i=0 Then Y = I, and Lemma 2.3.7 implies that Z = 0. Hence X = ( / - Q ) - 1 ( K - Q Z ) = ( / - Q ) - 1 , and the result follows. 57 Theorem 2.3.0. Suppose P is a stochastic matrix, and let P* be defined as in Theo-rem 2.3.1. Then lim PN = P* (C, 1). N—*oo Further, if P is aperiodic, then the ordinary limit can be used instead of (C, 1). Proof. Decompose P as in Equation (2.2.4). Then PN = S - I ^ (2.3.6) Since PN is a stochastic matrix, we have that PN is bounded, and hence so is QN. Moreover, I—Q is nonsingular, by Theorem 2.2.7. The result follows from Lemma 2.3.7 and Equation (2.3.2). If P is aperiodic, then c(Q) < 1, by Corollary 2.2.10. This implies that lim QN = 0, AT—oo by Theorem 2.4.2. • Theorem 2.3.10. Suppose P is a stochastic matrix, and let A = I — P. Then f ; ( p " - p * ) = = A#(c , i ) . Further, if P is aperiodic, then the ordinary limit can be used instead of (C, 1). Proof. Using Equations (2.2.4) and (2.3.2), we have PN - P* = S-1 °) S. (2.3.7) The result follows from Lemma 2.3.8, with A * defined by Equation (1.3.2). If P is aperiodic, the result follows from Lemma 2.3.6. • The consequence of Theorem 2.3.9 is that, for a Markov chain with stochastic transition matrix P , the entry 7r,y of the limiting matrix P* is the expected fraction of the time spent in state j, given that the system started in state i. 58 By Theorem 2.3.10, the entry a,y of the deviation matrix A* is the expected difference between the expected number of visits to state j when the system starts in state t, and the expected number of visits to state j when the system starts with the stationary distribution. We conclude this section with a simple numerical example, provided by the transi-tion matrix which has no transient states and only one recurrent class (with period 2). We have that p* _ ( 1 1/2\ (0 0 \ /1 /2 -1 /2 \ _ / 1 /2 1/2 \ and 2.4. Geometric convergence in the aperiodic case In the previous section, we expressed P* and A* as Cesaro limits using sequences of powers of the transition matrix P. The rate of convergence of those limits is only harmonic, however, as the proofs of Theorems 2.3.9 and 2.3.10 indicate. When the transition matrix P is aperiodic, ordinary limits converge. In this section, we show that the rate of convergence is geometric in this case. Consider the limiting matrix P*. By Theorem 2.3.9, if P is aperiodic then lim PN = P*. N-*oo Define the error matrix of the JV-th power as E(N) = p N _p* ( 2 4 1 j From Equation (2.3.7), we have EW = S-*(QZ J ) * (2.4.2) 59 where Q is as in Equation (2.2.4). Specifically, choose Q to be the Jordan canonical form, i.e., Q = diag(Jn,.(A,)), with |Ax| > |A 2 | > . . . > |A f c | , where A; is the number of Jordan blocks. Let 7 = | A i | . By Corollary 2.2.10, we have 7 < 1 for an aperiodic transition matrix P . Lemma 2.4.1. [See e.g., Varga (1962), Equation (1.28)]. Suppose djj^ is the (t,;)-th entry of the matrix ( j n (A) )^ . Then {0 if i < j (i-j)^N~i+3 * ; ' < » < min(n, 7Y + ;) 0 if N + j < i < n where \k) ~ k\(N-k)\ wTheorem 2.4.2. Suppose a(Q) < 1. Then lim^-oo QN = 0. Proof. Because Q^=diag( ( J n , (A , ) ) W ) and, by Lemma 2.4.1, we have lim (J„,(A,-))"=:0 iV —•OO when j A, j < 1. • Corollary 2.4.3. Suppose P is an aperiodic stochastic matrix. Then lim = 0. TV—00 Proof. By applying the theorem to Equation (2.4.2). • The geometric rate of convergence is a direct consequence of Lemma 2.4.1. The idea is more precisely expressed using norms. Following Varga (1962), Section 1.3, we define the Euclidean norm of a real vector x as | |x| | 2 = (x 'x) 1 ' 2 . 60 Moreover, the spectral norm of a real matrix A is defined as \\A\\2 = sup \\Ax\\2 xjtO \\X\\2 Also, by Theorem 1.3 of Varga ( 1962) , we have | | A | | 2 = (<T(A'A)) 1 / 2 . The following result is an immediate consequence of Theorem 3.1 of Varga ( 1962 ) , which is based on Lemma 2 . 4 . 1 . Theorem 2.4.4. Suppose P is an aperiodic stochastic matrix, and let 7 == <r(Q). Then \\E^%_ / - T o o | | £ ( " ) | | 2 ~ T This result implies that for large N, every power of P reduces the error by a factor 7, the subdominant eigenvalue of P. Now suppose P is irreducible. By Lemma 2 .2 .3 , there is a unique row probability vector ir such that rcP = jr. Then ir is the stationary distribution of P , and TTP* = 7r. Corollary 2.4.5. Suppose P is an aperiodic, irreducible stochastic matrix, and let TT(°) be an arbitrary probability vector. Then TT= Urn ir^PN is the stationary distribution of P . Proof. Let = n^PN. By Theorem 2 .3 .9 , the limit exists when P is aperiodic, and is given by IT = jr(0)p*. Since P * P = P * , we have irP = JT, SO that rr is the unique eigenvector of P of Lemma 2 .2 .3 , with eigenvalue 1. Hence IT is the stationary distribution of P . • Lemma 2.4.6. Let = - TT be the vector of errors on the N-th iterate TP K Then e ( " )=e(° ) i ? ( "> , (2 .4 .3) 61 where e(°) = — ir is the initial error. Proof. Since IT = *4°)P*, we have e W = = i r W _ j r = ff(o)(P^_p*). Now icPN = IT implies that ix(PN - P*) = 0, so we get eW=irW(PN-P*)-«{PN-P') = ( 7 r ( ° ) - 7 r ) ( P 7 V - P * ) = e(°)EW, by Equation (2.4.1). • To show that the rate of convergence of is geometric, we use a measure of the error, such that He^lh < € ^ ) . We choose e<"> = | |e< 0) | | 2- | |£<">| | 2 . (2.4.4) Corollary 2.4.7. The error converges to zero geometrically. More precisely, e(N+l) lim — r r r r - = 7, N-*00 e(N) where 7 < 1 is the modulus of the subdominant eigenvalue of P . Proof. By Theorem 2.4.4. • The above results will be extended, in Chapter VI, to the case of a periodic Markov chain. Also, the rate of convergence for the deviation matrix A* will be investigated in Chapter III, in the aperiodic case, when solving the evaluation equations of dynamic programming. 62 Chapter IH Discrete time Markov decision processes In this chapter, we use the decomposition results of the previous chapters to provide new, simple and direct derivations of many important results of dynamic programming. Also, some new algorithms are described, to compute the gain, the bias and the higher order terms required in the policy improvement methods of dynamic programming. With a given stationary policy 5, a Markov decision process evolves according to a Markov chain with stochastic transition matrix Ps. Hence the singular matrix A6 = I — P6 can be decomposed as in Equation (1.3.1). In Section 3.1, the decomposition of A6 is used to give a direct derivation of the Lau-rent expansion of the resolvent [see Veinott (1969) and (1974)], and of the discounted value function, when the interest rate is small. The boundedness of the powers of the transition matrix Ps is not used in this derivation. This property is crucial, however, for the probabilistic interpretation of the terms of the series expansion, especially the gain and the bias terms. Then is Section 3.2, we formulate the problem of finding the terms of a truncated series as a finite system of linear equations. Using matrix decomposition again, we show that the packed vector (of the terms of the truncated series) can be solved using the Drazin inverse of the packed matrix of coefficients. Next, the computation of the terms of the series expansion is considered in Sec-tion 3.3. We show how the matrix decomposition algorithms of Chapter I can be used to compute the gain, the bias and the higher order terms. 63 In Section 3.4, we show that when the transition matrix is aperiodic, the gain and the bias can be computed iteratively using the power method, and that the convergence is geometric. We also show that the error on the approximate gain and bias is related to the residual error by a system of equations of the same type as the evaluation equations. Finally, the application of the Drazin inverse approach to policy improvement al-gorithms is explored in Section 3.4. We provide a new proof of the result of Veinott (1974) that at most n — m terms are needed to compare the value of two policies, with respect to the discount optimality criterion, where n is the number of states and m is the number of recurrent classes. 3.1. Markov reward processes Consider a homogeneous Markov chain {X(t) : t = 0,1,2,...} with finite state space N = { 1 , . . . , n} and stochastic transition matrix P, where P i j - = Prob{X(* + 1) = j | X(t) = i}, x, j = 1,..., n. Suppose also that at the end of every period t, the system earns a reward r,-, where i = X(t). We say that r is the vector of immediate rewards. Such a Markov chain with rewards is called a Markov reward process, and is denoted by the triplet (TV, r, P). Now suppose an arbitrary random variable Y is defined on the Markov chain {X(t)}. We define the conditional expectation operator Ei as Ei(Y) = E{Y | X(0) = »}, for i = 1,..., n. Let p > 0 be the interest rate for one period. Then f3 = (1 + p)~l is the one period discount factor, with 0 < {3 < 1. We are interested in the present value, at time t = 0, of the total reward earned over an infinite horizon. More precisely, we define the expected total discounted reward function v(p) such that oo Vi(p) = Ei(Y_ / ? f c + 1 r x ( f c ) ) , » = 1,..., n. fc=o 64 The expectation can be expressed directly using matrix notation, and we get OO v(p) = /? £ pkPkr = /?(/ - PP)-1 r. (3.1.1) k=o The infinite series in Equation (3.1.1) converges by Lemma 2.3.6, because Um 0kPk = 0, fc—•OO with 0 < p < 1. We are interested in the case when f3 is in the neighbourhood of 1. Following Veinott (1969) and (1974), we replace the discount factor ft by the interest rate p, so that v(p) = (l + P)-1{I-(l+p)-lP}~1r = { ( l + / J ) / - P } - 1 r = {pI + A}~lr (3.1.2) where A = I — P. The matrix Rp = (pi + A)~x is called the resolvent of A. We now provide a direct derivation of the Laurent expansion for the resolvent [Theorem 2 of Veinott (1969) and Theorem 3 of Veinott (1974)] using algebraic properties of the matrix A and its Drazin inverse A#. Theorem 3.1.1. Let A be the non-zero eigenvalue of A with smallest modulus. If 0 < p < |A| then Rp = p~lP* + f ] (-p)\A*)i+\ (3.1.3) *=o Proof. By definition, *„=(,/+.0-=s-(>/+B = s - ( ( „ / + * ) - ' ^ o J 5 65 The first term is p lP*. Now consider the second term. [PI + B]-1 = [BipB-1 + I)}'1 = [I + pB-1)-1 B-1. By hypothesis, v(pB~l) = p/|A| < 1, so that i=0 by Lemma 2.3.6. But so the second term above equals i=0 which gives the desired result. • Corollary 3.1.2. Let tu_i, w0, Wi,..., be vectors such that ty_x = P*r (3.1.4) and wi = (-l)i(A*)t+1r, » = 0,1,2,... (3.1.5) Then oo v(p) = p - 1 tu_ i + wQ + Y_ P^i- (3.1.6) 1=1 Proof. By substituting (3.1.3) into Equation (3.1.2). • This Laurent expansion of the discounted reward function v(p) is fundamental for the notion of discount optimality that will be discussed in Section 3.5. Observe that our derivation uses only the property that ind(A) = 1, that is, that the matrix A can be decomposed as in Equation (1.3.1). A special case of Theorem 3.1.1 is when the matrix A is nonsingular [i.e., ind(i4.) = 0]. This is the case, for example, if all the recurrent subchains of P are substochastic. Then Equation (3.1.3) becomes ^ = E(-P) , 'U- 1) , '+ 1. t=0 66 Theorem 3.1.1 is itself a special case of a result of Rothblum (1981) [his Theorem 3.1], in which a matrix A with ind(jl) = k > 0 has terms in p~x, for » = 0, . . . , A:. When P is a stochastic matrix, the terms and wo have a probabilistic inter-pretation of particular interest. Define the symbols g = W-i = P*r (3.1.7) and h = wQ = A*r. (3.1.8) The term g is called the gain of the process and h is the bias. Theorem 3.1.3. Suppose P is a stochastic matrix. Then 2= lim Pkr (C, 1) k—• OO and h = JT,(Pkr-g) (C7,l). fc=0 Moreover, if P is aperiodic then the result is valid using ordinary limits. Proof. By Theorems 2.3.9 and 2.3.10. • Note that in addition to the fact that ind(A) = 1, this result uses the fact that the powers of P are bounded matrices. Theorem 3.1.3 suggests the following interpretation for the gain and bias terms. The entry <7,- is the expected reward earned by the system per time period, in the long run, given that the system starts in state *. The entry A,-, on the other hand, is the total difference between the expected earnings in one period and the long run average earning, given that the system starts in state », summed over all periods. We conclude this section with a very simple derivation of an equation which is well known for computing the gain and the bias [see Howard (1960) and Blackwell (1962)]. This equation is usually derived using a partial series expansion, but here it is based on the generalized inverse representation for P*. 67 Proposition 3.1.4. Let g and h be defined by (3.1.7) and (3.1.8). Then g and h satisfy h = r-g + Ph. (3.1.9) Proof. We use the fact that P* = I - AA*. By (3.1.7), we have g = {I-AA*)r = r - A(A*r) = r - ( J - P)h, where the last equality follows from (3.1.8). The result follows by rearranging terms. • 3.2. Singular systems of equations Let A be an n x n matrix of index 1 [i.e., decomposition (1.3.1) is valid], and suppose 6 is a given column vector. Recall that the projection matrix W = I — AA* is given by Equation (1.2.8). In the special case A = I — P with P a (stochastic) transition matrix, we have W = P*, the limiting matrix. Lemma 3.2.1. The (singular) system of equations Ax = 6 has a solution iff Wb = 0. Further, if Wb = 0, then for arbitrary y x = A*b + Wy (3.2.1) is a solution of Ax — b. Proof. Let *=*-,fe)-»-*-,(!0-*'-*-*(£)• Then ->--(o SK'(:;)=*-ft) which is equivalent to ( o o ) ( x 2 ) = ( & 2 ) ' This system has a solution iff 62 = 0> which is equivalent to Wb = 0, because " — - • ( ! ! ) " - ' & ) - i - 1 ( £ ) • 68 Now we assume that 62 = 0 and we take x = A#b + Wy, for some arbitrary vector y. We have Ax = AA*b + AWy = 5 - ' ( o ) + 0 = 6-• Theorem 3.2.2. Let {6,-,i = —1,0,1,...} be a sequence of n- vectors and W = I—AA&. Then the system of equations Ax-X = 6_! (3.2.2) x,_! + Axi = 6,- »' = 0,1,2,... (3.2.3) has a solution iff Wb-x = 0. (3.2.4) Furthermore, if (3.2.4) holds, the solution is unique and is given by Xi = A*(b{ - Xi-x) + Wbi+x, i = -1,0,1,2,... (3.2.5) where x_ 2 = 0. Proof. By Lemma 3.2.1, Equation (3.2.2) has a solution iff Wb_x = 0. Now assume (3.2.2) has a solution. We show by induction on i that under (3.2.4), a solution exists and (3.2.5) gives its unique value. The induction hypothesis is that Xfc_i is unique and that (3.2.3) has a solution for i = A;, for some k > — 1. This is trivially true for A; = —1 with x_2 = 0, because (3.2.2) has a solution, by hypothesis. Rewrite (3.2.3) with i=k as Axk = bk - Xk-x-By Lemma 3.2.1, one solution is xfc = A*(bk - x f c_i) + Wyk 69 (3.2.6) where yk is an arbitrary vector. Now apply Lemma 3.2.1 to Equation (3.2.3) with i — k + 1. It has a solution iff W{bk+1-xk)=0. Substituting (3.2.6), we have W(bk+1 - A*(bk - xfc_0 - Wyk) = 0. But WA* = 0 (see Corollary 2.3.2) and W2 = W, so we must have Wyk = Wbk+l and hence xk is uniquely determined by (3.2.5). • We now apply this to the Markov reward process (N,r,P). Corollary 3.2.3. Let A = I — P. Then the terms g, h and tt/,-, i series (3.1.6) are the unique solution of the recurrence Ag = 0 g+ Ah = r tu,_i + Awi = 0 » = 1,2,... with g = tu_i and h = w0. Proof. Apply Theorem 3.2.2 with 6_x = 0, bQ = r and 6,- = 0, for t = 1,2,... Here W = P* so we get g = P*r, h = A*r and assuming ti;,_i = (—l)t~i(A#)%r, Equation (3.2.6) gives to,- = ( - l ) ' ( A # ) ' + 1 r as in Equation (3.1.5). • This last result unifies some previous approaches to the problem. Howard (1960) derived (3.2.7) and (3.2.8) using z-transforms and made h unique by setting one compo-nent to zero in each recurrent class. Blackwell (1962) derived both equations using the series (3.1.1) and some limiting argument as ft /* 1. He also used the extra conditions P*g = P*r 70 = 1,2,..., of the (3.2.7) (3.2.8) (3.2.9) and P*h = 0. We saw that these conditions are necessary and sufficient for (3.2.8) and (3.2.9) with t = 1 to have a solution, respectively. Finally, Veinott (1969) derived (3.2.7), (3.2.8) and (3.2.9) using the series expansion (3.1.6). As will be seen in the next section, truncated recursions as in (3.2.2) and (3.2.3), but with a finite number of terms, are also of interest. They form a finite system of linear equations. Let A be an n x n complex matrix such that ind(A) < 1 (i.e., A is either non-singular or else it satisfies equation (1.3.1)). For k > 1, we define the matrix A(k) as the block Jordan matrix of order A; that has A for its diagonal blocks. For example, A(l) = A and for k = 4, we have A(4) = (A 0 0 0 I A 0 0 0 I A 0 VO 0 / A . We are interested in solving the system of equations: A{k)x(k) = b(k) (3.2.10) where x(k) and 6(A:) are partitioned so that they be conformable with A. For example, with A: = 4 we have x 4 . where xx,..., x 4 and b\,..., 64 are n-vectors. This system can also be written recur-sively as Axx = bx (3.2.11) and x ,_ i + Ax% = 6,-, ii = 2 , . . . , A;. (3.2.12) 71 If the matrix A is non-singular, then (3.2.11) would have the unique solution x = A~lbi and (3.2.12), which can be written as Axi = bi- Xi-i, (3.2.13) would have the unique solution Xi = A~l(bi — z,_j), i = 2, Expanding this iterative solution, we can easily verify that it can be expressed as x(Jfc) = A(k)~1b(k) where A(k)~i is a block lower triangular matrix such that for t > block (», j) is given by ( - l ) , W ( i i - 1 ) , W + 1 . (3.2.14) For example, with k = 4 we have AiA)-1 = ( A~l 0 0 0 -A"2 A~l 0 0 i l " 3 -A~2 A~l 0 V - A - 4 A~3 -A~2 A-1 We now extend these results to the case when A is singular. We assume that ind(A) = 1, so that A& exists, and we define W = I — AA& as before. We show below that an analogous representation for A(k)D with A# replacing A - 1 in (3.2.14) holds, but that it does not give a solution to the system of equations (3.2.10). Theorem 3.2.4. Suppose ind(A) = 1. Then the system (3.2.10) has a solution iff Wbi = 0. In this case, x i , . . . , xk-i are uniquely determined and are given by xi = A*bi + Wb2 (3.2.15) and Xi = A*(bi - x,_i) +Wbi+i, i = 2,...,k-l (3.2.16) while the last component is xk = A#(bk-xk-1) + Wy (3.2.17) 72 for an arbitrary vector y. Proof. Apply Lemma 3.2.1 as in Theorem 3.2.2 to obtain existence and uniqueness up to i = k — 1. For i = k, there is no next equation to satisfy, so the vector y is arbitrary. • This result is readily applied to the Laurent expansion for Markov reward processes. Let us take A = I — P, where P is, as before, the transition matrix of a Markov chain with immediate reward vector r. Corollary 3.2.5. The terms tuy, j = —1,0,...,» — 1 of the series (3.1.6) are uniquely determined by a system of t + 2 equations. Proof. To obtain the terms tu_i, two,..., tu,-, let us define « y + a = u»y, j = - 1 , 0 , . . . , t, and {0 if j = -1 r if j = 0 0 if j = l ,2 , . . . , t . Using equations (3.2.7), (3.2.8) and (3.2.9), we form the system (3.2.10) with k=i+2. For example, for i=2, we have k = 4 and /A 0 0 0 \ / 0 \ / A 0 0 j [ tuo I _ I r I 0 / A 0 I I tot I ~ I 0 I ' VO 0 / AJ \ w2 J \oJ Clearly, the first equation has a solution (because bt = 0) and so Theorem 3.2.4 applies. Hence W-i,..., u;,_i are uniquely determined and are given by (3.2.15) and (3.2.16). The last component u;,- is not unique and is given by (3.2.17). • We will now see how the solution of (3.2.10) can be expressed in terms of the Drazin inverse of the matrix A(k), thus extending equation (3.2.14) to the singular case. Lemma 3.2.6. If A is singular but satisfies (1.3.1), then A(k), k > 1, has index A; and its Drazin inverse A(k)D is a block lower triangular matrix such that its block 73 * > j \ is given by (-i) , w(A#r y + i. (3.2.18) Proof. We use the fact that A can be decomposed as in (1.3.1), where B is non-singular. Let S(k) be the block diagonal matrix that has all diagonal blocks equal to the matrix S of Equation (1.3.1). We define a new matrix C(k) as - l C(k) = S(k)A{k)S{k) so that A{k) = Sik^C^Sik). We argue that there is a permutation matrix Q(k) such that D(k) = Q{k)C(k)Q(k)~l _(B{k) 0 \ where B{k) and J(k) are both block Jordan matrices of order k, with their diagonal blocks respectively equal to B and 0. We show how to produce such a permutation in the simple case k = 2. It will be clear that the same is true for k > 2. We have (3.2.19) C(2) = (B 0 0 0 0 0 0 0 / 0 B 0 V0 I 0 0. Let us now consider a directed graph that has one node associated with each row of C(2), and with an arc going from node i to node j whenever block is non-zero. The graph of C(2) is shown in Figure 3.2.1. In general, the graph of C(k) has the following structure: every node i such that 3 < t < 2k has an arc going to node * — 2 with label / , and every odd-numbered node i has an arc going to itself, with label B. Hence we see that the nodes of the graph can be grouped into two subsets: the odd numbered nodes and the even numbered nodes. The desired permutation is simply to reorder the nodes as (1,3, . . . , 2k — 1,2,4,..., 2&), because the subgraph associated with the odd-numbered nodes is the graph of B(k), 74 B i I q B 1 O 9 I 4 L Figure 3.2.1. Directed graph for C(2) and the subgraph associated with the even-numbered nodes is the graph of J(k). So we have, for k = 2: 0(2) = B 0 0 0' / B O O 0 0 0 0 0 0 / 0 . which is indeed of the form (3.2.19). Now it is easy to verify that J(k)k = 0 and that J(k) is of index k. Then Dmr-pf ;) by definition, as in Lemma 1.2.6. Now B(k)~* is given by Equation (3.2.14), and by a graph argument similar to the above we see that A(k)D has the required shape as in equation (3.2.18). • It is straightforward to verify that A(k)A{k)D is a block diagonal matrix with all 75 diagonal blocks equal to AA*. For k = 4, we have A(4)A(4)D = 0 A* 0 0 0 \ 0 ~{A*f {A*f A* 0 A*) The consequence is that y(k) = A(k)Db(k) is not a solution of (3.2.10) in general, because AA*b{ ^ 6,- for i > 2. Nonetheless, we can use A(k)D to construct a solution. Define a new matrix R(k) whose only non-zero blocks are on the superdiagonal and are all equal to W = J — AA*. For example, R{\) = 0 and with A: = 4 we have R(4) = 0 W 0 0 0 0 W 0 0 0 0 w .0 0 0 0 We show the following result. Theorem 3.2.7. If the system Ax± = 6i has a solution, then the system A(k)x(k) — b(k) has a solution given by x(k) = \A{k)D + R{k)]b{k). (3.2.20) Moreover, xt, x2,... , i fc - i are uniquely determined while xk + Wy is also a solution, for any vector y. Proof. Expanding the matrix product in (3.2.20) gives precisely the solution of The-orem 3.2.4. • Corollary 3.2.8. The terms tw_i,...,u;,- of the series (3.1.6) are given by [A(A:)I> + R(k)]b{k), where A; = t + 2 and 6* is defined as in Corollary 3.2.5. 76 3.3. Computataion of gain, bias and other terms A consequence of Corollary 3.2.3 is that the terms g, h and tu,-, for » = 1,2,..., of the series (3.1.6) can be computed recursively, by solving successive systems of linear equations, all with the same singular matrix of coefficients. One such algorithm is described in Veinott (1969), Page 1651. It requires parsing the transition matrix P into its recurrent subchains and transient states. Gaussian elimination is used to compute the solutions. Another approach is to use the matrix decomposition algorithms of Chapter I. Applying the fast decomposition algorithm of Figure 1.3.5 to the matrix A = I—P, both the gain g and the bias h can be computed directly using the fast solution algorithm of Figure 1.4.3 with the right hand side b = r, where r is the vector of immediate rewards. The terms u;,-, for t = 1,2,..., can be computed as follows. Observe that at Step 3.2 of the fast solution algorithm, the bias is given by where Z\ was computed at Step 2.3. Let z[0^ = z\. Lemma 3.3.1. Let z[l+1^ be given by *<<+1) = -fluM*'\ (3.3.1) where the product by B^1 is computed as in Figure 1.4.2. Then Wi; = PL (*P) , » = 0,1,2,. . . , (3.3.2) where P and L were obtained with the fast decomposition algorithm. Proof. Recall that the fast decomposition algorithm of Figure 1.3.5 factorized the matrix A as A = PLUQ. Then the matrix B, although never explicitly computed, was denned as B = UQPL, so that A = (PL)(B» Bf)(PL)-\ 77 Using Equation (1.3.8), we have A# = (PL)(B» B»B»Bl*yPLr\ (3.3.3) We show by induction that Equation (3.3.2) holds. We know it holds for t = 0, the bias term. By Equation (3.1.5), we have VH+I =(-A*)((-lY(A*)i+1r) =-A#wi) .- = 0,1,2,.. . Hence = -{PL) (B» BTi'BjB^ { p L ) - i { p L ) ^ and the result follows from Equation (3.3.1). • The amount of arithmetic required is about the same for both the method of Veinott (1969) and our fast solution algorithm (when the number of recurrent classes is small), provided that only a small number of terms are required (less than n). We have no computational experience from which to compare their relative performance, however. 3.4. Iterative computation of gain and bias When only the gain and bias terms are required [e.g., when using the average reward criterion as in Howard (1960) and Blackwell (1962)], and when the transition matrix P is large and sparse, it may be appropriate to use an iterative method to compute g and h. We describe such an algorithm based on the powers of the transition matrix, and we show that its rate of convergence is geometric when P is aperiodic, using the results of Chapter II. Also, a stopping criterion is proposed, based on a sensitivity analysis of the evaluation equations. The iterative algorithm is described in Figure 3.4.1. We call it the power method because, as we will see, it produces the same sequence of iterates for the gain as the standard power method for computing an eigenvector. 78 Step 0. Set = r, x*1) = r and N = 1. Step 1. Compute x ^ + 1 ) = r + Px<*). Step 2. Compute gW = x<" + 1) - x<">. Step 3. Compute A<*> = - J t y M . Step 4. UN- WgW - gW-VW < 5 then stop, else add 1 to JV and go to Step 1. Figure 3.4.1. Power method for gain and bias Theorem 3.4.1. Suppose P is aperiodic, and let the sequences glN) and h^N^ be defined as in Figure 3.4.1. Then lim gW = g and lim = h, N-*oo AT—.oo where g = P*r and h = A&r. Proof. From Step 1, we have x[N+l) = £ > V , «'=0 and hence = pNr^ ( 3 4 x ) which converges when P is aperiodic, by Theorem 2.3.9. Further, we have hSN)=Y^Pir-NPNr «=o N-1 = 53 (P» - P*)r - JV(P* - P*)r. (3.4.2) »=o The first term converges to h = A#r, by Theorem 2.3.10, and the second term goes to zero. • 79 Lemma 3.4.2. Suppose P is aperiodic, and let = PN—P*, as in Equation (2.4.1). Then e[N) = gW-g = EWr (3.4.3) and 4") = A<*) - h = -{A* + NI)EWr. (3.4.4) Proof. Equation (3.4.3) is immediate from (3.4.1). To show (3.4.4), recall that 1=0 by Theorem 2.3.10, when P is aperiodic. Hence from (3.4.2), we have N-l oo - h = J 2 ( p i ~ p*)r ~ E( F* ~P*)r- NE^r «=o t=0 oo = - ^ ( P , - - P * ) r - A r ^ ) r i=N oo = ~(__]{Pi -P*)){PN - P*)r - NE^r 1=0 = -A*E^r-NE^rt and the result follows. • To show that the convergence is geometric, let us measure the error with e ^ ) = | | ^ ) | | 2 . | | r | | 2 (3.4.5) and e2N) = (\\A*\\2 + N)e[NK (3.4.6) Then \\e[K)h < «i"> and II 4 " 'II . s 4". Corollary 3.4.3. Suppose P is aperiodic. Then €{N+i) e(N+i) }im 1 = 7 and .lim 2 N-*oo JN) N—*oo AN) 80 where 7 < 1 is the modulus of the subdominant eigenvalue of P . Proof. By Theorem 2.4.4. • Although this result indicates that the convergence is asymptotically geometric, it is to be observed that the factor ||A^||2 + N in Equation (3.4.6) implies that the error on the bias is always greater than the error on the gain. Now that the rate of convergence of the algorithm has been determined, we still have to justify the stopping rule given at Step 4 of Figure 3.4.1, and to specify some guidelines for the choice of the parameter 5. We proceed as follows, using residual errors. Let = AgW (3.4.7) and = gl") + Ah,™ - r. (3.4.8) These can be computed easily from gW) and h,(N), although we will see that their computation is not necessary. Now since g and h satisfy the evaluation equations, the (unknown) errors e[N^ and must satisfy Ae<"> = (3.4.9) and + Aei"\= S^K (3.4.10) Theorem 3.4.4. The gain and bias errors are related to the residual errors by e[N) = A*6W and e2">=A*6<">. IN) Proof. We have that SX ' is compatible with A by construction, from Equation (3.4.7). Hence Theorem 3.2.4 gives eW = A*6W+P*6W (3.4.11) 81 and e[N) = A#5W+P*y, (3.4.12) where y is arbitrary. Now P*^N) = P*gW + p*Ah!N> - P*r = 0 because g = P*g(N) = P*r and P*A = 0. Also, Equation (3.4.4) implies that P*e2N) = -P*{A# + NI)EWr = 0, because P* A* = 0 and P*E^ = 0. Hence P*y = 0, and the result follows. • The consequence of this result is that l|elN)l|a<l|ii*|| a-||*1 (* r )|| a > (3.4.13) so that the error on the gain is bounded by a fixed factor | | A * | | 2 times the residual error. This factor indicates the sensitivity (or conditioning) of the Markov chain with transition matrix P [see Meyer (1980)] to changes in the data. Now observe that, at Step 4 of Figure 4.3.1, we have gm - gi"-D = (/> _ i)gl«-D = -AgC-V = -6i"-l). Hence our stopping criterion simply ensures that the residual error is small enough. By choosing 6 such that | | A#|| 2.$<6say, we see from Equations (3.4.5) and (3.4.6) that both e[N^ < e and € 2 N^ < c, assuming that the problem is reasonably well conditioned (i.e., that | | A * | | 2 is not too big). 3 .5 . Markov decision processes Now consider a system in which an action must be taken before a transition occurs. Let N = { l , . . . , n} be the (finite) state space, and let A,- be the finite set of actions that can be taken when the system is in state t, for * = 1,..., n. 82 The system evolves in time according to the process {X(t) : t = 0,1,2,...}, where X(t) is the state of the system at time t. Let {a{t) : t = 0,1,2,...} be the sequence of selected actions, with o(t) € 4?c(t)-We assume that the system satisfies the Markov property and that the transition probabilities do not depend explicitly on time. Hence we define p& = Prob{X(< + 1) = j | X(t) = i , a(t) = *}, for k G Ai and t, j = 1,..., n. We assume that the system satisfies the Markov property, that is Prob{X(< + 1) = j | X(0),o(0),...,a(0> = Prob{X(f + 1) = j | X{t),a{t)}. Immediately before a transition occurs, the system earns an (expected) reward r^, where k = a(t) and i = X(t). The action a(£) is selected according to a function JT, called a policy, such that a(t) = n(t, X(0), a(0),..., X(t - 1), a(t - 1), X{t)), for t = 0,1,2,... Now let us denote by P the transition probability function, by r the reward function and by n the set of all policies. Then the above model is called a Markov decision process and is denoted by the quadruple (N,r,P,H). For a given policy rr G II and interest rate p > 0, we define the expected total discounted reward function v*(p) such that tf(p) = £r(f> fc+14w)> t- = i,...,n, fc=0 where (3 = (1 + p)~l and E* is the conditional expectation operator under policy jr. That is, for an arbitrary random variable Y defined from {X(t)} and { o ( t ) } t w e n a v e Ei(Y) = E*{Y | X(0) = i}, t = l,...,n, 83 where the expectation is taken under policy TT. The problem is to find a policy IT G II that maximizes in some sense, for » = 1,..., n. Now define a function 8 to be a decision rule if Si G Ai, for i = 1,..., n, and let A = A\ x A2 x . . . x An be the set of all decision rules. Then a policy TT is said to be a Markov policy if ir(t,X(0),a(0),...,X(t))=8i(t), where i = X(t) and 6(t) G A. Moreover, a Markov policy w is said to be stationary if <5(0) = 5(1) = . . . For problems with an infinite time horizon, it has been shown that there is a stationary policy that is optimal [see e.g., Heyman and Sobel (1984), Theorems 4-3 and 4-4] for the criteria we will consider. Hence there is no loss of generality in restricting the problem to stationary policies. For simplicity of notation, we will denote a stationary policy by its decision rule 8, and the set of stationary policies by A. For a fixed, stationary policy 8 G A, the system is a Markov chain with rewards and we obtain the Markov reward process (N,r6,P6), as in Section 3.1, with transition matrix P = P6 and reward vector r = r6. For a given interest rate p > 0 and discount factor P = (1 + p)~l, the expected total discounted reward vs(p) is given by OO ^ ( / 0 = £ / ? , ' + 1 ( p ' ) V ! 1=0 as in Section 3.1. Following Miller and Veinott (1969), we say that a policy 8 is p-optimal if vS{p)>v'l{p) V 7 G A . (3.5.1) It was shown by Blackwell (1962) that there is a policy 6 G A that is p-optimal for every small enough p > 0. Such a policy is said to be Blackwell optimal. Motivated by the series (3.1.6), Veinott (1969) and (1974) defined a policy 6 to be k-discount optimal if lim p~k\v6(p) - v^ip)} > 0 V7 G A. (3.5.2) 84 (Veinott's definition also extends to the case of p < 0, but we will stick to the positive case, which Veinott(1969) calls k+ discount optimality). As special cases, a — 1-discount optimal policy is also said to be gain optimal, while a 0-discount optimal policy is also said to be bias optimal [Denardo (1970)] or nearly optimal [Blackwell (1962)]. The policy iteration method of Howard (1960) and Blackwell (1962) produces a policy 8 that is gain optimal. This method has been extended by Veinott (1966) to find a bias optimal policy and Miller and Veinott (1969) to find a Blackwell optimal policy. Veinott (1969) modified it to find a fc-discount optimal policy, for any k > — 1. More precisely, let tuf, i = — 1,0,1,..., be the terms of the Laurent series (3.1.6), for a policy 8 G A. Then (3.5.2) implies that 8 is A:-discount optimal if k k t = - i i = - i for all small enough p > 0. The extended algorithm evaluates u ; i l f tw*,..., tujt+i *° show that no other policy is lexicographically better, up to wk. Now let n be the number of states of the process. Miller and Veinott (1969) showed that a policy is Blackwell optimal if and only if it is n-discount optimal. This result was extended by Denardo (1971) who showed that an (n — l)-discount optimal policy is Blackwell optimal. Then Veinott (1974), using the fact that rank(A) = n —m, where A — I — P and m is the number of recurrent classes of P, showed that any (n — re-discount optimal policy is Blackwell optimal. We give a new proof of the latter result, using our matrix decomposition approach. As Miller and Veinott (1969), we need the following lemma from Gantmacher (1959), which we state without proof. Lemma 3.5.1. Let Af be a k x A; matrix and L a linear subspace of Rk. If Af'x G L for * = 0,1,...,k - 1, then Af*x G L for i = k, k + 1, (The idea is that for t > A:, M ' x is a linear combination of x, A f x , . . . , A f f c _ 1 x ) . This implies the following. Theorem 3.5.2. Suppose that some policy 8 G A has m recurrent classes (i.e., 1 < m < n). Suppose also that some policy 7 € A is such that to? = twf, for t = — 1,0,..., n — m. Then w~[ = ty* for i > n — m. Proof. First, observe that in the special case m = n, the result is trivially true because g"i = g6, A6 = I - P6 = 0 and {A6)* = 0, so that hs = u;£ = wf = . . . = 0. This implies that h1 = 0. Now by equations (3.5.4) and (3.5.6), u,7 = ( -1)'((A 7) #) V =0 = «;f, » > 0. In the case m < n, both policies have the same gain and bias, i.e. wt_i = w2_x = <7 and IWQ = WQ = n. Equations (3.5.4) and (3.5.6) imply that «,£ = {-iy((A6f)%h, i > 0. (3.5.4) Decompose A6 as in equation (1.3.1), and define «-(£:)-*•? where y,-,i and y,,2 have dimensions (n — m) and m, respectively. Define also , . ( « . ) = „ = 5 ( v , v - ( v 2)^. Hence 2 2 = 0. Multiply Equation (3.5.4) by S to get so that W,i = (-l)'(B- 1) ,'* l l , , t > 0. (3.5.5) y«,2 = 0. / Now apply the same transformation S to A'1, giving A1 = S - 1 f v U v 1 2 V (3-5.6) \ A a i A 2 2 / and define 86 Equation (3.2.9), with i=i+l and A = A1 together with (3.5.6), gives \ A 2 l A 2 2 ) \ E » + 1 , 2 / \Xi,2) By hypothesis, x,-,i = y,,i and x, t2 = 0, for i = 0 n - m s o (£: a C r ) = - ( " ' ) ' —->• Using (3.5.5), we get ( A - n B - 1 - / ) ( B - 1 ) * 2 1 = 0 and X21(B-l)i+l2l=0. (3.5.7) (3.5.8) (3.5.9) Now recall that Z\ is a vector of dimension (n — m). Define a linear subspace L of R"- m such that 1 € L iff ( X n - B - 1 - /)x = 0. Then Equation (3.5.8) implies that {B~l)%Z\ G L, for t = 0, . . . , n — m — 1. By Lemma 3.5.1, it is true also for » > n — m and hence Equation (3.5.8) is also valid for » > n — m. The same argument applies to (3.5.9) as well. We still have to show that it implies that x,-fi = y,-,i and x, i 2 = y,-,2 = 0, for i > n — m. By Corollary 3.2.3, the recurrence (3.5.7) has a unique solution. Equations (3.5.8) and (3.5.9) imply that x t ) 1 = y, ( 1 = (—l)t(B~1)%z1 and x,-,2 = yt,2 = 0 is that solution for every * > 0. • The following example shows that 6 and 7 in Theorem 3.5.2 need not have the same number of recurrent classes. Let policy 6 have the transition matrix P6 and reward vector r6 given by P6 = (\ 0 0 0' 0 0 1 0 0 0 0 1 vo 1 0 0. and Note n = 4 and m = 2. With A = I - P6 we have A* = /o 0 0 0 1/3 0 0 -1/3 1/3 \0 0 -1/3 87 so that Let 7 be a policy with a transition matrix given by and ttfo = P 7 = (1 - 3x x x 0 0 1 0 0 0 \ 0 1 0 with 0 < x < 1/3, and r 7 = r6. Then gn = g6, n 7 = h6, wj = twf and w 7 = u^. Thus 6 and 7 have the same series expansion, but 7 has only one recurrent class if x > 0. We now apply the above result to dynamic programming. Theorem 3 . 5 . 3 . A policy with m recurrent classes is Blackwell optimal iff it is (n—re-discount optimal. Proof. By applying Theorem 3.5.2 to Equation (3.5.3). • It implies, in general, that a policy is Blackwell optimal if and only if it is (n — 1)-discount optimal (because m > 1, always). We will now use this result and the notation of the previous section to formulate a policy iteration algorithm in a unified way. We first recall that with a fixed interest rate pt the policy iteration method for finding a /^-optimal policy can be formulated as in Figure 3.5.1, with (3 = (1 + p)~l. Using the notation of Section 3.2, we can now formulate the policy iteration method for finding a A; discount optimal policy as follows. Let K = k + 3 (we need to evaluate K terms xv-\, two, • • • 1 u>fc+i). The method is shown in Figure 3.5.2. The operator A6(K)D+R6(K) of Step 1, derived from Drazin inverse theory, provides an explicit solution to the policy evaluation problem. It is to be seen whether the latter algorithm could be formulated as a specialization of the method of Newton for solving systems of nonlinear equations, as the former was by Puterman and Brumelle (1979) using the ordinary inverse (A6) 1. 88 Step 0. Select an arbitrary policy 8 G A. Step 1. Policy evaluation. Compute v6 = (ii')~V, where A6 = I- PP6. Step 2. Policy Improvement. Find 7 G A s.t. r 7 + ppiy6 = m a x , e A { r " + pP"v6}. (Choose 7 = 8 if possible). Step 3. If 7 = 8 then stop else set 8 = 7 and go to Step 1. Figure 3.5.1. Policy iteration method for p-optimal policy Step 0. Select an arbitray policy 8 G A. Step 1. Policy evaluation. Compute X6{K) = [A6(K)D + R6{K)]b6{K), as in Corollary 3.2.5. Step 2. Policy improvement. Find 7 G A s.t. 67(/c) + P7(/c)z*(*c) = max^GA{6"(/c) + P"(K)X6(K)} where P 7 ( K ) = I(K) - A 7(AC) and the maximization is done lexicographically. (Choose 7 = 8 if possible). Step 3. If 7 = 6 then stop else set 8 = 7 and go to Step 1. Figure 3.5.2. Policy iteration method for A: discount optimal policy It is important to note, however, that the formulation of Figure 3.5.2 is not the most 89 appropriate for computations. Indeed, we saw in Section 3.2 that the terms u>*, for t = — 1,..., k, can be evaluated successively. Hence it is more efficient to combine Steps 1 and 2 in such a way that only those terms which are really required be computed. There are two ways to do it. One is to successively evaluate x6(i) = tu,_3 and produce A , such that A , = {7 G A , _ ! : 67(») + P V ( » ) > 6"(t) + P"**(»),Vn G A , _ i } where Ao = A , and stop at the smallest i < k for which A,- has only one element. This element is the policy 7 of Step 2. (If A * has more than one element, choose any 1 e A f c ) . Another way is to stop at the smallest i < k for which A,- does not contain 8, and choose 7 G A,- arbitrarily. Then 7 is a better policy than 6, although it is not necessarily the best improvement. This latter approach might require fewer terms to be evaluated than the former. 90 Chapter I V Continuous time Markov decision processes This chapter is mainly a review of known results on continuous time Markov re-ward processes and decision processes. Its purpose is to show the equivalence between continuous time and discrete time Markov decision processes, and to introduce some of the notation that will be used in the next chapter. In Section 4.1, we define the continuous time Markov reward process from the continuous time Markov chain of Cinlar (1975). Then we show, in Section 4.2, that the resolvent has the same algebraic structure as in the discrete time case. Our proof is more detailed than that of Howard (1960) and uses the same approach as in Chapter II. This result is used, in Section 4.3, to obtain a Laurent expansion of the resolvent. The gain and bias terms are also interpreted as limits of the reward functions. Finally, in Section 4.4 the equivalence between continuous time and discrete time processes is reviewed, using a data transformation considered by Howard (1960), Vei-nott (1969), Schweitzer (1971) and Serfozo (1979). 4.1. Continuous time Markov reward processes Let {X(t) : t > 0} be a homogeneous, continuous time Markov chain with state space N = {1 , . . . , n}. Let Sk be the epoch of the A:-th transition, with S0 = 0. Then we have Prob{Sk+l-Sk>t\X{Sk) = i} = e-Xit, i > 0, (4.1.1) where 0 < A,- < oo, for t = 1,..., n. 91 The transition rates are finite (A,- < oo) because the state space is finite [see Cinlar (1975), page 244]. We will also assume that they are positive (A,- > 0). Then T,- = 1/A,- is the expected holding time in state t, for i = 1,..., n, and the time matrix T = diag(r,) is nonsingular. The transition probabilities are defined as P i j = Prob{X(Sk+1)=j\X{Sk)=i}, i , j = l , . . . , n , (4.1.2) with pa = 0 and ]Cy=i Vij = h f ° r * = s o * n a * matrix P = (p,y) is stochastic. For A; = 0,1,2,. . . , let Y(k) = X{Sk). Then the process {Y(k) : A; = 0,1,2,...} is a discrete time Markov chain with transition matrix P, embedded at the transition epochs. The continuous time Markov chain {X(t) : t > 0} is completely determined by the matrix c? = r-1(P-/), (4.1.3) which is called its infinitesimal generator. Assume that X(t) gives the state of a dynamic system. Further, define the earning rate vector r = (r,) as follows. Assume that when the system is in state t, it earns a reward at a rate of u,-,- dollars per unit of time. Further, just before making a transition to state a reward of u,y dollars is earned instantly, for i,j = 1,..., n. From this, the expected earning rates are defined as r, = uu + A,- ]_2 PijUij (4.1.4) [see Howard (1960), Equation (8.16)]. Such a model is a continuous time Markov reward process and is denoted by the triplet (JV,r,Q). Define Z(t) as the total reward earned until time t. With Ei as the conditional expectation operator of Section 3.1, we define the vector V(t) of expected undiscounted 92 rewards by Vi(t) = Ei{Z(t)}, for i = 1,..., n. The function V,(f) is known to be differentiable, and its derivative U j ( t ) has a limit, as t —• oo, for i — 1,..., n. Indeed, define the matrix 11(f) such that *i,it) = Prob{X(0 = ; | X{0) = i}, t > 0. Then Jr,y(£) has a limit as t —* oo [see Heyman and Sobel (1982), Proposition 8-8], and the expected earning rate at time t is n u»(0 = Yl*«(0ryi = ii • • • i»• 3=1 This implies that V(t) = 0(t), because lim = lim - f u,(z)efe is bounded. Let s > 0 be the instantaneous interest rate, and let v(s) be the vector of expected total discounted rewards. That is, «,(s) is the expected total discounted reward of the process over an infinite horizon, when the process starts in state t, for » = 1,... , n. Using the conditional expectation operator Ei, we have vi{8) = Ei{f0° e-*dZ(t)} = / e~9tdVi(t), i = l , . . . , n . Jo The expectation and integration operators can be permuted since the result is finite, because V(t) — 0(t). Conditioning on the first transition, we get «<(«)= / \f e-txUiidx + e-at^pi3(uij + v3is))]\ie-x>t dt, Jo Mo j 9 t i J which can be integrated directly, giving 1 3& ' 3?* 93 Multiplying both sides of Equation (4.1.5) by (1 + sr,), and using matrix notation, we get {I+sT)v(3) = Tr + Pv(s). Collecting terms and premultiplying both sides by T~l, we finally obtain the resolvent equation {8l + T-lA)v(a) = r, (4.1.6) where A = I — P. It is not obvious whether the matrix (sI+T~1A) is nonsingular for any small enough interest rate s. We will show, in the next section, that it is indeed the case, and that the resolvent R9 = (si + T~* A) 1 has exactly the same properties as the resolvent Rp of a discrete time process. Hence all the results of discrete time dynamic programming that are based on the algebraic structure of Rp are also valid for continuous time Markov decision processes. 4.2. Algebraic structure of the resolvent All the results of Chapter HI, for discrete time decision processes, are based on the fact that ind(A) = 1, where A = I - P. We show that mA(T~l A) = 1, where T is the diagonal matrix of expected transition times. Our argument is similar to that of Howard (1960), page 98, but we emphasize the algebraic structure. The result is not true for a general matrix A of index 1. For example, let We have ind(A) = 1, because We have, however, that mc\(T~l A) = 2, because r-M-*-•(; -3)™-'(; s ) ! ) * 94 so that T~lA is similar to J2(0) which has index 2, by Lemma 1.2.1. We now proceed to show that the result is true when A = I—P, with P a stochastic matrix. Lemma 4.2.1. Suppose P is an irreducible, stochastic matrix, where A = I — P. Then T~lA has the eigenvalue 0 with multiplicity 1. Proof. We proceed as in the proof of Theorem 2.2.4. We know that the column vector e is a right hand eigenvector of P with eigenvalue 1, and that there is a positive row vector to that is the corresponding left hand eigenvector of P . This implies that e is also a right hand eigenvector of T~lA with eigenvalue 0, and that wT is the corresponding left hand eigenvector of T~lA. We show that the corresponding Jordan block of T~lA has dimension 1, by proving that Equation (1.1.6) has no solution. That is, we have to show that T~l Ay = e has no solution y. Suppose y is a solution. Then because both wt and e are positive vectors. But wt is an eigenvector of T lA with eigenvalue 0, so that (wT)(T~l A) = 0, which is a contradiction. Hence the result Theorem 4.2.2. Suppose P is a stochastic matrix. Then ind(T XA) = 1, where A — I - P. Proof. Let m be the number of recurrent classes of P. Then by Theorem 2.1.3, there is a permutation matrix S such that (wT)(T~lA)y = {wT)e>0, follows. • A = S~X Aom \ 0 0 V 0 0 A, mm J 95 Here AQO = I — Poo is the nonsingular submatrix corresponding to the transient states of P. Now, applying 5 to the diagonal matrix T~x, we have ( TQQ1 AQQ TQQ1 AQI T ' 0 0 1 i4o2 ••• TQQ1 Aom ^ T~lA = S -1 TfiAn 0 \ n T - 1 A s, where is the diagonal submatrix of T corresponding to the »-th subset, for » = 0 , . . . , m. The matrix T^1 A00 is nonsingular, hence all its eigenvalues are nonzero. Moreover, by Lemma 4.2.1, each T^1 An, for t = 1,..., m, has the eigenvalue 0 with multiplicity 1. This implies that the Jordan canonical form of T~l A has exactly m Jordan submatrices with eigenvalue 0, each of dimension 1. Hence ind(T _ 1i4) = 1. • The consequence of Theorem 4.2.2 is that the resolvent R$ of a continous time Markov process exists and has a Laurent expansion analogous to that of the resolvent Rp of a discrete time process. 4.3. Laurent expansion of the discounted rewards Let C = T~l(I — P) = — Q. Then the resolvent of the continuous time process is R9 = (si + C ) - 1 , and the vector of total discounted rewards is given by v(s) = Rgu. Because ind(C) = 1, we know that C# exists. Moreover, let W = / — C C # . The following results follow immediately from the results of Chapter HI. Theorem 4.3.1. Let A be the nonzero eigenvalue of A with smallest modulus. If 0 < s < |A|, then oo = a - i W + £ ( - 5 ) « ' ( C # ) * + 1 . (4.3.1) «=o Proof. Same as Theorem 3.1.1. • Defining g = Wr, h = C^r and to,- = (—a)'(C#)'+1r, for t = 0,1,. . . , we have oo v(s) = s~lg +h + Y^s'wi- (4.3.2) i=i 96 Theorem 4.3.2. The terms g, h and u;,-, for i = 1,2,..., of the series (4.3.2) are the unique solution of the recurrence Cg = 0 (4.3.3) g + Ch = r (4.3.4) Wi-i +Cwi = 0, x = l , 2 , . . . , (4.3.5) where wa = h. Proof. Same as Corollary 3.2.3. • This implies that the policy iteration method of Chapter III can be applied directly to the continuous time Markov decision processes, using the matrix C instead of A. Further, the terms g and h of the series (4.3.2) can still be interpreted as gain and bias, respectively. We do this as follows. Recall from Section 4.1 that the expected total discounted reward u,(s) is the Laplace transform of the expected earning rate function u,(«), that is, f°° vi(3) = I e 3tVi(i)dt, t = l , . . . , n . Jo We need the following Tauberian theorem. Theorem 4.3.3. [Heyman and Sobel (1982), Property A-8]. Suppose /(a) is the Laplace transform of /(£). Then lim fit) = lim sf(s), provided both limits exist. Corollary 4.3.4. The vector g of Equation (4.3.2) is the limiting earning vector, that is, lim u,(i) = gi, i = 1 n. t—•oo Proof. We know from Section 4.1 that u,(t) has a limit as t -* oo. By Equation (4.3.2), we also have that lim svi(s) = g{. The result follows from the theorem. • 97 Corollary 4.3.5. The vector h of Equation (4.3.2) is the bias vector, that is, lim (V,(t) - gtt) = hi, i = 1,..., n. t—• O O Proof. Let /,(r) = V,(*) - gtt. Then fi{s) = f°° e-'Viit) dt Jo = f°° e-gtVi(t)dt- I™ e-'git Jo Jo Vi{s) gi s s 2 ' where the last equality follows from elementary properties of the Laplace transforms. Hence Urn sfi{s) = hi, «\o by Equation (4.3.2). Also, /,(<) has a limit as t —* oo because it is bounded and u,-(t) converges. The result follows from the theorem. • We note that it is sufficient to use ordinary limits. In the discrete time case, it was necessary to use Gesaro convergence when the transition matrix was periodic. It is not necessary in the continuous time case, because the exponential distribution of the transition times automatically does the time averaging. 4.4. Equivalent Markov decision processes In this section, the continuous time Markov decision process is defined. By exam-ining the reward processes associated to stationary policies, a whole family of discrete time Markov decision processes are shown to be equivalent to a given continuous time process. A continuous time Markov decision process is a system that evolves according to a process {X(t) : t > 0} in which an action is selected every time a transition occurs, and which earns a reward that depends both on the state cf the system and the action selected. 98 More precisely, suppose the fc-th transition occurs at time t = Sk and that the system moves to state t = Yk = X(Sk). Then an action a = ak € A,- is selected and the next transition will occur according to the following probability function: Prob{y f c + 1 = j,Sk+l>t + x\Yk = i, Sk = t,ak = a} = p&e"*?*, (4.4.1) where 0 < A? < oo, p?- > 0, p£ = 0 and n Y^Pij = 0, for a € A,-, t = 1,...,n. 3 = 1 During the interval (Sk,Sk+i), a reward is earned at a rate of u£ dollars per unit of time and an additional amount of u?- dollars is received at time Sfc+i. For a policy TT, define Z*{t) as the total reward earned until time t, and the vec-tor V*(t) such that 17(0 = » ' = l , . - . , n . Then with an interest rate s > 0, the expected total discounted reward vector v*(s) is defined as «?(«)= f°°e-'dVfit), i = l , . . . , n . The objective is to find a policy JT to select the actions so that v*(s) is maximized for all i = 1,..., n. It has been shown that there is a stationary policy that is optimal [see Veinott (1969)]. Hence it is sufficient to consider only the stationary policies. Suppose a fixed stationary policy 8 £ A is used, where A = A i x . . . x A n . Then the action ak = 8i is selected if the fc-th transition is to state t = X{Sk). Hence the system is the continuous time Markov reward process (N, r*,Q*), where and , f - A f if i = j 99 As was pointed out in the previous section, the solution of a continuous time Markov reward process has the same structure as in the discrete time case, and can be computed using the same algorithms. Further, the discount optimality criteria can be denned in just the same manner, from the Laurent series, and a A;-discount optimal policy can be found with the policy improvement methods. This similarity between continuous and discrete time processes was first pointed out by Howard (1960) for the expected total discounted reward criterions as well as the average reward criterion. Howard (1960) also noted that a given continuous time Markov decision process is equivalent to a whole family of discrete time processes, in the sense that for any stationary policy 6 G A , all these processes have the same value. This can be seen as follows. Consider the continuous time Markov reward process (N, r, Q), where r = rs and Q = Q6, and let a be a positive real constant. Then the reward process (N, ar, aQ) is equivalent to the original process, up to a scaling of the time unit. Now by choosing a appropriately, we can define a stochastic matrix P such that aQ = P - I. But then the discrete time Markov reward process (N,ar,P) has the same value as the original continuous time process, because A = I-P = -aQ so that the terms of the series for the discrete time process satisfy the same equations as for the continuous time process (compare Corollary 3.2.3 and Theorem 4.3.2, with C=-Q). If the scalar ot is chosen such that 0<a<l/Xf, V a € A „ i = l , . . . , n , then such a stochastic matrix P6 exists for all policies 6 G A . Hence such an a defines a discrete time Markov decision process which is equivalent to the original continuous time process. 100 This equivalence was suggested by Howard (1960) for the expected total discounted ^ reward criterion and the average reward criterion. Veinott (1969) extended the result to the discount optimality criteria. The same device was suggested by Schweitzer (1971) as a data transformation to guarantee the convergence of the method of successive approximations to find a gain optimal policy (average reward criterion). To accomodate models with pft < 0 (strictly speaking, those are semi-Markov decision processes), the scalar a must be selected such that 0 < (1 -pu)a < 1/A°, Va G A i , t = 1 n. Finally, the equivalence was extended to models with a countably infinite state space by Serfozo (1979). 101 Chapter V Semi-Markov decision processes In this chapter, we apply the matrix decomposition approach to semi-Markov deci-sion processes. We follow the development of Denardo (1971), in which Laplace-Stieltjes transforms are used to derive a Laurent expansion of the expected total discounted re-wards. In Section 5.1, we define a semi-Markov reward process in a way that is compatible with the definition of a semi-Markov decision process in Heyman and Sobel (1984). The material of Section 5.2 is completely new. The matrix decomposition approach is applied to the moments of the transition probability function, and a condition of Denardo (1971) about the entries of the first moment is interpreted algebraically. A new algorithm is also introduced to solve some singular systems of equations involving the first moment. In Section 5.3, many results of Denardo (1971) are reviewed, and their proofs are formulated in terms of matrix decomposition, using the results of Section 5.2. The Laurent expansion of the expected total discounted rewards is obtained, emphasizing the matrix formulation. In Section 5.4, the application of the above results to semi-Markov decision pro-cesses is reviewed, mainly following Denardo (1971). Finally, in Section 5.5 some special cases of semi-Markov decision processes are reviewed. As pointed out in Denardo (1971) and Heyman and Sobel (1984), both 102 discrete time and continuous time Markov decision processes can be seen as special cases. Their formulation as semi-Markov decision processes is a good illustration of the model. Also, the equivalence between the semi-Markov and continuous time Markov decision processes is reviewed, under the average reward criterion. The equivalence was used in Ross (1970), but our approach stresses the fact that the processes are not equivalent with respect to other optimality criteria (e.g., bias optimality, etc.). 5.1. Semi-Markov reward processes Let {Y(k),S(k) : A : = 0,1,2,...} be a two-dimensional process such that Y(k) E N = {1 , . . . ,n}, 5(0) = 0 and, for all x > 0 and A ; = 0,1,2,. . . , Prob{r(A: + 1) = j, S(k + 1) < x + t | y (0), 5(0), . . . , Y(k) = i, S(k) = x) Prob{F(A: + 1) = j, S(k + 1) < x +1 \ Y{k) = i , 5 ( A : ) = x} Qij(t), *>0, »•,; = l , . . . , n . (5.1.1) Then {Y(k) : k = 0,1,2,...} is a discrete time Markov chain with transition matrix P = Q(oo), and 5 ( A : ) is the time of the A>th transition. Let X(k) be the reward earned during the A;-th stage of the process. The probability distribution of X(k) is assumed to depend on Y(k), Y(k + 1) and r(k) = 5 ( A : + 1) — 5(A:). We define the conditional reward functions u f -7(x, y) as the expected value of the cumulative reward during [5(AJ), 5 ( A ; ) + y) if Y(k) = t, Y(k+1) = j and r{k) = x, with 0 < y < i , and ui3(x,x) = E{X(k) | Y(k) = i, Y(k + 1) = j,r(k) = x). Unconditioning, we get the reward function Ri(t) such that W)=JjlJo U ty(z,x)dQ,y(x) + j f + U , y ( t , x) (fQ,y(x)j (5.1.2) is the expected reward earned during the interval [5(A:),min{5(A:) + *,5(A:-r 1)}], 103 before the (fc-f-l)-th transition. A sufficient condition for the integrals to exist in (5.1.2) is if Ui3(x,x) and u,y(t,z) have bounded variation in [0, oo] and [t, oo], respectively. Such a model is called a semi-Markov reward process and is denoted by the triplet (N,R, Q), where R(t) is the vector of expected reward functions of Equation (5.1.2), and Q(t) is the matrix of transition probability functions of Equation (5.1.1). The discrete time and continuous time Markov reward processes of Sections 3.1 and 4.1 are special cases of the semi-Markov reward process. and In the discrete time case, we have directly In the continuous time case, we have <?.-y(0=P.-y(i-«"M). for » # ; ' . and so that by Equation (5.1.2), n r t* f°° 1 '^(0 - E[J (u"'z + Vij)PiAie~XiXdx + J UiitpijXie-XiXdx^ = ( ? + E w w ) ( 1 - - A ' t ) = T+1 ~ e~X,t)> where r,- was defined in Equation (4.1.4). Hence Qij(oo) = Pij and -R,(oo) = r,/A,- is the expected one-stage reward. Now for t > 0, let Z(t) be the total reward earned by the system until time t. We are interested in the vector V(t) of expected total undiscounted rewards until time t, i.e., Vi{t) = Ei{Z{t)}, : = l , . . . ,n . 104 For t > 0, the expectation is finite if the expected number of transitions in [0, t] is finite. A sufficient condition for this in terms of Q(t) is described in the Section 5.2.. Then with interest rate s > 0, we have the total discounted reward t/,(s) defined as v,(s) = / e~Bt dVi(t), for t = 1,..., n. (5.1.3) Jo-Hence v(s) is precisely the Laplace-Stieltjes transform of V(t). Similarly, consider the transforms q(s) and r(s), respectively of Q(t) and R(t). We have qi3is)= f e-'dQtjit) (5.1.4) Jo-and r,(s)= / e—' dRi(t), (5.1.5) JQ-for i,j = 1,..., n. We define the A;-th normalized moments Qk and Rk as Qk= f°°tk dQ(t)/k\ (5.1.6) Jo-and Rk= f tkdR{t)/k\, (5.1.7) Jo-for A; = 0,1,2,. . . , whenever they exist. Note that Qk is a matrix and Rk is a vector. These moments are useful because of the following Abelian theorem. Theorem 5.1.1. [See Heyman and Sobel (1982), Property A-9]. Suppose f(s) is the Laplace-Stieltjes transform of F(t). Then jH e-«1*dF(t) = ( - l ) f c ^ / ( S ) . (5.1.8) Corollary 5.1.2. Suppose Q0, Qi,. • •, Qk exist and are finite, for some k > 0. Then q{s) = QQ - sQi + ... + (s)kQk + o(sk). (5.1.9) Proof. Evaluate Equation (5.1.8) at 3 = 0, with /(s) = <fty(s). By (5.1.6), we have WUo = (^)« for ^ = 0,...,*. The result follows by taking a Taylor expansion of g«y(s) in a positive neighbourhood of s - 0. • 105 Corollary 5.1.3. Suppose Ro,Ri,. ..,Rt exist and are finite, for some I > 0. Then r(a) = Ro - sRi + ... + (-s)eRt + o(a'). (5.1.10) Proof. Same as the previous corollary, but with /(a) = r,(a). • A similar series expansion will be derived in Section 5.3 for the expected total discounted reward function t/(a). But first, some preliminary results on the algebraic structure of the first moment Q\ of the transition probability function are derived. 5.2. Decomposition of the first moment Let us explore the algebraic properties of the moments Q0 and Q\ of the transition probability function Q(t). These properties will be useful for expressing the solution of the singular equations which will be encountered when deriving the Laurent expansion of the discounted reward function v(s). The properties of the zeroth moment Q0 were explored in Chapters II and III, since where P = (pij) is the transition matrix of the Markov chain embedded at the transition epochs. For the rest of this section, we will denote Q0 by P . Also, we assume that P is a stochastic matrix. Let also A = I — P. Then, as in Chapter Hi , A satisfies Equation (1.3.1), and hence A* exists and is given by (1.3.2). Again, we will need the limiting matrix P* =I-AA#. Recall that where the identity matrix is m x m, with m the number of recurrent classes of P . For the rest of this section, we denote the first moment Qi of Q(t) by H, i.e., Qi = H = (hij). By Equation (5.1.6), if p,j > 0 then n.y/p.y is the expected holding (5.2.1) 106 time in state t, conditioned on the next state being j, for i, j = 1,..., n. The following results show that if the matrix H has at least one nonzero entry in each of the recurrent classes of the embedded Markov chain P , then the expected number of transitions in a finite time interval is finite. Lemma 5.2.1. Suppose P is irreducible, and suppose that H 0. Then Proof. Let TT > 0 be the row vector of the stationary probabilities of P . Then where a,- = £ y = i h-ij, for i = 1,..., n. There is at least one * for which a,- > 0, because H ^ 0. Then where f3 = £2y=i wyQfy > > 0, and the result follows. • Lemma 5.2.2. Suppose P has m recurrent classes, and suppose H has at least one nonzero entry in each recurrent class of P . Then Proof. That rank(P*) = m is clear from Equation (5.2.1). We prove the other equality in the case m = 2. There is no added difficulty with m > 2, except for heavier notation. As in Theorem 2.1.3, let T be a permutation matrix such that rank(P*#P*) = rank(P*) = 1 and rank(P) = 1. Now rank(P*#P*) = rank(P*) = m P = T 107 Then O P * P* u •r01 •r02 r = r | o P;, o 13-1, and it is easy to verify that o o p 2 y (° Poi Hi i Pi i PQ2 H22 P22 \ 0 PX\QIIPI\ 0 r - 1. 0 0 P2*2ff22P2v This implies that rank(P*#P*) > m, because rank(P,*,i?,-,Pl*i) = 1, for i = l m, by Lemma 5.2.1. The equality will be proved in the corollary. • Now let us apply the similarity transformation of Equation (5.2.1) to the matrix Q, and write where H22 is an m x m matrix (note that Hu and H22 are not the same here as in the proof of Lemma 5.2.2). Corollary 5.2.3. #22 is nonsingular. Proof. Using Equations (5.2.1) and (5.2.2), we have that Hence rank(if 2 2) = rank(P*#P*) = m, by Lemma 5.2.2. The result follows because H22 is an m x m matrix, thus of full rank. • This corollary has a consequence that will be important for deriving (and evaluat-ing) the Laurent expansion of u(s). Recall from Chapter I that a singular system of equations is said to be compatible if it has a solution. The following theorem shows that we can select the solution x of a compatible system Ax = 6 (5.2.3) so that the system Ay = c- Hx (5.2.4) be compatible as well. 108 Theorem 5.2.4. Suppose A is decomposed as in Equation (1.3.1), with B = I — P u , and H as in (5.2.2), and suppose Then the system (5.2.3) is compatible and has a unique solution x such that (5.2.4) is compatible, given by Moreover, there are infinitely many solutions y to Equation (5.2.4), given by y = 5 " 1 = 5 " 1 ( 5 - 1 ( C l - ~ H " X ^ , (5.2.6) where y 2 is arbitrary. Proof. Recall from Section 1.4 that the system (5.2.3) is compatible iff 62 = 0. Writing (5.2.3) and (5.2.4) in decomposed form, we have 5 z i = 6 x , (5.2.7) Byi =ci - Huxi - H12X2 (5.2.8) and 0 = c 2 - H21X1 - #22*2- (5.2.9) Since B is nonsingular, z i is uniquely determined by (5.2.7), and (5.2.4) is compatible iff z 2 satisfies (5.2.9). But since fT22 is nonsingular, by Corollary 5.2.3, we have that X2 is uniquely determined. Finally, yt is uniquely determined by (5.2.8), and y 2 is arbitrary since it appears in none of the equations. • The decomposition approach is not only a convenient way to express the solution of the above equations, it also provides an efficient algorithm for its computation. We saw, in Chapter I, how the matrix A can be efficiently decomposed in a way that is equivalent to Equation (1.3.1). Moreover, we also saw how to use the decomposition to compute the component X\ of the solution, satisfying (5.2.7). 109 We now show how to compute x2 efficiently, to satisfy (5.2.9). Let d2 = c2 — H2lxi. Then x2 is the solution of the nonsingular system h22x2 = d2. The following lemmas show how to use the decomposition (1.3.1) to compute H2\Xi and H22 efficiently. First, let z2 = H2iXi, and write *=(:;)=(£iK <«•»> Clearly, if we can compute z efficiently, then we have d2 = c2 — z2 immediately. Lemma 5.2.5. Proof. We have 5 I T 5 - ( ? ) = 5 5 - ( *» £ ) s s - ( ? ) = ( £ ) . , = . . • Hence z is obtained by premultiplying (*x) by S - 1 , then by the matrix H and finally, by 5. The system H22x2 = cfo can be solved efficiently using Gaussian elimination, pro-vided that the matrix H22 can be computed efficiently. That this is the case can be seen as follows. Lemma 5.2.6. H22 = (0 I)SHS~l (J), where J is the m x m identity matrix. Proof. Because Sgg-i = [ Hn B \ 2 \ \ B2i H22 J ' by Equation (5.2.2), so we get to / ) « « - • ( ; ) - ( . , / , ( * » £ ) ( ; ) - * „ . • n o This implies that, in order to compute fT 2 2 , we simply apply the transformation S~l to m vectors, then we apply H, then S and finally, m row vectors. The total number of flops is proportional to mn 2 . In fact, this product and the PLU decomposition of f f 2 2 needs to be done once only, because the solution for several right hand vectors 6 and c can be computed from the same decomposition. The above method does not require parsing the Markov chain into its recurrent subsets. The chain structure was used only to derive the properties of the decomposition of the matrix H. A different approach was used in Denardo (1971), where the solution vector x is expressed in terms of P* t A* and, within each subchain, in terms of the coefficient /? of the proof of Lemma 5.2.1. Our method is more economical because it does not require P* and A* to be explicitly computed. Another algorithm that does not require the parsing into its subchains is the method of Federgruen and Spreen (1980). It is based on a modified Gaussian elimination procedure similar to the PLUQ decomposition of Chapter I which we used in the fast decomposition algorithm. Their formulation of the procedure does not include row interchanges, however, and fails to produce the desired elimination for some transition matrices. An example is This matrix A cannot be made upper triangular by Gaussian elimination without row interchanges. Fortunately, their results are still valid because it suffices to relabel the states in order to produce a transition matrix P for which the elimination is valid. In the above example, we get which is already upper triangular. I l l 5.3. Series expansion of the discounted rewards In this section, we return to the original notation Qo,Qi,..., for the moments of the transition probability function. At the root of the Laurent expansion of v(s) of Denardo (1971) is the following renewal equation for V(f), which is obtained by conditioning on the time x of the first transition. Recalling the definition of Q(t) and R(t), we have where the integral includes the possible instantaneous rewards at times 0 and t. In matrix form, we have By the convolution theorem [see Heyman and Sobel (1982), Property A-12a], the Laplace-Stieltjes transforms satisfy v{s) = r(a) + g(s)t/(s), which can be written as This determines the function v(s) uniquely, for small s, because of the following lemma. Lemma 5.3.1. Suppose Qi has at least one nonzero entry per recurrent class of Q0. Then the matrix I — q(s) is nonsingular. Proof. By Equation (5.1.9), we have Decompose A = /— Qo as in Equation (1.3.1) and write Qi as in Equation (5.2.2). We have (5.3.1) (5.3.2) / - q(s) = I - QQ + sQi + o(s). (5.3.3) 112 where B is nonsingular, and recall that H22 is also nonsingular, by Corollary 5.2.3. This implies that for s > 0 small enough, both B + sHxx + o(s) and sJ?22 + o(s) are nonsingular, and the result follows. • We now obtain the Laurent expansion of the discounted rewards v(s) in the special case when all moments exist. This is a special case of Theorem 1 of Denardo (1971). Theorem 5.3.2. Suppose all moments of q{s) and r(s) exist and are finite. Then t/(«) = -V_x + V 0 + sVx + s2V2 + ... s where V,-, for i = —1,0,1,2,..., are uniquely determined by AV-X = 0 and 1-1 AK = (-!)%+.(-l)'"yQ.wVy. * = 0,1,2,., J=-l (5.3.4) (5.3.5) (5.3.6) Proof. Rewrite Equation (5.3.2) using the infinite expansions corresponding to (5.1.9), (5.1.10) and (5.3.4). We have [I-Qo + sQi - s2Q2 + . . V_! + V0 + 8Vl+...] = [R0- sRx + s2R2 -...]. Collecting terms, we get an infinite sequence of linear systems of equations, which can be written as follows, in block matrix form: (5.3.7) where A = I — QQ. The notation of (5.3.7) is equivalent to the sequence of (5.3.5) and (5.3.6). We still have to show that the V,- terms are uniquely determined. We do this by induction on i. The induction hypothesis is that V _ l f . . . , V,-_x are uniquely determined by rows j = — 1 , . . . ,», for some i > —1. We have to show that there is a unique solution Vi for which row t + 1 is a compatible system. 113 / A 0 • • • ( 0 \ Qi A 0 Vo Ro -Q2 Qx A 0 ... Vx -Ri -Q2 Qx A 0 v2 R2 * . \ : J The hypothesis is trivially true for i = — 1, because the system (5.3.5) is homoge-neous. Now with V _ i , . , V , _ i fixed, we can write Equation (5.3.6) as AVi = 6, (5.3.8) where b = (-l)*i2,- + X)y=-i (—l)*~3 Qi-jVj- The next equation, with V i + 1 , can be written as AVi+^c-QM, (5.3.9) where c = ( - l j , ' + 1 J2 , - + l + Ey=_i {-^Y+i~3Qi+i-jVj- By Theorem 5.2.4, there is a unique V,- satisfying (5.3.8) for which (5.3.9) is compatible. This completes the proof. • The result of Theorem 5.3.2 can also be extended to the case when only a finite number of moments are known. Then the terms of the series for v(s) are solution of a truncated system of equations. The following result is Lemma 2 of Denardo (1971). We give a new proof based on matrix decomposition. Lemma 5.3.3. Suppose 6(s) = o(s) and let x(s) be the solution of [I-q(3)}x{s) = b(s). Then x(s) = o(l), i.e., l im,_ 0+ x(s) = 0. Proof. Write J — q(s) as in Equation (5.3.3) and let Then Equation (5.3.10) becomes (Auia) Al2(*)\ («M\_(bi{»)\ \A2l(s) A22(s)j \x2(s)J \b2(s)J> where An(s) = B + sHn +o(s) Ai2(s) = sHx2 +o(s) A2i(s) = sH2i + o(s) 114 (5.3.10) (5.3.11) A22{s) = sH22 + 0(s). Hence A22l{a) = \H22X + o(a _ 1 ). Now let C{s) = An(a) - Ax2{a)A22l{8)A2x(a). Then C(s) = B + o(l), so that C-^s) = fl-1 + o(l). We can write the solution of (5.3.1) directly as Xl{s) = C- 1(3)[6!(S) - i4 l 2 (3)A 2 2 (3)6 2 (s)] = o{s) and = *ni')Ms) ~ ilai(*)xi(s)] = o(l). This completes the proof. • The following result is Theorem 1 of Denardo (1971), giving the truncated series of v(s). Our proof is similar to that of Denardo (1971) but we emphasize the formulation with a block lower triangular matrix. Theorem 5.3.4. Suppose Qk+2 and Rk+x are finite, for some k > 0. Then v(a) = -V.t + V0 + aVi + ... + skVk + o(ak), a where V _ i , . . . , Vk are uniquely determined by the finite system of equations (5.3.12) A Qi -Q2 0 A Qi (v-1 ^ f 0 ^ Vo Ro Vx = -Ri , (5.3.13) J \Vk+i J l(- i ) U l R k + u \{-l)k+1Qk+2 ••. -Q2 Qx A) and for which V)t +i is determined up to a term P*y, where y is arbitrary. Proof. Write v(s) as v(s) = - V _ ! + V 0 + aVx + ... + skVk + ek{a). 3 Then, using (5.1.8) with A: + 1 and (5.1.9) with / = k, Equation (5.3.2) becomes [A + aQx - ... - ( - l ) f c + 2 Q f c + 2 + o( 5 f c + 2 ) ] [ a - 1 V _ t + V 0 + sVx + ... + akVk + ek(a)} = [Ro - aRx + ... + {-a)k+1Rk+1 + o{ak+1)}. (5.3.14) 115 Collecting terms, we get the system / A 0 . . . . Qi A ••. -Q2 Qi *•• * • • • \(-l)kQk+i -Q together with A 0 \ / 0 \ V0 — -Rx (5.3.15) {-l)k+lRk+i + E ( - l ) f c + 1 - y Q f c + 1 _ y V y + o(sk+l). (5.3.16) [I - q(s)ek{s) = s k + i Dividing (5.3.16) by sk, we get [I-q(s)][ek(3)/sk} = o(s), so that ek(s)/sk = o(l) by Lemma 5.3.3. Hence ek(s) = o(sk). The extra equation in (5.3.13) ensures that Vk is uniquely determined and finite. The result on Vk+l follows because AP* =0. • 5.4. Semi-Markov decision processes A semi-Afar/rov decision process is a model in which an action has to be selected each time the system makes a transition, and for which the transition probability function Qf3(t) and the one-period expected reward function R*(t) depend on the selected action a € A,-, for i,j = 1,..., n. Without loss of optimality, and as in Denardo (1971), we consider only the policies 6 which are stationary. Again, we denote by A = A i x . . . x A n the set of stationary policies. Given a fixed policy 8 £ A , the model reduces to the semi-Markov reward process (N, R6, Q6), where N = { 1 , . . . , n}, and = (<?$(')) t,j = l , . . . , n . 116 Correspondingly, the moments Q6k and for k = 0 ,1 , . . . , are defined by Equa-tions (5.1.6) and (5.1.7), and the Laurent expansion of the expected total discounted rewards v(s) is given by Theorem 5.3.4, with moments V * i 1 , . . . , V ^ , provided Qk+2 and Rk+i are finite. Similarly to the discrete time case, a stationary policy 8 G A is said to be s-optimal if v6 {s) > «»(«), V7 G A . Further, if 8 is a-optimal for all sufficiently small a > 0, then 8 is said to be optimal. Theorem 3 of Denardo (1971) gives a sufficient condition for an optimal policy to exist. We do not repeat it here. Moreover, a policy 8 is said to be A:-discount optimal if liminf a-fc[t/(a) - v"(a)] > 0, Vn G A . «-»o+ As noted in Denardo (1971), with A _ 2 = A and A f c = {8 G A f c _ ! : Vks > V f c",Vn G Ak-i}, * = - 1 , 0 , . . . , (5.4.1) every policy in Afc is Ar-discount optimal (provided that the relevant moments exist). Further, if there is an optimal policy 8, then 8 G Aoo* Unlike in the discrete time case, there may be no finite k such that a fc-discount optimal policy is optimal. Of course, if there is only one Ar-discount optimal policy (i.e., if |Afc| = 1), then it is optimal. Policies in A _ i and Ao are called gain optimal and bias optimal, respectively. This is because V _ i and V0 can be interpreted as the gain and bias terms, since from Denardo (1971), page 483, we have Urn ^ = VLi (5.4.2) and lim V(t) - W_x = V 0 {C, 1), (5.4.3) t —»• 00 117 where the Cesaro limit is defined as lim f(t) = g (C, 1) t—*oo if i r* Urn - / f{x)dx = g. t-*oo t JQ Hence V _ i is the average reward per unit of time, in the long run, and VQ is the total difference between the actual expected reward V(t) and the long run average V _ i * . Provided the required moments exist for all policies, the policy iteration method of Figure 3.5.2 can be extended for finding a fc-discount optimal policy of a semi-Markov decision process. But now, in the policy evaluation step (i.e., Step 1) we have to solve the system (5.3.13), as in Theorem 5.3.4. The terms V_i , . . . ,Vfc can be evaluated as in Theorem 5.2.4 and Lemmas 5.2.5 and 5.2.6. Moreover, in the policy improvement step (i.e., Step 2), we take the matrix P 7(/c) as P 7 ( « 0 = Qo -Qx <?2 0 Qo *•• -Qx Qo 0 \ (5.4.4) V(-l)fc+2Qfc+3 ••• Q2 -Qx Q0J Hence the policy iteration method for discrete time Markov decision processes is a special case of the policy iteration method for semi-Markov decision processes, in which the matrix of moments is simpler. 5.5. Three special cases In this section, we consider the discrete time and continuous time Markov decision processes as special cases of the semi-Markov decision process. This was already done in Denardo (1971), but we emphasize the similarity of the reward processes, using the matrix equations. We will see that the series expansion (5.3.4) is identical to (3.1.6), in the discrete time case, and to (4.3.1) in the continuous time case. 118 Moreover, we will also consider the equivalence between an arbitrary semi-Markov decision process and a corresponding continuous time Markov decision process, under the average reward criterion (i.e., gain optimality). This equivalence was also considered in Ross (1970), page 161. Again, we show the similarity of the reward processes using the matrix equations. In a discrete time Markov reward process, the transition function Qij(t) is zero for 0 < t < 1, and jumps to p,y for t > 1, for i,j = l , . . . , n . It is convenient to assume that a reward is earned just before a transition occurs, so that The Laplace-Stieltjes transforms are obtained directly as v(s) = e~"*P and r(a) = e_*r. (5.5.1) The discount factor ft over one time period is precisely /? — e~a. As in Chapter III, let /? = (1 + p)~l, where p is the one period interest rate, giving q(s) = (1 + p)~lP and r(s) = (1 + p)~lr. (5.5.2) It is natural, in the discrete time case, to use z-transforms instead of Laplace-Stieltjes transforms, and (5.5.2) gives precisely the z-transforms of V(t) and R{t), with param-eter p. Let us denote the transforms by q(p) and r(p). With this notation, and using the fact that {l + p)~l = l-p + p2-..., Equation (5.3.2) becomes [I-q(p)]v(p) = r{p), (5.5.3) where q{p) = P-pP + p2P-... and r(p) = r - pr + p2r - ... 119 Collecting terms, we get v(p) = p-1V-1+V0 + pVl + ... where V _ i , V 0 , V i , . . . , are a solution of (A 0 . . . P A 0 . . . -P P A 0 . . . P -P P A 0 with A = / — P, as usual. Now, left multiplying by the matrix (5.5.4) f ° \ V 0 r = —r v2 r J \ J (5.5.5) we get // 0 ... \ / / 0 • • « 0 / / 0 • > 0 0 J / 0 • . / A 0 / V _ x > I A 0 • • • V 0 r 0 I A 0 • • • Vi — 0 0 0 I A 0 v2 0 • • * V : J w (5.5.6) This is identical to the recurrence (3.2.7), (3.2.8) and (3.2.9), with twt- = V,-, for t > - 1 . Hence v(p) has the same expansion as in Equation (3.1.6). In the continuous time case, we have, as in Section 5.1, and *•(<) = ^ ( l - « " A r f ) . •)Pij We obtain the Laplace transforms directly as = (1 + 8Ti)~lPij = (1 - STi + 5 2 T ? -and r,(s) = (1 + ST^Tin = (1 - sn + S 2T? - . . .)r,r,-, 120 where r,- = 1/A,-. Using matrix notation, with T — diag(rf), we have q{s) = P — sTP + s 27 / 2 P - . . . and r(a) = Tr - sT2r + s2T3r from which Equation (5.3.2) leads to ( A 0 . . . TP A 0 . . . -T2P TP A 0 . . . T3P -T^P TP A 0 Left multiplying by the following matrix (5.5.7) (5.5.8) \ fV-i \ / 0 \ Vo Tr Vi -. - T 2 r v2 / V ! J (5.5.9) we get (I 0 . . . T 7. 0 . . . 0 T I 0 0 0 T I 0 I : • . • . • ) (A 0 (V-i \ / 0 \ T A 0 • • • Vo Tr 0 T A 0 • • • Vi = 0 0 0 T A 0 v2 0 V'- •J J I : J (5.5.10) By multiplying every row of the system (5.5.10) by T 1 , we obtain precisely the recur-rence (4.3.3), (4.3.4) and (4.3.5), with to,- = V,-, for i > - 1 . Consider now an arbitrary semi-Markov reward process. Let P = Q0 and H = Qi be the zeroth and first moments of Q(t), respectively, and let T be a diagonal matrix such that He = Te, where e is the column vector with all entries equal to L Then the diagonal entries of T are the expected transition times, that is, T = diag(r,). Lemma 5.5.1. Suppose P is an irreducible, stochastic matrix. Then P*HP* — P*TP*. Proof. Because P* = eir, where rr is the row vector of stationary probabilities of P . Then HP* = Heir = Ten = TP*, 121 and the result follows. • Lemma 5.5.2. Suppose P is a stochastic matrix. Then P*HP* = P*TP*. Proof. Proceed as in the proof of Lemma 5.2.2, noting that P*HP* and P*TP* do not depend on the transient parts of H and T. • Suppose T,- > 0, for t = 1,..., n. Then T is nonsingular. Consider a continuous time reward process with expected transition times r,- and reward rates r,-, for i = 1,..., n., where RQ = TV is the zeroth moment of R[t). Theorem 5.5.3. Let gt be the gain of the semi-Markov process and let ge be the gain of the continuous time process. Then g9 = ge. Proof. Let ha and he be the respective bias vectors. Then Theorem 5.3.4 implies that a a (£)-u) <«•«> and (* y (£)-(£)• ™ Now decompose A as in Equation (1.3.1), and write H = S - X ( H " M s a n d l ^ S - 1 (I" £ 2 >U \.£/21 " 2 2 / \J21 i 2 2 / Lemma 5.5.2 implies that / f 22 = ?2 2- Both systems (5.5.11) and (5.5.12) have the same form as (5.2.3) and (5.2.4), with 6 = 0. This implies that X\ = 0, by Equation (5.2.5), and that g, does not depend on H2\ (and ge does not depend on T^i)- The result follows because H22 = T22- • Corollary 5.5.4. The higher order terms (e.g., bias ht and he, etc.) are not equal in general. Proof. Because they depend on H\\, H12 and #21 > which are not necessarily equal to Tu, T\2 and T21. Also, the higher moments of both Q(t) and R(t) may be different, as well. • 122 The significance of Theorem 5.5.3 is that when only the average reward per unit of time (i.e., the gain) of a semi-Markov reward process is required (like for finding a gain-optimal policy in a semi-Markov decision process), then it is not necessary to compute the decomposition of H as in Lemma 5.2.6. Rather, we can simply solve the continuous time problem, which can be done using the same algorithm as for a discrete time process. 123 Chapter VI Stationary characteristics of tandem queues Consider a queueing system with two service stations in series (i.e., in tandem). New jobs entering the system arrive at station 1. After a job has received service at station 1, it moves on to station 2. Jobs leave the system after completing service at station 2. The waiting rooms are finite. There can be at most Ni jobs at station 1 and 7V2 at station 2 (including the jobs that are being served), with iY\ > 1 and AT2 > 1. The blocking protocol is as follows. Let ni be the number of jobs at station 1, and n 2 at station 2. A new job arriving at station 1 when nx = Ni leaves the system immediately, without requiring any service. A job completing service at station 1 when n 2 = JV2 returns to station 1 instantly, where it is treated as a new arrival. Al l jobs are identical and enter the system according to a Poisson process with parameter A. A job requests a random amount of service with mean T\ at station 1 and r 2 at station 2. These service requirements are mutually independent. They are also independent of those of other jobs and of the arrival process. Each station has a total service capacity (or service rate) of one unit of service per unit of time. The service discipline at station j , for j = 1,2, is determined by the functions 7y and 5y as follows [see Disney and Konig (1985), Section HI.2]. Suppose there are ny jobs at station j, with 1 < ny < Nj. Then 7y(ny,£) is the speed of service given to the job at position t at station j, with 1 < I < ny. When a job leaves station j from position £, the jobs in positions 1+1,..., ny move to positions 124 £, . . . , ny — 1. When a job enters station j with ny jobs already present, it moves into position £ with probability 5y(ny + 1,£), with 1 < £ < ny + 1, and the jobs in positions £ , . . . , ny move to £ + 1,..., ny + 1. The functions satisfy E"=i 7j'(nj>^) = 1 E*=i fy(n/>^) = Three service dis-ciplines of particular interest are First-In-First-Out (FIFO), La3t-In-First-Out (LIFO) preemptive and processor sharing. Their functions can be expressed as follows. FIFO: 7i(ny,l) = l , «,•(»* n,-) = 1; LIFO preemptive: 7y(iy, 1) = 1, 6j{nj> 1) = Processor sharing: £) = 1/ny, 5y(ny,£) = 1/ny, £ = 1,...,ny. When the service requirements are exponentially distributed, all three disciplines pro-duce the same queue length process. In the rest of this chapter, we will be interested in obtaining some stationary char-acteristics of tandem systems. First, in Section 6.1 we review the case in which the service requirements have an exponential distribution. In this case, we write the global balance equations using matrix notation. Next, we use the results of Chapters I and II to analyze two numerical procedures for computing the stationary probabilities of the queue length process. In Sections 6.2 and 6.3, we consider the method of successive approximations, or power method [see e.g., Stewart (1978)], and the matrix decomposition method of Wong et al. (1977). Our original contribution is a proof of convergence of the former, taking into account the periodicity of the transition matrix, and a justification of the latter from the point of view of numerical stability. In Section 6.4, we approach the solution of the stationary probabilities from the point of view of job-local-balance (JLB). The notion of local balance was studied by several authors, including Whittle (1968) and (1969), Kingman (1969), Baskett et al. (1975), Barbour (1976), Chandy et al. (1977), Schassberger (1978) and Hordijk and Van Dijk (1981), (1982), (1983a) and (1983b). 125 With the blocking protocol mentioned above, job-local-balance does not hold for our tandem system. Some modifications based on results in Hordijk and Van Dijk (1981) give two other queueing systems which satisfy JLB. The stationary probabilities of the modified systems are then expressed by a simple, explicit formula which is easy to compute. It is the object of Section 6.5 to provide some new numerical evidence that the call congestion (i.e., the probability that a new job is rejected) of the modified systems provide a lower and an upper bound on the call congestion of the original system. While a formal proof of the validity of the bounds, under certain conditions, can also be found in van Dijk and Lamond (1985), it was decided not to include it here, because the technique of proof bears little relationship with the methods discussed in this thesis. 6.1. Global balance equations Let Nj(t) be the number of jobs at station j at time t, with j = 1,2 and t > 0. Then {Nx(t), N2(t) : t > 0} is the queue length process of the system. Suppose that the service requirements have an exponential distribution, with parameters fix = ( f i ) - 1 and /z2 = ( r 2 ) _ 1 . We assume that the queue length process is stationary, that is, its distribution does not depend on t. Let s = (r»i, n 2 ). If N\(t) = nx and iV2(t) = n 2 , for some t > 0, then we say that the system is in state s at time t. The process {JV1(t),iV2(t) : t > 0} is a continuous time Markov chain [see Chapter IV] with state space S = {s = (ri i ,n 2 ) : 0 < nt < Ni,i — 1,2}. Our model is the same as in Wong et al. (1977), with Ni and iV 2 + 1 seats. The transition rate from a state r € S to another state s G S, denoted by R(r, s), is given in Table 6.1.1. Let us also define B(s) as the total rate out of state a. Then for s = (ni, n?) we have B(s) = Aa0(s) + (iiax{s) -K/JaM3) 126 where a*(s) =0, k = 0,1,2, except for Oo ( n 1 , n 2 ) = l if a i ( n n n 2 ) = l if o-2{nun2) = \ if ni < JVi, ni > 0 and n 2 < J\T2, n 2 > 0. to state condition rate (fi! + l , n 2 ) ( n i < ^ ! ) A (ni - 1 , ^ + 1) (ni > 0) and (r^ < N2) p.\ (n i ,n 2 - 1) (n 2 > 0) p.2 Table 6.1.1. Transition rates out of state (ni ,n 2 ) We are interested in the stationary probabilities p(s), a € S, such that p(a) = lim Prob{system is in state a at time t}. t—*oo Those probabilities are the unique solution of the global balance equations p(s)B(s) = ]Tp(r)/E(r,S), aeS (6.1.1) res together with the normalizing condition 5>(«) = 1. (6.1.2) Now let N = (iVi + 1) x (JV2 + 1). Then (6.1.1) and (6.1.2) give a system of AT + 1 linear equations in N unknowns. We will see, below, that one of the balance equations (6.1.1) is redundant, and that p(s) is uniquely determined by the remaining N linear equations. But let us first look at the problem using matrix notation. Let us assume that the states a €. S are ordered in some way (e.g., lexicographically). Then p can be seen as 127 a row vector, B as a diagonal matrix with diagonal entries B(a), and R as an N x N matrix with entries R(r, s). The global balance equations (6.1.1) can then be written as pB = pR (6.1.3) and the normalizing condition (6.1.2) says that p is a probability vector. We can rewrite (6.1.3) as p(B -R) = 0 (6.1.4) which can be transformed into ir(I-P) = 0 (6.1.5) where IT = pB/(pBe) (6.1.6) and P = B'lR. This property was used in the proof of Lemma 4.2.1, where we used the time matrix T instead of the rate matrix B. Here e is just a column vector with all entries equal to 1, and the denominator of (6.1.6) ensures that TT is also a probability vector. Note that B is a diagonal matrix, so that its inverse B~l is easily computed. We will take advantage of this fact in formulating the power method, in the next section. Observe also that Q = B — R is the infinitesimal generator of the queue length process. Moreover, P is the transition matrix of the embedded Markov chain [Ross (1970), chapter 5], for transitions out of a state, scaled to unit transition times. 6.2. Computation of the stationary probabilities We review one method for solving Equation (6.1.5), the power method. In Sec-tion 6.3, another method for solving (6.1.4), the method of Wong et al. (1977), will be reviewed as well. Several other numerical methods for computing the stationary probabilities of Markov chains are described in Stewart (1978), Kaufman (1983) and Harrod and Plemmons (1984). 128 The principal advantage of the power method, for solving the tandem queue prob-lem, is its simplicity. It is easier to implement than other, faster converging, iterative methods. The method of Wong et al. (1977), on the other hand, takes directly advan-tage of the special structure of the infinitesimal generator Q = B—R of Equation (6.1.4). It effectively reduces the original N x N problem [with N = (Ni + 1){N2 + 1)] to a (2JV*! + 2) x (2Ni + 2) problem. Its implementation, however, is significantly more diffi-cult than the power method, because it requires the computation of a Jordan canonical form. Al l numerical results in this chapter were obtained with the power method. Let us analyze the solution of Equation (6.1.5) by the power method. We will proceed to compute IT, from which we obtain p simply as p = KB~x l{«B-le). The advantage of solving (6.1.5) instead of (6.1.4) is that IT can be seen as the left hand eigenvector of P associated with the eigenvalue 1. As we will see, this eigenvector can be computed iteratively using the power method [see Wilkinson (1965)]. Lemma 6.2.1. The embedded Markov chain (i.e., with transition matrix P) is ergodic (i.e., P is irreducible). Proof. We have to show that when the system is in any state r G 5, it can reach any other state s G S with positive probability, in a finite number of steps. This is the case when A, p*i and /z2 are positive (as we assume), because it is possible, with an appropriate order of arrivals and completions, to reach any state s = (m, n 2) € S from state (0,0) with ni + n 2 arrivals and n 2 service completions at station 1. Similarly, the system can return to state (0,0) with ni service completions at station 1 and ni + n 2 service completions at station 2. • Theorem 6.2.2. There is a unique probability vector that is a solution of equation (6.1.5). Proof. Since the embedded chain is ergodic, this follows from Theorem 5.1.2 of Ke-meny and Snell (1960). [Also from our Lemmas 2.2.1 and 2.2.5]. • 129 We know from Corollary 2.4.5 that the power method converges to the stationary probability vector when P is aperiodic. We proceed to show that for the tandem queue problem, the transition matrix P is periodic, with period 3. Let us classify the states into the following three sets: Sm = {s = (ni ,n 2 ) G S : (nt + 2n2) mod 3 = m} m = 0,1,2. (6.2.1) Lemma 6.2.3. The embedded Markov chain is periodic with period 3. Proof. We show that all the possible transitions out of a state r G Sk are into some states s G Sm, where m = (k + 1) mod 3. Let r = (ni, n 2), then k = (ni + 2n2) mod 3, according to (6.2.1). There are at most three possible transitions, as in Table 6.1.1: if ni < Ni then m = [(ni + 1) + 2n2] mod 3; if ni > 0 and n 2 < iV 2 then m = [(ni — 1) + (2(n2 + 1)] mod 3; if n 2 > 0 then m = [m + 2(n 2 - 1)] mod 3. It is easy to verify that m = (k + 1) mod 3 in all three cases. • Lemma 6.2.4. £ff( s) = I k = 0,1,2. Proof. . Because the embedded Markov chain spends exactly one third of the time in any class Sk- • Following Varga (1962), let us assume that the states are ordered such that So comes first, then Si and then 5'2. We have P = and Then (0 Poi 0 0 0 P l 2 1^20 0 0 = (wo JT 2). (To 0 0> -\ 0 Tt 0 l o 0 130 where To = PoiPuPao, etc. Lemma 6.2.5. Each Tk, A:=0,l,2, is an irreducible, aperiodic transition matrix. Proof. See e.g., Varga (1962), page 44, exercise 4. [See also the proof of our Theo-rem 2.2.9]. • Lemma 6.2.6. For A:=0,l,2, let qk be the stationary probability vector of Tk. Then ""fc = |<7fc-Proof. From equation (6.1.5), we have IT = irP, and hence IT = T T P 3 . But then tffc = ffcJfc, and irk has weight | , by Lemma 6.2.4. • Theorem 6.2.7. For Jfc=0,l,2, let qk0) be a probability vector, and qkn) = q(k0)T£. Then qk = Urn qk0)TZ n—KX> is the stationary distribution of Tk. [See Kemeny and Snell (1960), Theorem 4.1.6 for the proof, and also our Corollary 2.4.5]. Corollary 6.2.8. Let = 7 r ( ° ) P N . Then TT= lim n{n) n-*oo if and only if X > ( O ) 0 0 = g, A: = 0,1,2. (6.2.2) Proof. Suppose that J29€Sk ir^(s) = to*, for some arbitrary numbers wk, A:=0,l,2. Then / 7r<3n) \ /tw09o u>iqi vj2q2\ lim j jr< 3 n + 1) j = I tw2g0 u>oqi wtq2 1 and the result follows from Lemma 6.2.6. • The above results indicate that the stationary probabilities can be computed iter-atively, using the power method [Wilkinson (1965), page 570], which we can formulate as in Figure 6.2.1. 131 Step 0: Take an arbitrary probability vector JT(°) satisfying (6.2.2), set k = 0 and specify e > 0. Step 1: Compute = ir^B~l. Step 2: Compute ir( f c + 1) = p^R. Step 3: If ||jr( f c + 1) - < e for some appropriate norm (e.g., 1, 2, oo) then stop else increment k and go to Step 1. Figure 6.2.1. The power method for stationary probabilities The sequence {7 r ( f c ) , k = 0,1,2} converges to the stationary probability vector TT, by Corollary 6.2.8, and the algorithm is extremely easy to implement, using procedures like those shown in Figure 6.2.2. (Note that there are at most three multiplications for every state, in one iteration). The performance of the power method, then, is determined by the speed at which the sequence ir(n) approaches its limit jr. Let 7* be the modulus of the subdominant eigenvalue of Tk, for k — 0,1,2. By Theorem 6.2.7 and Corollary 2.4.7, we see that each of the subsequences 7 r ( 3 n ) , fl-(3n+1) and n(3n+2) approaches ir geometrically, with parameter 70, 71 and 72, respectively. Hence at each step of the power method, the error is reduced by a factor 7 = (max{7o,7i>'/2}) 1 / 3 in average, when the number of iterations n is large. Another important consideration, when using the power method, is how to select the parameter e for the stopping criterion, at Step 3 of Figure 6.2.1. We show, as in Section 3.4, that if 8 is the accuracy required on TT, then we should select e < where A* is the Drazin inverse of A = I — P. 132 CONST DIM1=100; DIM2=100; (* storage dimensions *) TYPE STATE1=0..DIM1; STATE2=0..DIM2; DISTRIBUTION=ARRAY[STATE1,STATE2] OF REAL; VAR NI:STATE 1; N2:STATE2; (* server capacities *) RATEO,RATE1.RATE2:REAL; (* a r r i v a l , service rates *) P:DISTRIBUTION (* p vector *) FUNCTION OUTRATE(I:STATE1;J:STATE2):REAL; (* Total rate of tra n s i t i o n ont of a state *) VAR RATE:REAL; BEGIN (* OUTRATE *) RATE:=0; IF (KN1) THEN RATE: =RATE+RATEO; IF ((I>0) AND (J<N2)) THEN RATE:=RATE+RATE1; IF (J>0) THEN RATE:=RATE+RATE2; OUTRATE:=RATE END (* OUTRATE * ) ; FUNCTION INFL0W(I:STATE1;J:STATE2; VAR P:DISTRIBUTION):REAL; (* t o t a l flow of transitions into a state *) VAR FLOW:REAL; BEGIN (* INFLOW *) FL0W:=0; IF (I>0) THEN FL0W:=FL0W+RATE0*P[I-1.J]; IF ((KN1) AND (J>0)) THEN FL0W:=FL0W+RATE1*P[I+1. J-l] ; IF (J<N2) THEN FL0W:=FL0W+RATE2*P[I,J+l] ; INFLOW:=FL0W END (* INFLOW * ) ; Figure 6.2.2. Pascal procedures for transition rates We see this as follows. Let it be a computed solution, with error e = # — TT. Define the residual error r = TT(J — P ) . Theorem 6.2.9. e = A*r. Proof. Let u be a row vector such that u,- = 1/JV. Then ir is the unique solution of 133 the evaluation equations of dynamic programming: irA = 0 IT + hA = u, where h stands as the "bias" vector. The result follows immediately from Theo-rem 3.4.4. • Now by taking norms, we have M I < un • WAV Hence at Step 3, when we test for ||r|| < c, we have I H I < | | i i * | | . e < | | i l * | | . 5 / | | i l * | | « * . It is our observation that ||A*||2 < 1000 for a wide range of the parameters A, fil} fi-2, Ni and N%. Interestingly, the rate of convergence varies quite strongly. With an accuracy e = 1 0 - 9 , the number of iterations varied from 50 to 3000 iterations, and it is not obvious how the parameters affect the rate of convergence. For most of the curves we produced, in which every point was evaluated using the latest ir vector available, the average number of iterations was between 20 and 50. We did not observe that the buffer sizes TYi and were a determinant factor of the speed of convergence (in terms of iterations required). This does not support an observation of Stewart (1978) that large waiting rooms imply slow convergence. 6.3. The matr ix decomposition method In this section, we describe a method due to Wong et al. (1977), which exploits the special structure of the infinitesimal generator. The method is related to the matrix decomposition approach of the previous chapters and uses the Jordan canonical form of a certain matrix. We briefly review the method and formulate it in algorithmic form. Certain aspects of the method are clarified and we give a justification of the use of the Jordan canon-ical form, from the point of view of numerical stability. We have no computational experience to report, about its performance. 134 The method of Wong et al. (1977) produces a stationary probability vector by direct solution of a singular system of Ni + 1 equations. Let us use the two parameters p = \f\i2 and a = Hi/p*2. Also, assume that the states are sorted in reverse lexicographic order [i.e., p(0,0),p(l,0),...,p(Ar 1,0),p(0,l),...,p(Ar 1,JV 2)]. Note: the order shown on top of Page 5 of Wong et al. (1977) is incorrect. (The error has no effect on their development, however). Then the matrix Q — B — R of Equation (6.1.4) has a block-tridiagonal structure. We have ( - A x I Q = aL -A2 I where Ax = p + a ctL -A3J \ (6.3.1) V p + a -p a J A 2 = A i + I and A 3 = A 2 — aH, with / the identity matrix, H a diagonal matrix with all ones on the main diagonal except ku =0, and L a matrix with all ones on the subdiagonal and all other entries zero. A i , A 2 , A 3 , /, H and L are all (JVi +1) x [Nx +1) matrices and Q is an N x N matrix, where N = (Ni + 1)(AT2 + 1). [Hence Q has (N<2 + 1) x (N2 + 1) blocks]. Partition the stationary probability vector p as p = (po»Pi,.• • >Ptf3)i where p n = (p(0,n),... ,p(Ni,n)), for n = 0,...,N 2. Let also xn = ( p n , p n + i ) , for n = 0 , . . . ,N 2 -1. Then all the stationary probabilities can be expressed in terms of p 0 , using the (2JVx + 2) x ( 2 ^ + 2) matrix B defined as *=(0/72£)-Indeed, Equation (6.1.4) with Q as in (6.3.1) implies that Pi = Po-^i. 135 (6.3.3) and that xn+i = xNB, for n = 0 , . . . , N2 - 1, (6.3.4) hence = X Q B " , for n = 0 , . . . , N2. (6.3.5) Now consider the (Ni + 1) x (Ni + 1) matrix (6.3.6) The following result is shown in Wong et al. (1977). Theorem 6.3.1. The equation poC = 0 uniquely determines all the entries of p0 in terms of p(0,0). The significance of Theorem 6.3.1 is that it reduces the problem of solving the (Ni + 1)(N2 + 1) equations of (6.1.4) to the much smaller problem of solving only Ni + 1 equations. This suggests an efficient algorithm to compute p, as shown in Figure 6.3.1. Step 1. Let K 0 = (I Ax). Step 2. Compute Yn+i = YNB, for n = 0 , . . . , N2 — 1. Step 3. Compute C = r V 2 ( _ ^ H ) . Step 4. Let p(0,0) = 1 and solve poC = 0. Step 5. Compute pi = po-*4x. Step 6. Compute p n + 2 = Pn+iA2 — apnL, for n = 0, . . . , N2 — 2. Step 7. Scale p so that (6.1.2) is satisfied and stop. Figure 6.3.1. Unstable algorithm for stationary probabilities The matrix C of Equation (6.3.6) is produced at Steps 1, 2 and 3. Then Step 4 uses the result of Theorem 6.3.1. The vector po can be computed by either Gaussian 136 elimination or some other technique for singular equations (see Chapter I). Then the other components of the vector p are computed. Step 5 applies Equation (6.3.3) and Step 6 applies Equation (6.3.4). Finally, Step 7 obtains a probability vector. The algorithm of Figure 6.3.1 is efficient because it can be implemented with only one array of dimension (Nt + 1) x 2(/Yi + 1) and it takes 3(7^ + 1)2(JV2 + 1) flops to compute the matrix C. Then Step 4 takes ^(JVi + l ) 3 flops if Gaussian elimination is used. The amount of arithmetic for Steps 5, 6 and 7 is negligible. By comparison, each iteration of the power method takes 4{N\ + l)[N2 + 1) flops. In the case Nx — N2i the algorithm is then equivalent to about Ni + 1 iterations of the power method. For Nx and N2 not too large (say up to 50), our experience with the power method indicates that the algorithm of Figure 6.3.1 would usually be much faster. This algorithm is not recommended, however, because it is numerically unstable, as preliminary tests have shown. It appears that the iteration with the matrix B (both at Step 2 and at Step 6) is inaccurate. The method suggested by Wong et al. (1977) [Note: the numerical unstability problem was not mentioned by these authors] works directly with the powers of B, in Equations (6.3.5) and (6.3.6). Those are obtained directly from the Jordan canonical form of B. The following result was obtained in Wong et al. (1977). and Disney and Chandramohan (1980). Note that their proof is similar to that of a related well known result concerning the eigenvalues of a symmetric tridiagonal matrix, based on the Sturm sequence property [see Wilkinson (1965)]. Theorem 6.3.2. There is a nonsingular matrix S such that where JM(0) is the Jordan submatrix of order Af with eigenvalue 0, with Af = [{Ni + 2)/2j, and D is an (Ni — Af) x (Nx — Af) diagonal matrix with positive diagonal entries that are distinct of each other. (6.3.7) Assuming that the canonical form (6.3.7) is available, then the powers of B can 137 also be computed efficiently, so that Equations (6.3.5) and (6.3.6) become and o-(/ A1)B(IJ'M)"' d°n,)S-'. This leads to the algorithm of Figure 6.3.2. Step 1. Compute 5, D, and S - 1 as in Equation (6.3.7). Step 2. Compute C as in Equation (6.3.9). Step 3. Let p(0,0) = 1 and solve p0C = 0. Step 4. Compute pi = poA\ and let x0 = (po pi). Step 5. Compute x n as in Equation (6.3.8), for n = 1,..., JV2. Step 6. Scale p so that Equation (6.1.2) is satisfied. Figure 6.3.2. Modified algorithm for stationary probabilities The difficult part of this algorithm is at Step 1. Although the theorem guarantees that the the eigenvalues are distinct, it does not indicate how well separated they are. Very close eigenvalues are known to cause numerical difficulties in computing the eigenvectors [see Golub and Wilkinson (1976)]. We have no computational experience to report, at the moment, concerning this problem. 6.4. Job local balance and bounds In this section, we review briefly the job-local-balance property (JLB) and show that it is not satisfied by the tandem queueing system under consideration. Certain modifications, based on results in Hordijk and Van Dijk (1981), are proposed and the 138 (6.3.8) (6.3.9) modified models are shown to satisfy JLB. It is argued intuitively that these models provide a lower and an upper bound on the call congestion of the original model. If Ni = JV2 = oo then it is well known [Jackson (1957), (1963), Kingman (1969), Kelly (1979)] that for the exponential case and for any service discipline, the stationary probabilities are given by the product form (with c a normalizing constant): p(ni,n 2) = c ( — ) ( — ) n i>0 , n 2>0. Pi M2 In addition, as shown by Barbour (1976), Schassberger (1978), Kelly (1979) and Hordijk and Van Dijk (1982), for a number of service disciplines such as computer sharing or a LIFO-preemptive discipline (but not a FIFO discipline), this product form solution remains valid for the non-exponential case, that is, with non-exponential service re-quirements. This phenomenon is known in the literature as "insensitivity". In different frameworks and terminology, Whittle (1968) and (1969), Kingman (1969), Baskett et al. (1975), Barbour (1976), Chandy et al. (1977), Schassberger (1978), Hordijk and Van Dijk (1982), (1983a) and (1983b), have demonstrated that product form solutions and insensitivity properties are related to a notion of partial balance. Roughly speaking, partial balance says that the global balance equations, which are necessary and sufficient for the stationarity of a probability distribution, are satisfied with a particular decomposition in local equations. More precisely, as analyzed in Hordijk and Van Dijk (1983a), all these terminologies have the interpretation that for each job present in the system and for any system configuration, that job itself satisfies a notion of stationarity or more precisely, job-local-balance (JLB). It is easily seen that with finite buffer sizes, JLB can not be satisfied (for instance, if the first node is saturated whereas the second node is not, then at the first node jobs can enter at a positive rate but only leave at a zero rate so that these rates can not balance). This can also be seen by looking at the transition diagram in Figure 6.4.1. 139 1 0 1 : + + + : n , - l n, n,+l N, Figure 6.4.1. Failure of job local balance The dashed arrows indicate the transitions which are not possible at the boundaries, thus destroying the job local balance. (An arrow is labelled t if it applies to a job at node t, for t = 1,2, and 0 for a job outside the system). The JLB notion, however, would guarantee the simple form of the stationary prob-abilities. Therefore, the purpose of this section is to find modifications of the finite tandem system which do satisfy JLB and which may provide approximations or bounds for stationary characteristics of the original model. More precisely, we concentrate on the stationary probability that an arriving job is blocked from entering the system. In communication engineering this is historically known as the "call congestion". The 140 modifications given are based on results given in Hordijk and Van Dijk (1981) and the intuition that they will provide a lower and an upper bound for the call congestion of the original tandem queue. We define two modifications of the original model, from which we intuitively ex-pect to obtain lower and upper bounds on the call congestion. These modifications, suggestively called the lower and upper bound models, are characterized as follows. Lower bound model A job is blocked from entering the system (rejected) only if the total number of jobs present in the system is equal to Nx + JV2. A job which completes its service at node 1 is always allowed to enter node 2 instantly. Upper bound model In addition to the blocking protocols of the original model, a job is also blocked from entering the system (rejected) if n 2 = AT2 and a job which completes its service at node 2 is prohibited from leaving the system if nt = Ni, in which case it must return in the queue for a new service at node 2. It is intuitively clear that the lower bound model should have a lower blocking probability than the original model, since it makes a more efficient use of the Nx 4 - N2 available places, by sharing them among the two stations. Similarly, the upper bound model would be expected to have more congestion, since it has more stringent conditions for acceptance of a new job and it sometimes gives extra service to a completed job. As shown in the above reference, job-local-balance is closely related to the algo-rithmic computation of the stationarity probabilities. In the special case of a standard Jackson type network, this algorithmic computation simply leads to the product form probabilities. By using a notion equivalent to JLB, it has been shown in Sections 7 and 8 of Hordijk and Van Dijk (1981) that the stationary probabilities of the lower and upper bound models, in the exponential case, are given by Pr, and Pu as follows. 141 With Cj and c2 normalizing constants: ^ ( n 1 , n 3 ) = {^ ^ ' ) " ' ( A r 2 ) if 0 < ni;0 < n 2; n x + n 2 < Ni + JV2 otherwise. otherwise. (6.4.1) Although these probabilities exhibit a product form, it must be realized that the use of the expression "product form" is not fully justified because of the constraints of the admissible regions. Clearly, afterwards, the stationarity can just as well be verified by checking the global balance equations. The JLB analysis has been instrumental, however, in the derivation of the above models as potential candidates. The lower bound model was derived by modifying the original model so that the dotted arrows of Figure 6.4.1 be included. On the other hand, the upper bound model was derived so that the transitions which are not locally balanced be excluded. Furthermore, in addition to an algorithmic computation of the stationary prob-abilities, job-local-balance is associated with the insensitivity property [cf. Barbour (1976), Schassberger (1978) and Hordijk and Van Dijk (1982) and (1983b)]. Under this property, the stationary probabilities depend on the service distributions only through their means. Hence for service disciplines satisfying a job-local-balance condition [see the -K"2-criterion in Hordijk and Van Dijk (1983a)] the probabilities given by (6.4.1) and (6.4.2) are valid as stationary probabilities regardless of the form of the service distri-butions. A computer sharing or LIFO-preemptive discipline satisfies such a condition. Hence, for such disciplines the bounding method may be expected to be insensitive as well, that is to provide bounds that hold for all service distributions. From now on we use the subscript L (U) to indicate a quantity for the lower (upper) bound model, and no subscript for the original model. Now, the stationary probabilities represent the fraction of time spent by the system in each state. Since Poisson arrivals see time averages [see Brumelle (1971) and Wolff (1982)], the call congestions are given 142 by B = P{nx = Ni) BL = PUni +n2 = Ni + N2) Bu = Pu{ni = NxVn2 = N2), (6.4.3) (6.4.4) (6.4.5) where P() represents the stationary probability of an event. Based on the intuitive arguments at the beginning of this section, we may expect that for the exponential case. Moreover, we would also expect the same results in the non-exponential case as well, under service disciplines satisfying JLB. Section 6 of van Dijk and Lamond (1985) contains a formal proof of (6.4.6) for the exponential case and under the condition that A > fi2. It also briefly indicates a similar proof for (6.4.7) with Hi < fi2 and suggests a proof for the general case. In addition to the validity of the bounds, a bounding method must also provide an indication of their accuracy. In Section 6.5, the values of B, Br. and Bu, as given by (6.4.3), (6.4.4) and (6.4.5), will therefore be compared for a wide range of the system parameters. The numerical results also support the validity of (6.4.6) and (6.4.7), in general. As a first illustration, let us briefly consider the case of A = px = p2. Then, one easily verifies that: BL<B (6.4.6) and B< Bu (6.4.7) BL = 2/(Nr+N2 + 2) Bu = {ft + N2)/{NiN2 +Ni+ N2) so that for Nx = N2 = N and with N large we observe that: Bu ~ 2B^. Hence, very accurate bounds in percentage can not be expected. On the other hand, for N large 143 it holds that Bu ~ 0 and the numerical results of section 6.5 will show that even on a percentage basis, Bi, is then very accurate. A converse statement holds with small values of N for the upper bound. A more detailed investigation follows. 6.5. Performance results 6.5.1. Exponential case In the case of exponential service requirements, the blocking probabilities were computed as described in Section 6.2 and compared with the lower and upper bounds, for a wide range of parameter values. Since only the proportions of A, ftx and /z2 are relevant, we chose (without loss of generality) A = 1. Below we will discuss our major observations as illustrated by Tables 6.5.1 and 6.5.2 and Figures 6.5.1 and 6.5.2. Table 6.5.1. This tabulates the blocking probabilities for varying service speeds 0.1, 0.5 and 2.0, as well as buffer sizes (5,5), (5,10) and (10,5), in order to find out the accuracy of the bounds for large blocking probabilities, and simultaneously to observe the dominance of a bottleneck service speed and buffer size. As for the bounds, we found: • For B > 0.9, the bounds are very accurate, within 4% of the correct value in the worst case. • For B from 0.50 to 0.55, the worst case is accurate within 23% and the lower bound is always within 7% accuracy. • For B < 0.02, the upper bound is less than 0.035. Further, the following performance results appeared. • The smallest service rate is strongly dominant; the size of the second rate is hardly of influence (e.g., cf. pi = 0.1, p2 = 0.1; H\ = 0.1, /z2 = 0.5). • The smallest buffer size is strongly dominant; the size of the other buffer size is hardly of influence (e.g., cf. N\ = 5, N2 = 5; N\ = 5, N2 — 10). Moreover, the different blocking levels were characterized by (recall that A = 1): • B > 0.9 when the smallest rate is 0.1; • B w 0.5 — 0.55 when the smallest rate is 0.5; • B drops to less than 0.02 when both rates are 2.0. 144 N2 Pi BL B Bu 5 5 0.1 0.1 0.909184 0.916667 0.947369 0.5 0.900000 0.900026 0.909357 0.5 0.1 0.900000 0.900026 0.909357 0.5 0.549973 0.584101 0.673684 2.0 0.500366 0.508094 0.511811 2.0 0.5 0.500366 0.500853 0.511811 2.0 0.002694 0.017682 0.031250 5 10 0.1 0.1 0.906294 0.909091 0.947369 0.5 0.900000 0.900001 0.909100 0.5 0.1 0.900000 0.900000 0.909357 0.5 0.533333 0.547311 0.670319 2.0 0.500011 0.507937 0.508055 2.0 0.5 0.500011 0.500018 0.504240 2.0 0.000122 0.015905 0.016346 10 5 0.1 0.1 0.906294 0.916667 0.947369 0.1 0.5 0.900000 0.900026 0.909357 0.5 0.1 0.900000 0.900026 0.909100 0.5 0.533333 0.583336 0.670319 2.0 0.500011 0.500591 0.504240 2.0 0.5 0.500011 0.500381 0.508055 2.0 0.000122 0.000744 0.016346 Table 6.5.1. Numerical values with A = 1, varying Hi and / i 2 Table 6.5.2. Throughout our computations it appeared that, as a function of the service speeds, the worst results occured with equal parameters A = / i x = / i 2 . This makes sense since if one of them is significantly smaller, then that one will become the dominant bottleneck value. In Table 6.5.2, we can check the accuracy in this worst case for blocking probabilities varying from 0.6 to almost 0. We varied the buffer sizes from (1,1) up to (40,20). As for the bounds we now found: • With Ni = AT2 and B > 0.10, the accuracy in both directions is within 30%. • For Ny 7^ AT2 and B > 0.10, the accuracy is maintained for the upper bound. 145 N2 BL B{NltN2) B{N2,Ni) Bu 1 1 0.500000 0.600000 0.600000 0.666667 2 1 0.400000 0.533332 0.533334 0.600000 2 2 0.333333 0.422222 0.422222 0.500000 3 1 0.333333 0.512193 0.512196 0.571429 3 2 0.285714 0.379571 0.379579 0.454545 3 3 0.250000 0.325026 0.325026 0.400000 5 1 0.250000 0.501736 0.501743 0.545455 5 2 0.222222 0.347796 0.347809 0.411765 5 3 0.200000 0.280451 0.280480 0.347826 5 4 0.181818 0.244335 0.244373 0.310345 5 5 0.166667 0.222356 0.222356 0.285714 10 10 0.090909 0.124192 0.124192 0.166667 20 10 0.062500 0.084538 0.104327 0.130435 20 20 0.047619 0.064373 0.064373 0.090909 40 10 0.038462 0.046216 0.097846 0.111111 40 20 0.032258 0.034501 0.056293 0.069767 Table 6.5.2. Numerical values with A = Vl = A»2 = 1 • For iVi 7^ N2 and B < 0.10, the accuracy is maintained in the bottleneck direction (that is, for the lower bound if Ni > N2\ for the upper bound if N\ < N2). As in Table 6.5.1, again the dominance of the smallest buffer can be observed [for instance, compare (2,1), (3,1) and (5,1)]. Further, as a particular result of the equality of A, (ii and fi2, the calculations could be restricted by using a symmetry property. This will be discussed below (see Lemma 6.5.1). The blocking probabilities could be classified as by • B < 0.54 if NUN2 > 2. • B < 0.13 UNUN2 > 10. 146 ju2=5.0 M2 = 2.0 ^2=1.0 /u2=0.5 M2=0.2 -| 1 1 1 1 M l 1 2 3 4 5 0 Nl=2, N2=5 Nl=5, N2=5 n Ml ^2=5.0 /u2 = 2.0 ^2=1.0 M2 = 0.5 M2 = 0 2 -1 M l 5 0 -1 Ml 5 Nl=2, N2=2 Nl=5, N2=2 Figure 6.5.1. Blocking probability vs service rates Figure 6.5.1. Here the blocking probabilities are plotted against pi, the service rate at station 1, for several values of the rate at station 2. The solid line represents 147 the blocking probability, the short dashed line is the lower bound and the dashed line is the upper bound. Those curves show the distance between the bounds and the true value. In accord with intuition, the lower bound is more accurate when the bottleneck is at station 2 (i.e., when N2 or /z 2 is small), and the upper bound is more accurate when the bottleneck is at station 1 (i.e., when Nx or u.^ is small). This can be argued by that in the lower bound model the blocking occurs when both stations are full, which is the case most of the time in the original model when the bottleneck is at station 2. When the bottleneck is at station 1, however, most blocking occurs when only the first station is full, because the second station is very seldom saturated, in both the original and upper bound models. Both bounds are most inaccurate when both rates are close to 1 (i.e., when pi « fi2 w A)> where they are both in error by up to 30% approximately. Figure 6 .5.2. In this figure we give contour curves for fixed blocking probabilities as a function of y.\ and fi2. For each value of the blocking probability, the upper curve represents the upper bound, the middle curve is the true value and the lower curve is for the lower bound. Each curve partitions the space of the service rates into two regions: points above the curve have a lower blocking probability, and points below the curve have a higher blocking probability. Hence they indicate how to select pi and fi2 when designing a system with constraints on the call congestion. Moreover, the bottleneck effect is shown more dramatically here than in Figure 6.4.1. Symmetry property. In Table 6.5.2, the bounds are symmetric in Ni and N2, because p.\ = \i2. Moreover, the blocking probabilities for the original model were obtained by evaluating the stationary probabilities only once, because of the following result. Let p (ni ,n 2 ; iV 1 , iV2 ,A , / i 1 , / i 2 ) be the stationary probability that the original model be in state (ni,n 2). We have the following symmetry. 148 M 2 5 - i I1 V 2 3 Nl=2, N2=5 5 M2 4-2-1-Nl=5, N2=5 M2 5n „ M 2 4- 4-2-2 3 Nl=2, N2=2 MI Nl=5, N2=2 MI B = 0.1 B=0.2 - B=0.5 B=0.8 Figure 6.5.2. Blocking probability contonr curves Lemma 6.5.1. p{n1,n2',NuN2,X,fiUfi2) =v{N2-n2iNx - nuN2,NX)n2,nu\) (6.5. 149 Proof. Consider the system in which the empty seats (i.e., gaps) denote the states, instead of the number of jobs. Then whenever a job moves to the right, a gap moves to the left. Hence both systems have the same stationary probabilities. • Table 6.5.1 shows some values with various service rates p\ and ji2> with blocking probabilities ranging from near 0 to near 1. While it seems natural to fix A and vary \t\ and /i2, Lemma 6.5.1 suggests that it would be more economical to fix pi = 1 and vary A and fi2. We did not, however, attempt to produce other tables using this symmetry. The idea of considering the gaps moving in the opposite direction appears to be related to the reversibility concept of Yamazaki et al. (1985), although our result is different. 6.5.2. Hyperexponential case To support the conjectured insensitivity of the bounds, we also computed the block-ing probabilities in the case of hyperexponential service distributions; that is, for service requirements of two successive exponential phases both with parameter 71 at node 1 and similarly with parameter 72 at node 2. Hence, the mean service requirements are Ti = 27,--1, i= 1,2. Since the insensitivity of the product forms is only guaranteed for service disciplines satisfying a notion of JLB, we chose as natural candidate a computer sharing discipline with unit rate. Hence, if only one job is present at a node, it is served at rate 1, whereas if 2 jobs are present they are both served simultaneously each at a rate 0.5. Further, since the state description has to be extended, such as by listing the number of residual phases for each job, we restricted ourselves to the buffer sizes Ni = iNT2 — 2. (The number of states in this case is 36). Similarly to the exponential case, as described in Section 6.2, the blocking proba-bilities were calculated using the power method. Essentially, only the transition matrix of the embedded Markov chain became more complex, particularly because of the re-quired blocking protocol. Although the theory of JLB suggested insensitivity results only for a protocol which prescribes a repetition of a complete service at node 1 when 150 T2 BL Bw BR BE Bu 1 1 0.33333 0.38000 0.39857 0.42222 0.50000 2 2 0.62016 0.64220 0.65539 0.67580 0.72727 3 3 0.74040 0.75419 0.76477 0.77986 0.81818 4 4 0.80352 0.81376 0.82228 0.83404 0.86487 5 5 0.84209 0.85032 0.85737 0.86697 0.89286 2 5 0.80484 0.80957 0.81503 0.82102 0.84615 4 5 0.82684 0.83540 0.84322 0.85283 0.88048 2 10 0.90022 0.90111 0.90223 0.90325 0.91247 2 0.5 0.52381 0.57244 0.57345 0.57759 0.60000 Hyperexponential case Numerical values with A = 1 and N\ = N2 = 2 Table 6.5.3. Comparison with hyperexponential service times blocking at node 2 occurs [see Hordijk and Van Dijk (1983a)], we also investigated the protocol where only the last exponential phase of the service needs to be repeated after a blocking. Note that this latter protocol can also be interpreted as interruption of only the last phase. Hence, with a fixed mean service time rt at node 1 and a service distribution of n successive phases, the expected time of the last phase is ( n7 1 ) _ 1 , which tends to zero if n gets larger. Thus, this latter protocol approximately performs as a "waiting protocol" in which a job only has to wait at the preceding node exactly until the next node gets a vacancy. We will therefore refer to our original and this new blocking protocol as "returning", respectively "waiting", and use the notation: BE' for the blocking probability in the exponential case (as by interruption); BR: for the blocking probability under the returning blocking protocol; Bw' for the blocking probability under the waiting blocking protocol. 151 Even when the bounds are very close (within 1%) we observed (see Table 6.5.3) that Br, < Bw < BR < BE < Bu. (6.5.2) This observation suggests the following conclusions (all w.r.t. computer sharing). 1. The bounds hold for phase type distributions under the returning protocol; that is, they are insensitive bounds, as was expected based upon JLB. 2. The bounds are also insensitive under the waiting protocol; this could have been expected for the upper bound as from (6.5.3) below. For the lower bound, we could give an intuitive argument similar to the exponential case. 3. The blocking protocols are related by This can be argued as follows: BR < BE' in the exponential case the returning protocol can be seen as an in-terruption of the total exponential service. Hence, a blocked job always still has to complete a full exponential service with mean TX. For phase type distributions it may occur that during the first phase of a service the second node becomes saturated and then free again, without interrupting the service at all. The expected service time still needed by a job at node 1 after a saturation at node 2 will therefore be less than rx. Clearly, smaller sojourn times at node 1 imply a less frequent congestion and therefore blocking at node 1. Bw < BR: a repetition of only the last phase requires less time than the repetition of all phases. Hence, as above, we conclude less blocking. Bw < BR < BE- (6.5.3) 152 Conclusion This research was intended to produce the following contributions to the fields of queueing and dynamic programming. First, the matrix decomposition approach suggests a new way to look at the eval-uation of the stationary characteristics of the models. It is hoped that this approach, which is mathematically simple and direct, will contribute to make the models easier to use and understand, and stimulate their development and application. Second, the formulation of probabilistic models in the language of matrix algebra puts in evidence certain computational aspects of the solution. It is hoped that this work has contributed to point out some similarities between the computational prob-lems of applied probability and the standard solution techniques of numerical algebra. Finally, each one of the individual results is, of course, a contribution in its own right. 153 Bibliography [1] Aho, A.V. , Hopcroft, J.E. and Ullman, J.D. (1974). The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Mass. [2] Anstreicher, K . M . and Rothblum, U.G. (1985). Computing the index and Drazin inverse using the shuffle algorithm. Working Paper AR5384, School of Organization and Management, Yale University, New Haven, CT 06520. [3] Barbour, A.D. (1976). Networks of queues and the method of stages. Adv. Appl. Prob. 8, pp. 584-591. [4] Baskett, F., Chandy, M . , Muntz, R. and Palacios, J . (1975). Open, closed and mixed networks of queues with different classes of customers. J.A.C.M. 22, pp. 248-260. [5] Berman, A. and Plemmons, R.J. (1979). Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York. [6] Blackwell, D. (1962). Discrete dynamic programming. Ann. Math. Statist. 33, pp. 719-726. [7] Brumelle, S.L. (1971). On the relation between customer and time averages in queues. J. Appl. Prob. 8, pp. 508-520. [8] Campbell, S.L. and Meyer, C D . Jr. (1979). Generalized Inverses of Linear Trans-formations. Pitman, London. [9] Chandy, K . M . , Howard, J.H. and Townsley, D.F. (1977). Product form and local balance in queueing networks. J.A.C.M. 24, pp. 250-263. [10] Cjnlar, E. (1975). Introduction to Stochastic Processes. Prentice-Hall, Englewood Cliffs, N.J . [11] Denardo, E.V. (1970). Computing a bias-optimal policy in a discrete-time Markov decision problem. Operations Research 18, pp. 279-289. [12] Denardo, E.V. (1971). Markov renewal programs with small interest rates. Ann. Math. Statist. 42, pp. 477-496. [13] van Dijk, N . and Lamond, B.F. (1985). Bounds for the call congestion in tandem queues. Working Paper #1078, The University of British Columbia, Faculty of Commerce and Business Administration, 2053 Main Mall, Vancouver, B.C., Canada V6T 1Y8. 154 [14] Disney, R.L. and Chandramohan, J . (1980). A correction note on "Two finite M / M / l queues in tandem: a matrix solution for the steady-state". Opsearch 17, pp. 181-183. [15] Disney, R.L. and Konig, D. (1985). Queueing networks: a survey of their random processes. SIAM Review 27, pp. 335-403. [16] Drazin, M.P. (1958). Pseudo-inverses in associative rings and semigroups. Amer. Math. Monthly 65, pp. 506-514. [17] Faddeeva, V . N . (1959). Computational Methods of Linear Algebra. Dover, New York. [18] Federgruen, A. and Spreen, D. (1980). A new specification of the multichain pol-icy iteration algorithm in undiscounted Markov renewal programs. Management Science 26, pp. 1211-1217. [19] Funderlic, R.E. and Plemmons, R.J. (1981). LU decomposition of M-matrices by elimination without pivoting. Lin. Alg. Appl. 41, pp. 99-110. [20] Gantmacher, F.R. (1959). The Theory of Matrices 1 (English translation). Chelsea, New York. [21] Golub, G.H. and Van Loan, C F . (1983). Matrix Computations. The Johns Hopkins University Press, Baltimore. [22] Golub, G.H. and Wilkinson, J.H. (1976). Ill-conditioned eigensystems and the com-putation of the Jordan canonical form. SIAM Review 18, pp. 578-619. [23] Greville, T.N.E. (1973). The Souriau-Frame algorithm and the Drazin pseudoin-verse. Lin. Alg. Appl. 6, pp. 205-208. [24] Harrod, W.J. and Plemmons, R.J. (1984). Comparison of some direct methods for computing stationary distributions of Markov chains. SIAM J. Stat. Comput. 5, pp. 453-469. [25] Hartwig, R.E. (1976). More on the Souriau-Frame algorithm and the Drazin inverse. SIAM J. Appl. Math. 31, pp. 42-46. [26] Heyman, D.P. and Sobel, M.J . (1982). Stochastic Models in Operations Research, Vbiume /. McGraw-Hill, New York. [27] Heyman, D.P. and Sobel, M.J . (1984). Stochastic Models in Operations Research, Volume II. McGraw-Hill, New York. [28] Hordijk, A. and Van Dijk, N . (1981). Networks of queues with blocking. Perfor-mance 81 (ed. K . J . Kylstra). North-Holland, pp. 51-65. [29] Hordijk, A. and Van Dijk, N . (1982). Stationary probabilities for networks of queues. In Applied Probability—Computer Science: The Interface (eds. L . Disney and T.J . Ott), Birkhauser, Vol. II, pp. 423-451. 155 [30] Hordijk, A. and Van Dijk, N . (1983a). Networks of queues, part I: Job-local-balance and the adjoint process, part H: General routing and service characteristics. Pro-ceedings International Seminar on Modelling and Performance Evaluation Method-ology, INRIA, Vol. I, part I, pp. 79-104, part H, pp. 105-135. To appear in Lecture JVotes in Control and Information Sciences, Springer Verlag, Berlin. [31] Hordijk, A. and Van Dijk, N . (1983b). Adjoint processes, job-local-balance and insensitivity for stochastic networks. To appear in Proceedings 44th Session JSI, Madrid, Spain. [32] Howard, R.A. (1960). Dynamic Programming and Markov Processes. Wiley, New York. [33] Jackson, J.R. (1957). Networks of waiting lines. Operations Research 5, pp. 518-521. [34] Jackson, J.R. (1963). Jobshop-like queueing systems. Management Science 10, pp. 131-142. [35] Johnston, R.L. (1982). JVumericai Methods: a Software Approach. Wiley, New York. [36] Karlin, Samuel. (1969). A First Course in Stochastic Processes. Academic Press, New York. [37] Kaufman, L. (1983). Matrix methods for queueing problems. SIAM J. Sci. Stat. Comput. 4, pp. 525-552. [38] Kelly, F.P. (1979). Reversibility and Stochastic Networks. Wiley. [39] Kemeny, J .G. and Snell, J.L. (1960). Finite Markov Chains. Van Nostrand, New York. , [40] Kingman, J.F.C. (1969). Markov population processes. J. Appl. Prob. 6, pp. 1-18. [41] Lamond, B.F. and Puterman, M.L. (1985). Generalized inverses in discrete time Markov decision processes. Working paper #1069, The University of British Colum-bia, Faculty of Commerce and Business Administration, 2053 Main Mall, Vancou-ver, B.C., Canada V6T 1Y8. Submitted, Math. O.R. [42] Meyer, C D . Jr. (1980). The condition number of a finite Markov chain and per-turbation bounds for the limiting probabilities. SIAM J. Alg. Disc. Meth. 1, pp. 273-283. [43] Miller, B.L. and Veinott, A.F. Jr. (1969). Discrete dynamic programming with a small interest rate. Ann. Math. Statist. 40, pp. 366-370. [44] Puterman, M.L. and Brumelle, S.L. (1979). On the convergence of policy iteration in stationary dynamic programming. Math. O.R. 4, pp. 60-69. [45] Ross, S.M. (1970). Applied Probability Models with Optimization Applications. Holden-Day, San Francisco. [46] Rothblum, U.G. (1981). Resolvent expansions of matrices and applications. Lin. Alg. Appl. 38, pp. 33-49. 156 [47] Schassberger, R. (1978). The insensitivity of stationary probabilities in networks of queues. Adv. Apl. Prob. 10, pp. 906-912. [48] Schweitzer, P.J. (1971). Iterative solution of the functional equations of undis-counted Markov renewal programming. J. Math. Anai. Appl. 34, pp. 495-501. [49] Serfozo, R.F. (1979). An equivalence between continuous and discrete time Markov decision processes. Operations Research 27, pp. 616-620. [50] Stewart, G.W. (1984). Rank degeneracy. SIAM J. Sci. Stat. Comput. 5, pp. 403-413. [51] Stewart, W.J. (1978). A comparison of numerical techniques in Markov modelling. Communications of the A.C.M. 21, pp. 144-152. [52] Varga, R.S. (1962). Matrix Iterative Analysis. Prentice-Hall, Englewood-Cliffs, N.J . [53] Veinott, A .F . Jr. (1966). On finding optimal policies in discrete dynamic program-ming with no discounting. Ann. Math, Statist. 37, pp. 1284-1294. [54] Veinott, A.F . Jr. (1969). Discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist. 40, pp. 1635-1660. [55] Veinott, A.F . Jr. (1974). Markov decision chains. In Studies in Optimization 10, M A A Studies in Mathematics, edited by G.B. Dantzig and B.C. Eaves. [56] Whittle, P. (1968). Equilibrium distributions for an open migration model. J. Appl. Prob. 5, pp. 567-571. [57] Whittle, P.(1969). Nonlinear migration processes. Internal. Statist. Inst. Bull. 42, Bk. 1, pp. 642-647. [58] Wilkinson, J.H. (1965). The Algebraic Eigenvalue Problem. Clarendon Press, Ox-ford. [59] Wilkinson, J .H. (1982). Note on the practical significance of the Drazin inverse. In Campbell, S.L. (ed.), .Recent Applications of Generalized Inverses. Pitman, Boston, pp. 82-99. [60] Wolff, R.W. (1982). Poisson arrivals see time averages. Operations Research 30, pp. 223-231. [61] Wong, B., Giffin, W. and Disney, R.L. (1977)^ Two finite M / M / l queues in tandem: a matrix solution for the steady-state. Opsearch 14, pp. 1-8. [62] Yamazaki, G., Kawashima, T. and Sakasegawa, H. (1985). Reversibility of tandem blocking queueing systems. Management Science 31, pp. 78-83. 157 BERNARD F. LAMOND Public a t i o n s: B.F. Lamond (1985). Matrix Methods in Queueing and Dynamic Programming. Ph.D. d i s s e r t a t i o n , The University of B r i t i s h Columbia. N. van Dijk and B.F. Lamond (1985). Bounds for the c a l l congestion i n tandem queues. Submitted, Operations Research. B.F. Lamond and M.L. Puterman (1985). Generalized inverses i n di s c r e t e time Markov decision processes. Submitted, Mathematics of O.R. B. Lamond and N.F. Stewart (1981). Bregman's balancing method. Transportation Research, Vol. 15B, pp. 239-248. B. Lamond (1980). Balancement de matrices et optimisation d'entropie. Master's t h e s i s , Universite de Montreal.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Matrix methods in queueing and dynamic programming
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Matrix methods in queueing and dynamic programming Lamond, Bernard Fernand 1985
pdf
Page Metadata
Item Metadata
Title | Matrix methods in queueing and dynamic programming |
Creator |
Lamond, Bernard Fernand |
Publisher | University of British Columbia |
Date Issued | 1985 |
Description | We investigate some modern matrix methods for the solution of finite state stochastic models with an infinite time horizon. Markov and semi-Markov decision processes and finite queues in tandem with exponential service times are considered. The methods are based on the Drazin generalized inverse and use matrix decomposition. Unlike the related Jordan canonical form, the decompositions considered are numerically tractable and use real arithmetic when the original matrix has real entries. The spectral structure of the transition matrix of a Markov chain, deduced from non-negative matrix theory, provides a decomposition from which the limiting and deviation matrices are directly obtained. The matrix decomposition approach to the solution of Markov reward processes provides a new, simple derivation of the Laurent expansion of the resolvent. Many other basic results of dynamic programming are easily derived in a similar fashion and the extension to semi-Markov decision processes is straightforward. Further, numerical algorithms for matrix decomposition can be used efficiently in the policy iteration method, for evaluating the terms of the Laurent series. The problem of finding the stationary distribution of a system with two finite queues in tandem, when the service times have an exponential distribution, can also be expressed in matrix form. Two numerical methods, one iterative and one using matrix decomposition, are reviewed for computing the stationary probabilities. Job-local-balance is used to derive some bounds on the call congestion. A numerical investigation of the bounds is included. It suggests that the bounds are insensitive to the distribution of the service times. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-08-06 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0097294 |
URI | http://hdl.handle.net/2429/27124 |
Degree |
Doctor of Philosophy - PhD |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1986_A1 L35.pdf [ 7.38MB ]
- Metadata
- JSON: 831-1.0097294.json
- JSON-LD: 831-1.0097294-ld.json
- RDF/XML (Pretty): 831-1.0097294-rdf.xml
- RDF/JSON: 831-1.0097294-rdf.json
- Turtle: 831-1.0097294-turtle.txt
- N-Triples: 831-1.0097294-rdf-ntriples.txt
- Original Record: 831-1.0097294-source.json
- Full Text
- 831-1.0097294-fulltext.txt
- Citation
- 831-1.0097294.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0097294/manifest