"Business, Sauder School of"@en . "DSpace"@en . "UBCV"@en . "Sawaki, Katsushige"@en . "2010-02-25T21:55:50Z"@en . "1977"@en . "Doctor of Philosophy - PhD"@en . "University of British Columbia"@en . "This dissertation applies policy improvement and successive\r\napproximation or value iteration to a general class of Markov decision processes with discounted costs. In particular, a class of Markov decision processes, called piecewise-linear, is studied. Piecewise-linear processes are characterized by the property that the value function of a process observed for one period and then terminated is piecewise-linear if the terminal reward function is piecewise-linear. Partially observable Markov decision processes have this property.\r\nIt is shown that there are e-optimal piecewise-linear value functions and piecewise-constant policies which are simple. Simple means that there are only finitely many pieces, each of which is defined on a convex polyhedral set. Algorithms based on policy improvement and successive approximation are developed to compute simple approximations to an optimal policy and the optimal value function."@en . "https://circle.library.ubc.ca/rest/handle/2429/20920?expand=metadata"@en . "PIECEWISE LINEAR MARKOV DECISION PROCESSES WITH AN APPLICATION TO PARTIALLY OBSERVABLE MARKOV MODELS by KATSUSHIGE SAWAKI Bachelor of Economics, Nanzan U n i v e r s i t y , 1968 Master of Economics, Nanzan U n i v e r s i t y , 1970 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF THE FACULTY OF GRADUATE STUDIES (F a c u l t y of Commerce and Business A d m i n i s t r a t i o n ) We accept t h i s t h e s i s as- conforming to the r e q u i r e d standard THE UNIVERSITY OF BRITISH COLUMBIA June, 1977 (c*) Katsushige Sawaki, 1977. DOCTOR OF PHILOSOPHY i n In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the Head of my Department or by his representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Research S u p e r v i s o r : P r o f e s s o r Shelby L. Brumelle. ABSTRACT T h i s d i s s e r t a t i o n a p p l i e s p o l i c y improvement and suc-c e s s i v e approximation or value i t e r a t i o n to a gen e r a l c l a s s o f Markov d e c i s i o n processes with d i s c o u n t e d c o s t s . In p a r t i c u l a r , a c l a s s o f Markov d e c i s i o n p r o c e s s e s , c a l l e d p i e c e w i s e - l i n e a r , i s s t u d i e d . P i e c e w i s e - l i n e a r processes are c h a r a c t e r i z e d by the p r o p e r t y t h a t the value f u n c t i o n o f a process observed f o r one p e r i o d and then terminated i s p i e c e w i s e - l i n e a r i f the t e r m i n a l reward f u n c t i o n i s p i e c e w i s e - l i n e a r . P a r t i a l l y o b servable Markov d e c i s i o n processes have t h i s p r o p e r t y . I t i s shown t h a t there are e-optimal p i e c e w i s e - l i n e a r value f u n c t i o n s and p i e c e w i s e - c o n s t a n t p o l i c i e s which are simple. Simple means t h a t there are o n l y f i n i t e l y many p i e c e s , each of which i s d e f i n e d on a convex p o l y h e d r a l s e t . Algorithms based on p o l i c y improvement and s u c c e s s i v e approximation are' developed to compute simple approximations to an op t i m a l p o l i c y and the op t i m a l value f u n c t i o n . TABLE OF CONTENTS Page ABSTRACT i ACKNOWLEDGEMENT i i i Chapter I INTRODUCTION 1 II MODEL FORMULATION AND ASSUMPTIONS 4 I I I PROPERTIES OF U g AND U* 19 IV THE ALGORITHMS IV.1. The method of Successive Approximation 25 IV.2. The method of P o l i c y Improvement 2 9 V IMPLEMENTATION OF THE ALGORITHMS FOR MODEL 1 V . l . I n t r o d u c t i o n 40 V.2. Subroutine UDELTA (5,V,U gV) . . . . 44 V.3. Subroutine USTAR (V,U*V,6) 4 7 V.4. Successive Approximation 51 V.5. P o l i c y Improvement 54 V.6. Subroutine Q(B,a,V) f o r Model 2 . . 57 BIBLIOGRAPHY 62 - i i -ACKNOWLEDGEMENT I w i s h t o e x p r e s s my s i n c e r e g r a t i t u d e t o P r o f e s s o r S h e l b y B r u m e l l e , C h a i r m a n o f my t h e s i s c o m m i t t e e , f o r h i s i n v a l u a b l e g u i d a n c e and c o n s t a n t e n c o u r a g e m e n t . T h i s work w o u l d n o t have been p o s s i b l e w i t h o u t h i s g u i d a n c e and I am d e e p l y g r a t e f u l . I w o u l d a l s o l i k e t o t h a n k P r o f e s s o r s U. Haussmann and M.L. Puterman f o r t h e i r many h e l p f u l comments and s u g g e s t i o n s . I a l s o w i s h t o e x t e n d my g r a t i t u d e t o P r o f e s s o r L.G. M i t t e n f o r h i s s u g g e s t i o n s and e n c o u r a g e m e n t and t o P r o f e s s o r I . V e r t i n s k y f o r h i s c o n t i n u o u s e n c o u r a g e m e n t and f i n a n c i a l s u p p o r t d u r i n g my d o c t o r a l programme. I a l s o w i s h t o t h a n k T h e r e s a Fong f o r h e r e x c e l l e n t t y p i n g . S p e c i a l t h a n k s go t o my w i f e , Emiko, f o r h e r u n d e r -s t a n d i n g and p a t i e n c e . Chapter I INTRODUCTION The combined t h e o r i e s of dynamic programming and Markov d e c i s i o n processes have been a p p l i e d to many areas i n c l u d i n g i n v e n t o r y , queuing, and machine maintenance problems. T h i s t h e s i s develops a theory f o r a ge n e r a l c l a s s of dynamic programming models as w e l l as algori t h m s which y i e l d p o l i c i e s t h a t are both \"simple\" and e - o p t i m a l . The approach taken i s to c o n s i d e r a dynamic programming problem f o r an a r b i -t r a r y s t a t e Markov d e c i s i o n processes over an i n f i n i t e h o r i z o n . A t p r e s e n t there are no computational a l g o r i t h m s f o r models i n which the s t a t e space i s a continuum. However, al g o r i t h m s f o r some p a r t i a l l y observable Markov models and f i n i t e (at most countable) s t a t e Markov d e c i s i o n processes have been developed. The f o r m u l a t i o n o f our general model i s motivated by c o n s i -d e r a t i o n of the s p e c i a l s t r u c t u r e which the p a r t i a l l y o bservable models possess. The p a r t i a l l y o b s e rvable Markov process, i n t r o d u c e d by Dynkin [17], c o n s i s t s of two s t o c h a s t i c processes, the core process ( Z n , n = 1,2,...}, which cannot d i r e c t l y be observed, and the s i g n a l process {S n, n = 1,2,...} which becomes known a t each d e c i s i o n epoch n = 1,2,... . The core process i s a Markov c h a i n and the s i g n a l process i s p r o b a b i l i s t i c a l l y r e l a t e d to the core process by the c o n d i t i o n a l p r o b a b i l i t y Y- F L \u00C2\u00B0f o b s e r v i n g - 2 -a s i g n a l 0 given t h a t the core process i s i n s t a t e i . Dynkin shows t h a t the s t a t e occupancy p r o b a b i l i t y r e p r e s e n t s a s u f f i -c i e n t s t a t i s t i c f o r the complete p a s t h i s t o r y . Astrom [3] a l s o c o n s i d e r e d a s i m i l a r model wi t h f i n i t e s t a t e s and f i n i t e a c t i o n s over'a f i n i t e h o r i z o n , u s i n g the method of s u c c e s s i v e a p p r o x i -mation to f i n d e-optimal c o s t v e c t o r s , however, i t i s onl y a p p l i c a b l e to problems i n two dimensions. Smallwood and Sohdik [60] have independently obtained s i m i l a r r e s u l t s . L a t e r , Sondik [61] extended t h i s model to the i n f i n i t e - h o r i z o n and i n t r o d u c e d the c l a s s of f i n i t e l y t r a n s i e n t p o l i c i e s . The c o s t f u n c t i o n s of these p o l i c i e s which are used to approximate the c o s t f u n c -t i o n s of a r b i t r a r y s t a t i o n a r y , p o l i c i e s are piecewise l i n e a r w i t h r e s p e c t to the s u f f i c i e n t s t a t i s t i c . White [67] has c o n s i d e r e d a p a r t i a l l y o bservable semi-Markov process with a f i n i t e h o r i z o n where the c o n t r o l l e r knows the times o f the core process t r a n s i t i o n . A o k i [1] a l s o s t u d i e s the p a r t i a l l y observable c o n t r o l problem with f i n i t e s t a t e s , f i n i t e a c t i o n s e t s and f i n i t e h o r i z o n , but does not i n c l u d e an o p e r a t i o n a l a l g o r i t h m . Since i n p a r t i a l l y o bservable models wi t h f i n i t e s t a t e space the s t a t e s o f dynamic programming are p r o b a b i l i t y v e c t o r s i n R N (the N-dimensional r e a l space), i t f o l l o w s a f t e r some m o d i f i c a t i o n t h a t i f the s t a t e space (complete separable m e t r i c space) of our model i s r e p l a c e d by R N, the model then immediately i s reduced to a p a r t i a l l y observable Markov d e c i s i o n model. The s t a t e space o f the system w i l l l a t e r be assumed to be a - 3 -non-empty bounded subset of a separable complete m e t r i c space X so as to ensure t h a t t h i s t h e s i s i n c l u d e s p a r t i a l l y o b s e rvable Markov processes as a s p e c i a l case. In t h i s t h e s i s the concepts of simple p a r t i t i o n s , simple p o l i c i e s and piecewise l i n e a r f u n c t i o n a l s on the a r b i t r a r y s t a t e space are i n t r o d u c e d to e s t a b l i s h an a l g o r i t h m f o r d e t e r -mining an \"e-optimal simple s t a t i o n a r y p o l i c y \" . The i d e a i s based on the \" l i n e a r i t y \" of p a r t i a l l y o bservable Markov processes. In a d d i t i o n to these three concepts, assumptions on the immediate c o s t s and on the c o n t r a c t i o n o p e r a t o r s U are i n t r o d u c e d . Two a a l g o r i t h m s are d i s c u s s e d . The f i r s t of these i s the method of s u c c e s s i v e approximation which i s used f o r approximating the o p t i m a l c o s t V* and f o r f i n d i n g p o l i c i e s whose c o s t f u n c t i o n s approximate V*. The second i s based on the method of p o l i c y improvement. In Chapter II a f o r m u l a t i o n of a dynamic programming problem w i t h an a b s t r a c t s t a t e space and f i n i t e a c t i o n space w i l l be c o n s i d e r e d . Chapter I I I i s a study of o p e r a t o r s used i n the a l g o r i t h m s . In Chapter IV the methods of s u c c e s s i v e approximation and o f p o l i c y improvement w i l l be s t u d i e d . Chapter V e x p l i c i t l y develops the a l g o r i t h m s f o r the two methods i n a more concrete s e t t i n g . - 4 -C h a p t e r I I MODEL FORMULATION AND ASSUMPTIONS I n t h i s c h a p t e r we s h a l l f o r m u l a t e an o p t i m a l c o n t r o l p r o b l e m w i t h d i s c o u n t e d c o s t s and w i t h c o m p l e t e o b s e r v a t i o n o v e r an i n f i n i t e h o r i z o n u n d e r t h e s e t t i n g o f B l a c k w e l l [7]. A l s o , we i n t r o d u c e some d e f i n i t i o n s and a s s u m p t i o n s . A Markov d e c i s i o n p r o c e s s i s s p e c i f i e d by t h e f o l l o w i n g f o u r o b j e c t s : ( i ) t h e s t a t e s p a c e i s a non-empty B o r e l s u b s e t o f a s e p a r a b l e Banach s p a c e X; ( i i ) t h e a c t i o n s e t A i s f i n i t e and a i s an e l e m e n t o f A; ( i i i ) f o r e a c h p a i r (x,a) e fl x A, q ( * | x , a ) i s t h e one s t e p t r a n s i t i o n p r o b a b i l i t y o f t h e ^ s y s t e m on t h e B o r e l s u b s e t s o f fl; ( i v ) t h e i m m e d i a t e c o s t c ( x , a ) i s a bounded B o r e l measur-a b l e f u n c t i o n on fl x A. When t h e s y s t e m i s i n s t a t e x and a c t i o n a i s c h o s e n , t h e n we i n c u r a c o s t c ( x , a ) . We d e f i n e a p o l i c y t o be a s e q u e n c e i& r n = 1,2,...}, where <$n t e l l s us what a c t i o n t o c h o o s e a t t h e n - t h p e r i o d as a B o r e l m e a s u r a b l e f u n c t i o n o f t h e h i s t o r y H = ( x ^ , , . . . , x n ) o f t h e s y s t e m up t o p e r i o d n. . L e t A be a f a m i l y o f p o l i c i e s . A p o l i c y 6 = (6,6,...) w h i c h i s i n d e p e n d e n t o f t i m e n i s c a l l e d s t a t i o n a r y . Our e x p e c t e d d i s c o u n t e d t o t a l c o s t V (x) a t an i n i t i a l s t a t e x u n d e r a - 5 -p o l i c y 6 i s w r i t t e n as 00 (11.1) V 6 ( x ) = E{ I 6 n _ 1 c ( X 6 (X ))|X = x} n=l where n = 1,2,...} i s a Markov c h a i n w i t h p r o b a b i l i t y t r a n s i t i o n f u n c t i o n q(*|x, 6 (x)) . The d i s c o u n t f a c t o r i s denoted by 3 and 0' <_ 6 < 1. The f u n c t i o n V i s c a l l e d the c o s t o f p o l i c y 6. Define the optimal c o s t f u n c t i o n V* by (11.2) V*(x) = i n f V (x) f o r a l l x e SI. 6eA Then, the f o l l o w i n g i s t r u e (see B l a c k w e l l [ 7 ] ) . Theorem I I . 1 . There e x i s t s an o p t i m a l s t a t i o n a r y p o l i c y 6* with 6* V = V*. A l s o , V* s a t i s f i e s (11.3) V*(x) = min{c(x,a) + 3/ f iV*(x 1)q(dx\u00E2\u0080\u00A2|x,a)} f o r a l l x e f t . aeA An e-optimal' c o s t f u n c t i o n V i s one s a t i s f y i n g (11.4) i v * - Vll = sup|V*(x) - V(x) | < e. xefi A p o l i c y 6 such t h a t V = V . s a t i s f y i n g ( I I . 4) i s an e-optimal p o l i c y . L e t B(ft) be the s e t of a l l bounded B o r e l measurable f u n c t i o n s on ft with the sup norm I \u00E2\u0080\u00A2 I as above. Then, B (Q) i s a Banach space - 6 -(see L u s t e r n i k and Sobolev [38, p. 18]). For f i n d i n g an e-optimal p o l i c y and i t s c o s t f u n c t i o n we d e f i n e simple p a r t i t i o n s , simple p o l i c i e s and piecewise (abbre-v i a t e d , h e r e a f t e r , by p.w.) l i n e a r f u n c t i o n s . D e f i n i t i o n I I . 1. A p a r t i t i o n ^ ^ ^ = 1 of fl c X i s c a l l e d simple i f each i s a convex p o l y h e d r a l s e t , where a convex p o l y h e d r a l s e t i s the s o l u t i o n s e t of a f i n i t e system of l i n e a r i n e q u a l i t i e s , i . e . , E. = (x e fl: l..(x) < (or <)d., j = 1,2,...,n.}, l i j ~ J l i = 1,2,...,m, where each I. . d e f i n e d on X i s a l i n e a r f u n c t i o n a l and d. i s a iD 3 r e a l number. Note t h a t we always take l i n e a r f u n c t i o n a l to be bounded. Examples. (1) L e t E^ = fl. Take any l i n e a r f u n c t i o n a l on X and a r e a l 2 2 2 number d^. Then E = (E^, E 2} i s a simple p a r t i t i o n where E^ ='{x e A: 5,-L(x) < d x> and E^ = (x e fl: I (x) > . Furthermore, take another l i n e a r f u n c t i o n a l &2 ^ and a 3 3 3 3 3 r e a l number ^ d^. Then, E = (E-^, E^, E^, E^} i s a simple p a r t i t i o n where E^ = (x e fl: (x) d 2}, E^ = {x e fl: \u00C2\u00A3 1(x) > d1, ^ 2(x) < d 2> and E^ = {x e fl: A (x) >_dx, - 7 -&2 (x) .^ ^2^' a n <^ s o o n \" (2) Let ft = R N (the N-dimensional r e a l space). In d e f i n i t i o n N II.1, l e t I. \u00E2\u0080\u00A2(x) = %. -x where I.- e R and I. .x i s the ID iD iD ID inn e r product o f SL^ and x. Then { E ^ } i s a simple p a r t i -t i o n i n R . Lemma II.1. L e t ? 1 = (E^} and ? 2 = {F..} be two simple p a r t i t i o n s of ft. Then, the product p a r t i t i o n P^ \u00E2\u0080\u00A2 P^ = { E ^ n Fj} i s again simple. Proof: Here we omit E . n F . i f E . n F . = . The s e t s E . R F. 1 D 1 D i D are d i s j o i n t and are convex p o l y h e d r a l s e t s . Hence P^ \u00E2\u0080\u00A2 P^ i s simple. D e f i n i t i o n II.2. A s t a t i o n a r y p o l i c y 5 i s simple with r e s p e c t to a simple p a r t i t i o n ( E ^ ; i \u00E2\u0080\u00A2'..= l,2,...,m} i f 6 (x) = a^ f o r a l l x e E ^ , i = 1,2,...,m. D e f i n i t i o n I I . 3 . A v e c t o r valued f u n c t i o n V on ft c X i s c a l l e d p.w. l i n e a r i f there e x i s t s a simple p a r t i t i o n { E ^ } of ft such t h a t V(x) = (x) f o r a l l x e E ^ , i = l,2,...,m, and each i s the r e s t r i c t i o n to E . of a l i n e a r f u n c t i o n on X. l '.Given a p o l i c y 6, d e f i n e the operator Ug from B(ft) i n t o B(ft) by (II.5) (U 6V) (x) = c(x,<5(x)) + 3/ f iV(x' )q(dx' |x,6 (x) ) . - 8 -I f 5(x) = a f o r each x e fl, then we w r i t e U = U-. a o Define the operator U A from B(fl) i n t o B(fl) by (II.6) (U*V)(x) = min (U V)(x) f o r V e B ( f l ) . a a Although V* i s not n e c e s s a r i l y p.w. l i n e a r and 6* i s not n e c e s s a r i l y simple, we w i l l show f o r a c l a s s of Markov d e c i s i o n processes having the s t r u c t u r e d e s c r i b e d i n the f o l l o w i n g assumption t h a t there are e-optimal c o s t f u n c t i o n s and simple p o l i c i e s . Assumption I ( A . I . ) . (U V)(x) i s p.w. l i n e a r on fl f o r each a, a ~ p r o v i d e d t h a t V i s p.w. l i n e a r on fl. Examples. N N Model 1. L e t X = R and fl be a convex p o l y h e d r a l s e t i n R such t h a t q(*|x,a) i s a p r o b a b i l i t y measure on fl f o r each (x,a) e f l xA. The f o l l o w i n g two assumptions (A,II) and ( A . I l l ) imply ( A . I ) . Assumption I I ( A . I I . ) . For each a e A, the immediate c o s t f u n c t i o n c(\u00C2\u00AB,a) i s the r e s t r i c t i o n to fl of a l i n e a r f u n c t i o n a l on X. Hence f o r each a, t h e r e i s a v e c t o r c such t h a t c(x,a) = c a \u00E2\u0080\u00A2 x f o r x e fl. Assumption ITT ( A . I I I . ) . For each convex p o l y h e d r a l s e t B c fl and each a c t i o n a s A, - 9 -q a(B,x) = / Bx'q(dx'|x,a) i s p.w. l i n e a r i n x w i t h r e s p e c t to a simple p a r t i t i o n PNB) = (E.(a,B), j = l,2,...,nr }. j a, a We w i l l show i n Model 2 t h a t p a r t i a l l y o b s e rvable Markov processes are a s p e c i a l case of Model 1. We next check t h a t (A.I.) i s s a t i s f i e d . L e t a e A be a r b i t r a r y but f i x e d and suppose t h a t V i s p.w. l i n e a r with r e s p e c t to a simple p a r t i t i o n {E., i = l,2,...,m}. Let a m a ^a P = II P (E!) = {E.; j = l , 2 , . . . , r } , the product p a r t i t i o n , i = l 1 3 which i s again simple from Lemma I I . 1 . (U_V) (x) = c a \u00E2\u0080\u00A2 x + B/jjVfx' )q(dx' |x,a) m - c a \u00E2\u0080\u00A2 x + 8 I L V x'q(dx'|x,a) 1=1 E i 1 m = c a \u00E2\u0080\u00A2 x + 3 I V i q a ( E i , x ) i = l m \u00E2\u0080\u00A2 = [ c a + 3 7 V . X a ] x \u00E2\u0080\u00A2 f o r x e E a i = l 1 1 3 3. 3. where ^ \u00C2\u00B1 ^ ' x = q (E^^jx) f o r x e E^(a,E i) and the index I depends on i f o r each a e A. U V i s : l i n e a r on each E.. Hence U V i s a j a p.w. l i n e a r with r e s p e c t to the simple p a r t i t i o n P a = {E a, j = l , 2 , . . . , r } , which s a t i s f i e s ( A . I . ) . T h i s model 1 i s r e a l l y the b a s i c model s t u d i e d i n the theory. - 10 -Model 2. A p a r t i a l l y observable Markov D e c i s i o n Process (Dynkin [17], Smallwood and Sondik [60]). Consider a Markov d e c i s i o n process ( c a l l e d the core process) w i t h s t a t e space {l,2,...,N}, with a c t i o n s e t A, w i t h p r o b a b i l i t y t r a n s i t i o n m a t r i c e s ( P a , a e A}, and with immediate c o s t v e c t o r s {h , a e A}. L e t Z n be the s t a t e a t the n-th t r a n s i t i o n . Assume t h a t the process n = 0,1,2,...} cannot be observed, but a t each t r a n s i t i o n a s i g n a l i s t r a n s m i t t e d to to the d e c i s i o n maker. The s e t of p o s s i b l e s i g n a l s (5) i s assumed to be f i n i t e . For each n, g i v e n t h a t Z = j and t h a t a c t i o n a i s to be implemented, the s i g n a l 9 n i s independent of the h i s t o r y of the s i g n a l s and a c t i o n s {Q^.a^, -^a^, \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 \u00E2\u0080\u00A2 / 9 n - ] _ ' a n - l ^ p r i o r t o the n-th t r a n s i t i o n and has c o n d i t i o n a l p r o b a b i l i t y denoted by y a Q= P [ 0 = 6IZ = j , a ] . 3 N n . n N L e t X = R and ft = {x = (x n , x\u00E2\u0080\u009E, . . . , x.T) : J x. = 1, x. > 0, 1 2 N l l \u00E2\u0080\u0094 V i } . Define the i - t h component of X n to be P [ Z n = i | BQ,aQ,Qlt . . . ^ n - x ^ n - ] / 9 ^ ' 1 = 1,2,...,N. I t can be shown (see Dynkin [17]) t h a t P[Z ^ j , = j I 9. f a r t f 6 . , . . . , 0 ,a ,0 ,,]=P[Z ,, = j | 0 ^,an,X ]. n+1 J l 0 0 1 n n n+1 n+1 J 1 n+1 n n Thus X n r e p r e s e n t s a s u f f i c i e n t s t a t i s t i c f o r the complete pa s t h i s t o r y {9_,a A,...,a n , 0 }. I t f o l l o w s t h a t {x : n = 0,1,2,...} J 0 0 n-1 n n i s a Markov d e c i s i o n process (see Sondik [61]), c a l l e d the - 11 -observed process. I t s immediate c o s t i s c(x,a) = h\" \u00E2\u0080\u00A2 x. I t s a c t i o n s e t i s A. I t s p r o b a b i l i t y t r a n s i t i o n f u n c t i o n i s d e t e r -mined by the f o l l o w i n g c a l c u l a t i o n . For each measurable subset B \u00C2\u00A3 fl, x e fl, and a e A, q(B|x,a) = P [ X n + 1 e B | X n = x, a n = a] = TP[x E B | e = e , x = x, a = a] \u00E2\u0080\u00A2 P[9 = e f x g n+1 n+1 n n n+1 1 n = x, a n = a] = IP[X ^ T E B I B , -i = G , X =x, a =a] \u00E2\u0080\u00A2 \u00C2\u00A3P[6 = k n+1 1 n+1 n n k L n+1 9 3 0 l Z n + l = ^ X n = X ' a ] , P [ Z n + l = j ' X n = X ' a n = a ] = I P t X n + l \u00C2\u00A3 B | 9 n + 1 = 0, X n = x, a n = a] \u00E2\u0080\u00A2 h ^ P = j I Z = i , X = x, a =a]P[Z = il.X = x, a = a] J l n ' n ' n L n 1 n n = IP[X n + 1eB|e n + 1 = e , X n = x, a n = a ] l Y a e [ p a j x i = I P [ X n + 1 B B | 6 n + 1 = e , X n = x, a n = a ] l P a ( 9 ) x fc) where 1 = (1,1,...,1) and P a(9) T(x|9,a) by = [ P a . Y ^ n ] . Define the v e c t o r 1 3 1)9 - 12 -T(x|6,a) = i p a ( e ) X Note t h a t T(X |6 ra) = X , and t h a t n 1 n+1 P [ X n + l \u00C2\u00A3 B | 0 n + l = 6 ' X n = X ' a ] = 1 i f T(x|9,a) e B [o i f T(x|6,a) M (See Sondik [61]). So, q(B|x,a) = I 1 P a ( 6 ) ; 0 e$ a(B,x) where $ a(B,x) ='{0: T ( x J 0 , a ) \u00C2\u00A3 B ) . F i n a l l y , we show t h a t the observed process ^X^} i s a s p e c i a l case of Model 1; i . e . , q a(B,x) = / x*q(dx'|x,a) i s p.w. l i n e a r i n x f o r each convex p o l y h e d r a l s e t B c ft and a c t i o n a e A. Using the p r e v i o u s l y computed q(B|x,a) we have q a(B,x) = / Bx'q(dx'|x,a) = I T[x| 9,a] 1 P a ( 0 ) 0ea (B,x) x 0e$ a(B,x) l P a ( 9 ) x P a (0) x_ ^ p a ( 0 ) x = I a P a ( 0 ) x . 0 e $ a ( B , x ) - 13 -Thus i t i s s u f f i c i e n t to v e r i f y t h a t the s e t v a l u e d f u n c t i o n $ a ( B , \u00C2\u00AB ) : fl -* 2 ^ i s p.w. constant on fl where 2 ^ i s the power s e t of ( H ) . To do t h i s we need the f o l l o w i n g lemma. Lemma II.2. For each s i g n a l 6, a c t i o n a, and s e t B \u00C2\u00A3 fl, d e f i n e E ? ' a = (x e fl: T(x|e,a) eB>. Then f o r any subset of s i g n a l s ty \u00C2\u00A3 (H) , we have $ a(B,x) = ty i f and o n l y i f x e n E ^ ' a n n (E\u00C2\u00AE'a)\u00C2\u00B0. Qety Qety\u00C2\u00B0 B Proof. Note t h a t E g ' a = (x: 6 e $ a(B,x)}. Thus i f x e E g ' a f o r 6 \u00C2\u00A3 ty, then 8 e <\u00C2\u00A3>a(B,x). Consequently, ty \u00C2\u00A3 $ a ( B , x ) . On the other hand, i f x e ( E g ' a ) c f o r 0 e ty\u00C2\u00B0, then 6 $ $ a ( B , x ) . Conse-que n t l y , ty\u00C2\u00B0 \u00C2\u00A3 ( $ a ( B , x ) ) C . I t f o l l o w s t h a t ty = a(B,x). Conversely, suppose t h a t $ (B,x) = ty. Then x e Eq' f o r B 3. c c each 9 e if; and x e (Eg'\")^ f o r each 9 \u00C2\u00A3 ty , which completes the p r o o f . a a L e t E D (^ ) = {x: $ (B, x) = ip}. The above lemma g i v e s an a a e x p l i c i t r e p r e s e n t a t i o n of E D(ifO and q (B,x) i s p.w. l i n e a r a (5) w i t h r e s p e c t to the p a r t i t i o n {E_ ( t y ) : ty e 2 ^ } where i t i s assumed t h a t q a(B,x) = 0 i f E a(ijj) = ty' (empty) f o r a l l ty. Although B t h i s p a r t i t i o n i s not simple, i t can e a s i l y be r e f i n e d to a simple p a r t i t i o n as i n the next paragraph. - 14 -Suppose t h a t B \u00C2\u00A3 ft i s a convex p o l y h e d r a l s e t . Since f o r x -:e ft = (x: \u00C2\u00A3x^ = 1, x. > 0 \u00C2\u00A5i) an i n e q u a l i t y &x <_ b can be r e w r i t t e n as &x - b = {I - b l ) x <_ 0, we can without l o s s of g e n e r a l i t y assume t h a t B has the r e p r e s e n t a t i o n B = (x \u00C2\u00A3 ft : Kx < 0, Lx < 0} T f o r some mat r i c e s K and L where 0^= (0,0,...,0) . With t h i s r e p r e s e n t a t i o n of B, E ? ' a = (x e ft: T(x|9,a) eB> l P a ( 6 ) x l P d ( 6 ) x = {x e ft: KP a(0)x < 0, L P a ( 0 ) x < 0} = (x e ft: K a ( 0 ) x < 0, L a ( 0 ) x < 0} where K a(0) = KP a(0) and L a ( 0 ) = L P a ( 0 ) . So each E g ' a i s a D a p convex p o l y h e d r a l s e t . Each ( E Q ' ) can be r e p r e s e n t e d as a union of d i s j o i n t convex p o l y h e d r a l s e t s . I t f o l l o w s t h a t E a(ijj) i s a union of d i s j o i n t p o l y h e d r a l s e t s , say E a ( i H = u {E. (ty) } , Thus q a(B,x) i s p.w. l i n e a r with r e s p e c t j = 1 3 - (ii) to the simple p a r t i t i o n {E_. (ty) : j = l , 2 , . . . , n ^ , e 2 ^ } . - 15 -Model 3. Information a c q u i s i t i o n i n p a r t i a l l y observable models. Consider a p a r t i a l l y o bservable Markov c h a i n i n model 2. Define an i n f o r m a t i o n s t r u c t u r e as a mapping from the s e t of s t a t e s (unobservable) o f the core process to the s e t o f d i s t i n c -t i v e s i g n a l s 9. The d e c i s i o n maker chooses an i n f o r m a t i o n s t r u c t u r e from the s e t o f a v a i l a b l e s t r u c t u r e s and decides upon an a c t i o n f o r the system. L e t a = (a^, a^) be the p a i r of a c t i o n s , a^ f o r the system c o n t r o l and a 2 f o r i n f o r m a t i o n a c q u i s i t i o n . More p r e c i s e l y , we have P a . (9) = p a+ P a2 I D v i j 39 and N N a \u00C2\u00AE a o (II.7) c(x,a) = I x I P ^ I ' y . ^ h ( i , j , 9 , a n , a ) i = l x j = l 1^6=1 J 0 1 2 where h ( i , j , 9 , a ^ , ) i s the immediate c o s t of the core process when a s t a t e of the core process moves from i to j and a s i g n a l 9 observed under a c t i o n s a^ f o r the system and a 2 f o r the i n f o r -mation s t r u c t u r e , and x = (x^,...,x N) i s the p r o b a b i l i t y v e c t o r w i t h an i n t e r p r e t a t i o n t h a t x^ i s the p r o b a b i l i t y t h a t the core process i s i n s t a t e i . Note t h a t c(x,a) i s l i n e a r i n x. Consider a machine maintenance and r e p a i r model (e.g., Smallwood and Sondik [60]) as an a p p l i c a t i o n of p a r t i a l l y o bservable models. But t h i s model i s a m o d i f i c a t i o n of Smallwood - 16 -S o n d i k 1 s . The machine c o n s i s t s o f two i n t e r n a l components. The s t a t e s o f the core process = i , i = 1,2,3, have the f o l l o w i n g i n t e r p r e t a t i o n . I f i = 1, then both components are broken down, i f i = 2 e i t h e r one i s broken down and i f i = 3 both o f them are working. Assume t h a t the machine produces M f i n i s h e d products a t each p e r i o d and the machine cannot be i n s p e c t e d . The a c t i o n s a\"*\" f o r the machine c o n t r o l are to 2 r e p a i r and not to r e p a i r the machine. The a c t i o n s a f o r i n f o r m a t i o n a c q u i s i t i o n are the numbers of a sample to choose out o f the M f i n i s h e d p r o d u c t s . The s i g n a l s 9 are the number of d e f e c t i v e products i n the sample, which forms the s i g n a l process ( 9 n , n = 1,2,...}. The core process (Z^, n = 1,2,...} i s the unknown s t a t e s o f the components of the machine. L e t = P{Z n = i } , i = 1,2,3 and put x = (x^,x^,x^)\u00E2\u0080\u00A2 Then, the process { ( z n ' 9 n ) ' n = 1,2,...} becomes a p a r t i a l l y observable 1 2 machine maintenance and r e p a i r model wi t h a c t i o n s a = (a ,a ) and immediate c o s t c(x,a) d e f i n e d by ( I I . 7 ) . Model 4. A p a r t i a l l y o bservable semi-Markov model (White [67]). L e t {Z^_, t > 0} be a semi-Markov c h a i n w i t h the f i n i t e s e t o f s t a t e s and l e t t be the time the t r a n s i t i o n o c c u r r e d . n L e t x = t .- t , , n = l , 2 , . . . , wi t h tn = 0. Then n n n-1' 0 {(Z^ ,T ), n = 1,2,...} i s a Markov c h a i n (Ross [54]). L e t t n n Y = (Z^ ,T ) denote the p a r t i a l l y observable c o r e - p r o c e s s , n t n \u00C2\u00A3 J n L e t {0 , n = 1,2,...} be the s i g n a l process where each s i g n a l n i s observed a t the time o f the core process t r a n s i t i o n . The - 17 -c o n t r o l l e r knows t h e t i m e s o f t h e c o r e p r o c e s s t r a n s i t i o n w h i c h t a k e o n l y f i n i t e l y many i n t e g e r v a l u e s , i . e . , T = 1,2,...,M, f o r e a c h n. Then t h e c o r e p r o c e s s {Y , n = 1,2,...} i s a f i n i t e n s t a t e , d i s c r e t e t i m e Markov c h a i n and t h e p a i r o f two s t o c h a s t i c p r o c e s s e s { ( Y n ' 9 n ) ' n = 1*2,...} becomes t h e same p a r t i a l l y o b s e r v a b l e Markov c h a i n as i n model 2, p r o v i d e d t h a t t h e i m m e d i a t e c o s t h. r e p r e s e n t s t h e e x p e c t e d c o s t w i t h r e s p e c t t o t h e x and t h e s e t o f a c t i o n s , a e A, i s f i n i t e . T h i s model n d i f f e r s f r o m W h i t e [67] i n t h a t he a l l o w s T t o be c o u n t a b l e . n M o d e l 5. A c l a s s i c a l l i n e a r e c o n o m i c model (Wa l r a s [66]) L e t x be a p r i c e v e c t o r o f N c o m m o d i t i e s (or N s e c u r i t i e s ) i n t h e m a r k e t and assume t h a t a new p r i c e v e c t o r x' c a n be w r i t t e n a s where Pg i s an N x N m a t r i x d e p e n d i n g on t h e p r e s e n t e c o n o m i c s i t u a t i o n 0 and on an e c o n o m i c a l t e r n a t i v e a. L e t P [ 0 | x , a ] be t h e c o n d i t i o n a l p r o b a b i l i t y o f 0 f o r e c a s t e d , g i v e n x and a. Assume t h a t t h e r e e x i s t s a s i m p l e p a r t i t i o n {E^} o f t h e s e t o f p r i c e v e c t o r s x s u c h t h a t P [ 9 | x , a ] = P a . f o r x e E., ' 0 i I w h i c h i s p.w. c o n s t a n t w i t h r e s p e c t t o {E.}. T h e r e f o r e , t h e - 18 -model belongs to the c l a s s o f model 1, p r o v i d e d (A.II.) i s s a t i s f i e d . - 19 -Chapter I I I PROPERTIES OF U 6 AND U* T h i s chapter i s a study o f the p r o p e r t i e s o f and . Most of these p r o p e r t i e s w i l l be used l a t e r i n the development of a l g o r i t h m s to f i n d e-optimal approximations to V* and 6*. The f o l l o w i n g p r o p e r t i e s are well-known and p r o o f s may be found i n B l a c k w e l l [7], Denardo [10], and Ross [54]. Lemma I I I . 1 . ( i ) Ug and are c o n t r a c t i o n mappings on B(ft) w i t h c o n t r a c t i o n c o e f f i c i e n t 8 < 1. ( i i ) Ug, are monotone; i . e . , i f f , g e B (ft) with f <_ g, then U g f < U^g and U*f < U*g. ( i i i ) V = V i s the unique s o l u t i o n of the o p e r a t o r equation u6v = V. (iv) V* = V i s the unique s o l u t i o n o f the op e r a t o r equation U*V = V. The f o l l o w i n g theorem shows how the s t r u c t u r e i n Assumption I i m p l i e s t h a t U* and p r e s e r v e the p.w. l i n e a r i t y o f value f u n c t i o n s and the s i m p l i c i t y of p o l i c i e s . Theorem I I I . 1 . Suppose t h a t (A.I.) holds and t h a t V i s p.w. l i n e a r . Then - 20 -(i) UgV i s p.w. l i n e a r whenever 6 i s simple; ( i i ) U^V i s p.w. l i n e a r ; and ( i i i ) t h ere e x i s t s a simple p o l i c y 6 such t h a t U^V = U^V. Proof. (i) Suppose t h a t 6 i s simple with r e s p e c t to a simple p a r t i -t i o n {E^}. L e t E^ be an a r b i t r a r y but f i x e d c e l l from the p a r t i t i o n and suppose t h a t 6(x) = a f o r x e E^. Then (U.V) (x) = (U V) (x) f o r x e E. . o a l From (A.I.)* U V i s p.w. l i n e a r f o r each a e A. Hence a ~ UgV i s p.w. l i n e a r on each c e l l E^, and i s consequently p.w. l i n e a r on Q . ( i i f: The f u n c t i o n s U V. are each p.w. l i n e a r by ( A . I . ) . Suppose a i i i ) t h a t U V i s p.w. l i n e a r w i t h r e s p e c t i v e to the simple a p a r t i t i o n P a. L e t P = IT P a. Then P i s f i n e r than each aeA P , and so each U V i s p.w. l i n e a r w i t h r e s p e c t to P. a For each F e P and a e A , there i s some l i n e a r f u n c t i o n a l a such t h a t (U V)(x) = a a ( x ) f o r x e F. a a For each F e P, d e f i n e the s e t s G , b e A F {l,2,...,p}, by G^ = {x: ct^x < a ax, a = 1,2,...,b-l and a^x < a ax, a = b+1,...,p}. r r r r \u00E2\u0080\u0094 r - 21 -Then {G a: a e A} = P F i s a p a r t i t i o n of F and P = II P F FeP i s a p a r t i t i o n o f ft wit h the pr o p e r t y t h a t (U*V)(x) = a a ( x ) i f x E G a e P. a /\ The p o l i c y 6 d e f i n e d by 6(x) = a f o r x e G_ e P s a t i s f i e s r u6v = u*v. C o r o l l a r y . Suppose t h a t (A.l.) h o l d s and t h a t V\u00C2\u00B0 -e B(ft) i s p.w. l i n e a r . (i) Define V n ( x ) = ( U g V n - 1 ) ( x ) , n = 1,2,... . ( i i ) Define V n ( x ) = ( U * V n - 1 ) ( x ) , n = 1,2,... . Then V 1 1 i s p.w. l i n e a r and the s t a t i o n a r y p o l i c y , 6 d e f i n e d by U g V n _ 1 = U ^ V n _ 1 i s simple, n Remark I I I . l . P a r t (i) of the Theorem can be g e n e r a l i z e d as f o l l o w s : i f 5 i s simple w i t h r e s p e c t to a simple p a r t i t i o n P and g(*,a) i s p.w. l i n e a r w i t h r e s p e c t to P f o r each a e A, then g(\u00E2\u0080\u00A2,6(\u00E2\u0080\u00A2)) i s p.w. l i n e a r with r e s p e c t to the p a r t i t i o n P = P 6 . n P a. aeA Remark I I I . 2 . Suppose t h a t i n s t e a d o f Assumption I, we assume t h a t ft i s convex and t h a t f o r each a e A , U V i s concave a whenever V i s concave and non-negative. Then U^V and U*V are non-negative and concave whenever V i s . Although t h i s s t r u c t u r e - 22 -w i l l not be developed f u r t h e r i n t h i s t h e s i s , we note t h a t i t i s somewhat analogous to the p.w. l i n e a r s t r u c t u r e i n ( A . I . ) . We next c o n s i d e r the e f f e c t s o f i t e r a t i n g montone c o n t r a c t i o n mappings such as and , c i t i n g some r e s u l t s o f Denardo [10]. Lemma ITT.2. Suppose t h a t U i s a c o n t r a c t i o n mapping on B(fi) w i t h c o n t r a c t i o n c o e f f i c i e n t 3 < 1. L e t V\u00C2\u00B0 e B ( Q ) be given and d e f i n e the f u n c t i o n s V n, n = 1,2,... by V^Cx) = (UV n - 1) (x) . Then (i) {v n} converges i n norm to the f i x e d p o i n t V of U; i . e . , UV = V\". Now assume t h a t U i s a l s o montone. I n ^ ( i i ) I f V <_ V\u00C2\u00B0, then {Vn} i s m o n o t o n i c a l l y d e c r e a s i n g to V. I o n ^ ( i i i ) I f V 2. V , then {V } i s m o n o t o n i c a l l y i n c r e a s i n g to V. Remark I I I . 3 . The f i x e d p o i n t V need not be p.w. l i n e a r s i n c e the c e l l s i n the l i m i t i n g p a r t i t i o n are not n e c e s s a r i l y f i n i t e i n number nor p o l y h e d r a l . In the remainder of t h i s chapter, U w i l l be a c o n t r a c t i o n mapping w i t h c o n t r a c t i o n c o e f f i c i e n t 3 < 1 and f i x e d p o i n t V. The f u n c t i o n V\u00C2\u00B0 e B(Q) i s assumed to have been given and the f u n c t i o n s V n f o r n = 1,2,... are d e f i n e d by V n = UV n _ 1. By the - 23 -pre v i o u s lemma, (V n> converges to V. The f o l l o w i n g r e s u l t s n A concern the r a t e o f convergence o f {V } to V and e r r o r bounds on IIV1 - V\" . The f o l l o w i n g two lemmas imply t h a t { v n ) converges to V l i n e a r l y (due to Denardo [10]). Lemma I I I . 3 . I v n - V\u00C2\u00AB < 8 l l v n _ 1 - Vll . Proof. il v n - vll = ii u v 1 1 - 1 - uvii < 3 It v n ~ 1 - vll . Lemma III.4, IV n - Vll < j i g - IIVn - UV nl Proof. IV n - Vll < IIVn - UV n\u00C2\u00AB + \u00C2\u00ABuv n - UVll < IIvn - uv n\u00C2\u00BB + 6 \u00C2\u00AB v n - Vll - 24 -The r e s u l t i s obt a i n e d by a r e a r r a n g i n g the l a s t expression, Lemma TIT. 5. L e t V e B(fi). I f IIV - UVll < ( l - 3 ) e , then Proof, IV - Vll < e , I v - vii < ll uv - uvii '+ I UV - vii \u00E2\u0080\u00A2<\u00E2\u0080\u00A2 3II v - vll + ll uv - vll There f o r e H V - V\" < I UV - V\u00C2\u00BB/(l - 3) < e, Theorem I I I . 2 . I f 3 n\u00C2\u00ABV\u00C2\u00B0 - UV\u00C2\u00B0I < (1 - 3 ) e , then IV - VnH < e. Proof. IIVn - UVnH = IIUV n - 1 - U 2 V n 1 l < g | | V n _ 1 - u v n - 1 i < 3 n\u00C2\u00ABV\u00C2\u00B0 - UV\u00C2\u00B0II < (1 - 3)e A p p l y i n g Lemma I I I . 5 . immediately g i v e s us the r e s u l t , - 25 -Chapter IV THE ALGORITHMS S e c t i o n IV.1. The Method of Succ e s s i v e Approximation The method of s u c c e s s i v e approximation i s a w e l l known and popula r method f o r s o l v i n g equations. In the cont e x t o f a s o l u t i o n technique f o r s o l v i n g s t a t i o n a r y Markov/decision processes i t appears i n B l a c k w e l l [ 7 ] . The method i s to s t a r t w ith a c o s t f u n c t i o n V \u00C2\u00B0 , and to i t e r a t e U*, c o n s t r u c t i n g a sequence of c o s t f u n c t i o n s V n = UJtVn-''\", n = 1,2,... . By Lemma I I I . l , U A i s a c o n t r a c t i o n mapping wi t h f i x e d p o i n t V* and by Lemma I I I . 2 , { v n } converges to V*. By Theorem I I I . 2 , n can be chosen s u f f i c i e n t l y l a r g e , so t h a t V n i s an e-optimal c o s t f u n c t i o n . In f a c t by t a k i n g logarithms o f the e x p r e s s i o n i n Theorem I I I . 2 , n > l o g [ ( 1 \" B ) 1 \u00C2\u00A3 ] / l o g S IIV -V II i s adequate. The next theorem p r o v i d e s a means of c o n s t r u c t i n g an e-optimal p o l i c y from an e'-optimal c o s t f u n c t i o n and s p e c i f i e s the r e l a t i o n s h i p between e and e'. The a l g o r i t h m , w i l l f i r s t c o n s t r u c t an e'-optimal c o s t f u n c t i o n . From t h i s c o s t f u n c t i o n , an e-optimal p o l i c y i s c o n s t r u c t e d . - 26 -Theorem IV. 1. L e t V\u00C2\u00B0 e B (SI) be p.w. l i n e a r , d e f i n e V = U*V , n = 1,2,..., and l e t 6 be a p o l i c y s a t i s f y i n g U V n 1 = U*V n 1 . n \u00C2\u00A7n I f II v* - V 1 1\" 1\" < {1~2l)S . then Proof. By Lemma I I I . l U g f o r any s t a t i o n a r y p o l i c y 6 i s a c o n t r a c t i o n mapping and the f i x e d p o i n t i s V , i . e . , V = U^V . Consider || v * - v 6 n n =\u00E2\u0080\u00A2 il u 6 v 6 n - u*v*n n < llu, V 6 n - U- V* I + I IK V* - IK V n 1 l \u00E2\u0080\u0094 0 0 0 0 n n n n + H u ^ v n _ 1 - U*V*\" < g || v 6 n _ v*|| + 31| v * - V n _ 1 l l + g l l v n - 1 - V * l l where we used the e q u a l i t y U * V n _ i = U g V . Ar r a n g i n g the n above i n e q u a l i t y , we o b t a i n (1 - 3 ) H v S n - V*H < 23'IV* - V n 1II < (1 - 8)e f which completes the proof . - 27 -I f the s t a t e space X i s uncountable, or even countably i n f i n i t e , then t h i s procedure i s d i f f i c u l t to implement on a computer. However, i f the Markov d e c i s i o n process has the s t r u c t u r e o f (A. I-.) and V\u00C2\u00B0 i s p.w. l i n e a r , then each V n i s p.w. l i n e a r and each 6 n c o n s t r u c t e d as i n the p r e v i o u s theorem i s simple (by Theorem I I I . l . ) . In t h i s case, the c o s t f u n c t i o n s and p o l i c i e s can be s p e c i f i e d by a f i n i t e number of items - the i n e q u a l i t i e s d e s c r i b i n g each c e l l o f a simple p a r t i t i o n and the c o r r e s p o n d i n g a c t i o n or l i n e a r f u n c t i o n . A l g o r i t h m to f i n d an e-optimal simple p o l i c y . (i) S t a r t with any p.w. l i n e a r f u n c t i o n V \u00C2\u00B0 . ( i i ) Compute V 1 = U*V\u00C2\u00B0. ( i i i ) Choose an i n t e g e r n such t h a t BnHv\u00C2\u00B0 - V1!! < (1 - 3 ) e ' , where e ' = (1 - 3)e/2 3- I.e., choose n l a r g e r than 2 l o g [ d - g ) e ] / l o g g. 23II V -V ll (iv) Compute V 1 1 = U*V n 1 s u c c e s s i v e l y u n t i l n = n. (v) Consequently, we o b t a i n V n such t h a t ||v* - vnll < el . - 28 -(vi) C o n s t r u c t a p o l i c y 6 s a t i s f y i n g u 6 v n = u*v n.. Then 6 i s e-optimal. Remark IV.1. The a l g o r i t h m can be s t a r t e d with V\u00C2\u00B0 = 0. Remark IV.2. The t e r m i n a t i o n c r i t e r i o n , n = n, i n the a l g o r i t h m has the advantage t h a t Hv\u00C2\u00B0 - V^ \"\" i s computed on l y once. However, i t has the disadvantage t h a t n w i l l probably be l a r g e r than necessary, c a u s i n g unnecessary i t e r a t i o n s . An a l t e r n a t i v e would be to compute I V n - V n \"*\"H a t each i t e r a t i o n and stop whenever Hv11 - V n \"'\"U <_ (1 - $)e'/3. Theorem III.2 guarantees t h a t V n i s an e 1 - o p t i m a l c o s t f u n c t i o n . However, the computations of IIVn - V n \"*\"U w i l l , i n g e n e r a l , be expensive. The b e s t procedure i s undoubtedly to check I V n - V n \"*\"U at some, but not a l l , i t e r a t i o n s . For example, n might be computed based on I V\u00C2\u00B0 - V I . Then a t some i t e r a t i o n n near ^ , recompute n based on IIV11 - V n \"*\"H . T h i s i s the procedure suggested i n the next chapter. - 29 -S e c t i o n IV.2. The Method of P o l i c y Improvement Another commonly proposed method f o r s o l v i n g Markov d e c i s i o n problems i s p o l i c y improvement (Howard [26]). P o l i c y improvement i s a c t u a l l y Newton's method a p p l i e d to the convex op e r a t o r equation (I - U^)V = 0 to f i n d the s o l u t i o n V*. Newton's method converges s u p e r - l i n e a r l y i n many s i t u a t i o n s , and t h i s p r o p e r t y i s , m a i n t a i n e d when a p p l i e d to some Markov d e c i s i o n problems (Brumelle & Puterman [8], Puterman & Brumelle [49]). Since the s u c c e s s i v e approximation method converges o n l y l i n e a r l y (Lemma I I I . 3.)., i t i s d e s i r a b l e to adapt the p o l i c y improvement method to our model. Our v e r s i o n of p o l i c y improvement i n c l u d e s the s u c c e s s i v e approximation method as a s p e c i a l case. Given a p o l i c y 6 w i t h c o s t V , an i t e r a t i o n o f p o l i c y 6 6 improvement c o n s i s t s of f i n d i n g a p o l i c y 6' such t h a t U^,V =U*V , 6 ' and then s o l v i n g the l i n e a r equation V = ,V f o r V One method of s o l v i n g the operator equation V = U^V f o r V i s the method of s u c c e s s i v e approximation, i . e . , by i t e r a t i n g . More e x p l i c i t l y , s t a r t w i t h a c o s t f u n c t i o n V\u00C2\u00B0 and i t e r a t e Ug, c o n s t r u c t i n g a sequence of c o s t f u n c t i o n s V n = U^V n L , n = 1,2,... . By Lemma I I I . l , U^ i s a c o n t r a c t i o n mapping wi t h (5 6 a f i x e d p o i n t V , and by Lemma I I I . 2 , {V } converges t o V . By Theorem I I I . 2 , f o r any given e > 0, n can be chosen s u f f i c i e n t l y n 6 l a r g e so t h a t IIV - V I \u00C2\u00A3 e. However, we w i l l show t h a t i t i s 6 not necessary to approximate V a t a l l c l o s e l y i n the p o l i c y improvement a l g o r i t h m . In the remainder o f t h i s s e c t i o n , we d i s c u s s the a l g o r i t h m - 30 -i n g e n e r a l terms and then d i s c u s s the s p e c i f i c p o i n t s of s t a r t i n g the a l g o r i t h m , choosing the parameters (k^} which s p e c i f y the degree of approximation of V i n the n-th i t e r a t i o n , t e r m i n a t i n g the a l g o r i t h m , and a proof t h a t the a l g o r i t h m converges. Since c(x,a) i s bounded, t h e r e e x i s t s a constant M such t h a t c(x,a) \u00C2\u00A3 M Vx,a. . L e t c(x,a) = c(x,a) - M < 0 and d e f i n e V 6 ( x ) = E 6 [ 1c(X n,6(.X n)) |X1 = x] . Then V g ( x ) = V g ( x ) - M / ( l - 8 ) . Hence a m i n i m i z a t i o n of V g i s e q u i v a l e n t to a m i n i m i z a t i o n of over 6 e A . I t i s , t h e r e f o r e , easy to f i n d a p.w. l i n e a r f u n c t i o n V such t h a t U~v' < V. For o \u00E2\u0080\u0094 i n s t a n t , put V = 0 which i s p.w. l i n e a r and s a t i s f i e s U gV <_ V. A l g o r i t h m f o r f i n d i n g an e-optimal p o l i c y under ( A . I . ) . S t a r t w i t h a simple p o l i c y <5\u00C2\u00B0 and a p.w. l i n e a r f u n c t i o n y\u00C2\u00B0 e B(ft) s a t i s f y i n g y\u00C2\u00B0 > U y \u00C2\u00B0 . ~ 6\u00C2\u00B0 An i t e r a t i o n of the a l g o r i t h m i s d e s c r i b e d as f o l l o w s : n-- 0,1,2,... . At the s t a r t of the n-th i t e r a t i o n , we have a simple p o l i c y 5 n and a p.w. l i n e a r f u n c t i o n y 1 1 e B(^) s a t i s -f y i n g y . >-U y . \" 6 k n n (l) Compute U y where the i n t e g e r k i s the number of i t e r a -6 n t i o n s of U which are to be performed. +1 k + ( i i ) Set y = U n y n and:, f i n d a p o l i c y 6 n such t h a t 6 n TT 1 1 + 1 TT n + l U6 n + 1 Y = u * y \u00E2\u0080\u00A2 ( i i i ) I f \\yn - y n + 1 l l <_ (1 - 8) e, then stop with y n e-optimal and 6 n g-optimal. Moreover, V* <_ V ^ n _< y n + 1 . - 31 -(iv) I f II y 1 1 - y n + 1 l l <_ (1 - 3 )e, then increment n by 1 and perform another i t e r a t i o n . To s t a r t , the a l g o r i t h m needs a simple p o l i c y 6 and a p.w. l i n e a r f u n c t i o n y s a t i s f y i n g y >_ U^y. There i s no d i f f i c u l t y i n f i n d i n g a simple p o l i c y ; f o r example, 6(x) = a f o r a l l x e fl i s s a t i s f a c t o r y . F i n d i n g a s a t i s f a c t o r y y i s more d i f f i c u l t u nless the model i s s p e c i f i e d f u r t h e r . For example, on page 12 i n 3. a. T Model 2, q (r,x) = (p ) x. So i f 6(x) = a f o r a l l x c fl, then 6 a a T \u00E2\u0080\u00941 6 V (x) = c [I - 3(p ) ] x f o r a l l x e fl. S e t t i n g y = V p r o v i d e s a s t a r t i n g v e c t o r . I f yn i s a p.w. l i n e a r f u n c t i o n and 6 n i s simple, i t f o l l o w s from Theorem I I I . l . and (A.I.) t h a t y i s p.w. l i n e a r n~r\" 1 and t h a t 6 i s simple. Theorem I I I . l a l s o i m p l i e s t h a t each of the i n t e r m e d i a t e i t e r a t e s U-Ly n, j = 1 , 2 , . . . ,k are p.w. l i n e a r . Consequently, the a l g o r i t h m can s t a r t and the i t e r a t i o n s are w e l l d e f i n e d . The q u e s t i o n of how b e s t to e s t a b l i s h the a p p r o p r i a t e v a l u e s of the parameters -{k ) i n the a l g o r i t h m has not been r e s o l v e d . I f each k = 0 , then the a l g o r i t h m reduces to t h a t o f n ^ s u c c e s s i v e approximation d e s c r i b e d i n the l a s t s e c t i o n and which i s known to converge l i n e a r l y . However, the e f f o r t per i t e r a t i o n i s s m a l l . I f each k = \u00C2\u00B0\u00C2\u00B0, then the method i s known to converge n ^ s u p e r - l i n e a r l y i n many s i t u a t i o n s ( [ 8 ] , [ 4 9 ] ) . However, i n t h i s case the e f f o r t per i t e r a t i o n i s l a r g e . In g e n e r a l , i t seems a p p r o p r i a t e to take k n s m a l l , perhaps even 0 , i n the e a r l y i t e r a t i o n s . However, once the-.ned;ghborhood of V* i s reached, k n 6 n k should be l a r g e enough so t h a t U ^y approximates V i n n g n order to take advantage of the s u p e r - l i n e a r convergence. - 32 -Theorem IV.1. For each i t e r a t i o n , n = 0,1,2,..., i n the p o l i c y improvement a l g o r i t h m , n . \u00E2\u0080\u009E n . IT2 n ^ ^ T T n n n+1 y > U y > U y > .. . > U y = y - 6 n 6 n _ _ 6 n Proof. F i r s t , i t i s t r u e f o r n = 0. Since U y\u00C2\u00AE and s i n c e - 6 U by Theorem I I I . l U . i s monotone, i t f o l l o w s t h a t o k y\u00C2\u00B0 > U n y \u00C2\u00B0 > U 2 n y \u00C2\u00B0 > ... > U \u00C2\u00AEy\u00C2\u00B0 = y 1 > U . y 1 . By d e f i n i t i o n ~ 6\u00C2\u00B0 \" <5\u00C2\u00B0 \" ~ 6\u00C2\u00B0 ~ 6\u00C2\u00B0 6, s a t i s f i e s y = U*y . However, U*y <_ U n y < y , and so not o n l y i s the Theorem e s t a b l i s h e d f o r n = 0, but we have a l s o shown t h a t U , y 1 < y^. S1 Now suppose U y < y . The same argument as i n the f i r s t 6 n paragraph e s t a b l i s h e s the Theorem f o r n and a l s o t h a t U n + i y n + ^ yn+\"*\"- Hence the proof i s completed by i n d u c t i o n . C o r o l l a r y . yn >_ V* f o r n = 1, 2 , . . . . Proof. For an a r b i t r a r y n, y 1 1 _> u Y*1 j i u * y n - Since i s monotone (Lemma I I I . l ) , y _> U^y f o r each j . By Lemma I I I . 2, U*y n decreases m o n o t o n i c a l l y and converges to V* as j -*\u00E2\u0080\u00A2 <=\u00C2\u00B0. Consequently, y n _> V* and the proof i s complete. We next show t h a t i f the a l g o r i t h m terminates then i t w i l l p r o v i d e an e-optimal c o s t f u n c t i o n and an e-optimal p o l i c y . - 33 -Theorem IV 2. I f Hy n - y n + l 1 1 1 (1 - S ) e , then \u00C2\u00AB y n - V*\u00C2\u00AB < e, i . e . , y n i s e-optimal. Moreover, $ n i s a l s o e-optimal and 6, n n v* < v < y Proof. Note t h a t U y n = U*y n and t h a t by the pr e v i o u s c o r o l l a r y 5 N y n > V*. , n _ v*\u00C2\u00AB < \u00C2\u00BBy n - U*ynU + \u00C2\u00BBU^y 1 1 - U*V*I < I y n - u y nll + Blly11 - V*l 6 n < || y n _ u m y n l l + 6 l l y n - V*II f o r m= 1,2, . . . , because yn > U y > U y f o r m = 1,2,... . (Theorem IV.1) 6 n - 6 n Thus ( 1 _ g ) l l y n _ v * l l < H y n - U m v n l = l l y n - y n + 1 H < ( l - & ) e , - \u00C2\u00A7 n and so \"y 1 1 - V*\u00C2\u00BB < e . The l a s t statement i n the Theorem f o l l o w s by Theorem IV.1. The f o l l o w i n g theorem has been shown by Doshi [16] f o r continuous time Markov processes. But our proof i s d i f f e r e n t from and si m p l e r than h i s . Theorem IV.3. L e t V be the c o s t o f any s t a t i o n a r y p o l i c y 5. L e t 8' be a p o l i c y d e f i n e d by U^,V = U*V . - 34 -6 6 6 ' 6 (i) I f Ug,V = V , then V = V , and 6' and 6 are optimal. 6 ' 6 6 ( i i ) V < V . Furthermore, i f f o r some x Q \u00C2\u00A3 ft (U g ,V ).(xQ) < 6 V ( X g ) , then V 6' (x Q) < V 6 ( x 0 ) . Proof. (i) From the d e f i n i t i o n o f 6 ' we have u * v = u*- v = v . Since the op t i m a l c o s t V* i s the unique s o l u t i o n o f , 6 . . . n 6 6 6 5 V = V*. By i n d u c t i o n on n, Ug,V = V s i n c e Ug,V = V But 6 6 1 U n,V \u00E2\u0080\u00A2> V as n -* \u00C2\u00B0\u00C2\u00B0by Lemma I I I . 2. 6 ' 6 6 1 Hence V = V because V i s the unique f i x e d p o i n t o f Ug,. (Lemma I I I . l . ) 6 6 ( i i ) By d e f i n i t i o n o f 6' and V = U,V (Lemma I I I . l . ) 6 \u00E2\u0080\u009E 6 6 Ug,V \u00C2\u00A3 U gV = V By i n d u c t i o n on n - 35 -u n,v 6 < v 6 n <5 6 1 UT,V -> V as n -\u00C2\u00BB\u00E2\u0080\u00A2 \u00C2\u00B0\u00C2\u00B0. 6 \u00E2\u0080\u00A2 6 V < V . 6 6 Suppose ^ u 6 ' v ^ x o ^ < V ^ x 0 ) f o r S O m e X0 e V 6 ' ( x 0 ) = (Ug,V 6')(x ) (From Lemma I I I . l . ) < (Ug.V 6) (x Q) (V 6' < V S) g < V (x Q) (the assumption) Lemma IV.1. L e t {y n} be a sequence generated by the p o l i c y improvement a l g o r i t h m . I f y 1 1 converges p o i n t w i s e to y, then U^y 1 1 converges tQ U*y. Proof. In t h i s proof a l l l i m i t s are wit h r e s p e c t to the p o i n t -wise topology. L e t z n = U y n and z = U y f o r each a, n = 1,2,... . a aJ a a By the monotone convergence theorem, \u00C2\u00A3im(z n) (x) = \u00C2\u00A3im(U y 1 1) (x) a 3. n n = Um{c(x,a) + g/^y n (x') q (dx' | x,a)}' n = c(x,a) + 3/^&im y n (x\" ) q (dx'| x,a) n = c(x,a) + B/ ny(x')q(dx'|x,a) = (U y) (x) a = (z ) (x) f o r each a \u00C2\u00A3 A, x e Q. a To show U^y 1 1 ^ U^y i s e q u i v a l e n t t o showing t h a t m i n ( z n ) ( x ) \ (min z )(x) f o r a l l x e fl. a a Since A i s f i n i t e , (min- z n) (x) = m i n ( z n (x)l and (min.z )'(x) = minfz (x)) . a \u00E2\u0080\u0094 a a a a a a / a Le t x e ft be a r b i t r a r y but f i x e d , and d e f i n e a n = z n ( x ) and a = z (x) a a a a which are j u s t numbers. S i n c e (z^)(x) \ (z )(x) p o i n t w i s e , then a a - 37 -a n ^ a f o r each a e A . a a I t remains to show t h a t min a n ^ min a . I t i s c l e a r t h a t n aeA a a a n min a i s monotone d e c r e a s i n g . Since a ^ a i t f o l l o w s t h a t a ^ a a a min a < min a n f o r n = 1,2,... . a \u00E2\u0080\u0094 a a a Hence mm a < \u00C2\u00A3im mm a . a \u00E2\u0080\u0094 a aeA n a To show the other way suppose t h a t a i s the a c t i o n such t h a t Then Th e r e f o r e mm a = a-aeA a mm a = a- = \u00C2\u00A3im a- > \u00C2\u00A3im mm a a a a \u00E2\u0080\u0094 a aeA n n a \u00C2\u00A3im mm a = mm a a a n a a which completes the proof. Theorem IV.4. Suppose t h a t {y n} i s a sequence o f c o s t s generated - 38 -by the p o l i c y improvement a l g o r i t h m . (i) yn converges po i n t w i s e to y e B(fl). ( i i ) y = U*y, i . e . , y i s o p t i m a l . In other words, the p o l i c y improvement a l g o r i t h m converges. Proof. (i) F i r s t o f a l l we s h a l l show t h a t {yn} i s bounded below. By Theorem IV. 1 we have yn>^. U m yn f o r each m = 1,2,... . n 8 N By Theorem I I I . 2 U y -> V a s m - * 0 0 . Th e r e f o r e y _> V . Since the c o s t c(x,a) i s bounded below, i . e . , |c(x,a) | <_ M f o r a l l x, a, |v^ | _< j r ^ - Hence y n ( x ) >_ ~ ~ f o r a l l x. From Theorem IV. 1 y n i s a d e c r e a s i n g sequence. Hence y 1 1 converges p o i n t w i s e . ( i i ) By a c h o i c e of y^ and Theorem IV.1 we know t h a t (IV.1) yn > U y n > U*y n. To show the other way we have (IV.2) y\" = U m n _ 1 y n ~ 1 (By d e f i n i t i o n o f y n ) 6 1 U^n-lY 1 1\" 1 (Ujy 1 Uy, Vy.e B(fi)) = U*y (By d e f i n i t i o n o f 6 ) Then, from (IV.1.), and (IV.2), we o b t a i n - 39 -From the statement (i) y 1 1 \ y and then, from Lemma IV. 1, U*y n -> U.y. , T h e r e f o r e , we must have u*y = y which completes the pro o f . - 40 -Chapter V IMPLEMENTATION OF THE ALGORITHMS FOR MODEL 1 S e c t i o n 1. I n t r o d u c t i o n In t h i s chapter we s h a l l c o n s i d e r i n a more concre t e s e t t i n g the methods of s u c c e s s i v e approximation and of p o l i c y improvement. To show how each method i s a c t u a l l y handled, we assume i n t h i s chapter t h a t X i s the N-dimensional r e a l space ( i . e . , X = R N) and t h a t ft i s a bounded convex p o l y h e d r a l s e t o f R N. L e t c(x,a) = c \u00E2\u0080\u00A2 x, which i s the i n n e r product of two v e c t o r s a N c , x e R , so t h a t (A.II.) h o l d s . Let A = {1,2,...,p}. We r e p e a t (A.III.) a b i t more e x p l i c i t l y than i n Chapter 2. Assumption I I I (A.III.) For each convex p o l y h e d r a l s e t B \u00C2\u00A3 R1 and each a c t i o n a e A , the f u n c t i o n q (B,x) d e f i n e d by q a(B,x) = /x'qtdx' |x,a) B is,p.w. l i n e a r i n x w i t h r e s p e c t to a simple p a r t i t i o n P a(B) ='{Ej(a,B): j = 1,2,..., ma ^ } . c l a. We w r i t e q (B,x) = q.(B) \u00E2\u0080\u00A2 x when x e E. (a,B). - 41 -Remarks V . l . Note t h a t (A.III.) p l a c e s us i n the context of model 1 i n Chapter I I . R e c a l l from the d i s c u s s i o n there and i n model 2 t h a t under ( A . I I I . ) , U V i s p.w. l i n e a r whenever V i s , a and t h a t p a r t i a l l y observable models s a t i s f y ( A . I I I . ) . Suppose t h a t f i s a p.w. l i n e a r f u n c t i o n , l i n e a r on the c e l l s o f the p a r t i t i o n {E^, i = l,2,...,n}, t h a t f(x) = f ^ \u00E2\u0080\u00A2 x on E^, and t h a t E^ = {x: K 1x < b 1 ; L 1 x <_ d 1}, i = l , 2 , . . . , n . Each b 1 arid d 1 i s an N-dimensional v e c t o r and each K 1 and L 1 i s a matrix with N-dimensional rows. T h i s s i t u a t i o n w i l l be denoted by f ^ { ( f ^ K 1, b 1 ; L 1 , d 1) : i = l,2,...,n} and E \u00C2\u00B1 ^ ( K 1 , b 1 ; L 1 , d 1 ) I f 6 i s a simple p o l i c y with r e s p e c t to the p a r t i t i o n {E^, i = 1,2,...,n}, say 6 (x) = a^ f o r x e E^, then we w i l l , r e p r e s e n t 5 by 6 ^ {(a^; K 1,b 1; L ^ d 1 ) : i = l,2,...,n}. Define a operator o by (K,b; L,d) o (K',b'; L',d') = ( K ) ( b )\u00E2\u0080\u00A2 K 1 b (L ) ( d ) - 42 -I f A and B are m a t r i c e s each having the same number of columns then (_.) i s the matrix whose f i r s t rows are those of A and whose a l a t t e r rows are those of B. T h i s operator forms the i n t e r s e c t i o n o f the convex p o l y h e d r a l s e t s c h a r a c t e r i z e d by (K,b; L,d) and (K',b'; L',d'). T h i s r e p r e s e n t a t i o n of p.w. l i n e a r f u n c t i o n s simple p o l i c i e s , and convex p o l y h e d r a l s e t s i s convenient f o r machine storage. We w i l l normally use the same symbol f o r the p.w. l i n e a r f u n c t i o n (convex p o l y h e d r a l s e t , simple p o l i c y , r e s p e c t i v e l y ) and the a r r a y which r e p r e s e n t s i t . The only aspect of t h i s abuse of n o t a t i o n which i s l i k e l y to cause any c o n f u s i o n concerns convex p o l y h e d r a l s e t s . L e t E ^ (K,b; L,d) be a convex p o l y h e d r a l s e t . The s e t E i s empty i f {x: Kx < b; Lx _< d} = (J). The a r r a y E i s empty i f there are no e n t r i e s i n the a r r a y , as when the a r r a y i s i n i t i a l i z e d . The user of e i t h e r of these methods must s p e c i f y the va l u e s q a(B,x) f o r each convex p o l y h e d r a l s e t B and each x e ft. We assume t h a t t h i s s p e c i f i c a t i o n i s pr o v i d e d by a subroutine, c a l l e d Q, which has as i t s arguments an a c t i o n a, matrices K and L, and v e c t o r s b and d. The a r r a y s K, L, b and d s p e c i f y the convex p o l y h e d r a l s e t B = {x: Kx < b, Lx <^ d}. The sub-r o u t i n e Q has as i t s output an a r r a y {(\y K j,b j; L j , d j ) : j = 1,2,...,m} which c h a r a c t e r i z e s the p.w. l i n e a r f u n c t i o n q (B,*). The sub-- 43 -r o u t i n e Q a p p r o p r i a t e f o r model 2 i s d e s c r i b e d i n d e t a i l i n s e c t i o n 6. S e c t i o n s 2 and 3 d e s c r i b e s u b r o u t i n e s UDELTA and USTAR which r e s p e c t i v e l y compute U^V f o r a given 8 and V, and compute U*V f o r a given V. S e c t i o n s 4 and 5 d e s c r i b e implementations of the methods of s u c c e s s i v e approximation and of p o l i c y improvement. - 44 -S e c t i o n 2. Subroutine UDELTA (6,V,U gV) The i n p u t s to t h i s s ubroutine are a simple p o l i c y 6 which takes the value 6(x) = a^ f o r x e E^, i = l , 2 , . . . , n , and a p.w. l i n e a r f u n c t i o n V which takes the values V(x) = \u00E2\u0080\u00A2 x f o r x e F j , j = 1,2,...,m. L e t P = {E^: i = 1,2,...,n} and PTT = {F . : j = 1,2,...,m}. We l e t V j E i ^ { ( K 1 3 , b 1 3 ; L 1 D , a3-3) , j = 1,2, . . . fn\u00C2\u00B1} and F_. ^ { ( K j k , b j k ; L j k , d j k ) , k = l,2,...,m.}, We a l s o assume t h a t the v e c t o r s c , a = 1,2,...,p, and the d i s c o u n t f a c t o r 3 are a v a i l a b l e i n common. The subroutine outputs the p.w. l i n e a r f u n c t i o n U^V and i s based on the f o l l o w i n g computation. (U 6V) (x) = c(x,6(x)) + 3/ f iV(x')q(dx' |x,5 (x) ) = c \u00C2\u00B0 l X ) -x + 3 I V./ x'q(dx'|x,6(x)) j = l 3 F. m a, f o r -x e E , = C 6 ( X ) \u00E2\u0080\u00A2 x + 3 I V.q- r ( F ,x) j =1 3 Then, u s i n g the n o t a t i o n o f ( A . I I I . ) , - 45 -(U 5V) (x) = [c r + 3 I V L j ] \u00E2\u0080\u00A2 x j = l 3 3*. f o r x e E r n where G ^ i s the \u00C2\u00A3-th c e l l o f the p a r t i t i o n a P r ( F _ . ) . Note t h a t the index I depends on j . Set 1 = 0 . I w i l l count the number of c e l l s i n the p a r t i t i o n f o r U^V. For j = l , 2 , . . . , n c a l l Q(F^,a^), which w i l l r e t u r n w i t h an a r r a y c h a r a c t e r i z i n g the p.w. l i n e a r a r f u n c t i o n q ( F ^ , \u00E2\u0080\u00A2 ) , say q 3 r ( F \u00E2\u0080\u00A2) ^ {{\\; K J S b j \- L j \ d j \u00C2\u00A3 ) , Then f o r j = l,2,...,m and I = 1,2,...,t do the f o l l o w i n g . For r = 1,2, ... ,n form E r n G where G ^ ( K j \u00C2\u00A3 , h 1 % ; iP^d )^ . I f E r n G i s empty, then do the next r . I f E ^ G ^ cj) then r m increment I by 1 and s t o r e E n c as E' Compute c (a ) + 8 I v-^o J=l J and s t o r e as a . The subroutine i s now completed and (U^V) (x) = \u00E2\u0080\u00A2 x f o r x e E ^ , i = l , 2 , . . . , I . I t r e t u r n s with the a r r a y U gV ^ ( i , (ou; E|) , i = 1,2,...,!} as output. - 47 -S e c t i o n 3. Subroutine USTAR (V,U VV,6) Suppose t h a t V i s p.w. l i n e a r with r e s p e c t to a simple p a r t i t i o n ' { E ^ , i = 1,2,...,n}. The subroutine USTAR computes U*V and f i n d s a simple p o l i c y 6 such t h a t U\u00E2\u0080\u009EV = U.V. o The argument of USTAR i s a p.w. l i n e a r f u n c t i o n V % {(V i, E \u00C2\u00B1 ) : i = 1,2,...,n}. An a r r a y d e s c r i b i n g the convex p o l y h e d r a l s e t fl, the d i s c o u n t f a c t o r g and the v e c t o r s c , a = 1,2,...,p should be a v a i l a b l e i n common. The subroutine outputs I and the a r r a y (U*V,6) ^ {(a., E!, a . ) : i = 1,2,...,I}. The f u n c t i o n U +V i s obtained by (U AV) (x) = \u00E2\u0080\u00A2 x f o r x e E^. The p o l i c y 6 d e f i n e d by 6 (x) = a.^ f o r x e E!, i = 1 , 2 I , s a t i s f i e s U.V = U^V. 1 o The paragraph summarizes the procedure i n USTAR. The subro u t i n e f i r s t computes U^V f o r a \u00C2\u00A3 A u s i n g UDELTA. L e t P a be the simple p a r t i t i o n f o r U V. USTAR next forms the product a p a r t i t i o n P = II P a. Then P i s f i n e r than each P a, and so aeA each U V i s p.w. l i n e a r with r e s p e c t to P. For each F \u00C2\u00A3 P and a a e A , there i s some v e c t o r a a such t h a t F (U V) (x) = a\u00E2\u0080\u0094 \u00E2\u0080\u00A2 x f o r x \u00C2\u00A3 F. a F For each F E P, d e f i n e the s e t s G p, b e A, by Gp = {x: oipX < a a x , a = 1, 2 , . . . , b-1 and a ^ x \u00C2\u00A3 a a x , a = b+1, . . . ,p.} . Then ^G3,: a e A} = P F i s a p a r t i t i o n of F and P = II i s a * Fe.P p a r t i t i o n o f ft with the p r o p e r t y t h a t (U*V) (x) = a a \u00E2\u0080\u00A2 x i f x e G a e P. The p o l i c y 6 d e f i n e d by <5 (x) = a f o r x e G_ f( e P) s a t i s f i e s u gv = U^V. We now c o n s i d e r the subroutine i n more d e t a i l . For each a e A , c a l l UDELTA with the arguments V >^ {(V^E.^): i = l , 2 , . . . , n Si c l and 6 ^ {(a,ft)}. T h i s generates the a r r a y s {(a.., D ( j ) ) , j = 1,2., ...,m }. R e c a l l t h a t each of the convex p o l y h e d r a l s e t s Si E^, D a ( j ) , and ft are themselves a r r a y s of the form {(K 1, b 1 ; L 1 , d 1 ) : i = l,2,...,m}. The index I w i l l count the c e l l s i n the p a r t i t i o n f o r Ua.V. Set 1 = 0. Le t R be the s e t of a l l p-dimension v e c t o r s w i t h the i - t h component, r ^ , between 1 and m^ f o r i = 1 , 2 , . . . > p. S y s t e m a t i c a l l y c o n s t r u c t each r e R i n t u r n . Compute the s e t F ^ o D a ( r a ) . The s e t F i s a c e l l o f the product p a r t i t i o n c l = l / 2 f \u00C2\u00AB \u00C2\u00AB * f ] p p a P = H P . I f F i s empty, then c o n s t r u c t the next r e R. a=l Otherwise, f o r each b e A c o n s t r u c t the s e t Gp = F o (K, 0; L, 0 ) b a where K i s a (b-l) x N matrix with rows a - a , a = 1 , 2 , . . . , b -b r b a a and L i s a (p-b) x N matrix w i t h rows a - a , a = b+l,...,p. b a I f G F i s empty, then c o n s t r u c t the s e t G f o r the next b e A. b fob I f G_ 7^ cj), then increment I by 1 and s t o r e a T = a , E l = G , F I r b I F - 49 -and a. = b. When each r e R has been c o n s i d e r e d , the subroutine r e t u r n s . - 50 -( S T A R T a * 1 CALL UDBLTA (a,Q.rl,V0, *, I, (*k,Ek, k*M,~,l) I = 0 - 51 -S e c t i o n 4. Suc c e s s i v e Approximation The user o f the r o u t i n e must supply a d i s c o u n t f a c t o r 8, an o p t i m a l i t y t o l e r a n c e e > 0, a s p e c i f i c a t i o n of the bounded convex p o l y h e d r a l s e t Q, the c o s t v e c t o r s c , and the subroutine Q. I f the user does not supply an i n i t i a l p.w. l i n e a r value f u n c t i o n V , then the r o u t i n e s t a r t s wi th V V { (0,J2) }. As d e s c r i b e d i n Remark IV.2, i f the method of s u c c e s s i v e approximation i t e r a t e s U* u n t i l I u j v \u00C2\u00B0 - u J _ 1 V 0 l l <_ (1 - 8)e'/3 where e' = (1 - 8)e/(26) then the p o l i c y 5 such t h a t U gV n = U*V n i s e-optimal. L e t V n = U*V\u00C2\u00B0. To determine I U*V n - V nU r e q u i r e s a f a i r amount of computation. However, t h i s norm o n l y needs to be computed once by Theorem I I I . 2 . , s i n c e e 1 - o p t i m a l i t y o f the c o s t f u n c t i o n must be achieved w i t h no more than 1 + INT(\u00C2\u00A3) (1-8)e' i t e r a t i o n s , where \u00C2\u00A3 = l o g ( - \u00E2\u0080\u0094 ^ \u00E2\u0080\u0094 \u00E2\u0080\u0094 t r \u00E2\u0080\u0094 ) / l o g Q. However, i t i s l i k e l y V -V t h a t e u o p t i m a l i t y w i l l be achieved i n fewer than 1 + INT ( 5 ) i t e r a t i o n s . So we compromi .e with the f o l l o w i n g procedure which checks f o r e ' - o p t i m a l i t y a t about h a l f of the maximum number of i t e r a t i o n s . Compute HV1 - V\u00C2\u00B0H. L e t J = 1 + INT(\u00C2\u00A3/2). Then check f o r e ' - o p t i m a l i t y a t i t e r a t i o n J + 1. I f , a t t h a t p o i n t , e 1 - o p t i m a l i t y has not been achieved, recompute J u s i n g llv^+^\" - V^U i n p l a c e o f IIV\"*\" - V^H. Check e ' - o p t i m a l i t y next a f t e r J i t e r a t i o n s and continue w i t h t h i s procedure. 52 -START^) Re S - 56 -CALL USJAR ({(**,En), k*\.z.--l),l, 14 - Ak \u00C2\u00A3 Z. UDELTA (6, n,\,Erf, k-U - 57 -S e c t i o n 6. Subroutine Q(B, a, V) f o r Model 2. The i n p u t s to t h i s subroutine are an a c t i o n a e A and a convex p o l y h e d r a l s e t B \u00C2\u00A3 ft repr e s e n t e d by the a r r a y . B = (K, b; L, d}, where (K,b) has m rows and (L,d) has r rows. The s u b r o u t i n e has a v a i l a b l e as i t s data, the a r r a y s {Y^ \" Q : j = 1,2,...,N; 0 = 1,2,...,q; a = 1,2,. . . , p} and ( P ^ j ; i = 1,2,...,N; j = 1,2,...,N, and a = 1,2,...,p}. The a r r a y V = ( i , (X3 ; L3,b3 ; K ^ d 3 ) , j = l , 2 , . . . , l } i s the sub-r o u t i n e output. The a r r a y V c h a r a c t e r i z e s the p.w. l i n e a r a. a. ~i v e c t o r - v a l u e d f u n c t i o n q (B,*) by q (B,x) = A J \u00E2\u0080\u00A2 x f o r x s a t i s -f y i n g L-'x < b3 and K-'x <_ d3. Note t h a t X3 i s a matrix. The subroutine i s based on Lemma II . 2 . , and the compu-t a t i o n p r e c e d i n g the Lemma showing t h a t q a(B,x) = I P a(0) \u00E2\u0080\u00A2 x . 0e$ a(B,x) In t h i s s u b routine, the equation convention f o r d e s c r i b i n g convex p o l y h e d r a l s e t s w i l l be m o d i f i e d s l i g h t l y . Each convex p o l y h e d r a l s e t E c o n s i d e r e d w i l l always be a subset of ft, and hence x e E N w i l l always s a t i s f y \u00C2\u00A3x^ = 1. T h i s e q u a l i t y w i l l always be i m p l i c i t i n any d e s c r i p t i o n o f a convex p o l y h e d r a l s e t , even i f i t i s not e x p l i c i t l y i n c l u d e d i n the l i s t o f i n e q u a l i t i e s . With t h i s convention the s e t B i s re p r e s e n t e d by the a r r a y (K,0_; L;0) where = _. - b^ and L^_. = L^.. - d^ f o r each i and j . The f i r s t time the subroutine i s c a l l e d the mat r i c e s P (0)/ a = 1,2,...,p, 0 = 1,2,...,q, must be computed. R e c a l l from S e c t i o n 3 t h a t p ^ j ( 0 ) = P j i Y j 0 \u00E2\u0080\u00A2 Although the mat r i c e s P a(0) - 58 -c o u l d be i n p u t d i r e c t l y , the q u a n t i t i e s P A ^ and Yjg are more n a t u r a l from the user's p o i n t o f view. Next compute K(6)= K P a ( 9 ) and L ( 9 ) = L P a ( 0 ) f o r each 0 e (H) and s e t E Q = ' { K ( 6 ) , 0 ; L ( 0 ) , 0 } . The a r r a y E charac-Ba t e r i z e s the s e t E\u00E2\u0080\u009E i n Lemma II.2. The index I w i l l count the c e l l s i n the p a r t i t i o n o f V. Set 1 = 0 . L e t J step from 1 through 2^. L e t be the i - t h d i g i t of J i n i t s b i n a r y r e p r e s e n t a t i o n , i . e . , J = f J.2*~, J . e {0,1}. i = l 1 1 Each J r e p r e s e n t s a subset ty of (5) by 9 e ty i f and o n l y i f J f l = 1. Form the a r r a y F = o E.. I f F i s empty, then look 0 {i:J.=l} 1 1 q a a t the next J . Otherwise c a l c u l a t e the matrix R= \u00C2\u00A3,P ( i ) \u00E2\u0080\u00A2 J . . i = i 1 B a. The a r r a y F corresponds to the s e t n E f l' i n Lemma II.2. The ' Ba c \" 9 e ^ se t n (Eg ) i s a union o f convex p o l y h e d r a l s e t s , which we 9 e ^ C e 0 now f i n d . L e t the v e c t o r s k^ _, t = l,2,...,m and l^, t = l , 2 , . . . , r / \ ^ be the rows of K ( 9 ) and L ( 9 ) , r e s p e c t i v e l y . Define pB,a_ E 9 , t \" {x: JTx>0, \u00C2\u00A3:x<0, j = l , 2 , . . . , t-1} l\u00C2\u00A3t\u00C2\u00A3r t D [{x: L(9)x<_0, k x>0, and k.x<0, j = l , 2 , . . . , t-r-1} r7 - 60 -I - 0 7 - 0 r Ir \u00E2\u0080\u00A2 x = 0 uj * 0 f~ 0 (Here p ;S cn arr*J/) t = 0 t ~ t + 1 Comment Whether or not the s e t F = ty i s determined by s o l v i n g a Phase I l i n e a r programme. Since when i i s incremented the onlychange i n the l i n e a r programmes i s to add c o n s t r a i n t s , the L.P. should be s t a r t e d from the pre v i o u s o p t i m a l t a b l e a u and the dual simplex a l g o r i t h m used. S i m i l a r arguments apply t o the f o l l o w i n g loop where G = ty i s t e s t e d . - 61 -A ' - K t = o BIBLIOGRAPHY A o k i , M., O p t i m i z a t i o n o f S t o c h a s t i c Systems, Academic Press, New York, 1967. Astrom, K. J . , I n t r o d u c t i o n to S t o c h a s t i c C o n t r o l Theory, Academic Press, New York, 1970. Astrom, K. J . , Optimal C o n t r o l of Markov Processes with Incomplete State Information, J o u r n a l of Mathematical A n a l y s i s and A p p l i c a t i o n s , 10(1965), pp. 174-205. Bather, ., Optimal D e c i s i o n Procedures f o r F i n i t e Markov Chains, P a r t I: Examples, Advances of A p p l i e d Proba- b i l i t y , 5(1973), pp. 328-339. Bellman, R., I n t r o d u c t i o n to the Mathematical Theory o f C o n t r o l Processes, Academic Press, New York, 1971. B l a c k w e l l , D., D i s c r e t e Dynamic Programming, Annals o f Mathematical S t a t i s t i c s , 33(1962), pp. 719-726. B l a c k w e l l , D., Discounted Dynamic Programming, Annals o f Mathematical S t a t i s t i c s , 36(1965), pp. 226-235. Brumelle, S.L. and Puterman, M.L., On the Convergence o f Newton's Method f o r Operators with Supports, ORC76-12, U n i v e r s i t y o f C a l i f o r n i a , B erkeley, May 1976. Chitgopekar, S., Continuous Time Markovian S e q u e n t i a l C o n t r o l Processes, SIAM J o u r n a l on\" C o n t r o l , 7 (1969), pp. 367-389. Denardo, E., C o n t r a c t i o n Mapping i n the Theory U n d e r l y i n g Dynamic Programming, SIAM Review, 9(1967), pp. 165-177. - 6 3 -11. Denardo, E., Markov Renewal Programs w i t h Small I n t e r e s t Rates, Annals Mathematical S t a t i s t i c s , 42(1971), pp. 477-496. 12. Denardo, E., and M i t t e n , L., Elements of S e q u e n t i a l D e c i s i o n Processes, J o u r n a l of I n d u s t r i a l E n g i n e e r i n g , SVIII(1967), pp. 106-112. 13. Davis, M.H.A., and V a r a i y a , P., Dynamic Programming Condi-t i o n s f o r P a r t i a l l y Observable S t o c h a s t i c Systems, SIAMS J o u r n a l on C o n t r o l , 11(1973), pp. 226-261. 14. Derman, C., On S e q u e n t i a l D e c i s i o n s and Markov Chains, Management Science, 9(1962), pp. 16-24. 15. Derman, C , Denumerable State Markovian D e c i s i o n Process -Average Cost C r i t e r i o n , Annals of Mathematical S t a t i s t i c s , 37(1966), pp. 1545-1553. 16. Doshi, B., Continuous Time C o n t r o l of Markov P r o c e s s i o n on A r b i t r a r y State Space: Discounted Rewards, The Annals \u00E2\u0080\u00A2n of Mathematical S t a t i s t i c s , 4(1976), pp. 1219-1235. 17. Dynkin, E.B., C o n t r o l l e d Random Sequences, Theory of P r o b a b i l i t y and i t s A p p l i c a t i o n s , X(1965), pp. 1-14. 18. F e l l e r , W., An I n t r o d u c t i o n of P r o b a b i l i t y Theory and I t s A p p l i c a t i o n s , Volume 2, Wiley, New York. 1966. 19. Fox, B.L., Markov Renewal Programming by L i n e a r F r a c t i o n a l Programming, SIAM J o u r n a l of A p p l i e d Mathematics, 14(1966), pp. 1418-1432. 20. Fox, B.L., E x i s t e n c e of S t a t i o n a r y Optimal P o l i c i e s f o r Some Markov Renewal Programs, STAM Review, 9(1967), pp. 665-670. - 64 -21. Fox, B.L., F i n i t e - S t a t e Approximation to Denumerable State Dynamic Programs, J o u r n a l of Mathematical A n a l y s i s arid A p p l i c a t i o n s , 34(1971), pp. 665-670. 22. Furukawa, N., Markovian D e c i s i o n Processes with Compact A c t i o n Spaces, Annals Of Mathematical S t a t i s t i c s , 43(1972), pp. 1612-1622. 23. H a r r i s o n , J.M., D i s c r e t e Dynamic Programming with Unbounded Rewards, Annals of Mathematical S t a t i s t i c s , 43(1972), pp. 636-644. 24. Haussmann, U.G., On the Optimal Long-Run C o n t r o l of Markov Renewal Processes, J o u r n a l of Mathematical A n a l y s i s and A p p l i c a t i o n s , 36(1971), pp. 123-140. 25. Hockstra, D., P a r t i a l l y Observable Markov D e c i s i o n Processes w i t h A p p l i c a t i o n s , T e c h n i c a l Report No. 156, Department of Operations Research, S t a n f o r d U n i v e r s i t y , S t a n f o r d , September, 19 73. 26. Howard, R.A., Dynamic Programming and Markov Processes, Wiley, New York, 196 0. 27. J e w e l l , W., Markov Renewal Programming: I I I n f i n i t e Return Models, Operations Research, 11(1963), pp. 938-971. 28. Kakumanu, P., Nondiscounted Continuous Time Markovian D e c i s i o n Process w i t h Countable State Space, SIAM J o u r n a l on C o n t r o l , 10(1972), pp. 210-220. 29. Kakumanu, P., C o n t i n u o u s l y Discounted Markov D e c i s i o n Model with Countable S t a t e and A c t i o n Space, Annals of Mathematical S t a t i s t i c s , 42(1971), pp. 919-926. - 65 -30. Kashyap, R.L., O p t i m i z a t i o n o f S t o c h a s t i c F i n i t e S t a t e S y s tems, I E E E , T r a n s a c t i o n s on A u t o m a t i c C o n t r o l , A C - 1 K 1 9 6 0 ) , pp. 685-692. 31. K r y l o v , N.V., C o n s t r u c t i o n o f an O p t i m a l S t r a t e g y f o r a F i n i t e C o n t r o l l e d C h a i n , T h e o r y o f P r o b a b i l i t y and A p p l i c a t i o n s , 1 0 ( 1 9 6 5 ) , pp. 45-54. 32. K u s h n e r , H.J., I n t r o d u c t i o n t o S t o c h a s t i c C o n t r o l , H o l t , R i n e h a r t and W i n s t o n , New Y o r k , 1971. 33. Lembersky, M.R., On M a x i m a l Rewards and e - O p t i m a l P o l i c i e s i n C o n t i n u o u s Time Markov D e c i s i o n C h a i n s , A n n a l s o f S t a t i s t i c s , 2 (1974), 4pp. 159-169. 34. Lembersky, M.R., P r e f e r r e d R u l e s i n C o n t i n u o u s Time Markov D e c i s i o n P r o c e s s e s , Management S c i e n c e , 2 1 ( 1 9 7 4 ) , pp. 348-357. 35. L i , Yu-ku, I n f o r m a t i o n S t r u c t u r e and O p t i m a l P o l i c y , Computer and I n f o r m a t i o n S c i e n c e R e s e a r c h C e n t r e , The O h i o S t a t e U n i v e r s i t y , O h i o , September, 1970. 36. Lippman, S.A., Semi-Markov D e c i s i o n P r o c e s s e s w i t h Unbounded Rewards, Management S c i e n c e , 1 9 ( 1 9 7 3 ) , pp. 717-731. 37. Lippman, S.A., M a x i m a l A v e r a g e - Reward P o l i c i e s f o r Semi-Markov D e c i s i o n P r o c e s s e s w i t h A r b i t r a r y S t a t e and A c t i o n S p a c e , A n n a l s o f M a t h e m a t i c a l S t a t i s t i c s , 4 2 ( 1 9 7 1 ) , pp. 1717-1726. 38. L u s t e r n i k , L.A. and S o b o l e , V . J . , E l e m e n t o f F u n c t i o n a l A n a l y s i s , H i n d u s t a n P u b l i s h i n g Company, D e l h i , 1961. 39. M a i t r a , A., D i s c o u n t e d DynamicProgramming on Compact M e t r i c S p a c e s , Sankhya, 3 0 A ( 1 9 6 8 ) , pp. 211-216. - 66 -40. M a r t i n - L o f , A., Optimal C o n t r o l of a Continuous-Time Markov Chain with P e r i o d i c T r a n s i t i o n P r o b a b i l i t i e s , Operations Research, 15(1967), pp. 872-881. 41. M i l l e r , B.L., F i n i t e S t ate Continuous Time Markov D e c i s i o n Processes w i t h a F i n i t e P l a n n i n g Horizon, SIAM J o u r n a l on C o n t r o l , 6(1968), pp. 266-280. 42. M i l l e r , B.L., F i n i t e S t a t e Continuous Time Markov D e c i s i o n Processes with an I n f i n i t e P l a n n i n g Horizon, J o u r n a l of Mathematical A n a l y s i s , 22(1968), pp. 552-569. 43. M i l l e r , B.L., and V e i n o t t , A. J r . , D i s c r e t e Dynamic Programming wi t h a Small I n t e r e s t Rate, Annals of Mathematical S t a t i s t i c s , 40(1969), pp. 366-370. 44. Mine, H. and Tabata, Y., On a Set of Optimal P o l i c i e s i n Continuous Time Markovian D e c i s i o n Processes, J o u r n a l of Mathematical A n a l y s i s and A p p l i c a t i o n s , 34(1971), pp. 5 3-66. 45. Mine, H. , and Tabata, Y.,. A New Opfeimal-ity C r i t e r i o n f o r D i s c r e t e Dynamic Programming, J o u r n a l of Mathematical A n a l y s i s and A p p l i c a t i o n s , 37(1972), pp. 118-129. 46. M i t t e n , L.G., \"Composition P r i n c i p l e s f o r S y n t h e s i s of Optimal M u l t i - s t a g e Processes,\" J o u r n a l o f Operations Research, 12(1964), pp. 610-619. 47. Morton, R., Optimal C o n t r o l of S t a t i o n a r y Markov Processes, Advances of A p p l i e d P r o b a b i l i t y , 5(1973), pp. 18-19. 48. Osaki, S. and Mine, H., L i n e a r Programming Algorithms f o r Semi-Markovian D e c i s i o n Processes, J o u r n a l of Mathe- m a t i c a l A n a l y s i s A p p l i c a t i o n s , 22(1968), pp. 356-381. - 67 -49. Puterman, M.L. and Brumelle, S.L., On the Convergence of P o l i c y I t e r a t i o n i n S t a t i o n a r y Dynamic Programming, Working Paper No. 3 92, F a c u l t y of Commerce, U.B.C., June 1976. 50. Raviv, D e c i s i o n Making i n Incompletely known S t o c h a s t i c Systems, I n t e r n a t i o n a l J o u r n a l o f E n g i n e e r i n g Science, 3 (1965), pp. 119-140. 51. R i s h e l , R., Necessary and S u f f i c i e n t Dynamic Programming C o n d i t i o n s f o r Continuous Time S t o c h a s t i c Optimal C o n t r o l , SIAM J o u r n a l on C o n t r o l , 8(1970), pp. 559-571. 52. Rolph, J.E. and Strauch, R., A Countable P o l i c y Set f o r S e q u e n t i a l D e c i s i o n Problems, Annals of Mathematical S t a t i s t i c s , 43(1972), pp. 2078-2082. 53. Ross, S., A r b i t r a r y State Markovian D e c i s i o n Processes, Annals of Mathematical S t a t i s t i c s , 39(1968), pp. 2118-2122. 54. Ross, S., A p p l i e d P r o b a b i l i t y Models wi t h O p t i m i z a t i o n A p p l i c a t i o n s , Holden-Day, San F r a n c i s c o , 1970. 55. Ross, S., Average Cost Semi-Markov D e c i s i o n Processes, J o u r n a l of A p p l i e d P r o b a b i l i t y , Volume 7 (1970), pp. 649-656. 56. Rudemo, M., Doubly S t o c h a s t i c Poisson Processes and Process C o n t r o l , Advances Of A p p l i e d P r o b a b i l i t y , 4(1972), pp. 318-338. 57. Sage and Melsa, System I d e n t i f i c a t i o n , Academic Process, New York, 1971. - 68 -58. Sawaki, K. , and Ichikawa, A., An A l g o r i t h m f o r P a r t i a l l y Observable Markov D e c i s i o n Processes Over I n f i n i t e H o r i z on. B u l l e t i n o f Operations Research S o c i e t y o f Japan. 1976. September. 59; Schweitzer, P.L., I t e r a t i v e S o l u t i o n of the F u n c t i o n a l Equations of Undiscounted Markov Renewal Programming, J o u r n a l of Mathematical A n a l y s i s and A p p l i c a t i o n s , 39(1971), pp. 495-501. 60. Smallwood, R.D., and Sondik, E . J . , The Optimal C o n t r o l of P a r t i a l l y Observable Markov Processes Over a F i n i t e Horizon, Operations Research, 21(1973), pp. 1071-1088. 61. Sondik, E., The Optimal C o n t r o l of P a r t i a l l y Observable Markov Processes Over the I n f i n i t e H o r i zon: Discounted Costs, Department of E n g i n e e r i n g Economic Systems, S t a n f o r d U n i v e r s i t y . S t a n f o r d , June 1971. (Forthcoming i n Operations Research). 62. Stone, L., Necessary and S u f f i c i e n t C o n d i t i o n s f o r Optimal C o n t r o l of Semi-Markov Jump Processes, SIAM J o u r n a l on C o n t r o l , 11(1973), pp. 187-201. 63. Strauch, R.E., Negative Dynamic Programming, Annals of Mathematical S t a t i s t i c s , 37(1966), pp. 871-890. 64. V e i n o t t , A., D i s c r e t e Dynamic Programming with S e n s i t i v e Discount O p t i m a l i t y C r i t e r i a , Annals of Mathematical S t a t i s t i c s , 40(1969), pp. 1635-1660. 65. Wald,A;, S t a t i s t i c a l D e c i s i o n - F u n c t i o n s , John Wiley & Sons, New York, 1950. - 69 -66. Walras, L.> Elements d'economie p o l i t i q u e pure, L. Corbaz, Lausanne, 1874. 67. White, C.C., Procedures f o r the S o l u t i o n of a F i n i t e H orizon, P a r t i a l l y Observable Semi-Markov O p t i m i z a t i o n Problem, Operations Research, 24(1976), pp. 348-358. "@en . "Thesis/Dissertation"@en . "10.14288/1.0094312"@en . "eng"@en . "Business Administration"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use."@en . "Graduate"@en . "Piecewise linear Markov decision processes with an application to partially observable Markov models"@en . "Text"@en . "http://hdl.handle.net/2429/20920"@en .