Multi-period Stochastic Programming by Horand Ingo Gassmann M.S., Oregon State University, 1979 A Thesis Submitted in Par t ia l Fulfillment of the Requirements for the Degree of D O C T O R O F P H I L O S O P H Y in The Faculty of Graduate Studies Faculty of Commerce and Business Adminis t ra t ion We accept this thesis as conforming to the required standard The University of Br i t i sh Columbia •InniKirv 1987 © Horand Ingo Gassmann, 1987 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Commerce The University of British Columbia 1956 Main Mall Vancouver, Canada V6T 1Y3 D a t e 20 April 1987 D E - 6 ( 3 / 8 1 ) Abstract This dissertation presents various aspects of the solution of the linear mult i -per iod stochastic programming problem. Under relatively mi ld assumptions on the structure of the random variables present i n the problem, the value function at every time stage is shown to be joint ly convex in the history of the process, namely the random variables observed so far as well as the decisions taken up to that point. Convexity enables the construction of both upper and lower bounds on the value of the entire problem by suitable discretization of the random variables. These bounds are developed i n Chapter 2, where it is also demonstrated how the bounds can be made arbi trar i ly sharp i f the discretizations are chosen sufficiently fine. The chapter emphasizes computabil i ty of the bounds, but does not concern itself wi th finding the discretizations themselves. The practise commonly followed to obtain a discretization of a random variable is to par t i t ion its support, usually into rectangular subsets. In order to apply the bounds of Chapter 2, one needs to determine the probabil i ty mass and weighted centroid for each element of the part i t ion. This is a hard problem in itself, since i n the continuous case it amounts to a multi-dimensional integration. Chapter 3 describes some Monte -Car lo techniques which can be used for normal distributions. These methods require random sampling, and the two main issues addressed are efficiency and accuracy. It turns out that the opt imal method to use depends somewhat on the probabil i ty mass of the set i n question. Having obtained a suitable discretization, one can then solve the resulting large scale linear program which approximates the original problem. Its constraint mat r ix is highly structured, and Chapter 4 describes one algorithm which attempts to utilize this struc-ture. The algori thm uses the Dantzig-Wolfe decomposition principle, nesting decomposi-tion levels one inside the other. M a n y of the subproblems generated i n the course of this decomposition share the same constraint matrices and can thus be solved simultaneously. Numerical results show that the algori thm may out-perform a linear programming package i i on some simple problems. Chapter 5, finally, combines al l these ideas and applies them to a problem i n forest management. Here it is required to find logging levels in each of several time periods to maximize the expected revenue, computed as the volume cut times an appropriate discount factor. Uncertainty enters into the model i n the form of the risk of forest fires and other environmental hazards, which may destroy a fraction of the existing forest. Several discretizations are used to formulate both upper and lower bound approximations to the original problem. i i i Table of Contents Abstract ii List of Tables • v i List of Figures vii 1. Convexity Results : • • 1 1.1. Introduction 1 1.2. Statement of the problem ; 3 1.3. Convexity of the value function 5 2. Bounds on the expectation of a convex function 11 2.1. Upper bounds on compact sets 11 2.2. Improving the bounds by part i t ioning 17 2.3. A Numerical Example 20 2.4. Generalization to Unbounded Domains : . 21 2.5. Bounds for the multi-stage stochastic program 24 3. Condit ional Probabilities 27 3.1. Introduction 27 3.2. Mul t ivar ia te normal distribution: Simple Monte-Car lo 28 3.3. Deak's Method 31 3.4. Szantai's Procedure ; 32 3.5. A H y b r i d Method 35 3.6. Numerical Results 37 3.7. Condi t ional Expectat ion 41 3.8. Extensions 42 4. A n A l g o r i t h m 48 4.1. Introduction 48 4.2. Decomposit ion and the recourse problem 51 iv 4.3. Nested decomposition 53 4.4. Tr ickl ing down L P problems with multiple right hand sides 56 4.5. Storing Inverses • 60 4.6. Numerical results 62 4.7. Further possible improvements 65 5. A Forestry Applicat ion 69 5.1. Introduction 69 5.2. Determining the transition matr ix 70 5.3. Present value of standing timber 74 5.4. Formulat ion of the problem 76 5.5. Numerical results 78 5.6. Conclusion 83 Bibliography • • 84 v L i s t o f T a b l e s Table I. Results from a sample of size 100 38 Table II. Results from a sample of size 1000 39 Table III. Characteristics of the test problems 63 Table I V . The Two-period Problems 63 Table V . The Three-period Problems 64 Table V I . Annua l fire rates and their cumulative probabili ty distr ibution 71 Table V I I . Discretizations used and their probabilities 73 Table VI I I . Age specific data for the problem 76 Table I X . Results for various scenarios 79 Table X . Results for upper bound problems 80 Table X I . Some seven-period formulations 80 Table X I I . Upper bound problems wi th soft constraints 82 Table X I I I . Lower bound problems with soft constraints 82 vi List of Figures Figure 1. Madansky 's Upper Bound in One Dimension 13 Figure 2. The new bound when m — 2 16 Figure 3. Deak's method in Two Dimensions 31 Figure 4. A sample decision tree 49 Figure 5. Op t ima l basis decomposition i n IR 3 59 Figure 6. Loose cuts and tight cuts 65 Figure 7. Histogram of simulated fire rates 72 vi i Chapter 1. Convexity Results 1.1. Introduction "Real life" opt imizat ion problems are often complicated by the presence of uncertainty. The common cause of this uncertainty stems from the fact that decisions have to be made now for a process which wi l l only be observed i n the future. As a simple example consider the following. The production manager of a plant has to make a production decision today to optimize some objective criterion, but the finished product may only be available some time later, i n a month, let us say. One of the considerations which the manager has to attend to is to satisfy customers' demands for his/her product. But what wi l l that demand be? Short of requiring fixed orders one month i n advance, preferably wi th non-refundable prepayment — to be sure to be sure — there seems to be no way to find out. But such stringent order policies are unenforceable and would only serve to cloud the relations between the firm and its customers. O n the other hand, probabilistic statements about the demand are often possible, based on observed quantities in the past. This information can be incorporated into the formulation of the optimization problem, which thereby is turned into a stochastic pro-gramming problem, and the random elements influence the opt imal decision. Immediately there is a problem: Wha t do we mean by minimizing (or maximizing) a function subject to random variations? Al though other possibilities do exist, the usual approach is to minimize the expected value of the objective function. This makes for difficulties of its own, since finding function values — and derivatives, if they exist — now involves an integration, possibly mul t i -dimensional. The computational difficulties, in fact, are so great that a direct solution is often impossible if the random variables have a continuous distr ibution, and discretization methods have to be used. The problem is compounded if the planning horizon extends over several time periods, 1 as may be necessary, e.g., i n the opt imal management of a renewable resource such as t imber. For the first few periods good estimates of the random variables may be available, but as one gazes further and further into the future, the uncertainty increases. Yet such problems have to be analyzed, and algorithms for their solution need to be developed. This thesis presents various aspects of the solution of the linear mult i -period stochastic programming problem. The problem is introduced formally in Section 2 and important convexity results are stated and proven. Under relatively mi ld assumptions on the structure of the random variables, the value function of the problem at every time stage, that is, the value of the objective function that can be achieved by using the opt imal decisions from that point on, is joint ly convex in the history of the process, namely the random variables observed so far, as well as the decisions taken up to that point. Convexity enables the construction of both upper and lower bounds on the value of the entire problem by suitable discretization of the random variables. These bounds are developed i n Chapter 2 , where it is also demonstrated how the bounds can be made arbi trar i ly sharp i f the discretizations are chosen sufficiently fine. The chapter emphasizes computabil i ty of the bounds, but does not concern itself wi th finding the discretizations themselves. The practise commonly followed to obtain a discretization of a random variable is to par t i t ion its support, usually into rectangular subsets. In order to apply the bounds of Chapter 2 , one needs to determine the probabili ty mass and weighted centroid for each element of the part i t ion. This is a hard problem in itself, since i n the continuous case it amounts once more to a multi-dimensional integration. Chapter 3 describes some Monte-Car lo techniques which can be used for normal distributions. These methods require random sampling, and the two main issues addressed are efficiency and accuracy. It w i l l turn out that the opt imal method to use depends somewhat on the probabil i ty mass of the set i n question. Having obtained a suitable discretization, one can then solve the resulting large scale linear program which approximates the original problem. Its constraint matr ix is highly 2 structured, and Chapter 4 describes one algori thm which attempts to utilize this structure. The algori thm uses the Dantzig-Wolfe decomposition principle, and the chapter shows how various decomposition levels can be nested one inside the other. Another key ingredient of the algori thm is the realization that many of the subproblems generated in the course of this decomposition, share the same constraint matrices and can thus be solved simultaneously. Numerica l results show that the algori thm may out-perform a linear programming package on some simple problems. Chapter 5, finally, combines all these ideas and applies them to a problem i n forest management. Here it is required to find logging levels in each of several time periods to maximize the expected revenue, computed as the volume cut times an appropriate discount factor. Uncertainty enters into the model in the form of the risk of forest fires and other environmental hazards, which may destroy a fraction of the existing forest. Several discretizations are used to formulate both upper and lower bound approximations to the original problem. It develops that the opt imal value as well as the opt imal first period solution are reasonably stable wi th respect to the discretization. In particular, a one-point discretization using only the expected destruction rate suffices to find an implementable harvesting policy. 1.2. S t a t e m e n t o f t he p r o b l e m The multi-stage stochastic linear programming problem considered throughout this thesis is the problem min \ cixi + E ^ 2 min I c 2 x 2 4- E * 3 i i 2 [min(c 3a;3 H h E ^ T u 2 ] . . . ] ^ T _ 1 m i n c r ^ T (1) xt > 0, t = 1 3 where the vectors Xt G l R n ' , t = 1 , . . . , T are the decision variables, the vectors £t G IR," 1 ', t = 2 , . . . , T are random vectors on a probabili ty space ( Q , E , P ) , and all other quantities are known. M a n y problems in sequential decision making under uncertainty can be cast i n this form. Consider for example the following stylized problem, adapted from Smi th [126], which might face the manager of a production plant wi th uncertain demand. Firs t a production level has to be chosen for the first period, subject perhaps to some capacity constraints. Next a random variable is observed, describing the demand during the first period. Based on the outcome of this random variable, the product ion level is set for the next period. Capacity constraints i n period 2 may depend upon events having occurred previously, both first period production as well as the observed demand i n period 1. Another random variable is observed, namely the demand during period 2. This random variable could conceivably depend on the demand in the previous period. Then the production level for the th i rd period is chosen and so on. The objective might be to minimize total expected cost or to maximize expected profit. Similar problems arise i n portfolio theory, the theory of structural designs, agriculture, forestry, etc. (see e.g., Bradley and Crane [27], Kal lberg and Ziemba [80], Mirzoakhmedov and Mikhalev ich [92], Nazareth [96], and many other references found i n [22]). Prob lem (1) was first formulated i n a 1955 paper of Dantzig [30], and many general-izations have since appeared i n the literature. Rockafellar and Wets i n a series of papers ([119, 120, 121, 122]) studied an extension to convex functions and constraints, Olsen [102, 103, 104] and Dempster [37] gave a number of important theoretical results i n the form of necessary conditions which have to hold at an opt imal point. Infinite horizon analogues of P rob lem (1) were studied by Gr ino ld [58, 59] and F l a m [47]. A r k i n and Evstigneev [4] contributed to the theory, and the entire spectrum of known results received comprehen-sive treatment in Dempster [38]. The theoretical aspects of these problems have been researched at great length. Knowledge about computational procedures is considerably more sparse. A n easy 4 special case is the stochastic programming problem wi th simple recourse (see Everi t t and Ziemba [46]), which has At = ( + / , —/) for t > 1, so that the minimizations are t r iv ia l . A n algori thm for the general two-stage problem was developed by V a n Slyke and Wets [131], based on Benders ' decomposition [8]. Strazicky [128] and Birge [10] gave algorithms based on basis factorization and Dantzig-Wolfe [33] decomposition of the dual . A l l these methods are essentially equivalent, as demonstrated in Birge [12, 14]. Algor i thms based on the Lagrangian and augmented Lagrangian functions of the problem have been developed by Rockafellar and Wets [123] and Dempster, G u n n and Merkovsky [39], respectively. Recently Birge [15, 17] extended the decomposition algorithm to the multi-stage case by nesting various levels of decomposition inside each other. Conceptual algorithms were given by Louveaux [86] for a quadratic extension and by Noel and Smeers [100] for the convex case, but their algorithms have not been tested computationally. Prob lem (1) is a stochastic version of the staircase linear program, which was stud-ied by Ho and Manne [68], Wi t t rock [148], Abrahamson [1], and Scott [125]. M a n y of the technical tricks developed i n their work for the deterministic problem are adapted i n Chapter 3 to the stochastic setting. 1.3. Convexity of the value function Following accepted notation i n the field, Qt(%i, • • •, a?t—l j €2, • • • >£t) denotes the value function of the problem at time t, i.e., T~ 1 s.t. Atxt B t + 1 x t + A t + 1 x t + 1 (2) BTXT—I + ATXT — £x xT > 0, r = £,...', T. 5 where X \ , . . . , xt-i, £2, • • • , £t satisfy the remaining conditions of (1) which are not included i n the above problem (2), that is, A\X\ — h\ B2xi + A2x2 = 6 £ 3 * 2 + ^ 3 * 3 = 6 (3) Bt-\xT-2 + At-iXt-i = (t-i XT > 0, T = I, . . . ,t — I. This definition of Qt can be extended to al l of JRni x • • • x I T ' - 1 x IR™2 x • • • x I R m t by setting Qt = + 0 0 if its arguments do not satisfy equations (3) or i f problem (2) is infeasible for some t. Assume that problem (1) has an opt imal solution. Then for each t there exist X \ , . . . , xt-\ • £2 > • • • ? £t such that Qt is finite. In this definition of Qt there is no mention of the random structure on £t- The vector £t is known when problem (2) is solved and is treated as simply an m, t-dimensional point contained i n the convex set S t C IR" 1 ' , which is denned as the smallest closed convex set of probabil i ty measure 1. The probabilistic structure comes into play only when one pulls back to the previous decision stage to evaluate Qt-i by taking the expected value wi th respect to £ f . This expectation is denoted by <j>t-i(xi,... , a : t_ i ,6 , . • • , 6 - 1 ) = E£,|6.•••>£«.-1 Qt(xi, • • • xt-i, (2, • • • ,6-i>6)- (4) P r o p o s i t i o n 1. QT is jointly convex in its arguments, piecewise linear on its effective domain and uniformly Lipschi tz . P r o o f . 1. Let z° = { x l . . . , x ° T _ 1 , i l . . . , ( ° T ) , z1 = and A G [0,1] be given. B y z A denote the point zx = (1 — X)z° + \zl. If QT{z°) = + 0 0 or QT{zl) = + 0 0 , then the inequality <2 T (z A ) < (1 - A ) Q T ( z ° ) + \QT(Z1) holds tr ivial ly. Assume therefore that Q r ( z u ) and Q T ( 2 1 ) are both finite and 6 let x°T and xxT be such that CTXT — QT{Z1) and ATX1t — CT ~ BTX1T_X, for i — 0 or 1. Lineari ty of a l l constraints i n (2) and (3) implies then that xT = (1 — A ) x ^ + A x ^ satisfies ATXJ. = CT ~ BTXT-\I a n d z A satisfies the remaining constraints (3), so that QT{Z*) is finite. Further QT(zx) < cTxT = (1 - A)cTs4 + \cTxlT = (1 - X)QT(z°) + AQTC* 1)-Thus QT is convex. 2. The duality theory of linear programming shows that QT{Z) — min{CTXT '• ATXT — CT ~ BTXT-I} — M A X( 7 R(^T — BTXT-\) '• ITAT < c^} . A n opt imal solution to this latter maximizat ion problem exists for every z i n the effective domain of QT and is attained at a basic solution. Since there are only a finite number of bases, B1,.. . , BL, it suffices to consider a finite number of possible solutions nl = cT(BL)-\ where I = 1,...,L. Thus QT is the maximum of a finite number of linear functions CT(B1)~1(CT — BTXT-I), hence is piecewise linear. 3. F ina l ly let z° and z 1 be contained i n the effective domain of QT and assume without loss of generality that QT{Z1 ) > QT{Z°)- If x* is an opt imal solution i n the defining relation for QT{z°), then QT{Z1 ) — QT(Z°) = m i n {CTX : ATX = CT — JBT^T-I } — min {cyx : ATX = CT — BTXT_1} — vmn{cTX : ATX = CT ~ BTXT_1} — CTX* — min{cj '(a; — x*) : AT{X — x*) = CT — CT — BT{XT_1 — xT_1)} = max W B ' ) - 1 [ ( & ~ (T) - B T ( 4 - I " *T- i ) ]} l — l,...,L < max {cT(Bl)-l{& - CT)} + max {cT{Bl)~l BT{x\,_x - xT^)} for some constant C which depends only on the coefficients of CT, (B1)^1 and BT- 0 Proposition 2. If QT is jointly convex in its arguments and uniformly Lipschitz on its effective domain, then so is <pt-i-7 Proof. 1. Let z% ~ (x\,... ,x]_1,(l,... ,£j_j), i = 0,1 be points in the effective domain of <7Jt_i and let A £ [0,1] be given. Define z A = (1 - A)z° + A z 1 . Then <r3 t_1(zA) = ^ Qt{zXM)Ft{d£t) < f [ ( l - A ) g t ( z ° , 6 ) + AQ t(z 1,6)] JPtK*) = (1 - A) / Qt(z\tt)Ft(dtt) + A / Q T (^ ,6)^Kt) = ( 1 - A ) c > t _ 1 ( 2 0 ) + A ^ _ 1 ( z 1 ) . 2. Moreover, l ^ - ! ^ 1 ) - ^ . ! ^ 0 ) ^ jf Qt{z\it)Ft{dit)- Qt{z\it)Ft{dit) < Js \Qt(z\^)-Qt(z0,Ct)\Ft(d^) • Proposition 3. Qt is convex for al l t and satisfies a uniform Lipschitz condition on its effective domain. Proof. (By backward induction.) Proposi t ion 1 established the claim for t = T. Assume therefore that Qt+i is convex and uniformly Lipschi tz . Then <f)t is convex and uniformly Lipschitz by Proposi t ion 2. Moreover, Qt{x\, • • • ,£t-i,6> • • • ,£t) = m i n ctxt + <j>t{xi,>.. ,xt,£2, - • • ,£t) s.t. Atxt =£t-Btxt-i. (5) xt>0 Let z1 — (x\,... ,x]_1,£l,... = 0,1 and A E [0,1] be given. If problem (5) is infeasible for z° or z 1 , then there is nothing to prove. If both problems are feasible, then as i n the proof of Proposi t ion 1, z A = (1 — A)z° + A z 1 is feasible for the min imiza t ion problem m i n ctxt + (pt(zx,xt) s.t. Atxt = Lit ~ Btx^, xt>0 8 and Qt{zx) < (1 - X)Qt(z°) + A Q ^ z 1 ) . Now assume both z° and z1 are contained i n the effective domain of Qt wi th Qt(zx) > Qt{z°). Take any e > 0 and let x* be an e-optimizer for the minimizat ion problem defining Qt(z°), that is Qt(z°) > ctx* + <fit(z°,x*) - e. If is the global Lipschitz constant of (pt, then Qtiz1) - Qt(z°) < mm{ctx + cf>t(z1^) • Atx = $} - Btx]^} - [ctx* + <pt{z°,x*)] + e = min{c 4 (a; - x*) + (p^z1 ,x) - <pt(z°,x*) : At(x -x*) = et~ & - Bt{*U - x\_x)} + e < m i n { c t ( x - x*) + <i>t{z\x) + C^\\x - x*\x + C J z 1 --4>t{z\z) : At{x -z*) = t l - C0t ~ B t ( x J _ a - x*^)} + e < C^\\zl - z° | | i + mm{ctw + C0||?j;| |i : Atw = Cl - - Bt{x\_x - *?_,)} + e. This last minimizat ion problem can be re-formulated as an L P problem, so that by the same argument as i n Proposi t ion 1 one obtains m i n { c t ™ + CVlMli : Atw = £ - - B^x]^ - x ? _ J < C^z1 - z0]]^ for some constant C\ depending only on the deterministic elements of the problem. Th i s establishes that Qtiz1) - Qt(z°) < ( C V + C O P 1 - + e. Since e > 0 can be chosen arbitrarily, Qt is seen to be globally Lipschitz wi th Lipschi tz constant C = Cj, + C\. D The above results stand some clarification. It has not been proven nor asserted that Qt{z) w i l l be finite for every z i n the appropriate finite-dimensional space I R L on which it is defined, i.e., complete recourse is neither required nor implied. M a n y points in the effective domain of Qt may even be inaccessible, i f for instance the random variable £ ( is discrete. Nevertheless, the effective domain of Qt wi l l be a convex set (by assumption it contains at least one point) , and on it Qt satisfies a global Lipschitz condition. The constraints which 9 guarantee finiteness of Qt can be thought of as induced constraints (see Rockafellar and Wets [121] and Wets [143]) on the preceding decision variables xx,... ,xt-i-Stronger convexity results are proved in Wets [141], which also treats the case of stochastic cost vectors and stochastic constraint matrices. In order to prove the Lipschi tz continuity of the functions Qt and <j)t i n this more general setting, it is now necessary to assume that al l random elements are square-integrable. In part icular, is a convex function of the first-stage decision variable xi, and a solution could i n principle be found by any existing convex programming algori thm. The main difficulty stems from the fact that function evaluations are phenomenally expensive: Mul t i -d imensional integrations are at the outer l imits of capacity for any computer, and what is needed for continuous £t is not one, but T — 1 such integrations, to say nothing of the inner minimizations for t — 2 , . . . , T , which have to be performed as well . The next chapter describes how one could obtain upper and lower bounds for the values of Qt which might require considerably less computational effort. 10 C h a p t e r 2. B o u n d s o n t h e e x p e c t a t i o n o f a c o n v e x f u n c t i o n Since it is usually very hard to evaluate 4>t directly, one might be content wi th esti-mat ing or bounding its value, especially i f the bounds are reasonably good. The problem of finding such bounds is treated in abstract form i n this chapter, following earlier work on this topic by Madansky [89], Per lman [107], Ziemba and But terworth [151], Hausch and Ziemba [64], Huang, Ziemba and Ben-Tal [70], and Ben-Tal and Hochman [9]. To simplify the notation, we fix a point {x\,..., Xt-i, 2^> • • •, £ t - i ) and define the function g : I R m t -* IR by did) = - Q t ( a 5 i , . . . , a J t - i , 6 - - - - i 6 - i ; 6 ) - I1) Now to find (j)t{xi,..., xt-i, £2, • • • , £t- i)> what is required is the evaluation of a mult iple integral of the form 9 = f 9(it)Ft{dtt), (2) where g is a convex function by Proposi t ion 3 and Ft is the probabil i ty dis t r ibut ion of £t,given £2, • • • ,£t-i- A t this stage the index t becomes dispensable and wi l l consequently be suppressed for the remainder of this section. The solution of (2) presents enormous computational difficulties i f m is larger than about 5, particularly when g is difficult to evaluate. This chapter extends an idea of Edmundson [45] and Madansky [89] to find upper bounds for (1), which, together wi th the known lower bounds arising from the use of Jensen's inequality [107], can be used to find bounds on g, which are sharp i n the sense that the new upper bound is the best bound available wi th only first moment information on the probabil i ty distr ibution. The new bounds can be calculated for arbitrary distributions by solving a linear programming problem, where previously the Madansky bounds were computable only when the components £ 1 ? . . . , £ m of £ were independent. See [64, 108] for surveys of this work. 2.1. U p p e r b o u n d s o n c o m p a c t s e t s 11 Let U denote the convex support of £, i.e., the smallest closed convex subset contained i n H m such that JydF = 1. Let £ be the expected value of £ and assume g : JRm —> H is a convex function which is integrable wi th respect to dF, so that the expected value g exists and is finite. Suppose first that U is compact. Upper bounds i n the case m = 1 are discussed i n Huang, Ziemba and Ben-Ta l [70]. Since U is a closed interval [a, 6], one obtains the estimate, see Figure 1, ^ = ^ < 5 < ( L J W i ± i L ^ ) = , . ( 3 ) o — a If both upper and lower bounds are computed simultaneously, as is often the case in the applications, then the information from the lower bound can be used to find an improved upper bound. Instead of integrating the line segment AB used i n (3), one may employ the piecewise linear majorant of g represented by the line segments AC and CB. This can be effected by subtracting from gu the weighted mass of the triangle ABC or, equivalently, the triangle A'BC, namely the integral pg —g rg —g pu% I h(t)F(dt)= / Fr{at<t<bt}dt= / / F(ds)dt, Ja Jo J<} J at where h(t) is the vertical separation between the piecewise linear majorant and the line segment AC, Xt = t/(gu - gl), at = Xta + (1 - A t ) | , and bt = \tb + (1 - A<)£. This s imply represents a change in the order of integration. Of course the inner integral is not part icularly nice, but for a continuous distr ibution i* 1 wi th density / one obtains the estimate / ' F{ds) = f 1 f{s)ds > i i i f f{s){bt - at) > inf f{s){bt - at), Jat Jat s£[at,bt] s£[a,b] and so g<gu = g u - ( g u - gu) < 9U ~ f i n f u / (-)(&, - at)dt = (1 - a)g* + ag1, (4) Jo s€[a,b} 12 where a — \(b — a) inf 3£[ a,b] f{3)- This may not be a very large improvement over the Edmundson-Madansky bound — if f(s) = 0 for some s £ [a, b] there is no improvement at al l — and finding inf s g [ a i ( ,] f(s) may pose another problem. O n the other hand, i f J F is uniform, then a — 0.5, and for unimodal distributions one obtains immediately that i n f s € [ a , 6 ] f(s) = m i n { / ( a ) , / ( & ) } . Occasionally, it may be possible to find a simple expression for inf g g[ a t i ( , t ] f(s) and use this quantity directly. This method generalizes to higher dimensions rather easily, as w i l l be°shown i n Section 2.4. It is easy to extend these result to the case of a random vector (m > 1) wi th indepen-dent components. For U is an m-dimensional rectangle of the form m U = Y[[ai,bi] i=i and independence then yields the following estimate [89]: 1 i / m \ ^ E - E I K *«.-".c)> (5) t i = 0 i m = 0 \j = l J 13 where H ii i ci ~~ £j i ci £ 7 — ai d0 = a,i, d1 — bi, bJ0 = and b[ — -f . bj — cij bj — (jj B y repeated application of formula (4) wi th a continuous distr ibution F, one improves this to 2 2 / 771 \ j ^ E - E I K c J , (6) i1=o im=o y=i y where d20 = 7? = (1 - a*)*?, 7 / = (1 - a ; ) ^ > 7? = a; , a ; = §(6» - O i ) i n f , e [ o . ) t l . ] /,-(*) and is the marginal density of Ci- The remaining notation is as in formula (5). If the Ci are correlated, one has to proceed differently. Let Gr(g) denote the graph of g, i.e., the subset of IR"1"1"1 given by Gr(^) = {{x,y) I x € U,y = g(x)} , and let H be its convex hul l . Then (£,g) E H whenever £ and g both exist, since integration wi th respect to a probabili ty measure can be viewed as simply taking convex combinations. Hence one immediately has the upper bound g < sup {y | (£, t / ) 6 H}. For the case of an m-dimensional rectangle this was pointed out by Madansky [89], whose notation H*(x) = sup {y I (x,y) £ H} we adopt. The following result gives a computable upper bound on g regardless of the shape of U. Theorem 1. Let V be a bounded convex polyhedron containing U and let {y\,... , TJ/} be the extreme points of V . Assume there exists a convex function g from IR f c into IR whose effective domain contains V, such that the restrictions of g and g onto U satisfy the realation g\u = g\\j. Note that U has full measure, such that g(.) = g(.) a.e. wi th respect to the underlying probabil i ty measure P, so it is permissible to identify the two maps and use the same symbol g for both of them. Then g < z = max <j ^ 9i{vi)^i I K > 0, A; = 1, £ \vi = C J> . (7) , i=l i=l i=l 14 P r o o f . Let (x,y) £ H. Then (x,y) is a convex combination of points i n Gr (^ ) , so there exist an integer K, weights ak > 0 and points uk £ U, k = 1 , . . . , K such that K K K ]Taife = l , ^ a k u k = x, ^ ctkg(uk) = y. k=l k=l k=l Since V is convex and U C V , each u{ is a convex combination of vi,... ,vj, where al l vertices can be included if one accepts zero weights for some of them. Thus for every k, there exist /xki > 0, i = 1 , . . . , i " such that L^i^ fci = 1, £ i = i V-kiVi = «fc, i-e., x = Y^i k OLkV'ki'vi- B y convexity of g it follows that K K / I \ K I k=l k=l \i=l J k=l z=l Setting A; = ajt/ifci yields a; = X)• A ^ ; , Aj = 1, A» > 0, and y *i9{v*) < m a x ^ X! I > °> X] A i = l j S = i V i In particular, this holds for (x,y) = ( £ , # * ( £ ) ) (which is an element of H). D R e m a r k s . 1. If U is a polyhedral set, then H*(£) = z. This holds i n particular for m-dimensional rectangles, so Theorem 1 subsumes Madansky's result as a special case. Figure 2illus-trates the upper bound when m — 2. 2. If m = 1, then U — [a, b], so z = m a x f ^ A ^ a ) + A2<7(&) | A x + A 2 = 1, A x a + X2b = £, A i , A 2 > 0}. This implies A], = £ f £ , A 2 = 1 - A a = , whence 2 = ^ 5 ( a ) + f z f fif(0)-This is the usual Edmundson-Madansky bound. 3. W h e n V is a simplex, there is a unique convex combination of the extreme points {v{} which yields £. The maximizat ion in Theorem 1 is therefore t r iv ia l . 4. Birge and Wets [20; Proposi t ion 5.1] give a related result using a family {vx(.), x £ U} of probabil i ty measures on ext U, the set of extreme points of U, such that J svx(ds) = x for all x £ U (8). ext U 15 This condition makes their result more difficult to apply, although it is i n general sharper than Theorem 1. The of Theorem 1 can be interpreted as a probability=measure on ext V={v},..., vj} such that (8) holds at x — £ only. F i g u r e 2. T h e n e w b o u n d w h e n m = 2. 5. If £ has independent components, the estimates obtained from Theorem 1 are not as sharp as the upper bounds i n (5). For example, let g(x,y) = (x + y ) 2 + x + 2y and let £ be uniformly distr ibuted on [ 0 , l ] x [ 0 , l ] . Then (5) gives the upper bounds < \ [#(0,0) + 5 ( 0 , 1 ) + g{l,0) + g{l,l)] = 3, 16 whereas one computes, using Theorem 1: max{3A 2 + 2A 3 + 7A 4 | Aj + A 2 + A 3 + A 4 = 1, A 3 + A 4 = 0.5, A 2 + A 4 = 0.5} = 3.5. The lower bound from Jensen's inequality is g > 5(0.5,0.5) = 2.5, while the exact value can be computed as g — 8/3. 2.2. Improving the bounds by partitioning. B o t h the upper and the lower bound can be sharpened in a number of ways. Let V be parti t ioned into convex polyhedra Vk of positive probabil i ty mass pk = / F(d£), for Vfc k = 1 , . . . , i f such that V = UlLi'^ Sand^W' = 0 if * ^ j. Let further = ± J F(d£) yk be the conditional mean of £ given £ 6 Vk. Theorem 2. If Zk are the upper bounds on R[g(() \ £ £ Vk], found by applying Theorem 1, then K K g(0 < Y,pk9(t) < g < ^ z- (9) (a) k=l (b) (c) k=l (d) Proof. O n each subset Vk we have from Theorem 1 that —k Pk9tt ) < Pk9k < PkZk, where gk is the conditional mean of g(x) given ( £ F l . This quantity can be computed b y p7 Iv 9(0F(d0> s m c e Pk > 0. Because g = Jv g dF = T,kfv«9dF = T,kPk9k, the second and th i rd inequality follow after summing for k from 1 to K. —k Now consider the discrete distr ibution on V which takes on realizations £ wi th probabil i ty pk for fc = 1,...,K. Its mean obviously is £, so by Jensen's inequality 9(0 < E*Li Pk9(£k), proving (a). Similar ly, the upper bound of Theorem 1 describes a probabil i ty dis tr ibut ion on the extreme points vk, i = 1,.. . , /*. of the polyhedron Vk. Let irk be the corresponding probabilit ies. Then one may define a random variable Y on V wi th support on \J{ k vk by P r { y = i t f } = Y, 17 where the summation is over all polyhedra Vk containing vk. This random variable again has mean £, so K K Ik k=l k=l i=l by Theorem 1. D That the bounds can be made arbitrari ly sharp by choosing a sufficiently fine par t i t ion of V is also derivable from first principles: There exist step functions hi < g < h2 such that I h^dF < j gdF< f h2dF< f hidF + e. (10) Jv Jv Jv Jv If a par t i t ion S is chosen so fine that hi and h2 are constant on each element Vk of S, then both the upper and lower bounds are exact when applied to h2 and hi, respectively, i.e., f K K K K Ik . / hidF = < X>J.«7(f*) <9 < Y,PkZk ^ E E ^ f e 2 ^ ) = / h*dF> •*V k=l k-1 k=l k=l i-1 ^ V (11) and so Y^PkZk - ^Pkgitk) < e k k by combining relations (10) and (11). A different approach is presented in Birge and Wets [21]. Their idea is to include more information about the probabili ty distr ibution F itself in the formulation of the linear programming problem (7). To that end they re-interpret (7) as a moment problem of the type described i n [82]. It is clear that 9 = VF(g(t))< sup EF.(g(0), (12) where T is any family of distributions over V which contains F. For example, one could consider the family !FQ of all probabili ty distributions over V with no further restrictions, or the family T\ of distributions which have mean £. 18 Both of these lead to simple maximization problems, viz., sup Ej?'(flr(0) = . m a x 9(vi)> F'eT0 t=l,...,1 and the moment problem over the family T\ is precisely the L P problem (7). Restr ict ing F further w i l l lead to sharper bounds in (12), but also to increased effort in the solution of the moment problem. For constraints of the form / hi dF' <ai or / hi dF' = a{ or / h{ dF' > a{ Jv Jv Jv with hi piecewise linear on V, Birge and Wets [21] demonstrated that the corresponding moment problem can again be re-formulated as an L P problem. If second order information on F is available, that is, i f it is known that < Jv £i£j dF < afj, and i f hfj are piecewise linear functions on V satisfying ^(0>&fc>fcJ(OforalUeF, then F 6 = {F> ' Iv d F ' = Iv i d F ' = 1 Iv K ^ i ) d F ' - a^Sv h^tt)dF' ^ atj } • If {Vk}k<_1 is a part i t ion of V into compact polyhedral sets such that h^ and h^ are al l linear on each subset Vk, then convexity of g implies that there always is a maximiz ing measure JF" £ T2 for (12) wi th support on \JkextVk, the set union of al l the extreme points. For polyhedral Vk this restricted problem can be writ ten as an L P . Another interesting result was recently proven by Frauendorfer [48]. B y Choquet 's theorem [124], the maximizing measure in (12) has all its mass on the extreme points of V. If V is an m-dimensional rectangle, then there is a unique probabil i ty dis tr ibut ion JP on ext V which satisfies all 2 m moment constraints 19 for all subsets / C {1, • • • ,m}. Thus the maximizing measure is in particular independent of the function values at the vertices V{. Final ly , formula (3) can be extended to the multivariate situation to read g<agl + (l -a)gu. Now a = inf 5 grj f(s)fi(U) and l^(U) is the euclidian volume of U or a lower bound for i t . If, as is often the case, U is a rectangle, then /i({7) is simply the product of its dimensions. 2.3. A Numerical Example A n example may serve to illustrate these ideas. Let U = {(^ 1,^ 2) : £1 + £2 — 1} D e the unit circle in IR 2 and assume £ = (£1,^ 2) has a probabil i ty density /(6'6)=4^(7f - t f - t f - 1 ) i f 17. The surface described by / is part of a sphere in IR 3 wi th center ( 0 , 0 , - 1 ) and radius 5/3. The geometry of the problem is thus apparent, while analytic answers are rather hard to obtain. Let 3(^ 1,^ 2) = (£1 + &)2 + £1 + 2£ 2 and consider first the polyhedral set V = [—1,1] x [—1,1]. Obviously £ = 0, and so a lower bound on the expected value g = .4553 is given by Jensen's inequality as gl0 = g(0,0) = 0. 1. To find an upper bound according to Theorem 1, one solves the linear program max Aj — A 2 + A3 + 7A 4 s.t. - A i + A 2 - A 3 + A 4 = 0 - A , - A 2 + A 3 + A 4 = 0 A i + A 2 + A 3 + A 4 = 1, which has value g^ = 4. 2. Combin ing the upper bound g^ wi th the lower bound, one obtains the bound g^ = f f So + iT5o = 3.1220. 20 3. Observing that fv £1 £2/(^1,(2) <^(£i • £2) = 0, Frauendorfer's bound can be computed as g» = + g(-l,l)+ g(l,-l) +flf(l,l)]/4 = 2. 4. Exact second order information is quite hard to obtain, but wi th a moderate amount of work one can find the estimate 0.2 < / £ 2 / ( 6 , 6 M 6 , 6 ) < o . 2 5 . Jv Using the piecewise linear functions hr(() = 2|&| - 1 < < |&| = hf((), one can formulate another linear program, slightly larger than the previous one, which has for its opt imal value the upper bound g% — 1.875. 5. To improve the upper bound, one could also try to shrink the polyhedron V i n order to approximate U more closely. Removing from V the two triangles ((.5,1), (1, .5), (1,1)) and (( — .5, —1), ( —1, —-5), ( — 1, —1)), one obtains the bound g\ — 2.25 from Theorem 1. A certain amount of finesse is required to determine a 'good' polyhedron: Had one re-moved from V the triangles in the other two corners, namely ((1, —.5), (.5, —1), (1, —1)) and (( —1, .5), —.5,1), ( —1,1)), there would have been no improvement i n the upper bound, even though the area of the circumscribing polyhedron has been decreased. 6. The new polyhedron, let us refer to it as V1, is not a rectangle, so Frauendorfer's bound can not be computed, but using the same approximation to second order information as before, one can compute again Birge and Wets ' bound by solving a linear program whose value is 1.625. 7. F ina l ly divide V1 along the line £1 +£2 = 0. The probabili ty mass of each part is equal to .5, and the conditional means can be computed as ± ( . 2 2 3 4 , .2234). Using Jensen's inequality, one has the improved lower bound g[ = 0.04989, and after solving two linear programs wi th 3 constraints and 4 variables each, Theorem 1 gives the upper bound g% = 1.3402, which can be improved to g% = &g% + ±g[ = 1.0570. 2.4. Generalization to Unbounded Domains Throughout this section, assume that the domain U is convex but possibly unbounded and g dF exists. Then for every e > 0 there exists a compact convex U C U such that 21 I fu^jj 9 dF\ < e, Ju^jj dF < e. Here U \ U denotes the complement of U relative to U. The techniques of sections 2 and 3 can then be applied to U i n place of U. If E[ g | U} > 0, this gives the upper bound g = Eug = E[g\U] f_dF + E[g\U\U] f dF<E[g\U] + z Ju Ju\u as well as the lower bound g>(l-e)E[g\U] + E[g\U\U} / dF > (1 - e) E[<7 | U } - e. Ju\u If E[g | U] < 0, the upper and lower bound are, respectively, (1 — e)E[g \ U] + e and E[g\U}-e. Since it is often difficult to find a suitable U for given e, a different method is desirable. Such a method is available under the following assumptions: (i) U is contained i n (the translate of) a pointed cone in I R m . A cone C is pointed i f x £ C, —x £ C implies x = 0. (ii) g satisfies the linear growth condition limf-joo ^ ^ f f i < b~ < oo, for al l x £ U and al l directions d of U, where d ^ 0 is a direction of U, if x + id £ U for a l l x £ U, t > 0. Proposi t ion 3 established that this condition wi l l hold i f g is defined as i n formula (1). Theorem 3. Assume there exists a polyhedron V wi th extreme points {yi,... , V{} and extreme directions {d\,..., dj} such that U C V. If g can be extended to a convex function on V in the manner of Theorem 1 and assumptions (i) and (ii) hold, then / J g < max{^T\ig(vi) + ^ sN\\di\\ I *i>N ^ °>£ Xi = 1 ^ X { V i + X^'^' = ^ ' (13) t=l j= l i i j Proof. We wi l l show for al l (x,y) £ H that y < m a x { ^ A ^ v ; ) + "^T fyj || | A ; , ^ - > 0, ^ A; = 1, ^ A,-^ + ^fXjdj = £}• i j i i j The result then follows from the fact that (£,,g) £ H. 22 Let (x,y) £ H and assume without loss of generality that \\dj\\ = 1 for al l j Then there exist jk > 0 and uk £ U, k = 1,... ,K, such that Now for al l k, there exist > 0, f3jk > 0 such that / J J £ a . f c = l , «fc = £ + Y^f3jkdj. i=l i=l j=l Therefore * = £ £ lkOiikVi + £ £ IkPjkdj. i j i k Note that ^ . jk/3jkdj is a direction of U for every j = 1,..., J , and » j i k j k by convexity of 5 and the triangle inequality. Then set Xi = ^ 7 f c « t f c , for i = 1,... , / it H = £ IkPjk, for j = 1,..., J k and observe that i i=l j=l In particular, then, k l y < m a x ^ Xkg(Vh) + ^ P A^ o v e r a ^ ^ fc' ^ fc s a t i sfy M S ( 1 4). i=i j=i 23 • R e m a r k s 1. If both d and — d are directions of V , then the linear program (13) is unbounded. This situation is ruled out by assumption (i). 2. For bounded V , the set of extreme direction is empty, i.e. J = 0. Theorem 3 then reduces to Theorem 1. 3. In one dimension, V has the form [a, oo) (or ( — 00,6]). Taking d = 1, one obtains max {Xg(a) + fiS \ X > Q,p, > Q,X = l,Xa + p> = £} = g(a) + S{£ — a)-This expression was derived in [70]. 4. As i n section 4, the upper bound can be sharpened through the use of a par t i t ion {V1,..., Vk} of V. Formula (9) st i l l holds. 5. If assumption (i) is violated, e.g. i f U = I R m , the use of subsets may st i l l be possible. Let t = (t\,... ,tm) G I R m and consider the part i t ion of U into the 2 m subsets 171 given by U1 — {x G U \ xx < ti,... ,xm < tm}, U2 = {x G U | xx < ti,... ^Xm-i < tm—i,xm > im}> etc. 2.5 . B o u n d s for the m u l t i - s t a g e s t o c h a s t i c p r o g r a m . Now let U1,. .., U1 be a part i t ion of 5j> into convex sets of positive probabil i ty and let TJT be the discrete random variable on wi th realizations nlT = E [£T | £T G Ul], for i — 1 , . . . , / and probabilities plT = P r { £ r G U1}, i = 1 , . . . Such a discretization wi l l be referred to as a mean discretization. Further define 1 Lr-iixi,... ,xT-i,(2,- • • , £ x - i ) = ^2PTQT{XI,- • • ,ZT-I,6, • • • ,(T-I,VT)-i=l Then it is clear that LT-I is a finite, convex combination of convex, piecewise linear functions QT-I{- ,VT)I hence LT-I 1S convex, piecewise linear. Final ly , Jensen's inequality implies that LT-I(- ) < <7T-I(- )• 24 Now consider the optimization problem arising at the next to last stage, namely m i n CT-IXT-I + LT-I{X\,. • • ,£T-i)6> • • • >£T-I) s.t. AT-IXT-I = £r - i - BT-IXT-2-This can be transformed into a linear program by wri t ing it as min CT-IXT-I + # s.t. AT-I^T-I = £T-I - BT-\XT-2 0 > i y _ T _ ( a ; i , . . . , x T - i , 6 ) • • • ,£r-i), J = 1, •••,</, where are the linear pieces making up LT-I-This is again a stochastic linear programming problem, wi th one stage less than the original problem (1), and its value Q T _ 1 ( . ) gives a lower bound on the value function QT-I{ - ) of the original problem. B y repeated application of this idea one has thus proven Theorem 4. Let n — (n2,... ,?7T) be a mean discretization of £ = ( £ 2 , . . . ,£T)- Then the large-scale linear programming problem m i n clXl + § V\?c2x\" + £ £ pj'p } 3 c , * { " f c s + • • • + £ p}» ...p*'cT4"-'*' <!2 = 1 Al2 = l k 3 = l k2,...,kT S. t. > l i X i = &"! 5 2 a ; i + i 4 2 X ^ =r/ 2 f c 2, fc2 = l , . . . , # 2 B3xl? + Asxk3"ha =Vs"k3, k2 = l,...,K2 k3 = l,...,K3 BTxkT2ri'kT-1 +ATxkT2^kT = n T 2 k T , k t =l,...,Kt, t = 2 , . . . , T xk„...,kt > Q (15) gives a lower bound for the original problem (1.1). D In the same manner, vertex discretizations u>t are defined as discretizations on cir-cumscribing polyhedral sets V1 D Ul, where the realizations are the vertices of V 1 wi th probabilities corresponding to the weights for one of the upper bounds described in the 25 previous section. Frauendorfer's bound is particularly suitable because its weights are inde-pendent of the values of the decision process. If LO = (u; 2, • . . ,^T) is a vertex discretization of £, then the resulting large-scale linear program yields an upper bound on the original problem. The groundwork for a successive refinement procedure has thus been la id . Once a discretization has been determined (Chapter 3 wi l l show how this can be done), one solves problem (15), using any linear programming algorithm, for instance the specialized version of Benders' decomposition described in Chapter 4 which takes account of the specialized problem structure. Then one examines both the upper and lower bound solutions. If the bounds are far off, one can try to improve them by refining the discretizations. This w i l l bound the opt imal vaiue of the problem arbitrari ly closely as was shown i n section 2.3. The opt imal decision variables wi l l converge to the true solution under suitable conditions (see K a i l [76], Robinson and Wets [117], and Dupacova [42]). Instead of solving two LP problems for each part i t ion, the usual procedure starts from an arbitrary mean discretization of £ and finds x'[ to minimize c\X\ + L\(x\) subject to AiXi = bi xx > 0. Then U\{x\) is calculated for the vertex discretization derived from the same part i t ion. If U\(x*x) — Li(x\) is smaller than a prescribed tolerance, the problem is considered solved wi th opt imal solution x*. If the difference between the upper and lower bound is not satisfactory, then one would find the mean discretization based on a refinement of the par t i t ion just used. Various refinement schemes have been proposed i n the literature, see, e.g., Birge and Wallace [19], K a l i [73, 75]. 26 Chapter 3. Conditional Probabilities 3.1. Introduction Chapter 2 has shown the necessity for evaluating probabilities and conditional expec-tations of a random variable £ on each element A1 of a part i t ion { A l } f = 1 of the support E of £. Some increase i n efficiency is possible i f all subsets are treated simultaneously, but for the purpose of the following discussion the computations are separated. Each subset A1 of H is assumed to be a polyhedral set, possibly unbounded. The probabil i ty that lies i n A1 and the conditional mean of £ given that £ £ A1 are denoted, respectively, p% and £ \ If £ is a continuous random vector wi th distr ibution F, then P*= f dF, -i -i i , ( 1 ) £ = Uli--->£m) = — (9l-----9m)i P where q{= I skF{da). The main emphasis in this chapter is on the multivariate normal distr ibution, and several methods are developed in Sections 2-5 to compute p% and q\ under the assumption that the par t i t ion {A1} consists of ra-dimensional rectangles of the form 771 Section 2 describes some t r iv ia l cases and estimation by a simple Monte-Car lo method (see [61]). The variance of the Monte-Car lo estimators is on the order 0 ( l / m ) , and it is desirable to find ways to decrease the variance. Several such variance reduction schemes are presented next. A decomposition technique due to Deak [34, 35] is given i n Section 3, Szantai's Bonferroni-type approach [130] appears in Section 4. Bo th methods were devel-oped for the use i n chance-constrained programming, where there is occasion only to use the estimator px. It is easy to extend these method to yield estimators for the q\ as well . 27 The two techniques are combined in Section 5 into a hybr id method which attempts to exploit the advantages of both. Numerical results are given i n Section 6 to contrast the performance of these methods on a small number of sample problems. Section 7 discusses briefly some of the problems encountered when forming the quotient £k — q\jpx • In Section 8 we describe extensions of the various techniques to general polyhedral sets and a modification of Szantai's method to treat other multivariate distributions. 3.2. Multivariate normal distribution: Simple Monte-Carlo Arguably the most commonly used continuous multivariate distr ibution is the mul t i -variate normal distr ibution [72] whose density / is given by f(z\ — 1 -{Z-1L)tY,-x(Z-II,) /«\ A } ~~ (27r )W2 |S | 1 / 2 ' K ) where m is the dimension of the random vector z, p, its mean and S its covariance matr ix , assumed symmetric and positive definite. The multivariate normal dis tr ibut ion possesses some attractive properties which wi l l be used in the description of some of the methods i n this paper. It is well known, for instance, that if x ~ JV"(^,E) , then y — Cx ~ Af(Cfi,CSC) for an arbitrary matr ix C, the only proviso being that the product Cx be well defined. To simplify some of the presentation we shall assume given an m-dimensional rectangle A — n^-lit 0*'^]' a n ^ a random vector z ~ ^ ( 0 , 2 ) where S is a correlation matr ix , i.e., diag S = ( 1 , 1 , . . . , 1). This does not constitute a loss of generality since z can always be standardized by the linear transformation z \—> y defined by t/j ='(z% ~ \f&ii-We shall denote P = / f(z)dz, qk = / zkf(z)dz, (3) J A J A where / is the normal density as in (2). Before discussing the problem in full generality, we shall give some special cases for which the solution is easy. 1. If m = 1, there are no problems. The probabil i ty p = ^ 7 = J ^ 1 e~z l2dz can be found for example by expanding the integrand into a power series. There are efficient and reliable routines i n almost every mathematical software package which wi l l perform 28 the computation. The numerator gi for the conditional mean is even easier to obtain since the corresponding integral can be solved analytically. Thus q i = i 2TT e - ^ i / 2 _ e - 6 ? / 2 2. For m = 2, the answer is obtained almost as easily. The integral for p can be developed into a power series [6], similar to the one-dimensional case, and commercial software exists which performs the evaluation. In order to calculate qi and similar ly q2 it is possible to exploit the fact that h(t) = te~f I2 permits analytical integration. Thus one may complete the square i n the exponent, exchange the order of integration, and simplify. This gives (details are i n [51]) 1 9i = /2TT - « f / 2 $ - e - b i / 2 U & 2 - 0 " i 2a i - ° " i 2 2 012&1 - °"l2 h - C l 2 a 2 vA - ° - 1 2 h - 0"12^>2 0 - ° " 1 2 - $ a 2 - o- 1 2 oi - $ - f f 1 2 0-2 - 0"l2&l ~ °12 , 0" i2a2 0 - °-l22 a i - 012&2 2 12 where $(.) is the standard normal distr ibution function in one dimension. 3. The last t r iv ia l case occurs when z has independent components. Now X is an identity matr ix , and the problem at hand can be reduced to separate applications of the one-dimensional computation as follows: JA (2TT)W2 9fc and thus m/2 Zk 2TT •''l2dz-»»«* j*oi i 11/ 4 $ ( 6 f c ) - * ( a f c ) ' / 2 < f e < , 6 9fc ^ ( e - 2 / 2 - c - f c 2 / 2 ) 29 Difficulties arise for n > 3 if the components of z are correlated. W h i l e series ex-pansions do exist [42, 43], they converge slowly, and it is usually best to estimate the integrals by resorting to some sampling technique. The simple Monte-Car lo method [61] consists i n generating an independent sample {z1,... ,zN} of size JV from the dis t r ibut ion of z, counting al l instances for which the sample point lies i n the rectangle A, ignoring the others, and forming the estimator where 1^ denotes the indicator function of the rectangle A. It can be shown that p is unbiased, that is, the expected value of (the random variable) p is equal to p. Similar ly one has the estimators which can be shown to be unbiased for qk. Unfortunately, since the estimators are random variables, any performance guarantee can only be formulated in probability, and an ind i -v idual estimator may be far from the true value, even i f its variance is small . Moreover, the variance of p, and similarly of qk, is proportional to the inverse of the sample size, which necessitates a rather large sample i f any meaningful accuracy requirement has to be satisfied. For this reason much effort has been invested i n finding variance reduction schemes. The most popular device is based on antithetic variables, that is, whenever z is a point generated in the sample, one also includes the point — z. A comment should be made here on how to construct the sample points. Since z — Lw ~ Af(0,LLT) whenev er w ~ A^(0, J ) , it suffices to generate m independent uni-variate standard normal deviates w3, i = l , . . . , m and to calculate z3 = Lw3 for some matr ix L such that LLT = S. A n attractive choice for L in this setting is the Cholesky decomposition [127] of S, because it can be computed efficiently and because its tr iangular structure may reduce the computational effort necessary i n calculating z3. 30 3.3. D e a k ' s M e t h o d . Deak [34, 35] describes a more efficient method for finding the conditional probabil-ity p, based on the decomposition z = ALv, (4) where A is chi-distributed wi th m degrees of freedom, and v is uniformly distr ibuted on Sm, the unit sphere i n I R m . Here A can be interpreted as the length of the vector z, v as its direction, L is again an arbitrary matr ix satisfying LLT = S and can be taken as the Cholesky decomposition, just as in the simple Monte-Car lo method. Then <-r2(v) p= f f(z)dz = f dXm(\)dU(v), J A J Sm JrAv) where 7 * 1 ( 1 ; ) T\(v) — m i n { r | r > 0 ,a < rLv < b}, ^2('") = m a x { r | r > 0,a < rLv < b}. For an i l lustrat ion of this idea when m — 2, see Figure 3. r , ( v ) L v Lv F i g u r e 3. D e a k ' s m e t h o d i n T w o D i m e n s i o n s . 31 Sampling is then performed on v only, while the one-dimensional integral dxm{ A) is calculated explicitly. This gives p = ^Yl^i^i where /i-7 = $TI(V>) ^Xm(A). A g a i n one could use simple Monte-Car lo sampling from the uniform distr ibution on Sm, or the method of antithetic variables for some reduction in the variance. Deak presents a different idea which seems to work quite well in practice. Instead of an. independent sample of v, Deak advocates the use of 'orthonormal variates', that is, starting from a random system {y\,... , v3^} of orthonormal directions one computes al l possible linear combinations of the form dj = £ f = 1 ( — l ) n , ^ ( , where nj = 0 or 1 for al l I and i j 1 ^ ii2 i f l\ ^ l2-This method has the advantage of creating a large number of random points, namely ^^ (T)' wi th comparably l i t t le effort. The larger sample size has a dramatic effect on the variance of the estimator p, and Deak reports a "coefficient of efficiency" of up to 1000 [34]. Efficiency is measured as a^to/cr^tk, where cr2 is the approximate variance of the simple Monte-Car lo estimator for a fixed JV, <r\ is the approximate variance of Deak's estimator for the same JV, forming linear combinations of k directions, t0 and tk are the respective computat ion times. This particular measure is used because for both methods the variance is roughly proportional to 1/JV, that is a\ti is more or less independent of the sample size JV. The parameter k can in principle be chosen arbitrari ly from the set { 1 , 2 , . . . , m } , but the m a x i m u m value of 2k (™) occurs when k — |^ 2t +^1 j; a n d the computational complexity increases very fast. Deak reports best results for k = 2, 3, 4. To adapt the method for computing <ji, one observes that / Zif(z)dz = / / ^ ) XLivdXmWdU(v) = / Liv I ^ } XdXm(X)dU(v) J A JSM Jriiv) JS™ Jrj.(v) . . = 0 Liv dXm+i(\)dU{v), JSm Jr1(v) where (3 = T ( I Z i 2^) / V2T ( y ) . Thus qi and p can all be computed simultaneously from the same sample, and the ri(v) need to be determined just once. 3.4. Szantai's Procedure. 32 Szantai [130] uses a completely different approach, based on a Bonferroni-type decom-position of the sample space I R m . It follows that f*=f v>-£ / v>+£ / y>- £ / ^+---+(-ir L v, (6) JA Jnm i JAi i^JAinAj i < j < k J A i n A j n A k Jfj At where A{ — {x \ a{ > Xi or 6; < Xi} and ip is an arbitrary integrand. SinCe dF and J^nA- ^ c a n ^ e calculated easily (we are i n the case m = 1 or m = 2), and using simple Monte-Car lo simulation, one has available three different unbiased estimators for the quantity p, namely (i) p^ — sampling directly from JA dF, (ii) p^2) — calculating explicit ly / dF-Y] I dF = 1 - n + V / ' dF{ Jnm i JAi { Jai and sampling from the rest, (iii) p(3^ — calculating / dF-TfdF + Yf dF = T, dFtj Jn™ { JA< ^.JAinAi t ? j J a < Ja> + ( 2 - n ) ] T f i F i + (" ~ »("»- 2) and sampling from the rest. In the formulas above, F{ and F{j denote the one- and two-dimensional marginal distributions of & and (£{,£;) , respectively. Szantai describes a way to condense the tai l of the expansion in (6) into a single expression which involves only Ij, the number of constraints ai < z\ < h{ which are violated by a given sample point z3. In fact, one obtains N p ( 2 ) = f dF-J2 [_ d F + ^ J T m a x ^ - l } JR"1 j JAi • j (7) 33 The advantage of this method lies i n the fact that all three estimators can be extracted from the same set of sample points. Whi le simple Monte-Car lo wi l l completely ignore a sample point unless it falls wi th in the rectangle A,.there is sti l l some information contained i n the fact that the point lies on the outside. Szantai's method attempts to util ize this information. F ina l ly , the covariance structure of the estimators is determined to form yet another unbiased estimator p ( 4 ) = = A ^ ( l ) + A ^ ( 2 ) + ( l _ A i _ A 2 ) ^ ( 3 ) ) where the weights A i and A 2 are chosen so as to minimize the variance of Szantai reports improvements in efficiency which are of the same order of magnitude as those i n [4]. In order to adapt Szantai's method for calculation of the qk it is necessary to evaluate expressions of the form JA, xkdF, J A n A . xkdF. This can be done by a procedure quite similar to the method presented earlier for m — 2, namely by completing the square i n the exponent of the integrand and explicit ly integrating out one direction. This gives the formula The proof of formida (8) is straightfonvard, but extremely tedious. It is given in an Appendix i n [52]. 34 In the same manner as before one arrives at three unbiased estimators for qk, namely (i) — s a m p l i n g directly from JAzkdF, (ii) — calculating explicit ly / zkdF-Y[ zkdF = yj zkdF (9.1) JB.™ { JAi { JAi and sampling from the rest, (iii) — calculating explicit ly / zkdF - V / zkdF + V / zkdF J B . M . i JAi i^JAinAj = £ / zkdF + (2-m)J2 I zkdF is; JAiHAj • JAi (9.2) and sampling from the rest. Once more the affine combination of least variance is formed. Numerical results appear in Section 6. 3 .5 . A H y b r i d M e t h o d . Deak's method uses a decomposition of the random variable, while Szantai's procedure can be thought of as a decomposition of the sample space I R m . Bo th approaches yield impressive acceleration on the computing time, and it seems natural to try to combine the two, exploit ing all the desirable features simultaneously. The derivation proceeds exactly as i n the previous section, but now Deak's decomposition is to be used i n finding the estimators p^2\ p^ instead of simple Monte-Car lo sampling. Unfortunately this means that Formula (7) is no longer available. Instead one writes W dF(z)- £ / dF(z) + ...+ f dF(z) i<jjAinAJ i<j<kjAinAinA>< JnAi •=i m » m f t = "£(1-1) dF(z) = £ ( l - 1) / dXm(\)dU(v) (10) 1=2 JB< 1=2 JST" J C M = 1 E ^ - 1 ) / dXm(\)dU(v), J S R ' 1 1=2 J C M 35 where Bi = {z'\ a-i < Zi < bi is violated for exactly / indices i}, Ci(v) = {A | a,i < ALiV < bi is violated for exactly / indices i}. In a completely analogous fashion one obtains £ / dF-... + (-ir+* [ dF=f EC'1) f dXm(X)dU(v). i<j<kJAinAjnAk Jf]Ai Js"' 1=3 ^ / JC,(v) (11) In order to estimate the quantities in (10) and (11), a sample is generated from the uniform distr ibut ion on Sm. Since the event L{V = 0 has probabili ty zero, its occurence can be ignored, which makes it possible to determine C/(v) in the following way. Let cri t ical values t£j, for i = l , . . . , m be defined by = r n i n { a ; / . £ j7 j , bi/Liv}, Ui = max{a i /LiV,bi/Liv} and assume without loss of generality that the vectors I and u are arranged in ascending order. Further set l0 = u0 — —oo, / m + i = Um+i = +°° • If X,i,j are such that Z^_x < A < U and Uj__i < A < Uj, then it is not hard to see that exactly m + j — i inequalities are violated, i.e., A 6 Cm+j-i{v)- Moreover, as A decreases, the number of inequalities violated increases or decreases by 1, whenever the next cr i t ical value encountered is from the vector I or u, respectively; none of the inequalities hold i f A > um. F ina l ly \Lv £ A i f and only i f all inequalities hold, i.e., i f lm < A < u%. This suggests the following algorithmic procedure: Step 0: (Initialize.) Determine I, u. Set n v i o l <— m, nh igh <— m, n low <— m. Set upper < hoo, v\ <— 0, v?, <— 0, ZAJ <— 0. Step 1: (Process the next interval.) Set lower 4 - m a x { u n h i g h , / n l o w } , a «- ^Xm(A). Set i2 4 - n v i o l - 1, « 3 - ( n v l ° U l ) . If n v i o l = 0, set v\ *- v 1 + a. if n v i o l > 2, set t/2 «— 1^2 + cn'2, ^ 3 ^ 3 — ais-36 Step 2: (Update.) I f " n h i g h > ' n l o w s e t n v i o 1 <~ n v i o l - 1, n h i g h +- n h i g h - 1, I f " n h i g h = ' n l o w s e t " h i g h <— n h i g h - 1, n l o w <- n l o w - 1, If u n h i g h < ' n l o w s e * n v i ° l *~ n v i o l -f 1, n l o w <— n l o w — 1. Set u p p e r «— l o w e r . If u p p e r > 0, go to 1. Step 3: (Generate another sample point and repeat.) For given sample size N this defines three estimators j=i j=i j=i Similarly, there are the estimators a N a N a N P r 3-3 ' ( 2 ) P r 3-3 A ~(3) P r i - J * # 2^ ^ v " i . 9 i = ] y " 2 . a n d 9i = L i V 3 = 1 j=l j=l where i/j is obtained by the same algori thm using a = J\^^.T dXm+i{A) in place of a. The constant )3 is as in (5). We explicitly remark that the crit ical values I and u have to be calculated just once for each sample point. F r o m these triples of estimators one forms the affine combination of least variance, as before. (The weights \ \ may differ from component to component.) 3 .6 . N u m e r i c a l R e s u l t s The following pages give an overview over the performance of the various methods of sections 2-5 when applied to a small number of low-dimensional problems. Table I reports C P U time, efficiency and accuracy for a sample of size 100, the corre-sponding results for sample size 1000 are in Table II. The C P U time given is the total time required to calculate the estimators p and qk, k = 1, . . . , m , as measured on an A m d a h l 460 V / 8 computer at the University of Br i t i sh Columbia running on M T S . The codes were wri t ten i n F O R T R A N and compiled with the F O R T R A N G compiler. 37 Problem 1 2 3 4 5 6 7 n of d i m 3 3 4 5 10 10 10 V .02831 .19716 .90883 .00667 .96556 .03699 .50779 0.023 0.022 0.031 0.034 0.080 0.103 0.082 S M C 1 1 1 1 1 1 1 .00564 .03284 .12584 .00999 .21413 .01890 .11615 0.023 0.023 0.032 0.035 0.111 0.105 0.086 A V 1.38 3.10 12.9 2.21 33.8 2.99 3.17 .00669 .01184 .03431 .00892 .02319 .00699 .05864 0.028 0.047 0.062 0.039 0.118 0.113 0.121 D M C 1.52 1.89 0.61 5.31 0.71 6.53 1.16 .00343 .01518 .08205 .00551 .22083 .00685 .12030 0.032 0.071 0.091 0.041 0.157 0.147 0.163 D A V 2.65 5.46 10.9 7.15 71.8 15.8 4 .98 .00419 .00523 .00635 .00381 .01189 .00401 .03391 0.099 0.216 0.365 0.208 1.537 1.466 1.606 De 1 4 .27 9.30 44.0 7.78 114.7 27 .7 9.34 .00011 .00389 .00818 .00193 .00322 .00108 .00136 0.147 0.378 0.906 0.387 De 2 9.79 12.3 267.8 36 .0 N . C . f N . C . N . C . .00205 .00311 .00041 .00060 0.126 0.283 1.189 0.659 De 3 5.43 9.90 103.0 39 .8 N . C . N . C . N . C . .00298 .00145 .00334 .00080 0.036 0.027 0.039 0.049 0.275 0.173 0.183 Sz 0.48 2.92 * 0.97 * 0.69 * .00613 .01151 .01811 .00944 .00056 .02153 .03024 0.420 0.680 1.104 1.041 4.854 6.693 5.996 Hy 1 1.63 7.76 1328 . 1.87 > 1 0 6 8.14 60 .8 .00068 .00116 .00011 .00141 .00050 .00069 .00201 0.772 1.270 3.053 3.666 H y 2 2.53 8.79 2177 . 4 .35 N . C . N . C . N . C . .00102 .00150 .00038 .00012 0.545 0.877 4.123 7.176 H y 3 2.00 8.04 3417 . 4.21 N . C . N . C . N . C . .00242 .00180 .00029 .00080 j"Not computed * Zero sample variance T a b l e I . R e s u l t s f r o m a s a m p l e o f size 100 . Problem 1 2 3 4 5 6 7 n of d i m V 3 .02831 3 .19716 4 .90883 5 .00667 10 .96556 10 .03699 10 .50779 S M C 0.016 1 .00299 0.162 1 .01284 0.222 1 .04550 0.252 1 .00517 0.573 1 .03063 0.559 1 .00599 0.565 1 .02550 A V 0.167 1.43 .00269 0.173 2.48 .01184 0.243 9.64 .02337 0.261 2.20 .00283 0.650 34.8 .00844 0.582 2.05 .00599 0.606 3.28 .01521 D M C 0.217 1.84 .00068 0.415 1.52 .00547 0.533 0.50 .01342 0.297 2.63 .00088 0.352 0.64 .05903 0.905 5.51 .00312 0.984 0.92 .03416 D A V 0.257 2.90 .00121 0.648 3.73 .00245 0.833 9.89 .00829 0.319 4.30 .00142 1.333 81.3 .00377 1.240 9.94 .00147 1.395 3.63 .01372 De 1 0.935 3.56 .00063 2.101 5.74 .00054 3.582 27.7 .00180 1.995 4.24 .00029 15.068 79.4 .00077 14.190 11.2 .00036 15.737 5.38 .00242 De 2 1.394 6.61 .00062 3.722 8.69 .00046 8.995 198.2 .00030 3.786 12.3 .00031 N . C . N . C . N . C . De 3 1.200 4.78 .00099 2.766 6.28 .00110 11.816 77.0 .00046 6.498 16.2 .00019 N . C . N . C . N . C . Sz 0.232 0.64 .00317 0.212 0.83 .00586 0.265 27.3 .01002 0.395 0.88 .00160 0.868 * .00049 0.809 0.77 .00498 0.745 23.9 .01006 H y l 4.138 0.93 .00061 6.673 4.89 .00064 10.936 717.5 .00024 10.317 0.97 .00022 46.474 > 10 8 .00050 66.158 3.18 .00024 59.006 34.0 .00095 H y 2 7.598 1.85 .00028 12.545 6.11 .00019 30.425 1566. .00026 36.549 1.46 .00019 N . C . N . C . N . C . H y 3 5.407 1.42 .00077 8.764 5.06 .00059 40.741 1694. .00007 71.789 1.69 .00019 N . C . N . C . N . C . f Not computed * Zero sample variance Table II. Results from a sample of size 1000. To compute the coefficient of efficiency, a weighted average r of the observed variances is found for each method and mult ipl ied by the C P U time. The efficiency coefficient is then the ratio of T-SMCAEI where x denotes the different sampling methods. Accuracy is measured as the Loo - d i s t a n c e of the estimator (px,qx) from its true value (p, q). The abbreviations used in the tables are S M C - Simple Monte Carlo A V - Anti thet ic Variables D M C - Simple Monte Carlo applied to decomposition (4) D A V - Anti thet ic Variables applied to decomposition (4) De k - Deak's method, using k vectors at a time Sz - Szantai's method H y k - Hybr id method, using k vectors at a time. Simple Monte Car lo is expectedly worst for all problems, while the hybr id method is characterized by extremely slow computing times. This may be due i n part to the sorting algori thm used i n determining the vectors I and u of cri t ical values. U p to 15% of total C P U time was spent i n the sorting routine V S R T A of the I M S L library, which uses a quick-sort algori thm. A self-contained merge-insertion algorithm [83] might do better i n some instances. Moreover, the one-dimensional integrations could be accelerated by computing only a table of reference values and interpolating whenever a new value is needed. Szantai's method is comparable in performance to Deak's procedure on sets of large probabil i ty mass, but is markedly inferior on small sets. The reason is that for sets of large probabil i ty mass almost al l of the information is captured in the one- and two-dimensional approximations, and the sampled part of p and qk in (7) and (9), respectively, is practically negligible. Four of the coefficients of efficiency could not be computed because none of the sample points contributed any information. Consequently the sample variance was zero. For small sets, on the other hand, Szantai's method is not much more accurate than simple Monte Car lo , and its efficiency coefficient is lower because there are more computations to be performed. The problems were selected such that exact values could be obtained easily. In par-ticular, this meant choosing independently distributed components and using the 15-digit 40 normal tables of the Nat ional Bureau of Standards [95]. The integration routines used were the single precision I M S L routines M D C H I , M D N O R and M D B N O R for integrating the ci i i -dis tr ibut ion, univariate normal and bivariate normal, respectively. Round-off errors i n the integration routines account for the relatively large errors for Szantai's method and the hybr id method on Problem 5. The contribution of the sampled part for this problem is on the order of 1 0 - 6 , so the sampled part should be accurate to at least five decimal places. O n the other hand, the accuracy of routine M D B N O R is only .00001, which, due to repeated application, contributes an error of .00050 to the conditional probability. (No effort was made to detect and use independence in these test problems.) The estimators qk only require one-dimensional integration and are accurate to at least five digits. Further testing is clearly indicated, but at this stage it seems best to use Deak's method whenever the conditional probabili ty can be expected to be small , and the hybr id method whenever the probabili ty mass is likely to be large — as would ordinari ly be the case i n applications to chance-constrained programming. The best value for the parameter k i n each case seems to be close to m/2. For even larger probabil i ty mass it might be advantageous to dispense with sampling altogether, and concentrate instead on integrating the two-dimensional marginals wi th only a small error. 3.7. Conditi o n a l Expectation Up to now the emphasis has been on determining 'good' estimators p and qfc for the numerator and denominator of Formula (3). The real interest, however, lies i n the ratio qk/p, so it is natural to ask about statistical properties of the quotient qk/p for the various estimators, in particular about its mean and its variance. The first unpleasant surprise is the fact that the quotients are not unbiased [16]. Using a Taylor expansion about the point qk/p, it is not hard to verify that where aoo — v a r(p), cT0fc = cov(p,qk)- The bias can be reduced (but not eliminated) by (12) 41 forming the estimators Qk 1 Var (p )^ - - Cov(p,qk) P (13) P P Expand ing this expression into a Taylor series shows that P It should be noted that i n Formula (12) the true variance and covariance are used, while the quantities appearing in (13) are their estimates obtained from the sample. Fur-ther improvements may be effected by retaining higher order terms i n the Taylor expansion as well as higher order sample moments. The possible gain is not easily assessed, however, and storage and computation requirements would be increased considerably. Approximate formulas for the variance of the different estimators can also be obtained by Taylor expansion, namely Var 9fc o~kk *°~ok h 0"oo — + 0(N-') and Var(f ) = — P o~kk — O-QO + 0{N~2). This variance inflation poses a serious problem, i n particular i f p is small , and cannot be eliminated even i f p is known wi th certainty. O n the other hand, rectangles of small mass contribute l i t t le to the overall bounds in Formula (1), and perhaps the accuracy requirements may be relaxed a l i t t le i n this case. 3 .8 . E x t e n s i o n s It may sometimes be desirable to find conditional information on polyhedral sets other than n-dimensional rectangles. For instance, a decomposition of the sample space into (possibly unbounded) simplices improves the computation of the bounds in (1) and may be preferred i n some situations. 42 V Let therefore a (non-empty) polyhedral set A = {x \ a < Mx < 6} be given, where some or al l components of the vector a may be set to — oo, s imilarly b may contain cer-tain components equal to -foo. We wi l l assume that the rows of M are pairwise l inearly independent and again seek to determine the quantities p = JA dF, qk = JA zkdF', where z ~ A/"(0,S). This problem is very similar to the previous problem on m-dimensional rectangles, since the quantity y = Mx is normally distributed wi th mean 0 and covari-ance matr ix X = M S M T , which may, however, be singular i f the number of constraints exceeds n . Thus S may not have a Cholesky decomposition; instead the Cholesky decom-posit ion of S should be used to form the matr ix ML. F rom then on it is smooth sail ing, al l the techniques and methods of sections 2-5 wi l l go through provided L is replaced by ML i n all formulas. Sampling should sti l l be done from the m-dimensional normal dis t r ibut ion and the uniform distr ibution on Sm where appropriate. Other multivariate distributions for which conditional information may be of interest include the mult i lognormal [72] and mult igamma [112] distributions. A random vector z — ( z x , . . . , zm) has a mult i lognormal distr ibution if and only if the random vector ln z = (ln z\,..., ln zm) has a mult inormal distr ibution. Hence it is possible to work wi th the vector l n z instead, and all the previous results can be used. Another interesting distr ibution is the mult igamma distr ibution which has seen some application i n chance-constrained stochastic programming problems [111]. B y a suitable scaling transformation an arbitrary univariate gamma distr ibution can be reduced to stan-dard form which is defined by the density /(*) = r^rT* 0-1 -z e T(6) Hence we can assume without loss of generality that the multivariate gamma distr i -but ion is such that all the one-dimensional marginals have standard form. We shall write z ~ T(6) i f z has a standard gamma distribution wi th parameter 6. A well-known fact about the standard gamma distr ibution is its additivity,- more precisely, i f z x ~ T(6i), Z2 ~ T(^2)) a n d z\ is independent of z 2 , then z\ + z 2 ~ + 62). 43 Deak's method no longer applies to the computation of the conditional probabili t ies, since the decomposition of Formula ( 4 ) is not available any more, but Szantai's method can s t i l l be used, and it is easily adapted for computation of the qk as well. To that end one needs to develop algorithms for the evaluation of univariate and bivariate gamma distr ibu-tions i n Formula (6). Once this has been done, the quantities p and qk can be estimated by a simple Monte-Car lo method in three different ways and the affine combination of least variance can be formed. The univariate case is easy: there are efficient l ibrary routines available to compute e zdz, and for the conditional expectation one notes that Jo T(0) J0 9T(9) J0 T{0 + 1) Modulo the factor 6, this is the standard gamma distr ibution wi th parameter 9 + 1. For the bivariate distr ibution F{a1,a2) = / J * 1 J^2 f{zt, z2 )dz2dzx, Szantai uses the decomposition of Z\ and z2 into three independent standard gamma distributions sci , x2, x3 wi th suitable parameters #i , 92, 93, respectively, in the form Z\ = X\ + x2 z2 = Xi + x3 and obtains the series expansion [130] o o f(zuz2) = / 0 1 + , 2 ( 2 l ) / , 1 + , 3 ( z 2 ) ^ [G{r,9u92,93)L^-\z1)Ler^-l{z2)} , r = 0 F(z1,z2) = Fei+02(z1)Fei+e3(z2) o o + f0l+e2+i(zi)f9l+e3+iMj2 [B{r,91}B2,93)Ll^{zy)Ll^{z2)] , r=l 4 4 where Fe is the standard gamma distr ibution wi th parameter 0, r(0! +r )r (0 1 +0 2 )r(0! + 0 3 ) G(v,eu92,9s) = H(r,9u92,93) = ^(01)T{91+92+r)T(91+93+r), ( r - 1 ) ! M ^ i ^ ) r fc(0j + 92,r)h(91 + 0 3 , r ) ' wi th fc(0, r) = 0(0 + 1)(0 + 2) • • • (0 + r ) , Z^(.) are the generalized Laguerre polynomials [62] defined by Xftsc) = 1, Zf(a;) = 0 + 1 - a? and the recursion (r + l)L°+1{x) - (2r + 0 + 1 - * ) £ * ( * ) + (r + 0 )£?_ 1 ( a ; ) = 0, r = 1 , 2 , 3 , . . . A conditioning argument can be used to derive expressions for ai a.2 a-i a 2 J J z1f{zuz2)dzxdz2, J j z2f(z1,z2)dz1dz2, 0 0 0 0 based on the fact that the random variables z\ and z2 are conditionally independent given the value of x\. Therefore one obtains Q>\ 0 ,2 0 .1 0,2 ° 0 J J z i f i z i , Z 2 ) d z 2 d z 1 = j j z1f(z1,z2) J fe^x^dxidz^Zi 0 0 0 O O dx 0 ,2 J J J z i f ( z i i z ^ ) f e 1 ( x i ) d z 2 d z 1 d x 1 0 0 o o a a,20 0 0 dx/\b\ O i — X \ <L2~&2 III ^1 + X2^jf02^X2^03^X3^ei^Xl^dx3dx2dx:i 0 0 0 <i\Ab\ a~i — Xx a,2 — * C 2 = J J J xife1{xi)fo2{x2)fe3{x3)dx3dx2dx1 0 0 0 CLiAb\ 0-x—Xx 0,2^X2 i l l X2^X2^9^Xl^03^X3^dX3dX2dXl 0 0 0 45 Si J J J fsi+^(xi)Mx2)fe3{x3)dx3dx2dxi 0 0 0 + 02 J J J f02+i(x2)fe1{xl)fe3(x3)dx3dx2dxi 0 0 0 = 01F'{a1,a2) + 82F"(a1,a2), where F' is the distr ibution of the random vector zl ~ xl + x2 z2 = x\ + xz, x\ ~ Y(QX + 1), and F" is the distr ibution of 4' = s i + ^ 3 , a-2' ~r ( t? 2 + 1). Simi lar ly one can derive the formula f J z2f(z1,z2)dz1dz2 = ^ ^ ' ( a ^ o a ) + ^ ' " ( d , a 2 ) , 0 0 where F'" is the distr ibution of z"' = x i + x 2 4" = • + x 3 " , x 3 " ~ r(6? 3 + 1). The same idea can be used to derive values for expressions of the form ai a2 0 0 J J J z3f(zi,z2,z3)dz3 dz2 dzi, 0 0 0 where f(zi,z2,z3) is the joint density of a trivariate gamma distr ibut ion which can be decomposed into independent standard gamma distributions X{ ~ r(#;) as follows: Z\ — Xl + x 4 + x5 + Xi z2 = x2 + Xi + x6 + x-j (14) 2 3 = X3 + X5 + X 6 + £ 7 . 46 Condi t ioning on the values of x 4 , £ 5 , XQ^ x-j and simplifying, one obtains tti tt2 OO Z3 f(zi, Z2, z?,)dzz dz2 dzx 0 0 0 = 0 3 F 3 ( a i , a 2 , o o ) + #5^5(0 , ! , a 2 , oo ) + 0 6 F 6 ( a i , a 2 , 0 0 ) + 07F7(ai,a2, 0 0 ) , where stands for the cumulative trivariate gamma distr ibution having the same decom-position structure as (14), when X{ is replaced by x\ ~ r(0; -f- 1). The integration in the z 3 -direct ion is over the entire support of the random variable and can thus be suppressed by working with the two-dimensional marginal dis tr ibut ion of (zi,z2). After suitably aggregating the components of z;, this yields the expressions F 3 ( o i , o 2 , c » ) = ^ ( ^ , 0 2 ; 04 +07,01 +05,02 + 0e), ^ 5 ( 0 1 , 0 2 , 0 0 ) = F 5 ' ( a i , o 2 ; 0 4 + 07,0i + 0 5 + 1 , 0 2 + 0 8 ) , F 6 ( 0 i , 0 2 , 0 0 ) = ^ ( 0 1 , 0 2 5 04 + 07,01 + 05,02 + 06 + 1), F 7 ( o i , O 2 , O 0 ) = ^ ( ^ ^ 2 5 0 4 + 07 + 1,01 + 05,02 + 0 6 ) , where, e.g. F 5 is the cumulative distr ibution of z[ = y[ + y'2 z'2 = y[ + y'3, y l - r ( 0 4 + 0 7 ) , y'2 ~ r (0 i + 0 5 + 1 ) , y 3 ~ r ( 0 2 + 0 6 ) . 47 Chapter 4. A n algorithm m n 4.1. Introduction The previous chapters have given important convexity results and bounds for the multistage stochastic linear programming problem n < C i X i + E ^ 2 min I c 2 x 2 + E* 3 (minc 3 a ;3 + • • • + E ^ T m i n c T c c T ) ) > ^ L 1 2 \ X 3 X T / J J s.t. A\Xi = bi B2xi + A2x2 = 6 B 3 x 2 + A 3 x 3 = 6 W BTXT—I ~t~ ATXT — £T xt>0, k = l,...,T. In the above formulation, decision variables xt-i, Xt-2, • • • , xx do not explici t ly con-tribute to the restrictions on x t + x . This Markovian structure of the constraints permits the application of decomposition methods and does not really constitute a loss of gener-ality, as wi l l be explained i n Section 7. We wi l l restrict consideration to discrete random variables £t with a finite number of realizations and corresponding probabilities pkt for t = 2 , . . . , T . If £t does not have this property, one could discretize it as explained i n Chapter 2. This chapter describes one possible algori thm based on Benders' decomposition [2]. Expos i t ion of the algori thm is done under the following simplifying assumptions: (i) The constraint matrices At, Bt and cost coefficients ct are deterministic, (ii) the random right hand sides exhibit period-to-period independence. B o t h assumptions can and wi l l be removed in Section 7. Their main purpose is to simplify notation. Now (1) can be reformulated as the equivalent (deterministic) large scale linear pro-48 gramming problem m i n c l E l + £ pk*c2xk> + £ £ p j ' p j ^ j " * 3 + k2 = l fc2 = l k3 = l s. t. ylia?! B2xx + A 2 ^ 2 2 s 3a!{ 2 + A3X\ + L p2 ... pTT cTxT* k2,...,kT k2,...,kT — S2 5 A=2 = 1, ik^k\ k2 = i,...,K2 k3 h---,K3 < = 2 , . . . , r (2) In other words, one entire set of decision variables is introduced for each node in the decision tree, a sample of which is given in Figure 4. The explicit bounds on the decision variables ensure that problem (2) wi l l never be unbounded. In light of the fact that the problem wi l l be solved on a computer, it is permissible to set them equal to machine-infinity. scenario root(unlabeled) (I.I.I ( 3 .4 ) 3.4.2) Figure 4. A sample decision tree. A similar model was studied by Birge [10, 15], wi th the added restriction that the components of the last-period random vector (T are independent of each other. The present 49 paper follows Birge's approach fairly closely, but the last period is handled in a completely different way to avoid the independence restriction. In addit ion, several enhancements are introduced which increase the efficiency of the algorithm. A quadratic version of the multistage stochastic programming problem was studied by Louveaux [86], and O ' N e i l l [106] and Noel and Smeers [100] extended this to a general convex objective function. The last two references use a dual approach, while Birge [15] and the present paper work directly on the pr imal problem. Since the values of the p r imal variables are the main object of interest in order to implement the solution i n practice, a p r imal method may be better suited, because it dispenses wi th the need to recover the pr imal variables from the opt imal solution after the problem has been solved. The deterministic linear multistage problem has been studied extensively, beginning wi th the works of Dantzig and Wolfe [33], Glassey [56], and Ho and Manne [68]. Other works that contributed to the theory are three recent Stanford P h . D . theses by Abraham-son [1], Wi t t rock [148] and Scott [125]. A number of their results wi l l be used in the main body of the paper. The structure of this chapter is as follows. Section 2 briefly describes Benders' decomposition i n the two-period case. This section is included to fix some of the notation and to present the basic ideas of decomposition which w i l l appear again later on. In Section 3 the algorithm is extended to the multi-stage case by nesting various levels of decomposition. Some of the finer points of the algorithm are presented in the next two sections. Under the assumptions of the model, many linear programs have to be solved which differ only i n their right hand side. One way of doing this is described in Section 4. Section 5 is concerned wi th storage requirements of the algorithm and shows how a matr ix inversion can be avoided at the cost of additional storage. Numerical results are presented i n Section 6, where the algori thm is compared to the earlier one by Birge [15] and to results obtained from an all-purpose LP solver. F ina l ly , 50 Section 7 indicates how one might go about removing some of the assumptions made above and points out possibilities for further improvements in the algori thm. 4.2. Decomposition and the recourse problem Before describing the method i n full generality, it pays to have a brief look at the special case T — 2, which is also referred to as 'stochastic programming wi th recourse' i n the literature. Here the problem is to Uixx + J^pkc2yk AlX = b,B2x + A2yk = (k,h <x< u u l 2 < yk < u2,k = U — 1 ' K min< ^ ^ 'fc=i = m i n {c\x + Q(x)\A\x = b, l\ < x < ux} (3) = m i n {cix + •d^Aix = b,li < x < u i , t ? > Q(x)}, where we have set Q(x) = m i n | £ P k ^ y k \ A 2 y k = £k - B2x,l2 < y < u2J . (4) For a given x, Q(x) can thus be found by solving the linear programming problem (4) or its dual K max £ pk [n-k{Ck ~ B2x) + Xkl2 - VkU2] s.t. TrkA2 + A f c - pk = c2, k = l,...,K (5) Afc, fik > 0 nk unrestricted. It should be noted here that the variables A j , fij i n (5) can always be chosen in such a way that the problem is feasible and that the dual decomposes into K smaller problems max7r f c(£ f c - B2x) + \kl2 - \iku2 S . t . + Afc - fJ-k = c2 (6-fc) Afc, pk>0 •Kk unrestricted. 51 It has been shown in Chapter 1 that Q is convex and piecewise linear. The decomposition method replaces the constraint d > Q(x) in (3) by a partial description which is updated as more information becomes available. If the dual is unbounded, then a feasible direction (cr*, A*,/x*) and an index k can be found for which o~*(£k — B2x) + A * / 2 — P-*u2 > 0. Since unboundedness of the dual implies infeasibility of the pr imal , the constraint <T*(£* - B2X) + A * / 2 - fM*U2 < 0 should be added to the constraints i n (3) in an attempt to induce feasibility. If on the other hand the dual has an opt imal solution ( 7 r * , A * , / / * , . . . ,TT'K, A^ , / i# - ) , then convexity of Q implies that K Q(x) >YJPk Hit' - B2x) + \ll2-nlu2] for al l x, wi th equality holding at x. Thus i f a finite collection of dual feasible vectors (^i»M>/4i--->^JoAJo/ij<-)» * '= 1>---iJ a n d directions {(r3k{j), A £ ( j ) , / z J f c ( j ) ) , j = 1,...,J has been found, then the LP problem m i n CiX\ + i ? s.t. AiXi =b (7.1) Ep f c 4^«i + # > T,pkHtk +Kh -n[u2}, » = I , . . . , J (7.2) k=l k=l <ri{j)B2Xl > a i ^ ^ + \jk{j)l2 - a{{j)u2, j = l,...,J (7.3) ll < X < T i l is a relaxation of (3) such that every opt imal solution (x*,d*) satisfies 1 ? * < Q(x*). Final ly , i f (a*,t?*), ( 7 r * , A * , / i * , 7 T 2 , A ^ / i j , • •. , 7 r ^ ) ^loA-'/r) a r e opt imal for (3) and (5), respectively, with K = ^2pkTrl(e ~ B2x*) + Xkl2 - vku2, k=i 52 then x* is an opt imal solution to the first stage of the stochastic recourse problem and can be extended to a full solution by a suitable choice of where k = 1 , . . . , K. This is a specialized version, due to Van Slyke and Wets [131], of Benders' decompo-sition method. The algorithm is referred to in the literature as the L-shaped method and can be summarized as follows. Step 0. Solve the master problem min c\x s. t. A\x = b li < x < U l . If infeasible, STOP. Set J = J = 0, i?* = —oo. Let the optimal solution be x*. Step 1. Solve the subproblems k min ciy s. t. A2yk = £ f c — B2X* h <yk < u2. If any of the subproblems is infeasible, add a feasibility cut (7.3) to the master problem. Set J <— J + 1 and go to 2. Else let itifc be the optimal value of the kth subproblem. If d* > £ ^ = 1 wl, STOP: x* is optimal. Add an optimality cut (7.2) to the master problem. Set I <— I -\- 1 and go to 2. Step 2. Solve the augmented master problem (7). If infeasible, STOP. Let the optimal solution be (s*,i?*) and go to 1. 4.3. Nested decomposition The approach for T > 2 is essentially the same. A first application of Benders ' decomposition leads to the same master problem (3) as before and to the subproblems 53 ^2 I k3 ^21^3 i i ^3 fc2,...,fcr m m c2x22 + £ P3 c 3 £ 3 H h £ P3 • • • PT CTXT k3 = l k3,...,kx s.t.A2x22 = ^ 2 - B 2 X l B3xk22 +Asxk"k3 k3 = l,...,K3 (8-k2) BTxT2-'kT-1 +ATxkT2'-'kT =(kT2'-'hT, kt = l,...,Kt, t = 3 , . . . , T lt < xk2'-'kt <ut. Each of these subproblems is of course another multi-stage stochastic programming prob-lem wi th T — 1 time periods and (by now) known first period right hand side £22 — B2X\. In particular, the coefficient structure for ( 8 - ^2 ) is the same as for (2) , and so Benders ' decomposition can be applied once more. This leads to K2 second period master prob-lems and a number of T — 2 stage subproblems. B y repeated application of the same idea one decomposes the original problem (2) into a collection of ordinary linear programming problems, al l of which have the form m i n ct x*1 + ti*' s.t. Atx^ — KZ Pt+i^kBt+ix^ k=i > £ Pt+ikUt+i + xUt+i - »iut+i), » = 1 , . . o-i{j)Btx*> > ..,J(ht) h < x*' < ut (9) Here ht is the history (k2, •.., kt) of a particular scenario and h't is the history of its ancestor, namely (fc2, • • • ,kt-i). (If / = J = 0 we add the constraint = 0 to ensure nontr ivial i ty of (9). This constraint w i l l never be lifted for t = T.) Aside from any cuts generated during the solution process, al l problems at the same time stage share identical constraint matrices and cost coefficients, differing only i n the right hand side. This fact w i l l be exploited i n the t r ickl ing down procedure described i n the next section. 54 There are two ways in which the various problems communicate wi th each other. P r i m a l solutions are passed down from a problem (k2,- •. , & t - i ) a*. stage t — 1 to al l its descendant problems (k2,..., kt-\, kt), kt = l,...,Kt at stage t in form of a new input vector x k 2 ' w h i c h modifies the right hand side. Dua l solutions are passed up from stage t + 1 to the corresponding stage t ancestor problem in form of cuts. It remains to determine a sequencing protocol to fix the order in which the subproblems are solved. As i n the deterministic case studied by Scott [125], there are a number of possible strategies. The simplest one is Wit t rock 's [148] fast forward-fast back method where starting from stage 1 the problems at all stages are solved i n sequence, passing p r imal information down the decision tree — a chain i n the deterministic case. Once period T is reached the direction is reversed and cuts are created backward i n time. W h e n stage 1 is reached again, the cycle repeats. Termination occurs (after a finite number of steps) when no new cuts are generated during one cycle. The "tactical conditions" of [125] apply, stating that a period, and a fortiori a par-ticular subproblem, have to be re-solved only after new information has become available, either i n form of a cut being passed by the successor problems or by a new right hand side generated from the solution of the ancestor problem. Moreover, the decision is arbi trary at each stage -— except of course at t = 1 and t = T — in which direction information is to be passed. To keep the overhead to a min imum, we wi l l be looking at three modifications of the fast forward-fast back method which incorporate the above remarks and choose a sequence through the decision tree i n such a way that all problems at stage t are solved before a decision is made whether to move to stage t + 1 or to stage t — 1. Once stage t is solved, we can either (i) explore stage t — 1 first, moving to t + 1 only after a backward move is impossible because no new cuts are generated or (ii) explore t +1 first, moving back in time whenever all current solutions are opt imal and no new right hand sides are forthcoming or 55 (iii) alternate the two approaches, switching from one to the other whenever a move in the current direction — up or down — becomes impossible. Since the number of subproblems and hence the amount of work at each stage increases i n t, method (i) can be thought of as a "path of least resistance", that is, the next stage chosen is always the one which requires the least amount of work. Version (ii) explici t ly solves al l subproblems to optimali ty before passing a cut back to the ancestor problem. Version (iii) is the original fast-forward-fast-back method of [33], designed to exchange information between al l stages as fast as possible. In all cases we start wi th stage 1 and stop whenever the process of creating cuts has terminated. Other sequences are possible, but their implementation is complicated by the presence of many scenarios at the same stage, which requires a large amount of memory to keep track of which subproblems have been solved and which ones have not, i f jumping from stage to stage is permitted. In particular, the "largest discrepancy attack" of [125], which works quite well i n the deterministic case, appears a bit cumbersome. A very important observation at this point is the fact that so long" as I = J — 0, al l problems at one stage are identical except for the right hand sides. This is true for stage T throughout the algori thm and is true of the other periods unt i l cuts are being added. A n y efficient implementation of the algorithm wi l l have to make use of this fact. Several methods exist i n the literature for solving an L P problem wi th multiple right hand sides, see e.g. [50], [132], [146]. The sifting method of Garstka and Rutenberg [50], which was used i n Birge's algori thm [15], assumes independence of the components of the random vector of right hand sides and is thus not applicable here, but the bunching method of [146] is. 4.4. Trickling down L P problems with multiple right hand sides The basic theme exploited in the bunching method is the obvious fact that many subproblems at the same stage may share the same bases, both at the op t imum and at intermediate steps i n the pivot sequence of the simplex method. A brief description of a 56 possible implementation of the procedure follows. The version given here is designed to minimize storage requirements. To start the procedure an arbitrary right hand side is chosen from among the collection of problems that have to be solved at the current stage. This problem is referred to as the paradigm problem for the time being. If it is opt imal — or i f a phase I method shows it to be infeasible — the problem is marked ' D O N E ' , its status — opt imal or infeasible — is recorded, and a new paradigm is chosen. Otherwise, one simplex pivot is performed on the paradigm problem. A l l unclassified problems are then inspected to see, if the same pivot step can be performed on them. Since the cost coefficients are deterministic, the entering column can be used, and it is only necessary to check whether the leaving variable w i l l be the same as for the paradigm problem. If yes, the pivot step is taken, noting that the new basis is known already from the paradigm problem. If on the other hand the pivot step is not legal, we make a note of this fact together wi th the last legal basis and come back to this problem later on. If we revisit problems i n reverse order to their breaking off from the pivot sequence of the paradigm problem, it is clear that al l the information required to store at any one time lies on a chain of bases, each one simplex pivot away from the other. A hypothetical example wi th ten right hand sides may illustrate this. A t the start al l ten problems share the same basis, e.g., the al l artificial basis. Selecting problem 1 as the paradigm, we perform one pivot step. Let us assume that this is a legal pivot for problems 2, 3, 4, 5, 6, 7, 8, but not for problem 9, and that the current basis is opt imal for problem 10. A t the next pivot step we may find that problem 2 is opt imal , problem 6 is infeasible, and that the pivot step is not legal for problems 3 and 5. Now problem 1 might be opt imal , so that a new paradigm problem has to be chosen. Let us say we choose problem 4 and find its optimal solution i n two more pivot steps which are specific to the paradigm problem and illegal for all the others. This entire chain of events could be summarized i n the following list. 57 Level Parad igm Problems wi th same basis Don' t pivot Classified 0 1 2,3,4,5,6,7,8,9,10 — — 1 1 2, 3, 4, 5, 6, 7, 8 9 10 (opt.) 2 1 4, 7, 8 3, 5 2 (opt.), 6 (inf.) 3 4 7, 8 — 1 (opt.) 4 4 — 7, 8 — 5 — — — 4 (opt.) A t this point there are no further current problems, so we move back up i n the chain and find problems 7 and 8 at the next to last pivot step. The corresponding basis is retrieved, and the pivot chain is altered to this form: Level Paradigm Problems wi th same basis Don' t pivot Classified 0 1 2, 3, 4, 5, 6, 7, 8, 9, 10 — — 1 1 2, 3, 4, 5, 6, 7, 8 9 10 (opt.) 2 1 4, 7, 8 3, 5 2 (opt.), 6 (inf.) 3 7 8 — — Once problems 7 and 8 have been classified as either optimal or infeasible, one moves up the chain again, finding problems 3 and 5 at level 2, and finally one deals wi th problem 9. It is apparent from the example that the right choice of paradigm problem may greatly influence the performance of the algorithm. In order to minimize the total number of pivot steps, "unnecessary" pivots have to be eliminated. How can this be done? It is known that the set of a l l possible bases induces a part i t ion of the space of right hand sides into disjoint convex polyhedral cones with adjacent subsets differing by one pivot step. Figure 5 shows a simple three-dimensional example. Since the right hand sides arise out of a probabili ty distr ibution of a random vector, one may reasonably expect to see them concentrated near the mean. Thus i f the first paradigm is chosen close to the mean, its pivot path and opt imal basis may be shared by some problems, and a number of other bases may be close by, so that these problems wi l l 58 Figure 5. Optimal basis decomposition in IR3. split off far down the pivot chain for the paradigm problem. Thus it seems to make sense to order the problems by increasing distance from the mean i n some norm, for example the Li, L2, L o o - n o r m or the norm induced by the covariance matr ix of the right hand sides and to always choose the paradigm problem as close to the mean as possible. Some prel iminary work on this question has been done by Wallace [138]. The algori thm can be accelerated further if the first paradigm is not chosen "cold" -which may necessitate phase I steps for some or all of the problems. Instead one problem is 59 4. solved explici t ly beforehand so that an opt imal basis is known. Because the cost coefficients are deterministic, this basis wi l l be dual feasible for every possible right hand side, and so dual pivot steps can be used in the tr ickling down procedure. Computa t ional results indicate that this leads to an expected reduction in the C P U time of about 10% over the version using pr imal pivots. 4.5. Storing Inverses More can be done. If a subproblem is revisited immediately after a cut has been placed i n i t , then the previously optimal basis — which sti l l satisfies al l constraints save the last one — can be augmented by including the slack variable for the newly created cut among the basic variables. This basis is infeasible, but the only possible leaving variable has been identified, and dual pivot steps can be used to move towards an opt imal basis. In many instances the opt imal basis is reached in one step, and the variable entering the basis is another slack variable for a different cut. It is further possible to store the previous basis inverse, which may save an explicit matr ix inversion. Let us write the subproblem i n generic terms as m m qy s.t. My — r, and suppose that an opt imal basis B and its inverse B 1 have been identified. Then the cut fy + s = d is generated to form the augmented subproblem m m qy s.t. My = r fy + s d. Then it is easy to augment B to a starting basis B given by B = B 0 / B 1 60 where fs denotes the components of / corresponding to the columns basic i n B. The inverse is readily computed to be B'1 = B~l 0 / B B - 1 1 Since B~l is known, B~x can be computed without re-inverting. However, B~x ordinar i ly is not stored explicitly. What is available instead is a starting basis Ba = LQUO, where Lo and Uo are lower and upper triangular matrices, respectively, together wi th some updates Cfc corresponding to the pivot sequence leading from B0 to B. The starting basis is modified to account for the augmenting row ( / 1) by setting L°-i fBoU0 Uo = Uo 0 0 1 and Bo — LQUQ-Now we turn to the updates. The most popular form of updates are the so-called eta-vectors [28], where each update Ck consists of the column m/ f c of M corresponding to the entering variable at step k, together wi th a pointer to the leaving row. Thus the pointer being unchanged. This updating scheme is also called the product form of the inverse, since B — BoC\C2 • • • Ck-Another device is the Schur complement form of the update. Bisschop and Meeraus demonstrated i n a 1977 paper [23] (see also [54]) that al l the information needed to recon-struct the basis inverse k pivot steps after an inversion can be compressed into a square mat r ix of size no more than k x k, given by the formula Ck = —IkB0~1Vk, where Ik is made up of unit row vectors corresponding to the leaving rows, more precisely, if the ^ h vector of the starting basis was replaced at pivot step j , then the j t h row of Ik consists of the unit vector e^.. The matr ix Vk contains the entering columns i n order of their entry into the basis. In the augmented problem, the same pivot sequence can be used 61 to move from the starting basis BQ to the current basis B, which implies Ik = (Ik 0 ) , since the slack variable s never leaves the basis, and Vk = (/j* Thus Gk = -[h 0] B - 1 i 0" JB»B0 1 Jlk. = ~hB01Vk — Ck for all 4 .6 . N u m e r i c a l r e su l t s The algori thm was coded i n F O R T R A N I V , relying heavily on the linear program-ming code L P M - 1 by Pfefferkorn and Toml in [109] for the L P routines, and Birge's code N D S P [16], which is an experimental code using the algori thm described i n [15]. The program used eta-vectors for the basis updates, although a modification for Schur-complements is available. This version employs the package Q R M O D , which was wri t ten by Michae l Saunders at Stanford University. There are several reasons why eta-vectors are more suited for the small experimental problems which I was working wi th . Fi rs t , the problems are small and sparse enough, such that storage was not a prime concern. Second, Schur-complement updates most certainly require much more overhead and computation to find the update than eta-vectors do. F ina l ly , i n the tr ickl ing down procedure one needs to store not one update, but an entire chain of them. For Schur-complement updates, this leads to storage requirements of 0(fc 3 ) , i f the chain is of length k, while eta-vectors are added at the end of a list , such that al l updates Cj are t r ivial ly contained i n Ck whenever 0 < j ' < k. The resulting program was run on a set of 12 test problems, also supplied by John Birge, which are stochastic versions of the problems appearing in Ho and Loute [67]. Table III shows the general characteristics of the problems, such as the number of periods, number of right hand sides at each stage, the size of each stage, the size of the equivalent linear programming problem (2) and the optimal value of each problem. Computa t ional results for the two-period problems are recorded i n Table I V . The first figure gives C P U time in seconds, excluding input /output , the second figure shows the 62 Name # per # rhs Size Equiv . LP Value SC205/2 2 1-8 13x14,22x22 187x188 -60.4 SCRS8/2 2 1-8 28x37,28x38 252x341 123.4 SCSD8/2 2 1-8 10x70, 20x140 170x910 16.0 SCAGR7/2 2 1-8 15x20, 38x40 319x340 -832903.5 SCTAPl/2 2 1-8 30x38, 60x76 510x646 240.5 SCFXMl/2 2 1-8 92x114, 148x225 1276x1914 2877.6 SC205/3 3 1-2-4 13x14, 11x11, 11x11 123x124 -60.4 SCRS8/3 3 1-1-8 28x37, 28x27, 28x38 290x378 123.4 SCSD8/3 3 1-2-4 10x70, 10x70, 10x70 110x770 16.0 SCAGR7/3 3 1-2-4 15x20, 19x20, 19x20 205x220 -832903.5 SCTAP1/3 3 1-2-4 30x38, 30x38, 30x38 330x418 240.5 SCFXM1/3 3 1-2-4 92x114, 82x99, 66x126 784x1320 2877.6 T a b l e III. C h a r a c t e r i s t i c s o f the test p r o b l e m s . number of pivot steps. The total number of subproblems solved is also supplied, except for the MINOS runs, where there is only one problem. Name N D S P M S L i P M I N O S M I N O S on mean SC205/2 S C R S 8 / 2 S C S D 8 / 2 S C A G R 7 / 2 S C T A P l / 2 S C F X M l / 2 0.074/24/36 0.121/26/18 0.113/11/18 1.585/230/90 0.933/64/18 >120/>3000/? 0.037/24/28 0.047/23/18 0.039/11/18 0.328/109/81 0.231/62/18 31.6/1702/693 0.524/84 1.120/148 3.530/359 3.480/411 6.250/438 87.9/2615 0.063/19 0.107/21 0.193/13 0.202/62 0.279/58 3.720/419 T a b l e I V . T h e T w o - p e r i o d P r o b l e m s . The column headed ' M S L i P ' gives the result of the algorithm, using dual steps i n the tr ickl ing down procedure. For comparison the same problems were also solved wi th Birge's code N D S P and wi th a commercial L P solver. We used M I N O S , Version 4.0 [94] on both the equivalent L P problem and the much smaller expected value formulation, where at each stage the right hand side vectors are replaced by their mean. A l l problems were run Name M S L i P (i) M S L i P (ii) M S L i P (iii) M I N O S SC205/3 S C R S 8 / 3 S C S D 8 / 3 S C A G R 7 / 3 S C T A P l / 3 S C F X M l / 3 0.041/30/29 0.048/23/20 0.034/11/16 0.257/135/79 0.233/91/43 61.9/5000/706* 0.040/30/29 0.047/23/20 0.034/11/16 0.482/215/170 0.186/70/45 38.0/2880/693 0.040/30/29 0.047/23/20 0.034/11/16 0.419/203/119 0.184/75/37 60.6/4689/1010 0.292/53 1.170/148 1.790/226 1.760/269 2.680/256 27.7/1363 * - Terminated after 5000 iterations with objective value 2873.54 T a b l e V . T h e T h r e e - p e r i o d P r o b l e m s . on the U B C A m d a h l 5850 computer under M T S , using the F O R T R A N H compiler wi th full opt imizat ion. This is the setting under which the M I N O S object code was produced, and since a M I N O S source code was not available, it seemed the appropriate choice. Table V gives the results for the corresponding three-period equivalents. The col-umn for the expected value formulation is identical to the case T = 2 and was therefore omitted. Miss ing also is the column with results from Birge's code N D S P , which gave nonsensical answers on five of the six problems. Instead there are columns for each of the three sequencing protocols described in Section 3, using once more the dual version of the tr ickl ing down procedure. (Note that for two-period problems all sequencing protocols are identical.) It is not easy to interpret these figures. It seems that nested decomposition may be a viable option for solving stochastic linear programs, sometimes running considerably faster than the simplex method, especially for small problems. Increasing the number of periods sometimes works, due to the smaller size of the corresponding subproblems, but the choice of sequencing protocols is inconclusive. Final ly , the large problem S C F X M l / 3 clearly shows up the limitations of the method. Just what goes wrong in this case I do not know. The bad performance may be due to the effect of round-off errors, as suggested i n Ho [66], but that would have to be studied more closely. Nevertheless, the method sometimes works, and Avhen it works, it works quite well. 64 Figure 6. Loose cuts and tight cuts. 4.7. Further possible improvements In this section we indicate how the presented algori thm might be enhanced further by exploit ing sti l l more of the information generated. F ina l ly we discuss how the simplifying assumptions of Section 1 can be removed. The fact that several subproblems at the same stage may share an opt imal basis has been alluded to before. Since i n such instances the optimal dual variables coincide as well , it may happen that feasibility and optimali ty cuts for different problems differ only i n the right hand side. This behavior has indeed been observed for some of the test problems described in the last section. What this means is that the tr ickling down procedure could be employed even after the generating of cuts has begun. Even i f the dual variables are different, a cut generated for one of the subproblems may also be a cut for another, albeit not the best one possible. This is probably best explained i n a picture (see Figure 6). Here Q depicts the actual recourse function defined by (4), Q its current approximation 65 by the cuts generated so far. Every dual feasible solution to the descendant problems leads to a hyperplane which gives a lower bound on Q. This bound is exact for dual opt imal solutions, which therefore describe supporting hyperplanes. O n the other hand, any hyperplane which separates x*, the current min imum, and the graph of Q constitutes a valid cut and could be used as such. More research is needed to determine i f the gain i n terms of lowered computational effort due to the use of the trickling-down procedure can overcome the loss of information incurred when working wi th loose bounds. So far cuts have been placed into the various scenarios one by one. A recent paper by Birge and Louveaux [18] describes how more information could be generated i f instead of the ancestor problem requesting one aggregate cut from its descendants, each descendant problem offers a cut derived from information about its dual variables. Opt imal i ty cuts thus take the form •n-t+1 R ~ f c 2 , - i* : t , , g f c 2 , _ t + l tki i i t + 1 , ( 7r fc i Ht+lXt +Vt >TTk. ^t+l + Akj 't-rl-M*,- ut+l-Numerical results i n [6] suggest that this device may reduce computational effort to solve the stochastic recourse problem in the case T — 2. The price to pay for the improved information is the increased size of the ancestor problem and the larger storage requirement to hold this information. The same approach could be used for the multistage problem, although the presence of many different scenarios for which cuts have to be generated may lead to excessive storage requirements, even if slack cuts are detected and purged and only the active cuts are kept around. Moreover, the subproblems (9) wi l l be larger and therefore harder to solve. Pre l iminary work showed an increase in computation times of between 10 and 30 percent on some very small problems (where the number of added rows wi l l admittedly have a greater impact on the problem size than on larger problems). Now we turn our attention to the simplifying assumptions made in the introduct ion, indicat ing how one might modify the algori thm in each case. The easiest to dispense wi th is the period-to-period independence of the random vec-tors. A t no point i n the algorithm is any explicit assumption made about the shape of the 66 right hand sides in the various subproblems, so it is simply a matter of storing al l possible right hand sides and a pointer array which for each ancestor problem gives the number of its descendants. The remainder of the algorithm wi l l go through unchanged. The case of stochastic cost vectors is not much harder. In principle there is no problem, but the rules for pivot selection i n the tr ickling down procedure wi l l be complicated by the fact that we can no longer guarantee dual feasibility for all subproblems once an opt imal basis has been identified for the paradigm problem. Consequently, the dual version of the tr ickl ing down procedure loses some of its advantage and pr imal pivot steps come under consideration once more. The constraint matrices could be stochastic as well. If At, Bt depend only on previous random elements £2 > • • • > (,t-i, the situation can st i l l be salvaged. A l l problems wi th the same ancestor st i l l have the same constraint matr ix, up to cuts generated along the way, and the tr ickl ing down procedure can st i l l be used provided its scope is restricted to the descendants of a particular problem, and not as was done in the original version of the algori thm, to all the subproblems at one stage. No other randomness should reasonably be expected for the matrices Bt-i, and even the case of truly random constraint matrices At is hard to conceive i n any meaningful practical problem. Should this occur, however, then each subproblem has to be considered separately and the tr ickl ing down procedure can no longer be used. Nested decomposition is st i l l applicable, in the sense that sequential solu-t ion of al l subproblems (9) according to one of the three sequencing protocols of Section 6 st i l l yields an opt imal solution of (2), but it wi l l probably be better i n this setting to solve the LP problem directly. F ina l ly , a word about non-Markovian problems. For example, the three-stage problem m i n cxxx + c2a:2 + c3x3 s.t. AiXX = 61 B2xx + A2x2 — £2 Cxxx + B2x2 + A3x3 = £3 could be transformed to the form used i n the algorithm by augmenting the state space and 67 introducing auxil iary columns y2 as well as additional rows at the second stage. This gives min c i x j + c2x2 + c3x3 s.t. Axxi = bx B2xi + A2x2 — £2 CiXi + y2 - 0 B2x2+y2 + A3X3 = c]3-This trick is interesting only if C\ contains many zero rows and should normally be used only if dependence does not extend across too many time periods. It was actually employed i n formulating the three-period version of problem SCRS8. 68 Chapter 5. A Forestry Applicat ion. 5.1 . Introduction This chapter applies the ideas developed so far to a problem that arises i n forest management. The objective is to determine a harvesting sequence which maximizes the expected total volume of wood logged from the forest over some time periods to come. Uncertainty enters the model in the form of forest fires and other environmental hazards which may destroy a port ion of the standing timber. Following the work of Reed and Er r ico [114-116] and G u n n [60], the entire forest is differentiated into smaller units by the introduction of age classes comprising all the area which is grown wi th trees of ages between given bounds, regardless to location in the forest. As trees grow older, areas move from one age class to another, wi th new trees of age zero replacing trees that have been logged off. This model is referred to as the Type II model in the forestry literature. Another approach often taken, see, e.g., Lembersky and Johnson [85], Mar t e l l [91], and V a n Wagner [133, 134], is the so-called Type I model, which differentiates the forest into contiguous subunits of smaller area, referred to as stands, which are considered to be indivisible. Neither formulation is entirely free of difficulties: The Type II model contains impl ic i t homogeneity assumptions about growth rates and logging costs which wi l l rarely be satis-fied i n practice. The Type I model, on the other hand, has to assume that it is possible to identify indivisible stands, which wi l l then be logged off or destroyed i n their entirety. The Type II model yields quite naturally to LP formulations, while the Type I model is essentially an integer programming problem. The data used in this chapter describe one particular forest of t imber near Fort Nel -son, B . C . , comprising a total area of nearly 100,000 hectares. The model is idealized i n the sense that it assumes a single species stand of pure spruce (pinea glauca Moench Voss), uniform quality of timber and land, uniform accessibility and logging cost throughout the forest, and constant technology. The forest is discretized into a number of different age 69 classes, and the biological yield is assumed to be a known function of the age of the trees. The transit ion pattern between the various age classes is simple, a tree either moves on to the next age class at the end of each period or it gets logged off or destroyed, i n which case a new tree grows i n its place i n age class 1. The width of an age class is assumed equal to the length of one time period, taken to be twenty years so as to keep the size of the problem manageable. This leads to eight age classes including one absorbing state into which all trees are lumped together which have age greater than 140 years. 5.2. Determining the transition matrix Suppose the state of the forest at the beginning of a time period can be described by the vector s = (si,... , s j ) T , where S{ denotes the area of the forest covered wi th t imber in age class i. Then an area x = (xx,... , x i ) T is logged, leaving the remainder susceptible to forest fires which destroy a random proportion (px,... ,pi)T of the t imber left standing. Assuming instant re-forestation of the cleared area — either natural ly or by human intervention — the state s' of the forest at the beginning of the next period is given by s' = P[s - x] + Qx wi th Pi I -Pi 0 P2 0 PI-1 PI P2 0 Q = l l o 0 . . . l - p / _ ! I-pi. The loss rates pi are random variables, and historic data are available for their dis t r ibut ion. Ordinar i ly one would expect the pi to have different distributions: Younger trees are more likely to survive a forest fire than older trees. Unfortunately the information available from the B . C . Forest Service as given in [115] (see Table VI ) aggregates al l these figures into one single variable p describing the annual proportion of forest land destroyed by fire. In other words, one assumes p\ — p2 = ... = pi = p. This is a restrictive assumption, and more work should be done to disaggregate the loss rates. 70 p F(P) P F(P) P F(P) 0.0 0.47 0.0018 0.68 0.00777 0.85 0.0001 0.50 0.0020 0.71 0.0104 0.88 0.0002 0.56 0.00237 0.74 0.0118 0.91 0.00037 0.59 0.00266 0.76 0.0148 0.94 0.0012 0.62 0.00315 0.79 0.0266 0.97 0.0017 0.65 0.00546 0.82 0.0315 1.00 Table V I . Annual fire rates and their cumulative probability distribution. A further technical assumption made in this chapter is that p is stationary over time and independent from period to period. This is not an unreasonable assumption, but it neglects the possibility of factors such as improved fire-fighting equipment and early warning systems, introduction or eradication of noxious pests and tree diseases, cl imatic shifts and the like. Information about the incidence of forest fires is available in form of yearly data, such that the distr ibution of loss rates for a twenty-year period has to be computed using the yearly information. To that end, let pi,... ,y>20 be the loss rates for year 1 through year 20, so that the loss rate p over the entire period can be calculated from 20 Setting 1 — p = q, the 'survival rate', one has Because the qi are independently identically distributed and possess a mean and vari-ance, one might suspect that X)2£.2 lnc/; is approximately normal. If the time horizon is long enough, this is guaranteed by the central l imit theorem. O n the other hand, the dis t r ibut ion of qi is most irregular and has a discontinuity at 0. A sample of size 20 might not be large enough to smooth out this effect. In order to settle the question, 10000 sets of yearly loss rates were randomly generated on a computer and the resulting l n g plotted. 71 As evidenced i n Figure 7, the distr ibution of the sample is decidedly skewed, and a normal dis t r ibut ion does not fit the data al l that well. Twenty observations are obviously not sufficient to smooth out the rather non-symmetric distr ibution with the large atom at 0. 1500 -1250 -1000 750 500 250 02 .04 .06 .08 .10 .12 .14 .16 .18 .20 F i g u r e 7. H i s t o g r a m o f s i m u l a t e d f ire r a t e s . Instead the true population distr ibution was assumed to be equal to the sample distr i-but ion from the simulation study, which seems defensible in light of the large sample size and the fact that the one-year distributions are also based on a sample of historical data. 72 The sample distr ibution was then discretized i n various ways, as shown in Table V I I . The mean discretizations give an upper bound on the value of the problem — we are maximiz-ing — the vertex discretizations give a lower bound. Several deterministic versions were also tried, fixing the loss rate at one particular value, assumed constant over the t ime, and calculating the opt imal solution for this value. A . M e a n d i s c r e t i z a t i o n s ( U p p e r b o u n d ) 1 point: 1 P TT .06258 1.000 2 point: 2A P TT .08612 .4616 .04240 .5384 2B P TT .07961 .6 .03705 .4 2C P TT .08942 .4 .04469 .6 3 point: 3A P TT .09339 .3333 .06013 .3333 .03423 .3334 3B P TT .08942 .4 .05278 .2 .03705 .4 4 point: 4 P TT .10499 .1847 .07354 .2769 .05278 .2912 .03017 .2472 5 point: 5 P TT .10349 .2 .07536 .2 .05997 .2 .04633 .2 .02777 .2 B . V e r t e x d i s c r e t i z a t i o n s ( L o w e r b o u n d ) 2 point: 2 P TT 0.000 .6912 .20268 .3088 3 point: 3 P TT 0.000 .1736 .06258 .7488 .20268 .0776 4 point: 4 P TT 0.000 .1736 .06258 .5128 .08612 .2837 .20268 .0299 T a b l e V I I . D i s c r e t i z a t i o n s used a n d t h e i r p r o b a b i l i t i e s . 73 5 . 3 . P r e s e n t v a l u e o f s t a n d i n g t i m b e r Fini te time horizons quite frequently lead to erratic behavior in the last period. This is not hard to understand: If the world ends tomorrow, one might as well cut down al l the forest while one can st i l l derive some profit from i t . There are several ways to circumvent this undesirable last period effect. Reed and Err ico [115] solved various deterministic scenarios wi th 35 time periods and simply discarded the final five periods from their model. This is justifiable on theoretical grounds because of ' turnpike theorems' which ensure the existence of an opt imal infinite horizon solution, the ' turnpike' , wi th the property that finite horizon ' t r ips ' — provided they are far enough apart in time — wi l l optimally proceed to the turnpike as rapidly as possible and follow the turnpike unt i l the last moment before they have to turn. This approach permits the specification of arbitrary starting and endpoints. Since it is inconceivable to solve a stochastic problem over such a long time horizon, and also because I saw no way to justify the assumptions to hold statically over a period of seven centuries, I decided to proceed differently. The stochastic problems considered all have seven time periods and I attempted to impose a steady-state solution at the end, i.e., force an entry into the turnpike after seven periods. For some of the test problems it turned out that this was not enough time to reach the turnpike and infeasibility resulted. The advantage of this method is that it does not require discounting, that is, future harvests are just as valuable as harvests today and future constraint violations are penalized as heavily as infeasibilities in the first period. It has long been recognized [29, 63], that discounting i n renewable resources leads to non-conservative harvesting policies and lower long-run sustained yield, because the biomass is wiped out early and never permitted to recover. This is part icularly true of slow-growing classes such as trees. The alternative is to assign a value to timber left standing at the end of the problem horizon, which does require a non-zero discount rate, for otherwise the present value of standing timber is infinite. Ordinary accounting rules work with interest rates of 10 and even 15 percent per year, but in problems with 20-year periods such high interest rates 74 simply cannot be used. Reed and Err ico experimented wi th values of 2, 3, and 4 percent annually, I decided to use ini t ia l ly a rate of 0.5% in order to include max imum information about the future into the stochastic solution. The rate of 0.5% per year leads to a one-period discount factor of about .905, which is reasonably close to the accounting rules for yearly rates. For the computation of present values of standing timber one proceeds as follows. Let Vi,... , vs be the values of timber i n each of the eight age classes and y\,. . . , t/g be the biological yields. Then for each age class, there are two possible actions to be considered. One can either cut down the trees, realizing an immediate yield of yi plus anticipated yield avx once the logged off area has been reforested. The alternative is to postpone logging, i n which case a proportion q of the area wi l l s t i l l be standing one time period later, representing a value aqvi+i, the rest wi l l have been destroyed and have value apv\. Vi = m a x { y ; + avx,E[aqvi+i + apv]]} — m a x { y i + avi,a(l - p)vi+1 + apvi}. In part icular, since vg = v&, the optimal strategy i n age class 8 wi l l always be to harvest, unless the constraints of the problem specifically forebid it . Because of the structure of the yields yi, the opt imal solution is then for some i0 which describes the opt imal harvesting age. Once i0 is known, the defining equations for i = 1 , . . . ,io — 1, together wi th i>;0 = yif) + c*v\ form a non-singular system of equations that can be solved for the In the present case one finds io = 5. The corresponding values Vi can now be read from Table VI I I , along wi th the yields y; and the in i t i a l sizes of the various age classes. Thus if i > io i f i < io 75 Age class Y i e l d (m3/ha) Area at year 0 Value of standing timber 1 0 2.004 320.3417 2 0 1.404 356.1874 3 16 0.125 398.4370 4 107 0.241 448.2349 5 217 9.768 506.9294 6 275 16.385 564.9294 7 298 2.815 587.9294 8 306 61.995 595.9294 Table VIII. Age specific data for the problem. 5.4. Formulation of the problem W i t h the present values just computed, one has the following mathematical formula-t ion for the problem: T max > a. y xt + a. v ST+I t=i s.t. X\ {P-Q)x1-Pa1 + s2 %2 — s2 (P-Q)x2-Ps2 +S3 %3 — -53 XT — ST < 0 (P - Q)XT - PsT + 3T+l = 0 a j t > 0 , t = i , . . . , r where the xt are the decision variables, st are the state variables describing the age dis-t r ibut ion of the forest at time t, Si is the original state of the forest, assumed known, y is the vector of biological yields, also assumed to be known and constant over time, and v is the vector of present values of timber left standing, discounted back to time period 1. The primes are used to indicate row vectors. Addi t iona l constraints are placed into the model to ensure a controlled flow of t imber. 76 < si = 0 < o = 0 < o It might, for instance, be desirable on economic terms to avoid large fluctuations in the harvest from one period to the next. Wi thout further constraints, the opt imal solution might well be to cut down the entire forest and wait for 100 years or so before the entire area is logged off again. For small stands of timber this may constitute a defensible strategy, but for an entire forest of some 1000 square kilometres it probably does not. Possible harvest flow constraints could be of three types. a) Absolute l imits of the form a < y'xt < b, t = 1 , . . . , T. b) Absolute l imits on the change of the harvest a < y'{xt - xt-i) < b, t = l,...,T. c) Relative l imits on the change of the harvest a y'xt-i < y'xt < b y'xt-i-Following Reed and Err ico , the constraints employed are exclusively of type c), al-though i n principle any type and even a mixture of them could be accommodated. One could further envisage, as in G u n n [60], that flow constraints are separated into various categories of wood, corresponding to different uses i n the wood industry. Immature trees that are too small for a sawmill might for example st i l l be usable i n a pu lpmi l l . The entire complex of a wood-using industry wi th its capacity constraints and input requirements has been ignored in the present formulation. The solution method employed differs greatly from Reed and Err ico 's approach, who did a 'scenario analysis' of (1) for some 35 t ime periods. Each scenario uses a typical realization p to define fixed matrices P and Q. The resulting deterministic L P problem was then solved by an L P solver. W i t h eight age classes per time period, each problem has 34 * 18 + 8 = 620 constraints and 35 * 16 = 560 decision variables, but any connection between the scenarios is lost. Since their results showed period 1 harvest decreasing from over 400,000 m 3 to less than half that amount i f the risk of forest fire is increased from 77 0 to 0.02 annually, this connection is an important point of consideration. Moreover, the planning horizon of 700 years appears excessively long i f all one is interested i n is a harvest policy to be implemented during the first period. The analysis presented below wi l l attempt to address this latter point specifically. To this end, the long planning horizon is replaced by a much shorter one of seven time periods. Now the problem can be recast i n the form considered in Chapter 4 by setting A i = / , Bt = XQ - P) P ac 0 -be 0 0 -I 1 -I c 0 , for t = 2 , . . . , T . (Here J denotes the identity matrix.) The most important difference to the problem described i n Chapter 4 consists i n the stochastic 75-matrices, but as explained before, the algorithm itself is not affected, since the 73-matrices are only used in the generation of new right hand sides which differ from problem to problem anyway. 5.5. Numerical results The model was then input into program M S L i P and the results reported in Tables I X , X , and X I were obtained. These tables show the opt imal objective function values for the problem formulated in the previous section, the total first period harvest y'xi, and the components of the solution vector x\, namely the areas harvested i n age classes 3 through 8. (Age classes 1 and 2 can be ignored, since their yield is zero.) The number of rows for the equivalent linear program of Chapter 1 are also given. The C P U times were measured on a V A X 11/780 computer at the Technical Universi ty of Nova Scotia, using the V M S F O R T R A N compiler wi th full optimization; times are i n seconds and refer to the solution phase only, not counting time for input /output . Several deterministic versions were tried first, fixing the loss rate at one part icular value, assumed constant over time, which resulted in the opt imal solutions given in Ta-ble I X . This scenario analysis demonstrably does not work on this problem: The op t imum fluctuates wildly, and without specific probabilistic information on how to assemble the 78 different scenarios into one unifying picture, one is likely to arrive at a harvesting schedule which is quite far from the true opt imum. Real izat ion p 0.00 0.05 0.06258 0.10 0.15 0.20 0.20268 Objective value 46487.1 41800.5 41133.1 39166.4 36589.0 34049.9 33914.7 Harvest 1 7060.28 6427.93 6269.46 5848.12 5217.94 4555.94 4520.82 Age class 3 — — — — — — — Age class 4 — — — — — — — Age class 5 — — — — — — — Age class 6 — — — — — — — Age class 7 — — — — — — — Age class 8 23.0728 21.0063 20.4884 19.1115 17.0521 14.8887 14.7739 C P U time 24.32 26.09 25.93 24.35 22.78 22.16 21.61 Number of rows 124 124 124 124 124 124 124 T a b l e I X . R e s u l t s for v a r i o u s s cena r io s . Table X shows various mean discretizations which were solved as four-stage problems by using a one-point discretization for periods 4-7 and collapsing them all into one stage. The first three stages were discretized using the realizations and probabilities given i n Table V I I . The mean discretizations al l give an upper bound for the opt imal value, and they prove remarkably consistent. Not only the opt imal value, but also the opt imal first stage decision variables stabilize, and an implementable first period solution suggests itself. Almos t a l l the information about the probabili ty distr ibution of the loss rates is contained i n the mean, and consequently the expected value of perfect information appears insignificant. Table X I shows results for some vertex discretizations. The node structure gives the number of subproblems for each ancestor problem at the various time stages. The real-izations themselves are taken from the vertex discretizations of Table V I I . These solutions suggest a somewhat different picture. The lower bound on the opt imal value of the prob-79 Discretization I A 2C 2A 2B 3A 3B 4 5 Objective value 41133.1 40961.7 41008.7 40937.3 40908.7 40933.0 40897.2 40870.3 Harvest 1 6269.46 6150.00 6215.82 6112,43 6181.97 6113.54 6153.82 6122.42 Age class 3 Age class 4 Age class 5 Age class 6 Age class 7 Age class 8 20.4884 20.0980 20.3131 19.9752 20.2025 19.9789 20.1105 20.0079 C P U time 25.93 91.24 82.15 96.79 208.64 208.36 256.18 634.72 Scenarios 1 15 15 15 41 41 85 156 Number of rows 124 700 700 700 2176 2176 4984 9556 T a b l e X . R e s u l t s for u p p e r b o u n d p r o b l e m s . Node structure 1.222.222 1.322.222 1.332.222 1.432.222 1.222.222* Objective value 38825.6 39683.3 39784.8 39528.4 40914.3 Harvest 1 4762.86 4762.86 4762.86 4762.86 6134.65 Age class 3 — — - — — — Age class 4 — — — — — Age class 5 — — — — — Age class 6 — — — — — Age class 7 — — — — — Age class 8 15.5649 15.5649 15.5649 15.5649 20.0479 C P U time 350.91 404.48 547.17 744.46 425.54 Scenarios 127 191 283 377 127 Number of rows 2284 3436 5092 6784 2284 * Upper bound problem using discretization 2C T a b l e X I . S o m e s e v e n - p e r i o d f o r m u l a t i o n s . 80 lem is reasonably close to its upper bound, but the opt imal decision variables are quite far off. The opt imal harvest i n period 1 is identical for al l four problems tested, but this' value is about 25% below the opt imal harvest for the upper bound problems. Th i s is caused by feasibility problems if all seven periods exhibit bad years, namely pi is at its lowest observed value 0.79732 for al l periods. The flow constraints are so tight in this case that there is only one feasible solution. Indeed at the end of the seventh period there wi l l be no escapement whatsoever, all the harvestable forest has been destroyed despite the imposi t ion of a value for trees left standing. This solution is forced by the modell ing constraints no matter how small the probabili ty of observing seven lean years in succession, and illustrates what Madansky [90] called the 'fat solution' to a stochastic programming problem, namely a solution which is feasible for all possible future events. In an attempt to rectify the situation a linear penalty term was included which allows violations of the flow constraints, i.e., recourse, but imposes a penalty on the objective function for doing so. After some experimentation I set the penalty parameter at 50, so that each unit of violation wi l l decrease the objective function by 50 units. This number was chosen completely arbitrarily, but it is well known that a sufficiently large penalty parameter wi l l force exact solutions. Some more fine-tuning of the method revealed that the exact penalty parameter lies somewhere between 100 and 150. Tables X I I and XI I I show both upper and lower bound solutions wi th the soft con-straints. Now it would appear that both upper and lower bounds are converging towards a joint l imi t , w i th the lower bounds showing much the larger fluctuations. The last column i n Table X I I I gives the value of the lower bound problem after the first period upper bound solution xs = 19.66 has been implemented. It is worth pointing out that the solution times for these problems appear to increase slower than the number of rows for the equivalent L P problem, which is directly propor-t ional to the number of scenarios. This result is very encouraging for the viabi l i ty of the algori thm, since the average efficiency of the simplex method has been shown by Adle r , Megiddo and Todd [2] and Borgwardt [25, 26] to be on the order of m2 (the constraint 81 Node structure 1.111.111 1.222.222 1.322.222 1.332.222 1.333.222 1.333.322 Objective value 41132.0 40914.3 40897.9 40864.2 40835.8 40703.1 Harvest 1 6271.71 6134.65 6143.54 6105.57 6103.90 6036.34 Age class 3 — — — — — — Age class 4 — — — — — — Age class 5 — — — — — — Age class 6 — — — — — — Age class 7 — — — — — — Age class 8 20.4958 20.0479 20.0769 19.9528 19.9474 19.7266 C P U time 19.86 266.45 347.81 515.79 818.40 1140.57 Scenarios 1 127 191 283 418 607 Number of rows 124 2284 3436 5092 7522 10924 T a b l e X I I . U p p e r b o u n d p r o b l e m s w i t h soft c o n s t r a i n t s . Node structure 1.222.222 1.322.222 1.422.222 1.332.222 1.442.222 1.333.222 1.333.322 1.333.322 Objective value 38944.8 39142.6 39161.1 39356.9 39556.4 39664.4 39847.8 38591.4 Harvest 1 5158.89 5174.37 5174.37 5442.45 5500.55 5551.81 5584.02 6036.34 Age class 3 Age class 4 Age class 5 — — — — — - — — — Age class 6 Age class 7 Age class 8 16.8591 16.9097 16.9097 17.5858 17.9757 18.1432 18.2484 19.7266* C P U time 257.52 409.29 519.37 473.74 857.40 3344.61 1326.49 711.23 Scenarios 127 191 253 283 501 418 607 607 Number of rows 2284 3436 4552 5092 9016 7522 10924 10924 * - First period harvest fixed at this level T a b l e XIII . L o w e r b o u n d p r o b l e m s w i t h soft c o n s t r a i n t s . matrices of our forestry problem are essentially square). 82 There may, however, be numerical problems, as evidenced in the sixth column of Ta-ble X I I I . Numerical problems strike unpredictably, and it is not obvious what could be done about them. Ho [66] reported that decomposition methods are generally susceptible to truncation errors. To guard against this, all computations are done i n extended preci-sion and every matr ix is re-inverted after any L P subproblem has been solved. Wi thou t this precautionary measure, a tightly constrained feasible problem ( S C F X M 1 of the test problem set in Chapter 4) was determined to be infeasible by the algori thm. Another cause for slowdown is seen on large problems which come close to the memory l imitat ions of the code itself. In this case, much shuffling of variables takes place to purge information no longer needed (in form of slack cuts which are detected and removed from the L P bases regularly), so that newly generated cuts can be placed properly. 5.6. Conclusion M u c h work remains to be done before this forestry model could possibly be imple-mented. Better stochastic information, in particular a breakdown of the loss rate dis-tributions for ind iv idual age classes, would be desirable. It might also be interesting to incorporate the possibility of expanding or receding forests into the model. This could be done by allowing for a separate random variable r describing the probabil i ty that a cleared area is re-forested (in the present formulation r = 0). A more realistic model would further include a rudimentary representation of the wood working industry, as demonstrated by G u n n in the deterministic case. Some of the homogeneity assumptions could be removed by a refined age class differentiation, which might distinguish between qualitatively different stands or even different species of trees growing in the same forest. The transition matrices P and Q could be set up to account for that rather easily, at the expense of a larger problem size. 83 B i b l i o g r a p h y [1]. P . G . Abrahamson, " A nested decomposition approach for solving staircase linear pro-grams", Technical Report S O L 83-4, Systems Opt imizat ion Laboratory, (Stanford, C A , 1983). [2]. I. Adler , N . Megiddo and M . J . Todd, "New results on the average behavior of simplex algorithms", AMS Bulletin (New Series) 11 (1984) 378-382. [3]. J . H . Ahrens and U . Dieter, "Computer methods for sampling from G a m m a , Beta , Poisson and Normal distributions", Computing 12 (1974) 223-246. [4]. V . I . A r k i n and I .V . Evstigneev, Stochastic Models of Control and Economic Dynamics (Nauka, Moscow, 1979) (in Russian). English translation by E . A . Medova-Dempster and M . A . H . Dempster (Academic, Orlando (to appear)). [5]. E . M . L . Beale, " O n minimiz ing a convex function subject to linear inequalities", Jour-nal of the Royal Statistical Society, Series B 17 (1955) 173-184. [6]. E . M . L . Beale, G . B . Dantzig and R . D . Watson, " A first order approach to a class of multi- t ime-period stochastic programming problems", Mathematical Programming Study #27 (1986) 103-117. [7]. E . M . L . Beale, J . Forrest and C . Taylor, "Mult i - t ime-per iod stochastic programming", in : M . A . H . Dempster, ed., Stochastic Programming (Academic Press, London , 1980) pp. 387-402. [8]. J . F . Benders, "Part i t ioning procedures for solving mixed-variables programming prob-lems", Numerische Mathematik 4 (1962) 238-252. [9]. A . Ben-Tal and E . Hochman, "More bounds on the expectation of a convex function of a random variable", Journal of Applied Probability 9 (1972) 803-812. [10]. J . R . Birge, "Solution methods for stochastic dynamic linear programs", Technical Report S O L 80-29, Systems Opt imizat ion Laboratory, (Stanford, C A , 1980). [11]. J . R . Birge, "The value of the stochastic solution i n stochastic linear programs wi th fixed recourse", Mathematical Programming 24 (1982) 314-325. [12]. J . R . Birge, "The relationship between the L-shaped method and dual basis factor-ization for stochastic linear programming", Technical Report 82-15, Department of Industrial and Operations Engineering, University of Mich igan ( A n n Arbor , 1982). [13]. J . R . Birge, "Aggregation bounds in stochastic linear programming", Technical Report 83-1 , Department of Industrial and Operations Engineering, University of Mich igan ( A n n Arbor , 1983). 84 J . R . Birge, " A Dantzig-Wolfe decomposition variant equivalent to basis factorization", Mathematical Programming Study #24 (1985) 43-64. J . R . Birge, "Decomposition and parti t ioning methods for multistage stochastic linear programs", Operations Research 33 (1985) 989-1007. J . R . Birge, " N D S P User's manual" , i n : J . Edwards, ed., Documentation for the ADO/ SDS collection of stochastic programming codes, Working paper W P - 8 5 - 0 2 , Interna-t ional Institute for Appl i ed Systems Analysis (Laxenburg, Aus t r i a , 1985). J . R . Birge, " A n L-shaped method computer code for multistage linear programs", i n : Y . Ermoliev and R . Wets, eds., Numerical Methods in Stochastic Optimization (Springer Lecture Notes (to appear)). J . R . Birge and F . V . Louveaux, " A multicut algorithm for two-stage linear programs", Technical Report #85-15, Department of Industrial and Operations Engineering, The Universi ty of Michigan ( A n n Arbor , 1985). J . R . Birge and S .W. Wallace, "Refining bounds for stochastic linear programs wi th linearly transformed random variables", Operations Research Letters 5 (1986) 73-77. J . R . Birge and R . J . - B . Wets, "Designing approximation schemes for stochastic opti-mizat ion problems, in particular for stochastic programs with recourse", Math. Pro-gramming Study 27 (1986) 54-102. J . R . Birge and R . Wets, Generating upper bounds on the expected value of a convex function wi th applications to stochastic programming, Working paper, Department of Industrial and Operations Engineering, The University of Mich igan ( A n n A r b o r , 1985). P. Birge, " A research bibliography in stochastic programming", i n : Y u . Ermol iev and R . J . - B . Wets, eds., Numerical Methods in Stochastic Optimization (Springer Lecture Notes (to appear)). J . Bisschop and A . Meeraus, " M a t r i x augmentation and part i t ioning i n the updat ing of the basis inverse", Mathematical Programming 13 (1977) 241-254. R . G . B land , "New finite pivoting rules for the simplex method", Mathematics of Operations Research 2 (1977) 103-109. K . - H . Borgwardt , "Some distribution-independent results about the asymptotic or-der of the average number of pivot steps of the simplex method", Mathematics of Operations Research 7 (1982) 441-462. K . - H . Borgwardt, "The average number of pivot steps required by the simplex method is po lynomia l" , Zeitschrift fur Operations Research, Serie A-B 26 (1982) 157-177. 85 S.P. Bradley and D . B . Crane, " A dynamic model for bond portfolio management", Management Science 19 (1972) 139-151. V . Chvata l , Linear Programming (W.H.Freeman & Co. New York , 1983). C . W . Clark , Mathematical Bioeconomics (John Wi ley & Sons, New York , 1976). G . B . Dantzig , "Linear programming under uncertainty", Management Science 2(1955) 197-206. G . B . Dantzig , Linear Programming and Extensions (Princeton Universi ty Press, Prince-ton, New Jersey, 1963). G . B . Dantzig and A . Madansky, " O n the solution of two-stage linear programs under uncertainty", Proceedings of the Fourth Berkeley Symposium on Mathemat ica l Statis-tics and Probabil i ty, Vol.1 (University of California Press, Berkeley, 1961) pp. 165-176. G . B . Dantz ig and P. Wolfe, "Decomposition principle for linear programs", Operations Research 8 (1960) 101-111. I. Deak, "Computa t ion of multiple normal probabilities", i n : P. K a i l and A . Prekopa, eds., Recent Results in Stochastic Programming (Springer Verlag, New York , 1980) pp. 107-120. I. Deak, "Three digit multiple normal probabilit ies", Numerische Mathematik 35 (1980) 369-380. I. Deak, "Procedures for the computation of normal distr ibution function values and probabilities of rectangles", Working Paper No. M O / 4 8 , Computer and Automat ion Institute, The Hungarian Academy of Sciences (Budapest, 1983). M . A . H . Dempster, "The expected value of perfect information in the opt imal evolu-t ion of stochastic problems", i n : M . Ara to , D . Vermes and A . V . Balakr ishnan, eds., Stochastic Differential Systems (Springer Verlag, Ber l in , 1981). M . A . H . Dempster, " O n stochastic programming: II. Dynamic problems under risk," Working paper, School of Business Adminis t ra t ion , Dalhousie Universi ty (Halifax, Nova Scotia, 1986). M . A . H . Dempster, E . A . G u n n and R . R . Merkovsky, "Lagrangian dual decomposition methods for solving stochastic programmes wi th recourse", Working paper, School of Business Adminis t ra t ion , Dalhousie University (Halifax, Nova Scotia, (in prepara-tion)). T . G . Donnelly, "Bivariate normal distr ibution, algorithm 462", Communications of the ACM 16 (1973) 638. 86 J . Dupacova, " M i n i m a x approach to stochastic linear programming and the moment problem. Recent results", Zeitschrift fur angewandte Mathematik und Mechanik 58 (1978) 466-467. J . Dupacova, "Stabil i ty in stochastic programming wi th recourse. Contaminated dis-tr ibutions", Mathematical Programming Study # 2 7 ( 1 9 8 6 ) 133-144. J . E . Dut t , " A representation of multivariate normal probabil i ty integrals by integral transforms", Biometrica 60 (1973) 637-645. C . K . Eagleson, "Polynomial expansion of bivariate distr ibutions", Annals of Mathe-matical Statistics 35 (1964) 1208-1215. H . P . Edmundson, "Bounds on the expectation of a convex function of a random vari-able", the R A N D Corporat ion, P-382, A p r i l 9, 1957. R . Everi t t and W . T . Ziemba, "Two-period stochastic programs wi th simple recourse", Operations Research 27(1979) 485-502. S.D. F l a m , "Dual i ty in stochastic infinite horizon models", Working paper, C h r . Miche l -sen Institute (Fantoft, Norway, 1983). K . Frauendorfer, "Solving S L P recourse problems wi th arbitrary multivariate distr i-butions — the dependent case", Working Paper, Institut fur Operations Research, Universitat Zur ich , 1986. K . Frauendorfer and P. K a i l , " A solution method for S L P recourse problems wi th arbitrary multivariate distributions — the independent case", Working Paper, Institut fur Operations Research, Universitat Zurich, 1986. S. Garstka and D . Rutenberg, "Computat ion in discrete stochastic programs wi th recourse", Operations Research 21 (1973) 112-122. H.I . Gassmann, "Means of a bivariate normal distr ibution over a rectangle: A n efficient a lgor i thm", Working paper No. 971, Faculty of Commerce, The University of B r i t i s h Columbia , Vancouver, 1983. H.I . Gassmann, "Condi t ional probabili ty and conditional expectation of a random vector", Working paper No. 1095, Faculty of Commerce, The Universi ty of Br i t i sh Co lumbia (Vancouver, Canada, 1985). H . Gassmann and W . T . Ziemba, " A tight upper bound for the expectation of a convex function of a multivariate random variable", Mathematical Programming Study #27 (1986) 39-53. 87 P . E . G i l l , W . Murray , M . A . Saunders and M . H . Wright , "Sparse matr ix methods in opt imizat ion" , Technical Report S O L 82-17, Systems Opt imizat ion Laboratory, (Stanford, C A , 1982). C R . Glassey, "Dynamic linear programs for production scheduling", Operations Re-search 19 (1971) 45-56. C R . Glassey, "Nested decomposition and multi-stage linear programs", Management Science 20 (1973) 282-292. R . Gr ino ld , "Fini te horizon approximations of infinite horizon linear programs", Math-ematical Programming 12 (1977) 1-17. R . Gr ino ld , " A new approach to multistage stochastic linear programs", Mathematical Programming Study #6 (1976) 19-29. R . Gr ino ld , "Infinite horizon stochastic programs", Working Paper, School of Business Adminis t ra t ion , University of California (Berkeley, 1983). E . A . G u n n , " A model of long-term forest management and a decompostion approach to its solution", Working paper, Department of Industrial Engineering, Technical U n i -versity of Nova Scotia (Halifax, Nova Scotia, 1986). J . M . Hammersley and D . C . Handscomb, Monte Carlo Methods, (Methuen, London , 1964). Handbook of Tables for Mathematics, Fourth Edition (The Chemical Rubber C o m -pany, Cleveland, Ohio, 1970). C M . Harvey, "Value functions for infinite-period planning", Management Science 32 (1986) 1123-1139. D . B . Hausch and W . T . Ziemba, "Bounds on the value of information i n uncertain decision problems II", Stochastics 10 (1983) 181-217. J . Ho, "Nested decomposition for large scale linear programs wi th staircase structure", Technical Report S O L 74-4, Systems Opt imizat ion Laboratory, (Stanford, C A , 1974). J . K . Ho, "Convergence behavior of decomposition algorithms for linear programs", Operations Research Letters 2 (1984) 91-94. J . K . Ho and E . Loute, " A set of staircase linear programming test problems", Math-ematical Programming 20 (1981) 245-250. J . K . Ho and A . S . Manne, "Nested decomposition for dynamic models", Mathematical Programming 6 (1974) 121-140. 88 C . C . Huang, I. Vertinsky and W . T . Ziemba, "Sharp bounds on the value of perfect information", Operations Research 25 (1977) 128-139. C . C . Huang, W . T . Ziemba and A . Ben-Tal , "Bounds on the expectation of a con-vex function of a random variable: W i t h applications to stochastic programming", Operations Research 25 (1977) 315-325. K . N . Johnson and H . L . Scheurman, "Techniques for prescribing opt imal t imber har-vest and investment under different objectives: Discussion and synthesis", Forest Sci-ence Monograph 18. N . L . Johnson and S. K o t z , Distributions in Statistics, Vol.4 (John Wi ley & Sons, New York , 1972). P. K a i l , "Approximations to stochastic programs wi th complete fixed recourse", Nu-merische Mathematik 22 (1974) 333-339. P. K a i l , Stochastic Linear Programming (Springer Verlag, Ber l in , 1976). P. K a i l , "Computat ional methods for solving two-stage stochastic linear programming problems", Zeitschrift fur angewandte Mathematik und Physik 30 (1979) 261-271. P. K a i l , " O n Approximations and stability i n stochastic programming", Work ing pa-per, Institut fur Operations Research, Universitat Zurich, 1984. P. K a i l , "Approximat ion to optimization problems: A n elementary review", Mathe-matics of Operations Research 11 (1986) 9-18. P. K a i l , K . Frauendorfer and A . Ruszczynski , "Approximations i n stochastic program-ming wi th recourse", Working paper, Institut fur Operations Research, Universitat Zur ich , 1984. P. K a i l and D . Stoyan, "Solving stochastic programming problems wi th recourse i n -cluding error bounds", Mathematische Operationsforschung und Statistik, Series Op-timization 13 (1982) 431-447. J . G . Ka l lbe rg and W . T . Ziemba, " A n algori thm for portfolio revision: Theory, com-putational algori thm and empirical results", i n : R . A . Schultz, ed., Applications of Management Science, V o l . I ( J A I Press, Greenwich, Connecticut, 1981) pp. 267-292. E . P . C . K a o and M . Queyranne, "Aggregation i n a two-stage stochastic program for manpower planning in the service sector", i n : A . J . Swersey and E . G . Ignall , eds., TIMS studies in the Management Sciences, V o l . 22 (1980) pp.205-225. J . Kemperman , "The general moment problem. A geometric approach", Annals of Mathematical Statistics 39 (1968) 93-112. 89 D . E . Kmxth , The Art of Computer Programming, Vol .3 , (Addison-Wesley, Reading, Massachusetts, 1974). M . I . K u s y and W . T . Ziemba, " A bank asset and l iabi l i ty management model" , Op-erations Research 34, No. 3 (1986) 356-376. M . R . Lembersky and K . N . Johnson, "Opt imal policies for managed stands: A n infinite horizon Markov decision approach", Forest Science 21 (1975) 109-122. F . V . Louveaux, " A solution method for multistage stochastic programs wi th applica-t ion to an energy investment problem", Operations Research 28 (1980) 889-902. D . Ludwig and J . M . Varah, "Opt imal harvesting of a randomly fluctuating resource II: Numerical methods and results", SIAM Journal of Applied Mathematics 37 (1979) 185-205. K . S . Lyon and R . A . Sedjo, " A n opt imal control model to estimate the regional long-term supply of t imber", Forest Science 29 (1983) 798-812. A . Madansky, "Bounds on the expectation of a convex function of a multivariate random variable", Ann. Math .Stat. 30 (1959) 743-746. A . Madansky, "Inequalities for stochastic linear programming problems", Management Science 6 (1960) 197-204. D . L . Mar t e l l , "The opt imal rotation of a flammable forest stand", Canadian Journal of Forest Research 10 (1980) 30-34. F . Mirzoakhmedov and M . V . Mikhalevich , "Model l ing of opt imal cultivated land dis-t r ibut ion by means of a multistage stochastic programming problem"(in Russian), Ekonomicheskye i matematicheskye metody 18 (1982) 918-922. A . M . M o o d , R . Greybi l l and C . Boes, Introduction to the Theory of Statistics, 3rd edition ( M c G r a w - H i l l , New York, 1974). B . A . Mur tagh and M . A . Saunders, " M I N O S / A U G M E N T E D User's M a n u a l " , Tech-nical Report S O L 80-14, Systems Opt imizat ion Laboratory, (Stanford, C A , 1980). Nat ional Bureau of Standards, Tables of Probability Functions, V o l . II (Washington, 1942). L . Nazareth, " A land management model using Dantzig-Wolfe decomposition", Man-agement Science 26 (1980) 510-523. L . Nazareth, "Variants on Dantzig-Wolfe decomposition wi th applications to mul -tistage problems", Working paper W P - 8 3 - 6 1 , International Institute for App l i ed Systems Analysis (Laxenburg, Aus t r ia , 1983). 90 L . Nazareth and R . Wets, "Algori thms for stochastic programs: The case of non-stochastic tenders", Mathematical Programming Study #28 (1986) 1-28. H . Niederreiter, "Mult id imensional numerical integration using pseudorandom num-bers", Mathematical Programming Study # 2 7 ( 1 9 8 6 ) 17-38. M . - C . Noel and Y . Smeers, "Nested decomposition of multistage nonlinear programs wi th recourse", C O R E Working Paper #8505, Universite Catholique de Louvain (Lou-vain, Belg ium, 1985). E . Nurminsk i , "Decomposition algori thm based on the pr imal-dual approximat ion", Working Paper W P - 8 2 - 4 6 , International Institute for Appl i ed Systems Analysis (Lax-enburg, Aus t r i a , 1982). P. Olsen, "Multistage stochastic programming wi th recourse: The equivalent deter-minist ic problem", SIAM Journal of Control and Optimization 14 (1976) 495-517. P. Olsen, "When is a multistage stochastic programming problem well defined?", SIAM Journal of Control and Optimization 14 (1976) 518-527. P. Olsen, "Multistage stochastic programming wi th recourse as mathematical pro-gramming in an L p - space" , SIAM Journal of Control and Optimization 14 (1976) 528-537. P. Olsen, "Discrete approximations to stochastic programming problems", Mathemat-ical Programming Study #6 (1976) 111-124. R . P . O ' N e i l l , "Nested decomposition of multistage convex programs", SIAM Journal of Control and Optimization 14 (1976) 409-418. M . D . Per lman, "Jensen's inequality for a convex vector-valued function on an infinite dimensional space", Journal of Multivariate Analysis 4 (1974) 52-65. J . Pfanzagl, "Convexity and conditional expectations", Annals of Probability 2 (1974) 490-494. C E . Pfefferkorn and J . A . Toml in , "Design of a linear programming system for I L L I A C I V " , Technical Report S O L 76-8, Systems Opt imizat ion Laboratory, (Stanford, C A , 1976). A . Prekopa, "Network planning using two-stage programming under uncertainty", i n : P. K a i l and A . Prekopa, eds., Recent Results in Stochastic Programming (Springer Verlag, 1980) pp. 215-237. 91 A . Prekopa, S. Ganczer, I. Deak and K . Pa ty i , "The S T A B I L stochastic programming model and its experimental application to the electrical energy sector of the Hungar-ian economy", i n : M . A . H . Dempster, ed., Stochastic Programming (Academic Press, London, 1980) pp. 369-385. A . Prekopa and T . Szantai, " A new multivariate gamma distr ibution and its fitt ing to empirical data", Water Resources Research 14 (1978) 19-24. A . N . Rae, "Stochastic programming, ut i l i ty and sequential decision problems i n farm management", American Journal of Agricultural Economics 53 (1971) 625-638. W . J . Reed and D . Err ico , "Op t ima l harvest scheduling at the forest level i n the pres-ence of risk of fire", Canadian Journal of Forest Research (to appear). W . J . Reed and D . Err ico , "Harvest scheduling i n the risk of fire — further results", Canadian Journal of Forest Research (to appear). W . J . Reed and D . Err ico , " A new look at the model II harvest scheduling problem", Work ing paper, Department of Forestry, University of V ic to r i a (Vic tor ia , B . C . , 1986). S . M . Robinson and R . J . - B . Wets, "Stabil i ty in two-stage programming", Work ing Paper, Mathematics Research Center, University of Wisconsin (Madison, 1985). R . T . Rockafellar, Convex Analysis (Princeton University Press, Princeton, New Jersey, 1970). R . T . Rockafellar and R . Wets, "Nonanticipativi ty and L1-martingales i n stochastic opt imizat ion problems", Mathematical Programming Study #6 (1976) 170-187. R . T . Rockafellar and R . Wets, "Stochastic convex programming: Basic dual i ty" , Pa-cific Journal of Mathematics 62 (1976) 173-195. R . T . Rockafellar and R . Wets, "Stochastic convex programming: Relatively cpm-plete recourse and induced feasibility", SIAM Journal of Control and Optimization 14 (1976) 574-589. R . T . Rockafellar and R . Wets, "The optimal recourse problem i n discrete time: L1-multipliers for inequality constraints", SIAM Journal of Control and Optimization 16 (1978) 16-30. R . T . Rockafellar and R . J . - B . Wets, " A Lagrangian finite generation technique for solving linear-quadratic problems in stochastic programming", Mathematical Program-ming Study #28 (1986). W . R u d i n , Functional Analysis ( M c G r a w - H i l l , New York, 1973). 92 D . M . Scott, " A dynamic programming approach to time staged convex programs", Technical Report S O L 85-3, Systems Opt imizat ion Laboratory, (Stanford, C A , 1985). S .B . Smi th , "Planning transistor production by linear programming", Operations Re-search 13 (1965) 132-139. J . Stoer, Numerische Mathematik / (Springer Verlag, Ber l in , 1972). B . Strazicky, "Some results concerning an algori thm for the discrete recourse problem", in : M . A . H . Dempster, ed., Stochastic Programming (Academic Press, London, 1980) pp. 263-274. R . Swaroop, J . D . Brownlow, G . R . Ashworth and W . R . Winter , "Bivariate normal , conditional and rectangular probabilities: A computer program wi th applications", N A S A Technical Memorandum #81350, 1980. T . Szantai, "Evaluat ion of a special mul t igamma distr ibution function", Mathematical Programming Study #27 (1986 ) 1-16. R . V a n Slyke and R . J . - B . Wets, "L-shaped linear programs wi th application to opt imal control and stochastic opt imizat ion", SIAM Journal of Applied Mathematics 1 7(1969) 638-663. D . Walkup and R . J . - B . Wets, "Lif t ing projections of convex polyhedra", Pacific Jour-nal of Mathematics 28 (1969) 465-475. C . E . Van Wagner, "Age class distr ibution and the forest fire cycle", Canadian Journal of Forest Research 8 (1978) 220-227. C . E . Van Wagner, "Simulat ing the effect of forest fires on long-term annual t imber supply", Canadian Journal of Forest Research 13 (1983) 451-457. S .W. Wallace, "Solving stochastic programs wi th network recourse", Networks 16 (1986) 295-317. S .W. Wallace, " O n network structured stochastic optimizat ion problems", Report #842555-8, Chr . Michelsen Institute (Fantoft, Norway, 1985). S .W. Wallace, " A piecewise linear upper bound on the network recourse function", Report #862330-3, Chr . Michelsen Institute (Fantoft, Norway, 1986). S .W. Wallace, personal communication. R . Wets, "Programming under uncertainty: The equivalent convex program", SIAM Journal of Applied Mathematics 14 (1966) 89-105. R . Wets, "Programming under uncertainty: The solution set", SIAM Journal of Ap-plied Mathematics 14 (1966) 1143-1151. 93 R . Wets, "Stochastic programs with recourse: A basic theorem for multistage prob-lems", Zeitschrift der Wahrscheinlichkeit und verwandter Gebiete 21 (1972) 201-206. R . Wets, "Stochastic programs with fixed recourse: The equivalent deterministic pro-gram", SIAM Review 16 (1974) 309-339. R . Wets, "Induced constraints for stochastic optimization problems", i n : Techniques of Optimization, A . Balakrishnan, ed. (Academic Press, London, 1974) pp. 433-443. R . Wets, "Solving stochastic programs wi th simple recourse", Stochastics 10 (1983) 219-242. R . Wets," Stochastic programming: Solution techniques and approximation schemes", i n : A . Bachem, M . Grotschel and B . Kor te , eds., Mathematical Programming: The State of the Art (Springer Verlag, 1983). R . J . - B . Wets, "Large scale linear programming techniques in stochastic program-ming" , i n : Y u . Ermoliev and R . J . - B . Wets, eds., Numerical Methods in Stochastic Optimization (Springer Lecture Notes (to appear)). A . C . Wi l l i ams , "Approximat ion formulas for stochastic linear programming", SIAM Journal of Applied Mathematics 14 (1966) 668-677. R . J . Wi t t rock , "Advances i n a nested decomposition algori thm for solving staircase linear programs", Technical Report S O L 83-2, Systems Opt imizat ion Laboratory, (Stanford, C A , 1983). W . T . Ziemba, "Computat ional algorithms for convex stochastic programs wi th simple recourse", Operations Research 18 (1980) 414-431. W . T . Ziemba, "Stochastic programs wi th simple recourse", i n : P . L . Hammer and G . Zoutendijk, eds., Mathematical Programming: Theory and Practice (Nor th-Hol land , Amsterdam, 1974). W . T . Ziemba and J . E . Butterworth, "Bounds on the value of information i n uncertain decision problems", Stochastics 1 (1975) 361-378. 94
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Multi-period stochastic programming
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Multi-period stochastic programming Gassmann, Horand Ingo 1987
pdf
Page Metadata
Item Metadata
Title | Multi-period stochastic programming |
Creator |
Gassmann, Horand Ingo |
Publisher | University of British Columbia |
Date Issued | 1987 |
Description | This dissertation presents various aspects of the solution of the linear multi-period stochastic programming problem. Under relatively mild assumptions on the structure of the random variables present in the problem, the value function at every time stage is shown to be jointly convex in the history of the process, namely the random variables observed so far as well as the decisions taken up to that point. Convexity enables the construction of both upper and lower bounds on the value of the entire problem by suitable discretization of the random variables. These bounds are developed in Chapter 2, where it is also demonstrated how the bounds can be made arbitrarily sharp if the discretizations are chosen sufficiently fine. The chapter emphasizes computability of the bounds, but does not concern itself with finding the discretizations themselves. The practise commonly followed to obtain a discretization of a random variable is to partition its support, usually into rectangular subsets. In order to apply the bounds of Chapter 2, one needs to determine the probability mass and weighted centroid for each element of the partition. This is a hard problem in itself, since in the continuous case it amounts to a multi-dimensional integration. Chapter 3 describes some Monte-Carlo techniques which can be used for normal distributions. These methods require random sampling, and the two main issues addressed are efficiency and accuracy. It turns out that the optimal method to use depends somewhat on the probability mass of the set in question. Having obtained a suitable discretization, one can then solve the resulting large scale linear program which approximates the original problem. Its constraint matrix is highly structured, and Chapter 4 describes one algorithm which attempts to utilize this structure. The algorithm uses the Dantzig-Wolfe decomposition principle, nesting decomposition levels one inside the other. Many of the subproblems generated in the course of this decomposition share the same constraint matrices and can thus be solved simultaneously. Numerical results show that the algorithm may out-perform a linear programming package on some simple problems. Chapter 5, finally, combines all these ideas and applies them to a problem in forest management. Here it is required to find logging levels in each of several time periods to maximize the expected revenue, computed as the volume cut times an appropriate discount factor. Uncertainty enters into the model in the form of the risk of forest fires and other environmental hazards, which may destroy a fraction of the existing forest. Several discretizations are used to formulate both upper and lower bound approximations to the original problem. |
Subject |
Forest management -- Linear programming Linear programming Stochastic programming |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2010-08-12 |
Provider | Vancouver : University of British Columbia Library |
Rights | For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use. |
DOI | 10.14288/1.0097388 |
URI | http://hdl.handle.net/2429/27304 |
Degree |
Doctor of Philosophy - PhD |
Program |
Business Administration |
Affiliation |
Business, Sauder School of |
Degree Grantor | University of British Columbia |
Campus |
UBCV |
Scholarly Level | Graduate |
AggregatedSourceRepository | DSpace |
Download
- Media
- 831-UBC_1987_A1 G37.pdf [ 5.91MB ]
- Metadata
- JSON: 831-1.0097388.json
- JSON-LD: 831-1.0097388-ld.json
- RDF/XML (Pretty): 831-1.0097388-rdf.xml
- RDF/JSON: 831-1.0097388-rdf.json
- Turtle: 831-1.0097388-turtle.txt
- N-Triples: 831-1.0097388-rdf-ntriples.txt
- Original Record: 831-1.0097388-source.json
- Full Text
- 831-1.0097388-fulltext.txt
- Citation
- 831-1.0097388.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0097388/manifest