UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Reliability and maintenance for interrelated systems Armstrong, Michael John 1996

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1996-147231.pdf [ 6.84MB ]
Metadata
JSON: 831-1.0087743.json
JSON-LD: 831-1.0087743-ld.json
RDF/XML (Pretty): 831-1.0087743-rdf.xml
RDF/JSON: 831-1.0087743-rdf.json
Turtle: 831-1.0087743-turtle.txt
N-Triples: 831-1.0087743-rdf-ntriples.txt
Original Record: 831-1.0087743-source.json
Full Text
831-1.0087743-fulltext.txt
Citation
831-1.0087743.ris

Full Text

RELIABILITY AND MAINTENANCE FOR INTERRELATED SYSTEMS by M I C H A E L JOHN A R M S T R O N G BSc, Royal Military College of Canada, 1985 M B A , University of Ottawa, 1991 A THESIS SUBMITTED IN P A R T I A L F U L F I L L M E N T OF T H E REQUIREMENTS FOR THE D E G R E E OF DOCTOR OF PHILOSOPHY in THE F A C U L T Y OF G R A D U A T E STUDIES F A C U L T Y OF C O M M E R C E A N D BUSINESS ADMINISTRATION We accept this thesis as conforming to the required standard  THE UNIVERSITY OF BRITISH C O L U M B I A June 1996 © Michael J Armstrong, 1996  In  presenting  this  degree at the  thesis  in  University of  freely available for reference copying  of  department  this or  publication of  partial fulfilment  of  British Columbia,  I agree  and study.  thesis for scholarly by  his  or  her  that the  purposes  may be It  that  is  The University of British Columbia j Vancouver, Canada j  14- J<An&  for an  advanced  Library shall make it  permission for extensive  granted  this thesis for financial gain shall not be  Department of  DE-6 (2/88)  requirements  I further agree  representatives.  permission.  Date  the  by the  understood  that  allowed without  head  of  my  copying  or  my written  11  Abstract In this paper we consider three topics in reliability and maintenance. Their common feature is the explicit consideration of interdependence, either amongst components within a machine, amongst stages in a logistics chain, or amongst machines sharing maintenance facilities. Chapter 1 examines measures of the contribution of a component towards a system's reliability; these measures include structural importance, marginal reliability importance, joint reliability importance, and link importance. We review these different measures and examine how they relate to each other. We show that some structural results previously derived for independent components continue to hold for associated components. We then propose extensions to each importance measure to cover components which have two failure modes rather than the conventional one, and show that these extensions largely retain the properties of the original definitions. Chapter 2 examines a system containing one machine subject to random failure, for which we wish to determine a maintenance policy and a spare ordering policy. We consider the solvability and desirability of jointly optimizing these two traditionally separate policies. We discuss why the general form of the problem is intractable and show how it might be approached using bounds and heuristic policies; more importantly, we examine several special cases which are more tractable. Particular attention is paid to systems which contain only one spare at a time; we show that these systems have some convexity properties which facilitate cost minimization. We subsequently consider systems which allow more than one spare, but make use of other restrictions or assumptions to simplify analysis. Chapter 3 examines how to coordinate maintenance for machines sharing repair facilities, a task which is commonly called the machine repairman problem. Existing studies of this problem make the restrictive assumption that repairs are performed only after machine failure; our work extends the machine repairman model by allowing  Ill  preventive repair. We derive analytical results for a single critical age repair policy, and then use numerical studies to investigate other heuristic policies which can improve the system's cost effectiveness.  IV  Table of Contents Abstract Table of Contents List of Tables List of Figures Acknowledgements Foreword  " iv vi vi vii vii  Introduction  1  Terminology  4  Importance of Components  7  1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09  7 7 9 10 24 26 28 37 37  Chapter Abstract Introduction Notation and Assumptions Measures of Importance and Their Relationship Numerical Example: The Sign of JRI Systems of Dual Failure Mode Components Importance of Components for DFM Models Future Research Concluding Remarks  Maintenance and Inventory for a Single Machine 2.01 Chapter Abstract 2.02 Introduction 2.03 Model Description & Notation 2.04 Single Spare with Basic Structure 2.05 Early Ordering with Replacement on Receipt 2.06 Single Spare With Returns 2.07 Single Spare Extended Cost Structure 2.08 Scheduled versus Unscheduled Orders 2.09 Single Spare, Random Lead Time 2.10 Single Spare, Two Random Lead Times 2.11 Multiple Spares without Stock-outs 2.12 The General Problem 2.13 Future Research 2.14 Concluding Remarks  39 39 39 43 45 63 64 70 71 75 78 83 87 88 89  The Machine Repairman Problem with Preventive Maintenance 3.01 Chapter Abstract 3.02 Introduction 3.03 Basic Model and Notation 3.04 The Simple Single Age Repair Policy  90 90 90 92 94  V  3.05 3.06 3.07 3.08 3.09 3.10 3.11 3.12  The Joint Single Age Repair Policy Better Heuristic Policies Service Constraints Discrete Time Model Performance Study Simulation Study Future Research Concluding Remarks  96 99 104 106 1° U3 116 11 8  7  Bibliography  11^  Appendix A  124  Appendix B  127  Appendix C  147  Appendix D  1^8  vi List of Tables Table 1-1: Calculated JRI with Cov{x2,x5} = 0.00 Table 1-2: Calculated JRI with Cov{x2,x5} = +0.02 Table 1-3: Calculated JRI with Cov{x2,x5} - -0.02  25 25 25  Table 2-1: Example Solutions Table 2-2: Results Summary  60 61  Table Table Table Table Table Table  110 110 Ill Ill 115 115  3-1: 3-2: 3-3: 3-4: 3-6: 3-7:  Two machines, low failure variance (24 cases) Three-machines, low failure variance (12 cases) Two machines, high failure variance (24 cases) Three machines, high failure variance (12 cases) Mean Improvement by Number of Machines Mean Improvement by Other Factor  List of Figures Figure Figure Figure Figure Figure  1-1: 1-2: 1-3: 1-4: 1-5:  Structure of three component system Minimum Path Representation of an Arbitrary System Components Re-Ordered New Modules Formed Structure of Five Component System  10 20 20 21 24  Figure Figure Figure Figure  2-1: 2-2: 2-3: 2-4:  Timing of Events Possible locations of K K T solution points Timing of Events with Random Lead Times Relations Amongst Single Spare Models  46 53 76 82  Figure Figure Figure Figure Figure Figure Figure  3-1: 3-2: 3-3: 3-4: 3-5: 3-6: 3-7:  A C Q N with four machines and two repair shops Single Age Policy Double Age Policy W U A Policy Average Age Policy Constant Cost Policy Typical Optimal Policies  94 100 102 102 104 104 Ill  Figure Figure Figure Figure Figure  B-l: B-2: B-3: B-4: B-5:  Interior K K T Point Diagonal Boundary K K T Points Interior K K T Point Again Interior and Diagonal K K T Points Lower Boundary and Diagonal K K T Points  130 134 136 138 138  Vll  Acknowledgments I wish to thank my thesis supervisor Derek Atkins for the encouragement and patience he showed me during my stay here at UBC; working with him helped to make the long and often frustrating task of writing a thesis a much more agreeable experience. Thanks are also due to Hong Chen, Bertrand Clarke, Moren Levesque, Ernie Love, Tom McCormick, Maurice Queyranne, Marty Puterman, and Daning Sun for their collective help over the past four years. During this research the author received funding from the Department of National Defence and from Natural Science and Engineering Research Council grant OGP 0004743. Foreword Some of the material contained in this thesis has already been accepted for journal publication. The following describes the publication status at the time of this writing : (a) a paper [Ar95] containing an earlier version of the work in section 1.04 on joint reliability importance was published in IEEE Transactions on Reliability in 1995; and, (b) a paper [AA96] containing an earlier version of the work in section 2.04 was published in HE Transactions in 1996.  Note to the Reader: The paper [AA96] was coauthored by myself and Michael Armstrong. Michael was listed as thefirstauthor, and he was in fact the principal author who derived most of the results and wrote most of the text. My contribution was in the role of advisor, in that I assisted him with some of the more difficult parts of some proofs and helped him to edit his work into an appropriate style for our field of research. I therefore think it appropriate that this work be included in his thesis. Derek Atkins Professor of Management Science  i  Introduction In this thesis we consider several problems in the field of reliability and maintenance. Much of the classic work in this area analyzes a single component or machine in isolation; this simplifies reliability analysis and maintenance optimization, but obviously does not reflect the reality of many applications in which a machine is part of and interacts with a larger system. A feature common to all of our work herein is an explicit consideration of system interdependence, whether it be amongst different components within a system, amongst successive stages in a logistics chain, or amongst several machines sharing common maintenance facilities. Chapter 1 considers how to evaluate the contribution of individual components to the functioning of the overall system. There currently exist several measures of the importance of a component within a system in determining the system's reliability; these measures include structural importance, marginal reliability importance, joint reliability importance, and link importance. We review these different measures and examine how they relate to each other. We show that structural results previously obtained for joint reliability importance of independent components continue to hold for associated components. We then propose extensions to each importance measure to cover reliability models in which the components have two failure modes rather than the conventional one, and show that these extensions have properties similar to those in the single failure mode model. The chapter concludes with some numerical examples which illustrate our results. Our main contributions in this first chapter are as follows: (a)  we show that structural results previously obtained for joint reliability importance  continue to hold when component failures are not independent of each other; (b)  we extend the various measures of importance to dual failure mode systems; and,  (c)  we show how the various measures of importance relate to each other. Chapter 2 examines the interaction between the maintenance and inventory  functions of a system. We consider a system containing one machine subject to random  2  failure, for which we wish to determine an age replacement maintenance policy and a spare ordering inventory policy. Our goal is to evaluate the solvability and desirability of jointly optimizing these two traditionally separate policies. We discuss why the general form of the problem is largely intractable and then proceed to examine several special cases which are somewhat easier to analyze. Special attention is paid to systems which are only allowed to contain a single spare; we show that for these systems our problem has some convexity properties which make it amenable to cost minimization. We subsequently consider systems which allow more than one spare, but make use of other restrictions or assumptions to somewhat simplify the analysis. For the general form of the problem considered, we are only able to suggest some heuristic solution procedures; exact optimality appears to be intractable, at least using our style of analysis. When we assume that we are to use a single age maintenance policy and can either borrow a spare or instantly expedite one, then we are able to derive the cost function and specify some of its properties. When we restrict our inventory policy to having a single spare in stock or on order at any point in time, then we are able to specify the optimal policy form, derive the cost function, and show how to determine some of the optimal values of the decision variables for this function. When we further restrict the lead time to be deterministic, we are able to determine the full set of optimal decision variables. We find that the objective function is pseudo-convex in each decision variable and any economically viable Karush-Kuhn-Tucker point is a global minimum. A numerical study shows that joint optimization of maintenance and inventory decisions can be worthwhile in comparison to traditional separate optimization. The cost difference between joint optimal and separate optimal solutions can be arbitrarily large in extreme cases, while for the numerical data we consider, joint optimization yields an average improvement of 3% over the best form of separate optimization with one tenth of our cases having a difference greater than 10%. We also note the effects of allowing spares to be returned unused by an  3  improving machine, of requiring that spares be used upon receipt, and of allowing the lead time to depend upon the order type (i.e. scheduled or unscheduled). Our main contributions in this second chapter are as follows: (a)  we enhance the existing single-spare optimization literature by introducing the  notions of breakage costs, return of spares, service constraints, and double age policies; (b)  we combine and generalize previous work in single-spare optimization with  regards to such analytical properties as pseudo-convexity of the cost function; and, (c)  we illustrate the importance and difficulties of joint optimization of maintenance  and inventory. Chapter 3 examines the maintenance of machines sharing common repair facilities, a situation often referred to as the machine repairman problem. Although this has been a subject of research for quite some time, it seems that all existing studies have made the reasonable but restrictive assumption that repairs are made only after machine failure, i.e. preventive maintenance is ignored. Our work extends the machine repairman model by allowing for preventive repair under the assumption that the machines have an increasing hazard function. We derive analytical results for the case where the form of the repair policy is restricted to a single critical age, and then investigate other heuristic policies which can improve the system's cost effectiveness. Numerical studies are then used to compare the performance of our assorted policies for small systems. Our work shows that for the type of system we consider, single age policies are quite tractable to analyze and optimize; but their performance is sub-optimal and can result in meaningful inefficiencies. We can improve on this situation using heuristic policies; we find that these tend to be most worthwhile in systems with more machines, low shortage costs, low failure variance, and a high repair rate. Our conclusion from this chapter is that although we are unable to describe the truly optimal policy structure, the heuristics we examine seem to perform well enough to make consideration of more complicated policies unnecessary.  4  Our main contributions in this third chapter are as follows: (a)  we introduce the use of preventive repair into the study of the machine repairman  problem; (b)  we derive some basic properties of the traditional single critical age repair policy as  used in this context; and, (c)  we introduce and evaluate several heuristics that improve upon the performance of  single age policies. Terminology In discussing our work we will make extensive use of terminology from the reliability literature and to a lesser extent from the inventory literature. In this section we briefly introduce many of these terms; for more detailed explanations, refer to [BP75], [SS90], and [Co87] for reliability terminology and [GRZ93] for inventory terminology. For surveys of the existing maintenance literature, see [CP91], [VF89], and [SA85]; for the inventory literature [GRZ93] provides a broad survey. A conventional or binary-state machine (or component, module, system, etc.) has two possible states: working (or good) and failed (or bad). The reliability of a machine is the probability that it is working. A system of n components is often represented by a probabilistic network or graph, in which each edge of the graph corresponds to a component. The probability of being able to trace a path of working components (also called a path-set) through the graph amongst a set of k nodes is the k-terminal reliability of the system; in particular, if k = n then we talk of the all terminal reliability, while if we are only interested in paths between two particular nodes (the source and sink) so that k = 2 we speak of the s-t reliability of the system. A path which contains no extra edges is referred to as a minimum path. A cut in the graph (i.e. a set of failed components, also called a cut-set) that prevents the required paths from being traced results in system failure; if this cut contains no extra edges, then it is a minimum cut.  5  A module is any convenient self-contained group of components (or sub-graph) that can be represented by (condensed into) a single edge of the graph. Components are said to be in parallel if any one working is sufficient to allow a system (or module) to work, while they are in series if all are required to be working for the system to be working. The usage of the term parallel-series system seems to vary from paper to paper. In this thesis we use it to mean a system which can be sequentially decomposed into modules, sub-modules, and so on, so that as each module is decomposed the result is a set of sub-modules all either in parallel or in series with each other. This definition is more general than that of [BP75] but less than that of [Co87]. A component is irrelevant if its functioning or failure never has any effect on system operation; otherwise it is relevant. A binary-state system is coherent if all of its components are relevant and system reliability is a non-decreasing function of component reliability. The reliability function expresses the reliability of a system in terms of the reliability of its components, assuming that the behaviour of the components is statistically independent; it is a «-th order polynomial. The structure function expresses the state of the system in terms of the states of the components, regardless of any statistical dependence. The lifetime of a stochastically failing component can be described by a (cumulative) distribution function and a corresponding density function (which we assume exists). The survival function is equal to one minus the distribution function. The hazard rate or failure rate of a component is the ratio of the density function to the survival function, and can conveniently be thought of as the instantaneous conditional probability of component failure as it ages. If the failure rate is (strictly) monotonically increasing, we say that the component has a (strictly) increasing failure rate or is (strictly) IFR. Similarly, if the failure rate is monotonically decreasing the component is said to be DFR, while if the failure rate is constant then the component is CFR. Corrective maintenance is performed on a component after it has already failed, while preventive maintenance occurs prior to failure. If a repair fully restores a component to its original condition (i.e.  6  makes it good-as-new) then we describe that as perfect repair. Conversely, minimal repair only returns a component to its condition just before failure (i.e. makes it bad-asold). Inventory systems can be classified according to how often they review stock levels and make ordering decisions. A system is said to use continuous review if stock levels are reviewed constantly so that orders may be placed at any time; a system which performs the review and ordering functions only at fixed intervals (i.e. periodically) is said to use periodic review. A n (5-1,5) inventory policy is a continuous review policy which places an order for one new unit whenever it receives a demand for one, so as to always have a total of s units in stock or on order; this is also called a one-for-one ordering policy.  7  Chapter 1 Importance of Components 1.01 Chapter Abstract Each component in a system plays a role in determining the reliability of that system; some components are more important in this regard than are others. The existing measures of this importance include structural importance, marginal reliability importance, joint reliability importance, and link importance. In this chapter we discuss these measures and examine how they relate to each other. We show that structural results previously derived for joint reliability importance of independent components continue to hold for associated components. We then propose extensions to each importance measure to cover reliability models in which the components have two failure modes instead of the conventional one, and show that these extensions have properties similar to those of the original versions. Throughout our discussion we present some numerical examples which illustrate our results. 1.02 Introduction Consider a system, such as the radar system in an aircraft, which is composed of stochastically failing components. There are many questions that one could ask about such a system during its design, evaluation, and operation. How reliable is it? Why does it fail? If we make a particular change to one of the components, how does that affect the system as a whole? Given limited resources (e.g. a fixed engineering budget), how can we best improve the system's reliability (i.e. which components should be upgraded first)? In answering such questions, the concept of the importance of components naturally arises. For example, if in a given system the failure of one component is more likely to lead to the system's failure than is the failure of another component, then we would intuitively consider thefirstcomponent to be more important in some sense than the second. Similarly, more complicated questions of cost effectiveness and design priorities hinge on having some measure of each component's contribution or worth to the  8  system's reliable performance. For example, an aircraft's radar system has many components, each of which could potentially be made more reliable through redesign efforts, etc. The decision as to which of these components should receive priority for engineering improvements clearly would be made easier by having some method of ranking the components with respect to their contribution to system reliability. To address such questions we have at our disposal several notions of importance that have been previously defined by other authors. Two of these, structural importance and reliability importance, have been in use for some time and are well described by Barlow & Proschan in [BP75]. More recently, Page & Perry [PP94] defined a measure they called link importance, and Hong & Lie [HL93] introduced the concept ofjoint  reliability importance. There are several closely related topics which are not addressed herein. We do not consider methods of comparing the reliability of entire networks, as was done by [AB84], nor do we address the question of determining the optimal configuration of components within a network, as in [BFJP63] and [PM94]. For the reliability measures we do discuss, we make only passing reference to the computational issues involved in actually calculating their values, though we acknowledge that this is in itself an important issue; see [Co87] and [DR86] for discussions on this topic. We begin our paper by reviewing the original definitions of these measures and examining how they relate to each other. We then show that structural results previously derived for joint reliability importance of independent components continue to hold for associated components. We propose extensions to all of these measures to cover components which have two possible failure modes rather than the conventional one, and show that the extended measures largely retain their original properties. We conclude with some numerical examples.  9  1.03 Notation and Assumptions We consider a system of n components. The state of each component / = l,...,n is represented by indicator variable x ; for components with one failure mode, t  e {1,0},  where 1 indicates that the component is working, and 0 indicates that it is failed. We use x to represent the vector of all component indicator variables, so x = (xj,...,x ). The n  probability that component i is working isp = Pr[x, = 1], while the probability that t  component / is failed is q •, = 1 - p . l  t  The vector of all component reliabilities is/? =  (Ph—'Pn)- S(x) is the structure function for the system: S(x) = 1 if the system is working, 0 otherwise. R(p) is the reliability function for the system: R(p) = ?T[S(X)=1] = E[S(x)], so it gives the probability that the system is working. Sometimes we wish to evaluate one of these functions conditional on one or more component states being known; thus we write for example, S(lj, x) to indicate the structure function for a system in which XJ = 1 (i.e. component / always works). Similarly we use R(0j, x) = ~Pr[S(x)=\ | Xj =0] (i.e. it is a conditional probability), or we can condition on the states of two variables as in R(0j, lj, x) = Vr[S(x)=\ | XJ =0, XJ =1]. We will introduce other notation later as required for our discussion. We assume that we have a coherent system which can be modeled as a probabilistic network. For convenience only two-terminal (or s-t) reliability is discussed, but our results can also be shown to apply to all-terminal and ^-terminal reliability using similar proofs with appropriately revised notation. The discussion is phrased in terms of a system and its components, though one could equally well speak of a graph and its edges or a set and its elements. We initially assume that our components have the conventional single failure mode, though later in the chapter this assumption will be relaxed. In the course of our discussion we will refer to a small example system for which we calculate each of the various importance measures discussed. Figure 1-1 shows the structure of the system. For convenience in calculation it is assumed that all three  10  components shown are independent and identical, with a probability of working p = 0.8 and probability of being failed q = 0.2. Figure 1-1:  Structure of three component system (b) Graph version  (a) Component version  t  The structure function for this system is X2 + x^ X3 - x-[~X2 X3  S(x) =  and the reliability function is  R(P) = P\Pi +P\P3 -P1P2P3 = P -P 2  2  3  where the latter simplification is due to all components being identical. Note that by symmetry, components 2 and 3 must have the same importance value and ordering for all measures. The reliability of the system for the given component reliabilities is R(p) = 2(0.8) - (0.8) = 0.768 2  3  In other words, at any given time there is about a 77% chance that the system is functional. 1.04 Measures of Importance and Their Relationship Structural importance (SI) as defined in chapter 1 of [BP75] is one of the simplest measures of the importance of a component, as it ignores any consideration of the individual reliability of components and instead concentrates on the topological structure of the system. Definition of SI from [BP75]: SI/ = 2 ("- ) X [ S(lj,x) - S(0j,x) ] where the summation is over all x such that Xj = 1. _  1  11  A system of n components has 2" different state condition combinations, of which half have Xf= 1; hence the 2"(" ) term in the definition. _1  Intuitively, SI, is the proportion of possible states of the rest of the system in which the functioning of component /' makes the difference between system failure and system functioning. Because SI ignores component reliabilities, its value is easier to calculate than other measures, and it can be used even if these reliabilities are unknown or subject to change e.g. during the system design process. On the other hand, it can not distinguish between components which occupy similar structural positions but have drastically different reliabilities. In the example system of Figure 1-1, SIj = 1/4 [(0-0) + (1-0) + (1-0) + (1-0)] = 0.75; a similar computation gives S I = [0.25, so that SIj > S I and S I = SI . 2  2  2  3  The other measure of importance described in [BP75] is the reliability importance of a component (also known as marginal reliability importance or MRI). Definition of MRI from [BP75]: MRI/  = E[S(l/,x)-S(0/,x)] = R(lj,p) - R(0j,p) = d/dpj R(p) for independent components.  Note that the definition uses conditional expectations; i.e. E[ S(lj,x) ] is equivalent to E[S(X)\XJ=1].  M R I is essentially the rate at which the system reliability changes with respect to changes in the reliability of a given component. An improvement in the reliability of the component with the highest MRI value causes the greatest increase in reliability of the overall system. This information can be used to determine which components should be improved first in order to make the largest improvement in the system. In certain cases, the structure of the system will indicate which components have the highest MRI; examples from [BP75] include pure parallel and pure series systems, while more complicated examples include the k-out-of-r-of-n linear systems described in [PS91]. For our example system, M R I = [(0.8+0.8-0.64) - (0+0-0)] = 0.96 and similarly M R I = 0.16, X  2  so M R I t > M R I and M R I = M R I . 2  2  3  12  M R I gives a more informative measure of the importance of a component than does SI, as M R I considers both the topology of the system and the reliability of the components in its determination. As noted in [BP75], SI; = MRI, if the components are independent with reliabilities all equal to 1/2. This means that under these conditions both measures give the same relative comparisons for any two components; i.e. when all components are independent with/?, = p = 0.5 for all /' we will have: (a) SI, > Sly o  MRI, > M R I , ;  (b) SI, = Sly <=> MRI, = M R L . If the components are not independent or if the reliabilities are not identically one half, then these two measures need not agree; we could have SI,- > Sly but MRI, < M R L . In general M R I provides more information than does SI, but of course it requires more information to calculate as well. The link importance (LI) is a measure defined fairly recently by Page & Perry [PP94]. Whereas SI and M R I both provide numerical values for each component and hence determine a complete ordering (possibly including ties) of the components' importance, L I does not compute a value but provides only a partial ordering. There are actually two definitions in [PP94] for LI; we repeat them below, referring to the first one as LI proper and to the second one aspathset-cutset importance (PCI). Definition of LI from [PP94]:  LI, > L L (read, "component /' is at least as important  in terms of LI as component j ") if both of the following are true for any assignment of probabilities such that the reliabilities of all the components are independent and equal:  (&)R(Pj,p)ZR(O p); h  (b)R(\j,p)<R(l ). hP  If both LI; > LI, and LI, < L L , then the two components are said to be equivalent in L I and so we write LI; « L L . The intuitive notion behind L I is that one component is more important than another if that first component makes the greater difference to system reliability both by its  13  functioning and by its failure. Note that the definition effectively assumes that all components are identical, and gives an ordering only if both inequalities hold true for any common reliability; M R I on the other hand is calculated using a specific set of reliabilities, which need not be identical. For the example system,  > L I and L I * L I . 2  2  3  What we refer to as PCI is defined only in terms of the network topology and ignores component reliabilities; in this way it is similar to SI. The idea behind PCI is to examine the size of the path sets and cut sets that two components are part of; the component which belongs to the smaller sets will tend to be more important to system operation than the one which belongs to larger sets because a component's state is relatively less influential in a larger set. Definition of PCI from [PP94]:  PCI, > PCI, (read, "component z is at least as  important in terms of PCI as component j ") if all four of the following are true: (a) { the number of components in a minimal size path set when x, = 1 } < { the number of components in a minimal size path set when Xj• = 1 }. If equality holds, then also require that {the number of minimal size path sets when x,- = 1 } > {the number of minimal size path sets when Xj=l }; (b) {the number of components in a minimal size cut set when JC, = 1 } > {the number of components in a minimal size cut set when Xj= I }. If equality holds, then also require that {the number of minimal size cut sets when x, = 1 } < { the number of minimal size cut sets when Xj = 1 }; (c) {the number of components in a minimal size path set when x •, = 0 } > {the number l  of components in a minimal size path set when Xj = 0 }. If equality holds, then also require that {the number of minimal size path sets when x, = 0 } < {the number of minimal size path sets when Xj = 0 }; (d) {the number of components in a minimal size cut set when x, = 0 } < {the number of components in a minimal size cut set when x. = 0 }. If equality holds, then also require  14  that {the number of minimal size cut sets when x, = 0 } > { the number of minimal size cut sets when x, = 0 }. If both PCI, > PCI, and PCI, < PCI,, then PCI, * PCI,. A minimal size path set is simply the path set which has the fewest components in it. In our example, PCIj > P C I and 2  PCI * PCI . 2  3  The previous definition for LI was intended for systems of independent components; we can remove this requirement and revise the definition in terms of expectations as follows. Definition 1-1:  LI, > LI, (read, "component / is at least as important in terms of L I  as component j ") if both of the following are true for any assignment of probabilities such that the marginal probabilities for each component's functioning are all equal:  ( ) E[S(Qj,x)]>E[S(O x)]; a  (b)  h  E[S(lj,x)]<E[S(l x)]. h  If both LI, > L L and LI, < L L , then LI, « L L . Both LI and PCI are limited by being partial orderings, in that there are components that cannot be categorized by them; e.g. we could have two components such that neither LI, > L L nor LI, < L L nor LI, « L L . As illustrated by the examples in [PP94], LI is a stricter or more demanding test of relative importance than is PCI; that is, the use of LI results in fewer equivalencies or "ties" than PCI, but is more likely to be unable to express an ordering at all. It is therefore difficult to say which would be most useful in general; [PP94] does offer the following observation. Observation from [PP94]: L I implies the following about PCI: (a) LI, > LI,  PCI, > PCI,;  (b) L I , ~ L L  PCI,«PCL.  Page & Perry do not discuss how LI relates to SI or MRI, so we do that here. By comparing the respective definitions, we can see that the orderings determined by LI will be duplicated by SI, and also by MRI if the components all have identical reliabilities.  15  Proposition 1-1:  L I implies the following about SI and MRI:  (a) L I , > L L  => SI, > SI,;  (b) L I , « L L  => SI, = Sly ;  (c) LI, > Lly => MRI, > MRIy when all components are identical; (d) LI, « Lly => MRI, = MRIy when all components are identical. Proof: Consider part (c) first. LI, > Lly by definition means that both E[ S(0j, x) ] > E[ S(O x) ] and E[ S(lj, x) ] < E[ S(lj,x) ]. From this it follows that h  E[ S(l x) ] + E[ S(0j, x)]>E[ S(lj, x)] + E[ S(O x) ] h  h  E[ S(l x) ] - E[ S(0 x)]>E[ S(lj, x)]-E[ S(0j,x) ] h  h  MRI, > MRIy . For part (d), LI, « LI, means that both: LI, > Lly , hence MRI, > MRIy; and also LI, < Lly, hence MRI, < MRIy . For both M R I inequalities to be true, we must have MRI, = MRIy. For parts (a) & (b), set p = 1/2 and q = 1/2 to have M R I - SI; then (a) follows from (c) and (b) follows from (d). QED We conjecture that PCI has a similar relationship with SI but not with MRI, i.e. that PCI implies the following about SI: (a) PCI, > P C L ^> SI, > S I ; y  (b) PCI,«PCIy ^> SI, = Sly. We have been unable to prove this conjecture, however. Aside from the observations above, the various importance measures need not agree on the ordering they provide. In terms of being able to discriminate amongst the importance of different components, we can rank these measures roughly in order of decreasing informativeness but increasing ease of calculation as follows: MRI, LI, PCI, SI. In actual application, we might expect to see SI, PCI or LI to be used during the initial  16  design of a new system, when component reliabilities are as yet unknown (these measures do not use component reliability in their computation). M R I would be most appropriate for detailed analysis of an existing system, when presumably information about component reliabilities and covariances is available. The last measure of importance we discuss is joint reliability importance (JRI) which was introduced by Hong & Lie in [HL93]. JRI is quite different from previous measures: whereas SI, LI, PCI, and M R I all evaluate the effect of one component on the system reliability, JRI evaluates the effect of one component on another component's importance. This makes JRI potentially more useful for making the kind of trade-off decisions that commonly arise during system design, since it gives some indication of how components interact in determining the system's reliability. Intuitively, JRIy is the rate of increase in the importance of component /' as we increase the reliability of component j (and vice versa). Definition from [HL93]: JRIy  = R(\  h  For independent components:  lj,p) + R(0j,0j,p) - R(Pi, ljj>) - R(lj,0j,p) = dVdpjdpj Rip).  This definition of JRI and the structural results derived in [HL93] are restricted to the case where component failures are assumed to be independent of each other. Unfortunately, in many real world systems the component failures are not independent. For example, all of the components in an aircraft radar system would presumably be subjected to many of the same stresses: the vibration from the engines, the shock of landing, the irregularities in the power supplied, etc. It is not unreasonable to therefore expect that there may be positive correlations amongst the components (i.e. if one circuit has failed due to a power surge, another is likely to have failed too). Less commonly, the independence assumption may be violated by the presence of negative correlations. The following definition extends the concept of JRI to more general systems where the components need not be independent.  17  Definition 1-2:  JRIy  = E[ Sfl lj,x) + S(0j,0j,x) - S(0j, \j,x) -S(\j,0j,x) ] p  When components are independent this simplifies to the original definition, since then E[5fxj] = R(p), and so for example E[ S(\j, 1px) ] = R(\j, 1px). We can interpret this definition as follows. If we could guarantee that two components in a system would always work, we would expect about twice the improvement in system reliability that we would get from a guarantee for only one of the components. If this is actually the case, then JRI is zero. If there is more improvement ("synergy") then JRI is positive. If there is less improvement ("diminishing returns") then JRI is negative. For the example system, JRI  1 2  = [1 + 0 - 0 - 0.8] = 0.2, which confirms that the reliability of component 2 becomes  more critical when component 1 is improved (and vice versa); conversely, JRI23 = [0.8 + 0 - 0.8 - 0.8] = -0.8, which shows that component 2 becomes less critical when component 3 is improved (and vice versa). The next lemma shows that JRI can be represented in terms of M R I in a modified system in which one component is in effect guaranteed either to be working or to be failed. Lemma 1-1: (a) JRIy = M R L O ^ - M R L ^ ) (b) JRIy = M R I / l y ) - MRI/0/) where we use MRI^ly) to indicate that MRI, is to be calculated while fixing xj = 1, etc. Proof: MRL(1/) - MRI (0 ) ;  /  = E{S(l ,lpx)-S(0 ,l ,x) i  i  j  }-E{S(l ,0 ,x)-S(0 ,0 ,x) i  j  i  j  }  = E{S(Jj,Jj,x) + S(0 0px) - S(0j,lj,x) - S(lj,0 x)} it  jt  = JRI . y  This proves part (a). The choice of /' and j are arbitrary, so the proof of part (b) is identical.  18  QED This lemma suggests that the JRI of two components describes the relative importance of one component when the other is functioning. In particular: (a) a positive JRI indicates that one component becomes more important when the other is functioning (the "synergy" case), as with components 1 and 2 in our example system; (b) a negative JRI indicates that one component becomes less important when the other is functioning ("diminishing returns"), as with components 2 and 3 in the example; and, (c) a zero JRI indicates that one component's importance is unchanged by the functioning of the other. The following shows that the range of JRI is limited and that it is possible to find the sign of JRI without calculation in two special cases. Lemma 1-2: JRI is bounded as follows: (a) for any two components /' and j, JRI e [-1,1]; y  (b) if two components /' and j are in parallel, then JRIy <0; (c) if two components /' and j are in series, the JRIy > 0. Proof: Part (a). The structure function S(x) only assumes values of 0 or 1, thus  E{S(x)}z[0, 1] The system is coherent, thus E{S(lj,x) }>E{S(0j,x) } Now MRIy = E{S(lj,x) - S(0j,x) } e [0, 1] So JRIy = MRIj-Cly) - MRI,{0j) e [-1, 1]. Part (b). If /' and j are in parallel and j is working (i.e. Xj = 1), then /' is irrelevant and MRI,-(Ly) = 0. Using this with Lemma 1-1 gives: J R I = MRLXly) - MRI,-(0y) = 0 - MRL/Oj) < 0 y  Part (c). If /" and j are in series and j is failed (i.e. xj = 0), then /' is irrelevant and MRIjCOy')^- Using this with Lemma 1-1 gives: JRI = MRIj(li) - MRLXOv) = MRIj(ly) - 0 > 0 ;  19  QED More generally, there are four possible situations for the structural relationship of two components /' and j: (i) i and j appear together on at least one minimum path, but not on any minimum cut; (ii) / and j appear together on at least one minimum cut, but not on any minimum path; (iii) /' and j appear together on at least one minimum path and on at least one minimum cut; or, (iv) / and j do not appear together on any minimum path nor on any minimum cut. The last case (iv) is not of interest here, because it implies that at least one component is irrelevant and hence the system is not coherent. A special case of (i) is where components are in series, while a special case of (ii) is where components are in parallel; the previous lemma gives the sign of JRI for these two special cases. The following theorems show that the sign of JRI can be readily determined more generally for both cases (i) and (ii). Theorem 1-1: The following two conditions are equivalent: (a) there is no minimum path set containing both /' and j; (b) JRIy < 0 for all possible assignments of probabilities. Proof: (We have two different ways of proving Theorems 1-1 and 1-2, so we will show one type of proof here for Theorem 1-1 and the other type subsequently for Theorem 1-2. Since the theorems are in a sense duals of each other, the proofs can easily be adapted from one theorem to the other by interchanging paths for cuts, etc.) Assume (a) holds. Begin with an arbitrary system, and choose any two components i, j within that system. Create the minimum path representation of the system. This will be a collection of paths; the paths are in parallel with each other, and each path consists of one or more components in series. Some of these paths contain component /', some contain component j, and some contain neither; by hypothesis, no path contains both components. Any component which appears on multiple paths can be  20  thought of as multiple components (compj, comp ,—) with identical reliabilities 2  Pcom 2 =•••)  a  P  n  d  perfect correlations (Cor{x  ,  compl  x  }  comp2  (p  i  comp  = 1). See Figure 1-2 for a  pictorial example. Figure 1-2: Minimum Path Representation of an Arbitrary System  HD-0-#-0-|  -o-#-o-o— -0-0-0-#—  -o-o-o-oL-O-O-O—  Q  other  Since each path is a simple series, the order of components does not affect the value of the structure function; so arbitrarily re-arrange the components so that / and j are the last components on any path they appear on. See Figure 1-3. Figure 1-3: Components Re-Ordered  r O O O  -0-0-0-%— -0-0-0-#—  L-O-O-OCombine components into arbitrary modules as follows: (i) all occurrences of /' are recombined into a single /' component; (ii) all occurrences ofj are recombined into a single j component; (iii) all other components on paths containing /' are combined to form module 1;  21  (iv) all other components on paths containing7 are combined to form module 2; and, (v) all other components on paths containing neither /' nor j are combined to form a third module. See Figure 1-4 for the resulting structure. Figure 1-4: New Modules Formed  1 y  2 3  The structure function of this system of components and modules has the form S(x Xj,Y) = YJXJ + Y xj + Y - Y x Y Xj - Y Y it  2  3  1  i  2  m  3  -Y XjY + Y x Y x Y 2  3  1  i  2  j  2  where 7/ is the indicator random variable for the functioning of module /'. So now the expression for JRI is: JRIy = E{S(l lj, Y) + S(0j, Oj, Y) - S(l Oj, Y) - S(0 lp) } h  it  h  = E{(Yj + Y + Y - YjY - YjY - Y Y + YfaYj) + (Yj) 2  3  2  3  2  3  - (Yj + Y - Yj Y3)-<Y2 + Y - Y Y^ } 3  = E{YjY Y 2  3  - YjY2} =  3  2  E{YjY Y }-E{YjY }<0 2  3  2  This shows that (a) => (b) holds for any pair of components, regardless of any covariances (or for any assignment of probabilities in general). Get (b) => (a) by reversing the order of the steps. QED Theorem 1-2: The following two conditions are equivalent: (a) there is no minimum cut set containing both /' and j; (b) TRIjj > 0 for all possible assignments of probabilities.  22  Proof: The proof of this theorem makes use of the properties of supermodularity, as described in [Lo83]. A supermodular set function MQ is one which satisfies the following inequality for any sets C and D: M(C uD)+ M(C n D) >M(C) + M(D). Let A], A ,  A denote k random events. For every subset C = {i(l),  2  k  of the index set {1, ...Aj( j}.  i(c) }  k), define the joint probability functionMfQ = Pr{A^jj nA^ j 2  n  ThenM(C) is a supermodular function.  c  First show that (a) => (b). Define the following random events. Aj  the event that all minimum cut sets containing component z are functional  A2  the event that all minimum cut sets containing component j are functional  A  the event that all minimum cut sets containing neither /' nor j are functional  3  By hypothesis, there is no minimum cut set containing both components z and j. Also define C ={ 1,3 } and D = {2,3}. By definition, J R I is y  = E{S(li,lj,x) + S(0 0j,x)-S(0i,lj,x) -S(lj,0j,x)} it  = Pr{S(l lj,x)=l} + Pr{S(0i, Oj,x)=l} - Pr{S(0j, l x)=1} - Pr{S(lj, Oj,x)=1} it  jt  = Pr{A } + Pr{Aj nA 3  2  nA )-Pr{Aj 3  nA )-Pr{A 3  nA }  2  3  = M(C nD)+ M(C uD) - M(C) - M(D) > 0 where the inequality follows from the supermodularity of the joint probability function. Next show that not-(a) => not-(b). Suppose there exists a minimum cut set C containing both i and j. Since C is minimal, there exists a path set V (respectively, Vj) with V nC = {i} (respectively, Vj n t  t  C = {j}). If we define the component reliabilities  according to p^ = 0 Vk e C\{i,  and Pk = I Vk g C, then we get J R I as follows: y  = E{ S(l \ x) h  h  + S(0j,0j,x) - S(0j,lj,x) -S(lj,0j,x) }  j)  23  = P r { S ( l , l x ) = l } +J>r{S(0j,0j,x)=\} -Pr{5(0 l -x)=l} - Pr{S(l;, O ^ l } /  >  /;  y  = 1 + 0 - 1 - 1 = -1 <0 QED We note that the qualitative analysis of networks done by Granot & Veinott in another context in [GV85] produces what seems to be related results regarding the sign of qualitative changes. Also note that the corresponding theorems in [HL93] showed that (a) => (b) assuming independence, whereas now we say (a) <=> (b) in general. The JRI values in our example system coincide with the theorem results: JRI 12 > 0 and JRI23 < 0. We can further show that JRI is strictly non-zero in many cases. Proposition 1-2:  Suppose that 0 < p  m  < 1 and -1 < Cox{x , xj < 1 for all m  components m,n, then: (a) under the conditions of Theorem 1-1 part (a), J R I < 0; y  (b) under the conditions of Theorem 1-2 part (a), J R I > 0; y  (c) in a system with a parallel-series structure, J R I * 0 for all pairs ij. y  Proof: Part (a):  Examine the last line of the proof for Theorem 1-1. Our hypothesis  permits us to state that E{ YjY Y }<E{ YjY } and so J R I < 0. 2  3  2  y  Part (b):  Follows the same type of reasoning as in part (a) but using Theorem 1-2.  Part (c):  Recall that in a series-parallel network, each module can be broken down  into sub-modules which are either all in series or all in parallel within that module. This can be repeated with sub-sub-modules, etc., until all modules have been broken down into components. Begin with the system as a single module linking s and t. Start breaking-down the modules until components / and j first appear in different sub-modules. If these two submodules are in series with each other, then /' and j cannot appear on the same minimum cut; apply part (b) above. If the sub-modules are in parallel, then /' and j cannot appear on  24  the same minimum path; apply part (a) above. Of course these relative locations of i and j are preserved in any subsequent decompositions. QED The above results describe the intuitive idea that in most real systems every component's importance is affected by every other component's reliability to some degree. The extra conditions are not very restrictive; in a sense, they only rule out "trivial" components (e.g. p  m  = 1 implies that a component is always working; in the unlikely event  that this were true in a real system, it would not be of interest to reliability analysis in the conventional sense). Note that even when none of these propositions hold, JRI is exactly zero only for particular combinations of component reliabilities (typically, at a combination ofPffi  values where the JRI is crossing over from positive to negative, or vice versa). There seems to be no direct relationship in general between JRI and measures of  importance other than MRI; e.g. J R I > 0 implies nothing about LI, and L L in general. y  1.05 Numerical Example: The Sign of JRI To illustrate the relationship between JRI's sign and a system's topology, in Tables 1-1, 1-2, and 1-3, we show selected JRI values for a system which has the same bridge structure as the first example in [HL93]; this structure is shown in Figure 1-5. Figure 1-5: Structure of Five Component System  25  The reliabilities of all the components are assumed to have the same value p , m  and all  covariance terms are zero except for Cov{x2,x^} which is specified above each table. Note that by symmetry, J R I i 2 = J R I 4 5 and J R I 1 3 = J R I 4 3 . Table 1-1: Calculated J R I with Cov{x ,x } = 0.00 2  .1 .972 -.018 .072  Pm  JRI,2  JRI JRI,,  14  .5 .500 -.250 .000  .3 .784 -.126 .084  5  .7 .216 -.294 -.084  .9 .028 -.162 -.072  .95 .007 -.090 -.043  .9 .028 -.126 -.056  .95 .007 -.052 -.025  .9 .028 -.198 -.088  .95 .007 -.128 -.061  Table 1-2: Calculated J R I with Cov{x ,x } = +0.02 2  .1 .972 -.014 .056  Pm  JRI,2  JRI  14  JRJ.,3  .5 .500 -.230 .000  .3' .784 -.114 .076  5  .7 .216 -.266 -.076  Table 1-3: Calculated J R I with Cov{x ,x } = -0.02 2  .1 .972 -.022 .088  Pm JRI12  JRI JRI  14  13  .5 .500 -.270 .000  .3 .784 -.138 .092  5  .7 .216 -.322 -.092  As an example of how these table entries are calculated, consider J R I i 2 when p  m  =  0.5. Then we have:  JRI  12  = E[S(\j,\j,x) + S(0i,0j,x) -S(0j,\j,x) -S(lj,0j,x) ] = (1) + ( 0.5 ) - ( 0.5 + 0.5 - 0.5 ) - ( 0.5 + 0.5 - 0.5 ) 2  2  2  3  2  2  3  = 0.500 Note that the inclusion of non-zero covariances changes the values calculated but not the sign of the values. We see that the signs of these values agree with those predicted by our results. Component pair (1,2) has no minimum cut in common, and so J R I i  2  is  positive. Component pair (1,4) has no minimum path in common, and so JRI14 is negative. Component pair (1,3) has both a minimum cut and a minimum path in common; our theorems do not indicate what the sign will be, and as the table shows J R I 1 3 can be  26  either positive or negative depending on the p values. The fact that J R I ^ can have either t  sign indicates that the relationship between components 1 and 3 depends on their reliabilities; in other words, the priorities for component upgrading are different if the current reliabilities are all quite low than if they are all very high. This example obviously is simple enough that the conclusions of the above paragraph could easily be obtained without recourse to our theorems. Imagine, however, a much more complicated system being analyzed to determine the effect of multiple failures on system performance. In this kind of study, it would be important for reliability engineers to understand not only the effect of individual component failures on the system reliability, but also the effect on the criticality of other components in the system. The engineers can use Theorems 1-1 and 1-2 to get an initial guide as to how the components interact, and later calculate the actual JRI and M R I values to understand more precisely the relationships. 1.06 Systems of Dual Failure Mode Components Our discussion so far has made the conventional assumption that a component has only a single failure mode and so can occupy one of two mutually exclusive states, i.e. working or failed; such components are often referred to as binary or binary-state components. In the remainder of this chapter, we consider models which allow for two failure modes and hence three mutually exclusive states: working, failed open, or failed short; we refer to these as dual failure mode (DFM) components. D F M models arise naturally in the study of flows through electrical circuits and pipelines, where the way in which a component fails can be quite important. A D F M component is working (Xj = 1) if it functions as designed (e.g. it can regulate the flow of electricity through itself); it is failed open (x =fo) if there is no flow through it at all (e.g. an open electrical circuit); and t  it is failed short (x = fs) if it allows an infinite or uncontrolled flow through itself (e.g. a t  shorted electrical circuit). We indicate the probability that a component / is failed short by  qsj = Pr[x, =fs], and the probability that component /' is failed open by qo = Pr[Xj =fo\ i  27  Similarly, an entire system is said to be failed short if there is any path set consisting exclusively of shorted components, it is failed open if there is at least one cut set of open components, and it is working if it is neither failed short nor failed open (i.e. there exists no shorted path and no open cut). Defined in this way, the three system states are also mutually exclusive. Note that there are other ways to define system performance for D F M components; see [SSYY93] and [SS90] for discussions on this point. We shall see that at both the system and component levels, the failed open state is equivalent and has similar behaviour to a conventional failure in the binary-state model, while the failed short state is quite different in its effect. Note that the reliability function R(p) and the structure function S(x) have the same interpretation with D F M components as was the case with binary components, though their form becomes more complicated. Page & Perry [PP87] showed that one could calculate the reliability of a D F M system using two binary-state systems with the same topology. Theorem for the Calculation of D F M reliability from [PP87]:  R(p)=R(p")-R(p') where R(p") = Pr[no open cut] is calculated usingp" = p + qs and q " = qo , while R(p') t  t  t  i  = Pr[short path] is calculated using p( = qs and q ' = p + qo . R(p") and R(p') are t  t  t  t  calculated using the binary-state reliability function corresponding to a system with the same network topology as the original D F M system, but with the probabilities adjusted as shown. Note that R(p") is a reliability in the conventional sense, whereas R(p') is effectively a form of unreliability figure for the D F M system. The preceding definition is used in [PP87] for systems of independent components, and so is given in terms of reliabilities. The following version is in terms of expectations and allows for failures of components to be associated. Definition 1-3:  D F M reliability can be calculated as follows:  E[S(x)] ^ E[ S(x") ] - E[ S(x>) ]  28  where Ef^x")] = Pr[ no open cut] is calculated using a binary-state system in which 1/' = lj ufsj and 0," = fo , while E[S(x')] = Pr[ short path ] uses 1/ = fs and 0/ = l u fo . t  t  z  i  We can use this definition and its "state translations" to convert some other terms of D F M reliability into binary-state terms as follows. Lemma 1-3: (a) E[ S(l ,x) ] = E[ S(ir,x") ] - E[ S(0/,x') ]; t  (b) E[ S(fsj,x) ] = E[ S(lf,x") ] - E[ S(l/,x') ]; (c) E[ S(fo x) ] = E[ S(0,",x") ] - E[ S(0/,x') ]. h  Proof: These equivalencies are obtained by application of Definition 1-3 as follows. (a) E[ S(\ x) ] = E[ SaWs") - S({l,},x') ] = E[ 5(l,",x") - S(0/,x') ] h  (b) E[ S(f ,x) ] = E[ S({fSj},x") - SOfSi)^ Si  (c) E[ S(fo x) ] = E[ S({f },x") - SdfOi)^ b  0i  ] = E[ SOAx")  - Sil^  ]  ] = E[ 5(0/'^") - 5(0/^-) ]  QED These equivalencies will be useful in subsequent calculations and derivations. 1.07 Importance of Components for DFM Models Binary components have only two states, working and failed; thus when we speak of improving the reliability of a component, it is obvious that we are increasing the probability of it being in the working state while reducing the probability of it being in the failed state. In discussing D F M components, we must be more explicit in defining what we mean by "improving the reliability". From now on when we speak of improving a component (making it more likely to be in the working state), we will specify whether it is improving with respect to open failures (hence decreasing the probability of being failed open, without changing the probability of being failed short), or the with respect to short failures (hence decreasing the probability of being failed short, without changing the probability of being failed open). We will ignore the possibility of simultaneously improving a component with respect to  29  both types of failures, though such a feature could be handled at the expense of more complicated notation and formulae. We now develop importance measures for our D F M components. We begin by redefining marginal reliability importance to handle the two types of failures. Definition 1-4: (a) M R I O ,  For systems of DFM  components:  = E[ S(lj,x) - S(fo x) ] h  = R(lj,p) - R(foj,p) if components are independent; (b) M R I S 7 -  E E f ^ ) - ^ ) ]  = R(lj,p) - R(fsj,p) if components are independent. M R I O is the M R I corresponding to reduction in the probability of an open failure, while M R I S is the M R I for reduction in the probability of a short failure. For the Figure 1-1 example, assume that there is an equal likelihood of either type of failure, so that qo = 0.1, and qs = 0.1. Then M R I O ^ = 0.99 and M R I 0 = 0.09, so M R I O i > M R I 0 and M R I 0 2  = M R I 0 ; while M R I S , = 0.19 and M R I S 3  2  2  = 0.09, so M R I S i > M R I S  2  and M R I S  2  2  =  M R I S 3 . Here we see that although the probability of each component failure type (open or short) is the same, their effect on importance is quite different. For example, M R I O j is much higher than M R I S j because an open failure in component 1 always causes system failure, whereas a short failure often allows the system to keep functioning (depending on the other component states). We can redefine structural importance in a similar fashion. Definition 1-5: (a) S I O j = 3-( ) £ [ S(\j,x) - S(foj,x) ] where the summation is over x such that x\ = 1; n_1  (b) SIS/ = 3-0 " ) £ [ S(lj,x) - S(fsj,x) ] where the summation is over x such that Xj = 1. 1  1  For the example, SIO, = 0.888 and S I 0 = 0.333, so S I O i > S I 0 and S I 0 = S I 0 ; 2  2  2  3  while SISj = 0.555 and SIS = 0.333, so SISj > SIS and SIS = SIS . 2  2  2  3  As with SI and MRI, there can be a direct equivalency between SIO and M R I O , and between SIS and M R I S . This occurs when all components are independent and all  30  probabilities equal 1/3, i.e. when we havep = qs = qoj = 1/3 for all /', then clearly the t  t  following relationships will hold: (a)  SIO, > SIO,  o  MRIO, > MRIOy;  (b)  SIO,- = SIO,-  »  MRIO, = MRIO, ;  (c)  SIS,- ^ SIS,-  o  MRIS, > M R I S , ;  (d)  SIS, = SIS,  o  MRIS, = MRIS . y  We next construct our JRI equivalents, again in two variations. Definition 1-6: (a) JRIOy  = E[ SQiAjjc) + Sifotfoj.x) - Sifo \j,x) -S(ljfoj,x) ] u  = R(l Aj,p) i  +  R<fo Joj,p)-R{fojAj,p)-R(l /oj,p) i  i  when components are independent; (b) IRIS/,-  = E[ S(li,lj,x) + S(f f ,x)  - S(f ,lj,x) -S(lj/ j,x) ]  Sil Sj  =  Si  S  R(\^ ,p)+Rifs Js ,p)-Rifs^ ,p)-R j  i  j  j  when components are independent. For the system in our example, J R I 0  1 2  = +0.1 and J R I 0  2 3  = -0.9; J R I S = -0.9 and 12  J R I S = +0.1. 23  Both M R I and JRI for binary components can also be expressed as derivatives of the reliability function when component failures are independent. This is also true for the D F M equivalents. Lemma 1-4: When all components are independent, we have: (a) MRIO/  = d/dpj Rip)  = - d/dqoj R(p);  (b) MRISy  = d/dpj R(p)  = - d/dqsj R(p);  (c) JRIOy  = dVdpj Rip)  = dVdqoj Rip);  (d) JRISy  = dVdpj Rip)  = BVdqsj R(p).  2  2  2  2  Proof: Part (a).  Decompose the reliability function (as described in e.g. [BP75]) and then  make the substitution of 1-pj-qsj in place of qoj.  31  Rip)  =Pi*R(hP) + qoj*R(foj,p) + = Pi*Ri\ ) hP  qsfRfap)  + (l-pi-qSiYRffo^p) + q *R(f p) Si  Sh  Now take the derivative with respect to pj.  d/dpjRip)  = l*R(l p) + (0-l-0)*R(/b p) + 0*R(fii p) i  i  i  = R(l;,p)-R(fo ,p) i  = MRIO, To get the second equality, follow the same procedure above but instead substitute \-qof qsj in place ofpj and take derivatives with respect to qoj. Part (b):  This follows a parallel argument, using l-pj-qoj in place of qsj.  Rip)  =Pi*R(hP) + qsfRffsiJ)) + qofRVow) = Pi*R(\i,p) + (X-Pi-qojfRtfsiJ,) + qoi*R(fo p) h  d/dpjRip)  =l*i?(l^ + (0-l-0)*%p) + 0 * % p )  =  R(\ ,p)-Rifs p) i  = MRIS  u  Z  To get the second equality, follow the same procedure above but instead substitute l-qoj qsj in place of pj and take derivatives with respect to qsj. Part (c):  Start with the derivative from part (a) and repeat the decomposition,  substitution, and differentiation.  d/dpjRip)  =R(\ p)-R(fo p) h  h  = j*R(li,ljj>) + P  {l-pj^SjyRiliSojrf+qsfRQiSsjp)  -pj*R(foj,lj,p) - (\-pj-qSj)*R(fojjbj,p) - qsj*R(fojJsj,p)  52 /  dp? Rip) = l*Rilj,ljj>) + (  0 - 1 - 0 ) ^ ( 1 , / ^ ) +  0*R(ljfsj,p)  - l*R(foj,\j,p) - (p-l-0)*R(foiJbjj>) - 0*RifoiJSj,p)  = /2(i i^) H-/?^/*/^^)-^Ci/^=^) /s  = JRIOy,  32  To get the second equality, follow the same procedure above but instead substitute \-qojqsj in place ofp\ and take derivatives with respect to qo\. Part (d):  This follows a parallel argument to that in (c) above.  dld R{p)  =R(\ p)-R(fs p)  Pi  h  h  =pfR(\ \j,p) h  + (l-pj-qOjYRilifsjp) +  ofR(\aoj,p)  q  -pj*R(f ,lj,p) - (\-pj-qoj)*R(f J j,p) - qoj*R(fsjJoj,p) Sj  Si  S  5 2 / dp? R(p)= l*R(h,lj,p) + (0-l-0)*tf(l//s/,p) + 0*R(lifoj,p) - \*R(f ,\j,p) - (0-\-0)*R(f J ,p) - 0*R(fsjfoj,p) Si  =  Si  Sj  R(\ A ,p)+R(fs js ,p)-R(fs^ ,p)^ i  j  i  j  j  = mSjj To get the second equality, follow the same procedure above but instead substitute l-qojqsj in place ofpj and take derivatives with respect to qsj. These same results can be obtained by first converting DFM reliability into its equivalent binary component terms (via Lemma 1-2), and then taking the appropriate derivatives. QED It is easy to show that the definitions for MRI and JRI with DFM components have direct equivalents in terms of binary state components. Proposition 1-3: (a) MRIO,  = MRI/', i.e. MRI/ calculated using x";  (b) MRIS/  = MRI/', i.e. MRI/ calculated using *';  (c) JRIOy  = JRI"//, i.e. JRI/ calculated using x";  (d) JRISy  = -JRI'//, i.e. -JRI/ calculated using x' (note the negative sign).  Proof: These equivalencies are obtained by application of Lemma 1-3 term-by-term to Definitions 1-4 & 1-5.  33  (a) MRIO,-  = E[ S(l x) ] - E[ S(fo , x) ] h  t  = E[ S(V,x") - S(PM ] - E[ S(0/',x") - 5(0/^0 = E[ S(lj",x") - SWJC")  (b) MRISy  ]=  1  MRIy"  = E[ S(l,,x) ] - E[ S(fs x) ] h  = E[ S(l,",x") -  ] - E[ S(l,",x") -  ]  = E[ -S(0/,x') + 5(l/,x') ] = MRIy'  (c) JRiOy  = E[ SO/.l/.x) -  ly,x) - S(lifoj,x) + S(f jOj,x) ] 0i  = E[ S ( l " , l / , x " ) - 5(0/,0/,x') ] - E[ S(0/',l/,x") - S(0 \0J,x') ] ;  i  - E[ S(l/',0/,x") - SWfiJjc ) ] + E[ S(0r,0j",x") - S(0 \0jjc') ] 1  i  = E[ S(\i",lj",x") ] - E[ S(0i\lj",x") ] - E[ .S(l ",0/',x") ] + E[,S(O ",0/ ,x")] ,  ;  (d) JRISy  i  = E [ S(lj,lj,x) - S(fsj,\j,x) - S(\i/Sj,x) + S(f Jsj,x) ] Si  = E[  S(i/',iy,x") - 5(o/,oy,x') ] - E [ S(i/',iy,x") - S O A O / V O ] - E [ S ( l / \ l / , x " ) - 5(0/,l/^ ) ] + E [ 5 ( l / ' , l / , x " ) - 5(1/,1/^') ] 1  = -E[  5(o/,oy,x') ] + E [ S(i/,oy,x') ] + E [ 5(o/,iy,x0 ]  - E[  5(i/,iy,x') ]  = -JRiy  We offer below a D F M definition of LI in which we follow the idea behind the original version: to be ordered, one component must be at least as important as another component regardless of which state both are assumed to be in. Definition 1-7:  LID, > LEDy (read, "component / is at least as important in terms of  LID as component j ") if all of the following are true for any assignment of probabilities such that the marginal probabilities for each component are all identical:  34  (a) E[ Stfoj, x) ] > E[ S(fo x) ]; h  ()  E[S(f ,x)]>E[S(fs x)];  (b)  E[S(lj,x)]<E[S(l x)].  a  Sj  h  h  If both LID, > LED, and LTD, < LID,, then LID, « LEDy. For the example system LED2«LED , but LID cannot give an ordering for components 1 3  and 2. Defined in this way, LID has similar implications to those that L I had in binary models. Proposition 1-4:  LID implies the following about SIO, SIS, MRIO and MRIS:  (a) LID, > LIDy => SIO, > SlOy and SIS, > SISy ; (b) LID, « LIDy => SIO, = SlOy and SIS, = SISy ; (c) LED, > LEDy => MRIO, > MRIOy and MRIS, > MRISy for identical components; (d) LID, « LIDy => MRIO, = MRIOy and MRIS, = MRISy for identical components. Proof: First consider part (c). LED, > LEDy by definition means that both E[ SifOj, x) ] > E[ S(fo ,x) ] and E[ S(lj, x) ] < E[ S ( l ] . t  From this it follows that  E[ S(l x) ] + E[ S(fo x)]  > E[ S(lj, x)]+E[ S(fo x) ]  E[ SQije) ] - E[ S(fo x)]  > E[ S(lj, x)]-E[ S(f ,x) ]  h  p  h  h  0j  MRIO, > MRIOy Similarly, E[ S(fs x)]>E[ Sifs x) ] and E[ S(\ x) ] < E[ S(l x) ], and so p  E[S(\ x) ] + E[ S(fsj, x)] h  E[S(l x) ] - E[S(fs h  h  x)]  b  p  h  >E[S(lj, x)]+E[S(£s x)  ]  h  >E[S(lj, x)]-E[S(f ,x) Sj  ]  MRIS, > MRISy. For part (d), LED, « LEDy means that both: LED, > LEDy, hence MRIO, > MRIOy and LED; < LEDy, hence MRIO, < MRIOy. If these inequalities for MRIO are both true, then we must have MRIO,- = MRIOy. The same logic shows that we must have MRIS,- = MRISy.  35  By settingp = 1/3, qs = 1/3, and qo = 1/3, we have MRIO = SIO and MRIS = SIS; then (a) follows from (c) and (b) follows from (d). QED Since the original PCI was defined strictly in terms of the network topology, in principle it can be used as is. Unfortunately, it seems that it no longer is as sensible a measure as before; for example, a short minimum path is beneficial when one is only worried about conventional (open) failures, but it is a liability when dealing with short failures. We do not define a new version herein, as there is no obvious way to adapt its rationale to D F M systems. We still have P C I T > P C I and P C I ~ P C I for the example 2  2  3  system. In Theorems 1-1 and 1-2 we derived some structural properties for the JRI of binary components. Below we present the equivalent results for our D F M model. Proposition 1-5:  The following three conditions are equivalent:  (a) there is no minimum path set containing both /' and j; (b) JRIOy < 0 for all possible assignments of probabilities; (c) JRISy > 0 for all possible assignments of probabilities. Proof: Proposition 1-3 showed that JRIO = JRI" and -JRIS = JRI'. Substitute these into part (b) of Theorem 1-1 as follows. JRIOy < 0 o JRIy" < 0 JRISy > 0 ^ -JRIy' > 0 O JRIy' < 0 QED Proposition 1-6:  The following three conditions are equivalent:  (a) there is no minimum cut set containing both /' and j; (b) JRIjj- > 0 for all possible assignments of probabilities; (c) JRIS/i < 0 for all possible assignments of probabilities.  36  Proof: Proposition 1-3 showed that JRIO = JRI" and -IRIS = JRI'. Substitute these into part (b) of Theorem 1-2 as follows. JRIOy > 0 o JRIy" > JRISy < 0 ^> -JRIy' < 0 O JRIy' > 0 QED These two results make intuitive sense. JRIO behaves just like JRI does because these both deal with the same kind of failures (i.e. those due to open cuts); for example, if we had two components in parallel (hence no common minimum path) we would expect that reducing the risk of an open failure in one would tend to make the other less important with respect to causing a system open failure. On the other hand, JRIS behaves in the opposite fashion: in the same situation, making one component less likely to fail short would tend to make the other one more critical with respect to system short failures. This is the reason for the negative sign in part (d) of Proposition 1-3. Our result in Proposition 1-2 concerning the strictness of the inequalities clearly continues to hold for these D F M equivalents. Looking back we can see that all of the D F M importance measures (except LID, which was unable to give a ranking) ranked component 1 as being more important than components 2 and 3 in the example system in Figure 1-1; this is due in part to the simple structure of the system and our choice of probabilities. Note that the strength of this ranking varied considerably depending on the measure used. Consider for example the M R I measures: we see that component 1 is much more important than component 2 regarding open failures (MRIOi = 0.99 and M R I 0 = 0.09), but is only slightly more 2  important regarding short failures (MRIST = 0.19 and M R I S = 0.09). This means that if 2  we wanted to improve the reliability of a component, we would get much better gains from reducing the chance of an open failure in component 1 than from either reducing the  37  chance of an short failure in component 1 or reducing the chance of either failure in component 2. 1.08 Future Research This chapter has dealt with components which are either binary-state or D F M three-state. An obvious goal would be to extend these notions of component importance to systems of general multi-state components, or perhaps even continuous-state components. The problem in doing so is that there seem to be many differing ways to define such a system's reliability (orperformability, see [SS90]), so any importance measures and properties may depend heavily on the definition used. Technically, PCI can be used with components with any number of states as it only considers network topology; however, the usefulness of the PCI ordering may depend on the nature of the system (e.g. PCI considers a long s-t path to be undesirable, but as we saw with the failed short state of D F M components this is not always appropriate). We should note that we have completely ignored one important issue, which is the actual calculation of the measures we have discussed. As other authors have discussed (see e.g. [Co87]) some of these measures involve calculations which are NP-Complete (e.g. the work required to calculate SI grows exponentially in the number of components), and many require information on component reliabilities and covariances that may not be readily available in practice. For example, M R I calculations require the knowledge of every component's reliability as well as the covariance between every pair of components, whereas in reality component reliabilities are only estimates and most covariances are completely unknown. Addressing these problems is beyond the scope of our work herein, and so we leave them for future research to deal with. 1.09 Concluding Remarks In this chapter, we reviewed several existing measures of reliability importance for binary components and discussed how they are related. We then showed that structural results previously derived in [HL93] for JRI of independent components continue to hold  38  for correlated components. We proposed extensions to all of these measures to cover D F M components, and showed that their original properties largely continue to hold.  3?  Chapter 2 Maintenance and Inventory for a Single Machine 2.01 Chapter Abstract We consider a system containing a single machine subject to random failure, for which we wish to determine a maintenance policy and a spare ordering policy. We examine the solvability and desirability of jointly optimizing these two traditionally separate policies. We discuss why the general form of the problem is largely intractable and then proceed to examine several special cases which are easier to analyze. Special attention is paid to systems which are only allowed to contain a single spare; we show that for these systems our problem has some convexity properties which make it amenable to cost minimization. We subsequently consider systems which allow more than one spare, but make use of other restrictions or assumptions to somewhat simplify analysis. 2.02 Introduction Consider a system containing one machine subject to random failure, for which we want to determine a maintenance (age replacement) policy and an inventory (spare ordering) policy that will permit the system to operate at minimum cost. The traditional approach to finding a maintenance policy is to assume that spares are always available as needed, and then select a replacement policy which minimizes the expected maintenance costs (i.e. ignoring any costs due to inventory). Similarly, the traditional approach to inventory policy is to make some assumption about the distribution of demand, and then select an ordering policy which minimizes the expected inventory costs while ignoring the maintenance costs. The selection of these two policies in sequence ignores the effect that each can have on the other. In this paper we consider the problem of joint optimization of replacement and ordering policies, paying particular attention to solvability (is it tractable?) and desirability (is it worthwhile?). This work is motivated in part by the author's previous work experience in aircraft maintenance. At his place of work, maintenance decisions and inventory decisions were  40  made by two distinct groups of people, and a significant amount of the local manager's time was spent resolving problems between these two groups. It was also a feature of the organization that on occasion there were arbitrary limits placed on the maximum level of inventory that could be stocked. Sometimes this was due to a physical space limit; e.g., when deploying to a remote location, only one of any large spare would be taken. More generally, the limits arose from policy decisions made by a higher level headquarters; e.g., due to limits in the purchasing budget or a production shortfall at the manufacturer, no site was permitted to hold more than one of a certain high value spare, and an additional spare could only be obtained in exchange for a failed part. Thus a single spare system could arise due to a spare rationing policy that was beyond the local manager's control. One of the difficulties in jointly considering the maintenance and inventory policies for a machine is caused by the interaction between maintenance demand and inventory supply. In traditional maintenance analysis a spare is assumed to always be available; thus the performance of the inventory policy has ho effect on when maintenance may be carried out. Similarly, in traditional inventory analysis the demand is assumed to arise from an exogenous process; this process continues to make demands regardless of the past success (or lack thereof) of the inventory policy. In a reality, however, maintenance decisions will necessarily be influenced by whether or not a spare is currently in stock; similarly, whenever there is an inventory stock-out that prevents us from running the machine, the demand process will stop until a spare is delivered. Thus we must explicitly consider the effect each sub-system has on the other if we are to develop optimal policies. There is a large body of literature dealing with maintenance issues in general; refer to [VF89] for a survey of papers dealing with the maintenance of a single machine. The earliest work that considered the joint optimization of maintenance and inventory policies dealt with Markovian systems; see for example Ross [Ro69], Derman & Lieberman [DL67], and Allan & D'Esopo [AD68]. Kao in [Ka73] works with semi-markov processes, but does not consider inventory issues. Falkner [Fa68] worked with a non-  41  Markovian (i.e. deteriorating) machine but handled inventory quite differently from our work: he used a periodic inventory policy with immediate replenishment at the beginning of each period. His approach allowed a wide variety of costs and structural features. He showed that the optimal replacement time was a function of the stock on hand and the time until next replenishment, and that an optimal order quantity could be calculated by an iterative procedure. Falkner's work in [Fa69] which dealt with inventory only, and in [Fa70] which dealt with deterministic machines, was also along these lines. Park & Park [PP85] similarly dealt with systems using periodic review inventory policies. There are also a number of papers which have studied variations on (5-1,5) continuous review inventory policies that are partially relevant to our problem: see for example Feeney & Sherbrooke [FS66], Posner & Yansouni [PY72], Smith [Sm77], and Schmidt & Nahmias [SN85]. We will later in this paper make use of some results by Atkins & Katircioglu [AK95] and by Schultz [Sc89], who studied delayed (5-1,5) inventory policies. There have been quite a few papers dealing with single spare models, one of the earliest of which was [N074] by Nakagawa & Osaki. They considered a system like ours but without any breakage cost; consequently, age replacement as such was unnecessary. Subsequent papers, e.g. [K078a], [K078b], [K078c], [K078d], [K079], [K088], [KH81], [OKY81] [Os77], [PP86a], [PP86b], [SL92], [Sr92], [SS89], [T078a], and [T078b], have added other costs and/or maintenance actions to the model but have continued to neglect a breakage cost, and so their results could not be directly compared to classic age replacement theory either. We might classify these papers based on structural concepts. The earlier papers tended to assume deterministic lead times, while later work allowed for random lead times and minimal repairs. The following allowed for random lead times: [KH81], [PP86a], [PP86b], [Sr92], and [SS89]. A distinction between order types (scheduled versus unscheduled) was made in [T078b], [K078a], [K078b], [K078c], [K078d], [K079], [K088], [KH81], [Os77], [SL92], [Sr92], [SS89], and [T078a]. Minimal repairs were  42  included in some form in [SL92], [Sr92], [K088], and [PP86a]. Many papers required the use of a replacement upon receipt maintenance policy, i.e. always install a spare as soon as it arrives: this assumption was made in [K078a], [K078b], [K078c], [K078d], [K079], [KH81], [Os77], [SL92], [Sr92], [SS89], and [T078a]. We note that most of these papers quite naturally have tended to emphasize the case where the machine has an increasing failure rate; consequently, they have neglected to consider the option of returning an unused spare component to its origin if the machine has a decreasing failure rate. Mine & Kawai [MK77] examined a similar system which did have a breakage cost but no holding or explicit shortage charges; their order lead time was random and their objective used discounted total cost rather than average cost rate. They showed that under reasonable conditions a unique optimal ordering time and replacement time existed, and they derived some properties of these times. [OKY81] also used discounted total cost as their objective. [SLT92] considered a variation in which policies were based upon the cumulative number of failures since installation rather than age. [N078] considered a variation in which the main decision was whether to repair or replace, using the estimated cost or estimated time to repair as the decision criteria. The most general form of the problem that we might consider in this chapter would allow the lead times for receipt of spares to be random, with a distribution that might depend on whether the order was scheduled or not, and would allow inventory to contain as many spares in stock or on order as desired. We might hope to develop an equation using renewal principles that looks something like the following, which we could then analyze to find jointly optimal maintenance and inventory policies. (2-1) , E \ maintenance costs per cycle] + E \ inventory costs per cycle] EI cost rate of policy I = — = —= E [ length of cycle] r  43  This equation determines the long run expected cost rate of operating the system by dividing the expected cost per renewal cycle by the expected length of such a cycle. This requires that we be able to determine clear renewal points and easily describe the system's behaviour between these points. In the general problem such renewal points can be specified, but the behaviour of the system between these points is complicated to the point of being intractable for exact analysis, in part because of the previously mentioned interaction between maintenance demand and inventory supply. In this chapter our approach is to make some restrictive assumptions that allow us to define and solve simpler versions of the problem, particularly for the case where the system contains only one spare at a time. Typically, such restrictions facilitate analysis by partially removing the maintenance-inventory interaction, or by restricting the choice of allowable policies. We begin this chapter by more precisely describing our modeling assumptions and notation. We then consider systems which are allowed to contain at most one spare, first requiring that the order lead time be deterministic and then allowing it to be random; our most detailed analytical results are obtained in these sections. Next to be examined are systems which allow more than one spare, but make use of other restrictions or assumptions to somewhat simplify analysis. In each case, we proceed by defining a renewal system, specify an equation which describes its expected cost rate, and then derive some of the properties of the equation. We conclude with a suggested approach to tackling the general problem in actual practice. Proofs for our results are contained in Appendix B. 2.03 Model Description & Notation We examine a system which contains one machine subject to random failure. We can shut down the installed machine at any time, and we can then replace it provided that we have a new spare machine in stock. Between placement of an order and receipt of a spare there is a random lead time with distribution G(t), density g(t), and mean L > 0. The replacement action itself is assumed to be perfect (i.e. it returns the machine to good-as-  44  new status) and requires a random time with mean R. The stochastic machine lifetimes are assumed to follow a distribution function F(t) which has finite mean ju, density f(t), and hazard rate z(t). We say that F(t) is IFR (increasing failure rate) when z(t) is nondecreasing, and strictly IFR when z(t) is strictly increasing; similarly D F R or strictly D F R indicates z(t) is non-increasing or strictly decreasing, respectively. In most of this chapter it is assumed that the machine is IFR; we will indicate whenever we depart from this assumption. We use t to represent the scheduled replacement time of the component, and r  t to denote the scheduled time at which we order a spare. 0  Throughout this chapter we assume that the failure distribution F(t) is known, whereas in practice it is typically unknown and must be estimated. This estimation process is a non-trivial problem in itself, and can become quite complicated if it is performed while trying to operate the system efficiently, i.e. if we are learning as we go. This is an interesting and important problem but is beyond the scope of the work herein. Our system incurs a cost cv(t) whenever a part is replaced, where c is a cost coefficient and v(t) is a convex function of the age of the machine, i.e. the cost of a replacement depends on the machine age; this is the cost of a preventive replacement, and includes the cost of the part itself, labour charges for the actual installation, and the administrative cost of placing an order (which is assumed to be negligible, so that ordering of individual spares is appropriate). The simplest case is where v(t) = 1 for all t, so that the replacement cost is a constant; allowing v(t) to be convex permits consideration of more general cost expressions, e.g. a fixed parts cost plus a age-varying labour charge. The system incurs a breakage or failure cost b if the replacement is done after failure; this is the additional cost of a corrective replacement, and can include charges for extra labour, repair of damage to the rest of the system, etc. If the system is unavailable, i.e. while awaiting delivery of a needed spare, a shortage penalty cost at rate k per unit  45  time is incurred; this can represent missed profits, the cost of renting a temporary replacement, etc. A cost rate of h per unit time is incurred while holding a spare in inventory; this represents the opportunity cost of capital, depreciation of the spare, etc. An operating cost is also incurred at rate aq(t) whenever the system is operational, where a is a cost coefficient and q(t) is a non-decreasing function of machine age, i.e. the machine gets more expensive to operate as it ages. This term is used to represent direct costs of operation such as for fuel use, etc., as well as many of the other costs that previous authors have included in their models. One such cost is that of declining revenues due to deteriorating production quality or yield, as discussed in [MB95]. Another is the decreasing expected salvage value of the installed part, as in e.g. [KH81]. Probably the most important is the increasing expected cost of minimal repairs (which return the machine to bad-as-old status) as in [K088]. There are several different ways to model minimal repairs; see [Be93] for a thorough discussion of the topic. Our work takes a fairly general approach, in that we assume that there are two kinds of failure, minor and major. A minor failure is always rectified by a minimal repair, which is instantaneous and does not incur the breakage penalty b. A major failure does incur the breakage penalty and can only be rectified by replacing the failed machine with a new one. Thus the term aq(t) in our model is an aggregate of whichever of these other costs may be relevant to the system being modeled. We will introduce other cost factors and decision variables as required later in this chapter. All costs are assumed to be non-negative unless otherwise indicated. 2.04 Single Spare with Basic Structure In this section our system is constrained to allow only one spare to be in stock or on order at any time, and by requiring that the order lead time for both scheduled and unscheduled orders be deterministic and equal to L; we naturally require  0<t <t -L<oo o  r  . A single spare is obviously a severe restriction on inventory policies, though it may be approximately true for real systems due to storage space limitations, and in some cases it  46  none the less leads to an optimal policy. We further restrict our model structure by assuming v(t) = 1 (so replacement cost is a constant value c), q(t) = 0 (no operating cost), and R = 0 (replacement is instantaneous). This restricted system gives us the most tractable model that we consider and hence permits our most detailed results. Previous papers have analyzed many variations of this restricted system, but we believe our choice of cost structure is less well examined and generally more interesting. We cover both the two major costs generally considered in the maintenance literature (replacement and breakage) as well as the two major costs widely considered in the inventory literature (holding and shortage). Our results therefore can be directly compared to those of classic models of age replacement (such as in chapter 3 of [JMR67]) which largely ignore inventory issues. Our performance objective is to minimize the system's expected cost per unit time (expected cost rate). Since replacement times are regeneration points of the system, this cost rate can be obtained by looking at one cycle of the system; that is, from the time of one replacement until the time of the next. The expected cost rate thus equals the expected sum of costs incurred in one cycle divided by the expected length of a cycle. During our discussion, it may be helpful for the reader to refer to the time-lines shown in Figure 2-1. These show the possible orders of events that can occur in the system during each renewal cycle. Figure 2-1: Timing of Events (a) Machine survives until scheduled replacement  0  to  to + L  (b) Machine fails after spare arrival  T  to  tr  47  (c) Machine fails after spare is ordered  —1 0  1 to  .1 X  to + L  1 tr  (d) Machine fails before spare is scheduled for ordering  1 0  X  to  1  to + L  tr  where X  m a c h i n e failure machine operation s p a r e o n order  ^ ™ s h o r t a g e  period  s p a r e in s t o c k  Let us begin by looking at the traditional methods of making the replacement and ordering decisions sequentially. The traditional maintenance approach is concerned with the replacement cost c and the breakage cost b; the idea is to balance the cost of using too many parts against the increasing expected costs due to operation and breakage. Spares are assumed to be always available. What we must decide is when to replace the part. A replacement is made on failure or possibly as a preventive measure before failure at some planned time t . The form of our optimal policy is an age replacement policy r  defined by this single number t . r  In the derivations which follow we make use of the relation that oo  co  it = \tf(t)dt 0  = JF(t)dt  (2-2)  0  The expected cost rate for this policy sums the expected cost of parts plus the expected cost of breakage, both per cycle;  c + bF(t ) r  (2-3)  48  and divides by the expected cycle length 'r  <»  jxf(x)dx  'r  + jt f(x)dx  = $F(x)dx  r  0  (2-4)  0  t  r  Combining these gives an expression for the expected cost rate which we call the Replacement Cost Function RCF.  RCF(t)  c + bF{t)  = ~  ,  —  ^  (2-5)  JF(x)dx o Since F(t) is IFR, RCF is a pseudo-convex (and unimodal) function of t with a unique r  minimum, as was shown in chapter 4 of [JMR67] (and as we show more generally in Theorem 2-1 which follows). It is not necessarily convex. The principle benefit of pseudo-convexity in this context is that any local minimum of the function is also a global minimum; furthermore, if the pseudo-convexity is strict this implies that the minimum is a unique point (otherwise there may be an interval over which the function has the same minimum value). Refer to [ADSZ88] or to [BSS93] chapters 3 and 4 for more details on the subject of convexity. The optimal t will be one of t ' or oo as follows, where z(t) is the hazard rate: r  (a) 0 < t; < oo  r  if bz(t;) - RCF(t; ) = 0 ;  (b) t = co  if fc(co) - RCF(oo) > 0 .  r  This equation is saying that the optimal scheduled replacement age is that at which the marginal expected cost rate of operating equals the average expected cost rate. In the event that no such t ' exists, i.e. if fe(oo) < RCF(cc), then this implies that it is optimal to r  replace only at failure, so set t = co. To actually solve for t , we can use (a) and (b) r  r  above in conjunction with some form of line search procedure (see [BSS93] chapter 7 for examples).  49  Now consider the traditional inventory approach, which is concerned with balancing the shortage cost rate k incurred while a demand is unmet against the cost rate h of holding a spare in stock. The decision to be made is when to order a spare so as to minimize these costs. In our system, we cannot order a spare if there is already one in stock or on order. The order instant t will therefore lie between the time the last spare 0  was installed (time zero), and a time L before the next definitely will be needed (at t ); r  thusO <t  0  <t -L. r  In general for this model, the expected shortage cost per cycle is t. + L  [*o  1  t + L a  (2-7) and the expected holding cost per cycle is  (2-8) The expected cycle length is not quite the same as for RCF due to the possibility of a stock-out prolonging the cycle. It is equal to the sum of the expected part life plus the sum of the expected shortage time.  (2-9)  Combining these terms yields the Ordering Cost Function OCF.  OCF(t ) = 0  (2-10)  50  Since F(t) is U R , OCF is a pseudo-convex function of t and so has a single minimum (see 0  Theorem 2-2 below for details). The OCF equation assumes that the inventory manager knows and makes use of all relevant information about the distribution of demand, but typically this is not the case. We would commonly expect the manager to make a simplifying assumption about the form of the demand distribution, while taking account of whatever data was available about historical demand rates. Probably the simplest assumption is to treat demand as a Poisson process, with mean time between demands equal to p.. This ignores the effect of preventive replacements and simplifies the ordering decision so that orders are only placed at demand epochs, i.e. choose t to be either 0 or t - L (whichever is least costly). This 0  r  also simplifies implementation because the calculation only requires the average component lifetime without preventive replacements. Other approximations are possible; for example, one could assume a Poisson demand distribution with a mean inter-demand time u* < u. which reflects historical demand from an existing maintenance operation. We will return to this issue later in the chapter. To illustrate such a simplified case we offer what we call the Simplified Ordering Factor SOF, which chooses the ordering policy in complete disregard of maintenance demand (beyond using p.). SOF = kL - hju  (2-11)  The first term on the right-hand side represents the cost of a shortage each cycle, while the second is the cost of holding a spare throughout an average cycle. If SOF is negative, one should order on demand; if positive, one should order immediately after replacement; or if zero, then either option is equally good. Now we consider the joint optimization of maintenance and inventory, for which an operating policy is described by the pair (t , t ). If the machine fails before t , it is 0  r  r  replaced immediately if a spare is available, or else as soon as a spare arrives; otherwise it is replaced at t . If the machine fails before t , an order for a spare is placed immediately; r  0  51  otherwise an order is placed at t . We want to find a feasible t , t pair that minimizes the 0  0  r  expected cost rate for the entire system. The expected cost function for the joint problem is built using (2-9) for the expected cycle length, (2-3) for the expected replacement and breakage costs per cycle, (2-7) for the expected shortage cost per cycle, and (2-8) for the expected holding cost per cycle. Combining these terms gives the expected operating cost per unit time, which we call the Joint Cost Function JCF.  t  + L  0  /,  _  c+ k j F(x)dx + h j F(x)dx JCF{t ,t ) 0  =  r  + bF{t ) r  ^  (2-12)  J F(x)dx + j F(x)dx This function has some useful properties. Theorem 2-1: JCF is unimodal and pseudo-convex in t ; hence for a given t , JCF(t , t ) r  0  0  r  has no interior (t + L<t < co) maximum with respect to t and at most one interior Q  r  r  minimum at t \ and the optimal t will be one of t + L, co, or t ' as follows: r  r  0  r  (a) t = t +L  if h + bz(t + L) - JCF(t  (b) t + L<t '<oo  if h + bz(t; ) - JCF(t ,t ') = 0;  (c) t = <x>  if h + bz(oo)-JCF(t ,  r  0  0  0  r  0  r  ,t +L)>0;  0  o  r  oo)>0 .  0  If there is no t ', then JCF is monotone in t . Case (a) indicates that the installed r  r  machine should be replaced immediately upon receipt of a spare (at t + L); case (c) 0  implies that the machine should be replaced only at failure. Note that case (b) is simply saying that optimality occurs when the marginal expected cost rate of running equals the average expected cost rate overall. Theorem 2-2: JCF is unimodal and pseudo-convex in t ; hence for a given t , JCF(t , t ) 0  has no interior (0<t <t o  r  Q  L) maximum and at most one interior minimum point t ', and Q  so the optimal t will be one of 0, t - L, or t ' as follows: 0  r  r  0  r  52  if k -  (a)f = 0 o  (b)  if k - h  0<t '<t -L o  F(L)  r  - JCF(0, t ) > 0 r  0  if k  (c)t = t -L 0  r  h  -  F(t <+L)-F(t ') F(t ) - F(t r  F(t ) r  JCF(t ',t ) = 0 a  r  0  r  - L)  - JCF(t  r  - L,t ) r  <0  If there is no t \ then JCF is monotone in t . Case (a) indicates that a spare 0  0  should be ordered as soon as possible (at time 0), while case (c) implies that one should order only on demand (at t - L). r  Proposition 2-1:  A change in cost coefficients will tend to change the optimal policy  values as follows: (a) for any given t , the optimal t is non-decreasing in the shortage cost rate k, and non0  r  increasing in the holding cost rate h, and the breakage cost b; (b) for any given t , the optimal t is non-decreasing in c, b, and h, and non-increasing in r  0  k. Part (a) implies that a previously optimal replacement policy of "replace at failure" remains unchanged if k or c increases, or if b or h decrease. Conversely, a previously optimal policy of "replace on receipt" remains unchanged if k or c decreases, or if b or h increase. Part (b) implies that a previously optimal ordering policy of "just in time" remains unchanged if c or b or h increase, or if k decreases. Conversely, a previously optimal policy of "always stock" remains unchanged if c or b or h decrease, or if k increases. For t to be increasing in the maintenance costs (b and c) is not surprising; a later 0  order lengthens the expected cycle length, hence allowing the amortization of these costs over a longer period of time. The effect of the inventory costs (k and h) on t is readily r  explained by examining the derivatives of the cost function, but does seem somewhat counter-intuitive from the perspective of a maintenance manager: if the penalty for not  53  having a spare at time of failure is high, the machine is replaced less often, which would seem to risk more failures; if holding costs are high, it is replaced more often, which would seem to require more inventory be carried. The rationale for this behaviour becomes clear when we take the inventory manager's perspective: if the shortage cost is high, we wait longer to replace so that we have more time to get a spare in stock; if the holding cost is high, we replace sooner so as to get the spare off our shelf. The jointly optimal pair (t , t ) can be calculated for a given problem instance 0  r  using non-linear programming (NLP) methods, which typically search for Karush-KuhnTucker (KKT) points. The N L P problem statement is to minimize JCF(t , t ) subject to 0  r  0 <t <t - L. Figure 2-2 shows the feasible region for the N L P problem in which the four 0  r  possible types of K K T point locations are shown. Figure 2-2:  to/K  Possible locations of K K T solution points.  tr = to + L  We know that the K K T conditions are necessary for a point to be a minimum (see chapter 4 of [BSS93]), but they are not sufficient for global optimality in all problem instances because although JCF is pseudo-convex in each variable individually it is not in general pseudo-convex in both variables simultaneously, as the following example shows.  54  Let F(t) = 1 - e \ L = 0.6, c = 900, b  Jointly Pseudo-Convex Counter-Example:  f  = 1000, k = 1000, h = 10. Then there are two K K T points on the diagonal boundary, one of which is a minimum while the other is a saddle point. In the next theorem we shall give a sufficient condition for global optimality; but first, the following definition is provided to aid in discussion. Definition 2-1:  We say that a feasible policy is viable if it makes economic sense to  operate the system under that policy. If the policy is not viable, then we would be better off not operating the system at all; it would be cheaper to constantly incur the shortage cost. For the problem at hand, a viable solution (t , t ) is one for which JCF(t , t ) < k. 0  r  0  r  The next theorem makes use of this fact to characterize a globally optimal solution. Theorem 2-3:  If (t , t ) is a K K T point of JCF and is viable, then (t , t ) is a 0  r  0  r  global minimum. In particular, any K K T point with t + L < t is always viable and hence 0  r  always a global minimum. Refer again to Figure 2-2. The diagonal boundary line represents the constraint t  r  -L >t , while the lower (horizontal) boundary line represents t > 0. Each circle 0  0  represents one of the four possible types of K K T points, and the arrows shows the improving direction associated with each. Theorem 2-3 indicates that a K K T point located in the interior (labeled 1) or on the lower boundary (2) is always a global minimum, while a point on the t = t + L diagonal boundary (3 or 4) might instead be a r  0  local minimum or a saddle point. When the minimum occurs at an interior point, combining Theorems 2-1 and 2-2 yields the relation  bz(t ' ) = k - h r  F(t ') F(t '+L) 0  0  F(t >) a  (2-13)  55  To compute an optimal pair (t , t ), rather than using generic N L P software we 0  r  could take advantage of the preceding theorems and corollaries to customize both the K K T conditions and a search algorithm. We do not know whether the potential benefits of using such a customized N L P approach in practice would be worth the effort of the customization, but in Appendix A we describe this idea and outline how such an algorithm might proceed (we did not actually use it for the numerical studies in this chapter). Having discussed both separate and joint optimization, we can now compare our joint approach to the more traditional approach in which RCF is used to choose t , and r  then another function is used to determine t . Note that 0  JCF = [OCF] +  [RCF]-  (2-14)  0  The multiplier for RCF adjusts for the longer cycle time due to the possibility of a stockout. The comparison of joint to sequential optimization involves two types of cost rate increases. The first arises in using RCF to calculate t , followed by JCF to calculate t r  0  given t (call this combination JCF:RCF). Here we lose by using RCF, which ignores r  inventory considerations in determining maintenance policy. Optimizing JCF.RCF will determine the best possible sequential policy from the system's point of view. The second cost increase comes from using some method of determining the inventory policy (given a maintenance policy) that is more myopic, easier to calculate, and/or easier to implement. Consider the following examples: (a)  One can ignore maintenance costs (due to b and c) in determining inventory policy,  by using OCF to calculate t given t (call this combination OCF.RCF). Optimizing 0  r  56  OCF.RCF will determine the best possible sequential policy from the inventory manager's point of view; (b)  If the inventory manager does not want to keep track of the age of the machine, a  policy of (5-1,5) form can be adopted (with 5 < 1 ) . XJCF.RCF is the same as JCF:RCF except that we restrict the choice of t to either extreme of 0 or t - L . Optimizing 0  r  XJCF.RCF Will determine the best possible sequential (5-1,5) policy from the system's point of view; (c)  One can ignore both maintenance costs and the aging of the machine by using  XOCF.RCF, which is the same as OCF.RCF except that we restrict the choice of t to 0  either extreme of 0 or t - L . Optimizing XOCF.RCF will determine the best possible r  sequential (5-1,5) policy from the inventory manager's point of view; or, (d)  Inventory optimization can ignore completely the existence of planned  maintenance: its costs, the aging of the installed machine, and the use of preventative replacement. This allows the use of a simplified approximation such as SOF to determine t given t (call this combination SOF:RCF). This is the simplest method for an inventory 0  r  manager to calculate and operate. All of these inventory methods can be thought of as variations on a continuous time (5-1,5) type of inventory policy. JCF and OCF imply that we use a ( - 1 , 0 ) policy until t = t when we switch to a ( 0 , 1 ) policy. Using XJCF, XOCF, or SOF to choose an 0  endpoint for ordering is equivalent to choosing only a ( - 1 , 0 ) or a ( 0 , 1 ) policy with no switching. Since the optimal t is generally not an endpoint, a standard (5-1,5) continuous 0  time policy is generally not the optimal form of ordering policy for the simple system we consider; see [ A K 9 5 ] for more about the sub-optimality of standard (5-1,5) policies. We can also think of these different methods in terms of the cooperation and information exchange that can or must occur between the maintenance manager and the inventory manager. Joint optimization implies that the two managers must cooperate in making the two decisions together for the benefit of the overall system, in spite of the  57  natural incentives for each manager to make their respective decisions so as to obtain maximum benefits for their own sub-system. Separate optimization does not require such cooperation, but merely requires some flow of information from the maintenance manager to the inventory manager; the maintenance manager would not use any information about the inventory process (i.e. the lead time and the cost structure). The use of JCF.RCF requires that the inventory manager have access to and make use of full information about maintenance costs, the preventive maintenance schedule, and the machine's lifetime distribution. The inventory manager needs to know the preventive maintenance schedule and the machine's lifetime distribution in order to use OCF.RCF, while only the mean of the lifetime distribution needs be known in order to use SOF.RCF. We naturally expect that the use of less information will tend to make the inventory decision process simpler but result in higher costs. While it is difficult to make generalizations about the size of these cost increases, we can offer the following. Proposition 2-2:  Sequential optimization can be arbitrarily worse than joint  optimization; i.e. the ratio JCF:RCF IJCF can be arbitrarily large. Proposition 2-3:  Using OCF.RCF instead of JCF.RCF to find t always gives an 0  earlier ordering time; that is, (t from OCF.RCF) < (t from JCF.RCF). 0  Proposition 2-4:  0  Not only does SOF ignore interior points (t : 0 < t < t - L) which 0  Q  r  may be optimal, but it is possible for SOF to choose the wrong endpoint (t = 0 or t = t 0  0  r  L). Our model in this section does not explicitly deal with the issue of how much could be saved if we could keep multiple spares in stock or on order, but it does offer one indicator in the optimal value found for the ordering time. If the optimal t = 0, we would 0  suspect that further cost reductions could be obtained if we allowed for advance ordering of spares (i.e. relax the constraint t > 0 to t >-L) and/or multiple spares in stock. 0  0  58  Conversely, an optimal t > 0 indicates that nothing would be gained by relaxing the 0  ordering constraint and that having more than one spare in stock would be sub-optimal. In our modeling of the problem so far, we have taken as our objective the minimization of the expected cost rate; that is, the cost rate calculated over calendar time. We believe that in general this is the most reasonable objective. Some other authors, however, have also considered the maximization of the cost effectiveness, which they define as the expected proportion of time that the system is functioning (i.e. not short) divided by the expected cost rate during system functioning. This is equivalent to minimizing the expected cost rate calculated over operating time (rather than calendar time). If we were to use this as our objective, then our equation for JCF would become slightly simpler, in that there would be one less term in the denominator.  (2-15)  JCF\{t ,t ) 0  r  0  One could easily show that all of the results previously derived for the original objective (2-12) would continue to hold for this new objective (2-15). Our modeling effort so far has also assumed that the system manager has reasonable estimates for all of the required cost coefficients; this may not be true, especially for the shortage cost rate k and the breakage cost b which often should include such intangibles as loss of customer good will. It might therefore be more appropriate to specify constraints on the solution that enforce certain levels of performance, instead of or in addition to the use of cost parameters; these constraints are often called service constraints. This technique is often used in inventory modeling, where the same problem can arise, but to our knowledge has not previously been considered in the literature for the single spare model. For example, in place of specifying a breakage cost, we might require  59  that the expected proportion of machines which fail in use be less than some maximum limit. Similarly, instead of specifying a shortage cost we might set a minimum on the expected proportion of demands for spares which are filled from stock (i.e. a minimum fill rate) or a maximum on the expected wait to receive a back-ordered spare. We can add these service constraints to the N L P minimization problem as follows. Minimize:  JCF(t ,  t)  0  r  Subject to:  t >0  t >t + L  o  r  F(t )  < maximum expected failure proportion  r  F(t  0  0  (2-16)  + L) > minimum expected fill rate  (2-17)  t„ + L  j F(x)dx  < maximum expected wait  (2-18)  Specified in this way, service constraints are easily handled in our search for an optimal solution pair (t , t ). We say this because each of them only acts on one of the 0  r  two decision variables; in terms of the solution space illustrated in Figure 2-1, each constraint is either a horizontal or a vertical line. We can therefore use the results of Theorems 2-1 and 2-2 to show that Theorem 2-3 still holds, and so our search procedure should proceed as easily with service constraints as without. Note that if a service constraint is used instead of the shortage cost k, i.e. so that k = 0, then the optimal ordering time will be as late as is feasible, i.e. the optimal t will be a boundary solution. 0  We next turn to numerical studies on two small sets of test problems to illustrate the kind of performance that can result from sequential versus joint optimization. We did not use any service constraints. First we tried a small parameter set obtained from an existing business; the goals were simply to see what could happen, and to do a "reality check" on our thinking (i.e. do the cost functions behave as expected?). Table 2-1 shows the cost coefficients and calculated optima for 12 combinations of distributions and costs.  60  These 12 cases were derived in the following manner. Case 1 uses informed estimates for cost coefficients and for L and u. for a computer used by a business consultant based in a major city (i.e. in a climate controlled office with ready access to spare parts); the failing part is a hard disk drive. The failure distribution F(t) is arbitrarily assumed to be of Weibull form F(t) = 1 - e-^ with shape parameter iq = 3, u. = 2000 days, L = 1 day. n  Cases 2 and 3 use the same situation but with the inventory cost coefficients doubled and halved. Cases 4-6 are duplicates of 1-3 but with r) = 1.5. Cases 7-12 are for the same consultant when doing business in a remote northern community (harsher environment, no local source of spares); u. = 1000 days, L = 3 days. All cases use c = 500 and b =950. The columns in the table show calculated optimal times and expected cost rates of joint optimization, compared to sequential optimization using each of JCF, OCF, and SOF to find t . The columns labeled "%A" show the percentage increase in the expected cost D  rate over the calculated joint optimum; i.e. % A = [ sequential cost - joint cost ] / [joint cost ]. Calculations were done using the Maple mathematics software package. Note that the calculated t and t values for all the sequential methods were obtained by simple 0  r  univariate optimization, so the proven pseudo-convexity properties of JCF are sufficient to guarantee that these are in fact optima. Table 2-1:  Example Solutions  #  L  h  k  1  1  2 3  RCF  Joint Optimum  Changing Parameters  'r  JCF 0.621  2.0  1239 1238 1151 1150  0.5  1294 1293  300  1.0  600 150  2.0  3.0  300  1.0  1  3.0  600  1  3.0  150  4  1  1.5  5  1  0.5  'r  JCF.RCF  OCF.RCF  SOF.RCF  %A  %A  %A  1360  1359  0.6 1359  0.6 1359  0.6  0.665 1360  1359  2.4 1359  2.4 1359  2.4  0.596 1360  1359  0.3 1359  0.3 1359  0.3  2030 2029  0.877 2536  2535  0.6 2535  1.001 2536  2535  1.8 2535  0.6 2535 1.8 2535  0.6  1718 1717  0.1 2535  0.1 2535  1.8 0.1  6  1  1.5 1.5  7  3  3.0  300  1.0  542  539  1.405  680  654  4.7  653  4.7  677  4.8  8  3  3.0  600  2.0  475  472  680  654  13.0  653  13.0  677  13.0  2249 2248  0.813 2536  2535  9  3  3.0  150  0.5  596  593  1.597 1.285  680  656  1.5  653  1.5  677  1.6  10  3  1.5  300  1.0  851  641  2.222 1268  641  1.1  635  1.1 1265  3.9  11 12  3 3  1.5 1.5  600  2.0  562  559  2.866 1268  638  4.8  635  4.8 1265  9.1  150  0.5  1058  648  1.862  648  0.3  635  0.3 1265  1.9  1268  61  The results in Table 2-1 suggest that in most cases although the t and t values 0  r  may differ noticeably between joint and sequential optimization, the actual cost rates are tend to be quite similar. This is particularly true of the sequential methods, which tend to give results very similar to each other. The cost differences tended to be larger for the cases where the inventory costs were large and the lead time relatively long (i.e. where inventory considerations were most relevant); the largest difference was observed (in case 8) where both of these were true. Note that as predicted by Proposition 2-3, OCF:RCF always gives an earlier ordering time than does JCF:RCF. The next step in our numerical work was intended to explore the behaviour of our heuristics over a wider range of coefficients, and to try to characterize when the cost differences would or would not be large. We again chose a failure distribution F(t) of Weibull form with shape parameter r\ = 3, scale parameter A= 1; so // « 0 . 9 . For all cases c = 1 while the other parameters were varied as follows: L = { 0.5, 0.05, 0.005 }, the ratio of b/c = { 10, 1, 0.1 }, the ratio of h/c = { 5, 0.5, 0.05 },and the ratio ofk/h = { 10000, 1000, 100, 10, 1 }. In total this gave 135 different cases or combinations of parameters. In Table 2-2 we summarize the joint versus sequential differences for these cases by giving the mean and other statistics of the cost rate differences relative to the joint optima. Table 2-2:  Results Summary  Sequential Procedure  OCF.RCF  JCF:RCF  XOCF.RCF  XJCF.RCF  SOF.RCF  2.6 6.5  3.3 7.5  4.3 9.5  4.6 9.8  10.9  28.0  29.3  52.6  52.6  52.6  # Cases with increase < 1% # Cases with increase 1-10%  110  106  94  93  92  9  10  20  20  21  # Cases with increase > 10%  16  19  21  22  22  % Cost Increase - Mean % Cost Increase - Standard Deviation % Cost Increase - Largest  5.1  As expected, the sequential combination JCF.RCF gives the closest results to joint optimization. On average the differences were quite small, but they varied greatly from case to case: the vast majority of cases had negligible differences, but about one in ten had  62  a difference >10%. Moving across the table we see that both the frequency of large differences and their magnitude increase somewhat as we use less and less information about maintenance in optimizing inventory. The methods increasingly behave in a "hit-ormiss" fashion, tending to give results that are either very good or very bad; there did not seem to be any pattern as to which method did better in any given instance (except as discussed in the next paragraph). OCF.RCF generally does better than XJCF.RCF, which implies that it is more important for inventory to allow scheduled orders (due to part aging) than to account for maintenance costs. XOCF:RCF, which ignores both of those factors, generally performs worse than both of these methods. The simplistic SOF.RCF is the worst performer on all counts, though of course it is also the easiest to use. What Table 2-2 does not show is under what circumstances the large differences occurred. Our numerical work did not reveal any simple algebraic rules for this, but it did reveal a pattern that we can describe qualitatively and which we think makes good intuitive sense. Large differences most commonly occurred when all four cost coefficients were in some sense balanced; we might think in terms of the potential costs that could be incurred over the life of the part (i.e. the life-cycle costs). When maintenance costs were much larger than inventory costs, they drove the optimization process and made the joint versus sequential issue irrelevant at the system level. Similarly, if any one life-cycle cost was very large (or very small), its presence (respectively, absence) made the optimization process much more straightforward. It was when all four life-cycle costs were operationally significant that it became important to trade them off in an optimal manner; and so then joint optimization showed a material advantage. A comment about the ordering lead time can also be made. The primary effect of increasing lead time relative to the part life is to increase the potential shortage cost incurred; so indirectly the lead time also plays a role in creating or avoiding the balanced life-cycle cost structure described above. The lead time had a more direct effect when it was very large; e.g. L = 0.5 was more than half the expected life of the part. In these  63  circumstances, it was more difficult to find a viable solution (because the part was often failing before spare delivery), and so working around this lead time problem came to dominate the solution process. The most obvious limitation of these numerical results is that they are based upon a particular choice of failure distribution. We expect that the qualitative issues we describe would hold in general, though the size and frequency of the cost differences would vary. Consequently, we feel able to conclude that reasonable gains can be made using joint optimization, and that if sequential optimization is unavoidable one can benefit from using as much maintenance information as is available when making the subsequent inventory decision. 2.05 Early Ordering with Replacement on Receipt The previous section restricted 0 < t but some other authors, e.g. Thomas & 0  Osaki in [T078a] and [T078b], have allowed -L < t < 0 as an alternative, so that a spare 0  could be ordered before the start of the current cycle. That this could be appropriate would be indicated by a solution to JCF in which the optimal t = 0 (a boundary solution); 0  conversely, t > 0 indicates that there is no value in having a spare arrive any earlier. Such 0  a boundary solution will often arise when the order lead times are relatively long compared to the machine life (e.g. if L > p). To keep the model tractable these authors have also restricted the maintenance policy to performing replacement on receipt, i.e. always install a new spare as soon as it arrives, regardless of the installed machine's functioning or age (hence fix t = t + L). This is obviously a severe restriction on maintenance flexibility, r  0  but it does allow for considerable simplification in the analysis. The replacement on receipt restriction enables us to know the exact spare arrival times and cycle lengths in advance; without this restriction, the timing uncertainty of future cycles makes analysis intractable. [T078b] also tried to analyze the model without this restriction but erroneously treated the future arrival time of a spare as being independent of the choice of  64  ordering time. Replacement on receipt also implies that the system may have multiple spares on order but never any stock on hand, and hence no holding costs will be incurred. Simplifying JCF to fit the new assumptions above yields the following equation JED for the expected cost rate, with -L < t < 0 or equivalently 0 < t + L < L. 0  0  t„ + L  c + bF(t  a  JED(t )  =  a  + L) + k J F(x)dx ,  w  °  ( 2  "  1 9 )  This equation need not be pseudo-convex over its entire range, but fortunately it is over the range of interest to us. Theorem 2-4: Over its viable range (i.e. all t : JED(t ) < k), JED(t ) is pseudo-convex in 0  0  0  t with a unique minimum at: 0  (a)  -L<t <0  if  bj\t +L) + kF(t +L) - JED(t ) = 0 ;  (b)  t =0  if  bj\L) + kF(L)-JED(0)<0.  o  o  0  0  0  Note that (b) implies there is no benefit to ordering early. For example, let c = $ 10, b = $ 20, h = $ 2, k = $ 100, L = 1, and F(t) = 1 - e-*\ so that the delivery lead time is longer than the average machine life (L < u,« 0.9). Optimization of JCF from the previous section with these parameters gives its best solution of JCF(0.00, 0.80) = $ 41.03. If we restrict maintenance to replacement on receipt but allow early ordering of spares, then we can optimize JED in this section to find a solution ofJED(-0.48) = $ 27.65. In this example, it is optimal to order the spare one renewal cycle in advance of its use, but in general the best ordering time may be several periods in advance. 2.06 Single Spare With Returns The previous section considered policies in support of a machine which was deteriorating with age, i.e. was IFR, but in this section we consider instead a machine whose reliability is improving with age, i.e. is DFR. There is clearly no requirement for preventive replacement of the machine, but there is still a need to provide a spare for  65  eventual use as a corrective replacement once the original fails. Furthermore, since over time the installed part is less and less likely to fail, it may be beneficial to return a spare from our stock to its source once the risk of failure has sufficiently declined. Thus to run such a system at minimal cost, we must determine an optimal policy for the order and return of spares. Why should we be interested in such a system? First, while one may not always have the option of returning a spare to its source, this option is certainly not unheard of; within a large corporation for example, a regional repair shop could ship back an unneeded spare to the company's central warehouse. Second, while it is true that the reliability of most machines deteriorates with age, there are examples where the reverse is true, at least during the early part of the operating life when so-called infant mortality may be a problem such as for components with a bathtub-shaped hazard rate curve. To our knowledge this option of returning an unused spare has not previously appeared in the literature for the single spare model. We need to add some notation to model this situation. We use t to represent the s  scheduled time at which to return a spare from stock to the external source; we require 0 < t <t - L < oo. A restocking cost r, representing shipping and handling etc., will be 0  s  incurred i f a spare is returned unused to its source. The form of our spare policy is as follows. At time zero, the newly installed machine begins operation. If it fails before t , 0  then immediately place an order for a new one. If the machine is still operating at t , then 0  place an order for a spare at that time. Once a spare arrives, use it to replace the installed machine if it has previously failed; otherwise hold the spare in stock until it is needed. If the installed machine has not failed by time t , then return (send back) the spare to its s  source. If the installed machine fails after t , then immediately order a new spare and s  install it upon arrival. Our decision problem is to find a pair of values (t , t ) that 0  s  minimizes the expected cost rate. We continue to assume that we can deal with only one spare at a time, and that the order lead time is deterministic.  66  We note as an aside that our order and return policy is somewhat related to an continuous review inventory system using an (s-1, s) ordering policy, except that our s changes twice over time. We set s = 0 initially, then at t switch to s = 1, and finally at t 0  s  switch back to s = 0. For this model, the expected shortage (or stock-out) time per cycle is (2-20) t  a  \Lf(t)dt+  + L  co  \{t  • a  + L - t)f(t)dt  t„ + L  + J Lf(t)dt  = j F(t)dt +  LF(t ) s  and the expected holding time per cycle is  )(t-t -L)f(t)dt  + {t -t -L)F{t )=  0  s  0  'jF(t)dt  s  t„ + L  (2-21)  t + L 0  The cycle length is the sum of the machine lifetime plus any shortage time, so the expected cycle length is LI + j F(t)dt + LF(t ) s  .  (2-22)  to  The replacement cost c will be incurred every cycle, while the restocking cost r will be paid only if we return the spare, i.e. only if the installed machine fails after the scheduled return time. Combining these terms with our other cost parameters gives an equation for the system's expected cost rate, which we call ORS.  t, + L  _  t,  c + k j F(t)dt + kLF(t ) s  ORS{t ,t ) 0  s  =  *  _  _  + h J F(t)dt +  — ju + J F(t)dt +  rF{t ) s  ^  (2-23)  LF(t ) s  to  This function has some useful properties (recall that 0 < t < t - L < QO). 0  s  67  Theorem 2-5: For any given t , ORS is pseudo-concave in t ; hence the optimal ordering s  0  time will be either t = 0 (order immediately) or t = oo (order on failure). 0  0  Theorem 2-6: For any given t , ORS is pseudo-convex in t ; hence there will be a unique 0  s  optimal return time at:  (a) t = t + L s  0  (b) t +L<  t' <oo  0  s  (c) ^ = oo  if  r + kL - h /z(t + L) - ORS(t ,t +L)L  if  r + kL - h/z(t' ) - ORS(t ,t' )L  if  r + kL  0  0  s  0  s  - /i/z(oo) - ORS(t ,cc)L 0  0  <0 ;  = 0; >0.  Case (a) implies that return upon receipt is optimal, whereas case (c) implies that it is optimal to keep the spare until failure. These results allow us to readily determine an optimal spare policy (t , t ). 0  We  s  proceed as follows, noting that if t = oo then we must also have ^ = 00: 0  Step 1: compute the value of ORS(oo, co), i.e. Oitf(oo,oo) =  (2-24)  fj. + L  Step 2: fix t = 0 and optimize ORS over t to find ORS(0, t' ); 0  s  s  Step 3: compare ORS(oo, 00) with ORS(0, t' ) and choose the least costly policy. s  In effect, we need to perform only a univariate optimization plus one extra function evaluation of ORS in order to determine our optimal policy. Note that we allow t = t + L, i.e. the spare can be returned as soon as it arrives. s  0  This "sneaky" policy avoids incurring any holding costs, while ensuring that a spare is "on the way" during the initial period of machine operation (when the risk of failure is highest). If the restocking cost r is sufficiently high (as we would expect in most applications) then this "sneaky" policy will not be optimal. The following describes other properties of ORS. Proposition 2-5: solution as follows:  A change in the cost coefficients will tend to change the optimal  68  (a) for a given t , the optimal return time t is non-decreasing in the restocking cost r and 0  s  the shortage cost rate k, and non-increasing in the holding cost rate h and the replacement cost c; (b) for a given t , the optimal ordering time t is non-decreasing in c, r, and h, and nons  0  increasing ink. This proposition reflects what we would intuitively expect to happen if the system costs were to change. For example, when the holding cost h increases we tend to avoid holding a spare by ordering later and returning earlier; conversely, when the shortage cost k increases we tend to avoid the risk of shortage by ordering a spare earlier and holding it longer. Now suppose that we begin at time zero with a spare already in stock, so that we don't need to place an order but have only to decide when (if ever) to return this spare. This obviously would not happen on an ongoing basis, but could apply for example upon start-up of a new system or when a replacement is performed before failure for reasons of technological change. The mathematics which describe this situation are similar but slightly simpler than those for our original model. The expected shortage time per cycle, the expected holding time per cycle, and the expected cycle length are respectively  0  The replacement cost c will still be incurred every cycle, while the restocking cost r will be paid if the installed component fails after the spare has been returned. Combining these terms with our other cost parameters gives an equation for the system's expected cost rate, which we call RS.  RS{t ) = s  0  H +  LF(t ) s  (2-25)  69  Not surprisingly, this equation RS has similar properties to ORS. Theorem 2-7: RS is pseudo-convex in t ; hence there will be a unique optimal return s  time, either t = 0 (return immediately), 0 < t' < QO, or t = QO (keep until needed at s  s  s  failure).  (a) t = 0  if  r + kL - h/z(0) - RS(0) L < 0 ;  (b) 0 < t' <oo  if  r + kL - h/z(t' ) - RS(t' )L  (c) * = oo  if  r + kL - /z/z(oo) - itf(oo)Z >0.  Proposition 2-6:  The optimal return time t is non-decreasing in r and A:, and non-  s  s  5  s  s  =0;  s  increasing in h and c. Next we present a hypothetical numerical example for our ORS model and compare the cost of our (t , t ) policy to the cost of using a simpler policy that does not 0  s  allow returns (in effect, t = oo). Suppose we have a machine which fails according to a s  DFR Weibull distribution with shape parameter r| = 0.5 and scale parameter X = 1, so p = 2 years. Let the lead time L = 0.05 years, the cost of replacement c = $ 1000, the restocking cost r = $ 100, the holding cost rate h = $ 200 / year, and the shortage cost rate k = $ 10,000/year. If we use a policy which does not allow returns, then our options are to order either immediately and hold until failure, or to not order until failure. If we order immediately, we will incur a holding cost until failure but no restocking cost. Setting t = 0 0  and t = QO in (2-23) gives the expected cost rate. s  (2-26)  0.05  co  1000 + 10000 J [ l - e-'°' }dt + 10000(0. 05)(0) + 200 J e'^dt s  0.05  0  0.05  2  j[l-e-  +  0  dt + 0  + 100(0) * 728  70  If instead we do not order until failure, then both the holding and the restocking costs are avoided but we expect to pay more for shortage costs. Setting t = oo and t = oo in (2-23) 0  s  gives the expected cost rate for this policy. 1000 + 10000(0.05) + 10000(0.05)(0) + 200(0) + 100(0) ^  ^  2 + 0. 05 + 0 We would of course chose the less expensive of the two and so order immediately and hold until failure, at a cost rate of $ 728 / year. If we now allow the return of spares, then we may also choose to order immediately and return at some time t . Optimizing ORS(0, t ) over t gives an optimal t' s  s  s  s  = 2.0 years, at a cost rate of (2-28)  0.05  1000 + 10000 J [ l -  2.00  ]df + 10000(0. 05)e' " + 200 J e'^dt 200  + lOOe" - " 2  00  0,05  Q  6  ?  9  0.05  2 + J f l - e \dt e>  + (0.05)e"'"  o  So in this particular example, we can expect to save approximately 7% by returning an unused spare (at $ 679 / year) rather than holding it until failure (at $ 728 / year). In formulating both versions of this model, we use the minimization of the expected cost rate per unit total time as our objective. This is a common objective and is arguably the best one. In some circumstances, however, the minimization of the expected cost rate per unit operating time (equivalently, the maximization of the so-called cost effectiveness) may be a more reasonable objective. We therefore note in passing that all of the results in this section still apply to this alternate objective, albeit in a slightly modified (and simplified) form. 2.07 Single Spare Extended Cost Structure In Section 2.04 of this chapter we considered a model with a simplified cost structure so as to keep our analysis relatively straight forward. In this section we show  71  that the same results apply to a somewhat more general cost structure; we retain the restriction of a single spare with deterministic order lead times. First allow the replacement cost cv(t) to be a non-increasing convex function of the age of the installed machine. The expected replacement cost per cycle then becomes  cC(t )  = c | | v ( x ) / ( x ) & + v(f )F(/ )J  r  r  (2-29)  r  We also allow for an increasing operating cost at rate aq(t), which will add an expected cost per cycle of t,  _  aA(t ) = ajq(x)F(x)dx  (2-30)  r  0  Finally we allow the replacement action to take a random amount of time, with a mean of R. This will increase the expected cycle length by R and add an additional shortage cost kR per cycle. Adding these three features to the equation JCF gives the extended cost  rate JOD. (2-31)  t„ + L  t  r  _  cC(t ) + k J F(x)dx + kR + h J F(x)dx + bF(t ) r  JOD{t ,t ) 0  r  r  =  + oA(t ) r  ^ J F(x)dx + J F(x)dx + R o  t„  This extended function has the same properties as does JCF, so Theorems 2-1,22, 2-3, and 2-4 still apply (with appropriate additions in notation). For this reason, the proofs in Appendix B for these theorems are written using the extended cost function JOD rather than the simpler JCF. 2.08 Scheduled versus Unscheduled Orders In practice there can be a difference in the lead times and delivery costs between orders which are scheduled (i.e. placed at t ) and those which are unscheduled (i.e. placed 0  72  on failure before t ). Unscheduled orders may involve larger administrative costs to place, 0  and since they are of a more urgent nature they may be delivered by a faster but more expensive method of transport (e.g. by air instead of by land). This feature has been included in many previous works on this subject, including [Os77], [T078a], [K078a], [K078b], [K078d], [K081], [KH81], [SS89], and [Sr92]. Allowing for multiple lead times and delivery costs in our model involves a fairly straightforward addition to our notation, but can complicate the analysis considerably. That a two number (t , t ) policy is generally not optimal can be argued as follows. If the 0  r  installed machine fails well before t , then we will order immediately, even though this 0  will involve paying some premium for expedited delivery; and if the machine fails after t , 0  an order will have already been placed and so we can only wait for its arrival (assuming of course that we cannot retroactively hasten its delivery). If however the machine fails just slightly before t , then depending on the relative costs and delivery speeds of the two 0  order types we may wish to wait until t to place a less expensive scheduled order rather 0  than immediately making a more costly emergency order; thus an optimal policy form will now require more decision variables in order to specify this waiting procedure. Whatever analysis we do for a system like this, we must be clear in our assumptions about what types of orders can or must be placed and their characteristics. Given a particular set of such assumptions, we can adapt our equations from the previous sections to distinguish amongst order types.. Suppose that lead times remain deterministic but are now distinct: scheduled orders still take L units of time to arrive, but unscheduled orders (i.e. on failure) require only L <L units of time to arrive and incur a cost premium c u(t), where c is a cost X  x  x  coefficient and u(t) is a non-decreasing function of the component age (e.g. u(t) - 1 for all t implies a constant cost premium). We will use a double age (t , t ) ordering policy as x  0  follows. If the installed machine fails before the expediting age t then immediately place x  an unscheduled order. If failure occurs after t but before t then wait and let the x  0  73  scheduled order occur at t . If there has been no failure by t then place a scheduled 0  0  order at that time. Preventive replacement will occur as before at t if failure has not r  occurred by then. Thus the system policy is specified by the three values (t  x  ,t ,t ), 0  r  where 0 < t < t < t - L < oo). Note that our use of two critical ages for ordering seems x  0  r  not to have appeared in the single spare literature before. The expected shortage time per cycle will now be  K(t ,t ) x  = R + °\F(t)dt  a  - F(t )[(t x  0  + L) - (t  +4)]  + L) - (t  +4)]  x  (2-32)  The expected cycle time also changes to  _  t„ + L  JF(t)dt  + R + JF(t)dt  - F(t )[(t x  a  x  0  (2-33) =  'jF(t)dt  +  K(t ,t ) x  0  0  The expected ordering cost premium is (2-34)  c \u(t)f(t)dt x  o  The following equation describes the expected cost rate JTD(t  x  ,t ,t ) 0  r  (2-35)  cC(t ) r  + bF(t )  + aA(t )  r  + kK(t ,t )  r  x  0  + h JF(t)dt to + L  JF(t)dt  +  +  c \u(t)f(t)dt x  0  K(t ,t ) x  0  0  Note that settingL = L, t =t , x  of the previous section.  x  0  and c = 0 transforms JTD back into the JOD equation x  74  Solving for optimal values of t , t and t is fairly straight forward, as the x  0  r  following suggests. Theorem 2-8: JTD(t , t , t ) is pseudo-convex in each of t , t and t , so that there will x  0  r  x  0  r  be a unique minimum for each variable (given values for the other two) as described below: (a)  Given t and t , the optimal t is at: x  0  (i) t =t + L r  i f h + bz(t +L) + aq{t + L) + cv\t +  Q  o  (ii) t +L<t < 0  r  0  L)-JTD(t ,t ,t +L)>Q\  o  x  o  0  oo if h + bz(t ) + aq(t ) + cv\t ) - JTD(t ,t ,t ) = 0;  r  r  (iii) t = co  r  r  x  i f h + bz(oo) + aq(cc) + cv'(°o) -  r  o  r  JTD(t ,t ,oo)<0. x  o  Case (i) implies that a spare should be installed upon receipt, while (iii) implies that replacements should be performed only at failure; (b)  Given t and t , the optimal t is at: x  r  (i) t = t 0  if k - h F(t ) - F(t + L)  JTD(t ,t ,t )  if k - h ° F(t ) - F(t + L)  JTD(t ,t ,t )  F  x  0  0  x  (  t  x  L  x  (ii) t <t < x  x  t -L  0  0  (  t  L  x  (iii) t = t -L ° a  )  x  > 0  r  x  a  = 0  r  0  i f k - h =—  r  x  x  F  r  r  )  JTD(t ,t x  - L,t ) < 0  r  r  F(t )-F(t ) x  r  Case (i) implies that a spare should be ordered as soon as possible, while (iii) implies that a spare should be ordered just in time for scheduled replacement; (c)  Given t and t , the optimal t is at: 0  (i) t =0  r  x  if  x  k + JTD(0, t ,t ) > 0 Q  (t (ii) 0 < t < t ° x  K  )  x  Q  if  X  c k  (t + L) - (t + L ) 0  r  +L)-(L )  0  x  x  + jTD(t ,t ,t ) - 0 y*>o,r) x  a  r  75  t  (iii)f =  - k + JTD(t ,t ,t )  if  0  x  0  0  <0  r  Case (i) implies that unscheduled orders should never be placed, while (iii) implies that a spare should always be ordered as soon as the installed machine fails. Note that a two number policy (t , t ) is equivalent to fixing t =t , which will not 0  r  x  0  generally lead to an optimal solution for this problem. As an example, let c = $ 10, b = $ 20, h = $ 2, k = $ 100, L = 0.05, L = 0.05, c = $ 5, u(t) = 1, and F(t) = 1 - *r' , so that 3  x  x  unscheduled orders require the same lead time as scheduled orders but are more expensive (assume other costs are zero). Optimization of JTD with these parameters but fixing t = x  t (a single age ordering policy) gives its best solution of JTD(0.39, 0.39, 0.62) = $ 25.78. 0  If we remove the restriction on t and so use a double age ordering policy, then we can x  optimize JTD to find a lower cost solution of 7773(0.32, 0.39, 0.62) = $ 25.67. Our treatment here generalizes much of the past work dealing with deterministic lead times, so that [Os77], [T078a], [K078a], [K078b], and [K078d] can be viewed as special cases of the work herein. 2.09 Single Spare, Random Lead Time In this section we again assume that scheduled and unscheduled orders are identical in both cost and lead time, but now the lead time is allowed to be random with arbitrary distribution G(t); this was the case in [KH81], [PP86a], [PP86b], [SS89], and [Sr92]. In addition to making formulae somewhat complicated, the randomness also has an effect on the optimal form of the age replacement policy: a conventional single critical age is not optimal. Due to the randomness of its delivery time, an ordered spare may not arrive in time for the scheduled preventive replacement at t ; so a policy must indicate r  what to do when this occurs. The new maintenance policy is a double age replacement policy (t , t ): if the machine survives until age t replace it if a spare is available at that r  w  r  time or as soon as possible thereafter, but shut down the machine at the wear-out age t > w  t even if the spare has still not arrived (thus incurring shortage costs until it does arrive). r  76  Note that the single age replacement policy from the previous sections is a special case of this in which t = t . w  Since scheduled and unscheduled orders are identical, the inventory  r  policy remains a single critical ordering age t , so our optimal system policy has the three 0  decision variables (t , t , t ), where 0 < t < t < t < QO.- Note that our use of two critical 0  r  w  0  r  w  ages for replacement seems not to have appeared in the single spare literature before. To derive the expected cost rate for this system, we need to consider two random events: the time of failure of the installed machine, and the time of arrival of an ordered spare. The system behaviour and hence costs incurred depend on when each of these events occurs, and in particular which one occurs first. Figure 2-3 shows the possible orders of events that can occur in the system during each renewal cycle. Figure 2-3: Timing of Events with Random Lead Times (a) Machine survives until scheduled replacement and spare arrives before then  —I  1  0  1  to  S  1  tr  tw  (b) Machine fails before scheduled replacement but after spare arrival  —1  1  0  1  to  S  X  1  tr  tw  (c) Machine fails after spare is ordered but before it arrives  —1 0  1 to  ^ ™ X S  1 tr  1 tw  (d) Machine fails before spare is scheduled for ordering  0  X  to  S  i tr  tw  (e) Spare arrives after scheduled replacement and machine survives until then  1  0  1  1  to  tr  1—  S  tw  77  (f) Spare arrives after wear-out age and machine survives until wear-out age  —I  0  1  1  to  tr  tw  S  The terms in the cost rate equation for this random lead time model cover the same types of costs as before but are complicated by the need to consider all of the possible combinations of events. The expected replacement cost per cycle is c times (2-36)  C(t ,t ,t ) a  r  = ]v(t)f(t)dt  w  + F(t )G(t r  - t )v(t )  r  0  + F(tJG(t  r  w  -  t )v(t ) Q  w  0  + G(t  ~ t ))v(t)f(t)dt  w  + J  + F(tSj°v(t  0  jv(t)f(t)dtg(l)dl  tr-'o  +l)g(l)dl  0  + J v(t  0  'r  +1)  t -t„  \f(t)dtg(l)dl  l+ t  t  a  The probability of breakage each cycle is B(t ,t ,t ) 0  r  = F(t ) + 'jG(t - t )f(t)dt  w  r  (2-37)  0  tr  The expected operating cost per cycle is a multiplied by  A(t ,t ,t ) 0  r  = G(t - t )jq(x)F(x)dx  w  r  + G(t  0  w  - t )] a  q(x)F(x)dx • (2-38)  +  J t -t„ r  J q(x)F(t)dx  g(l)dl  o  The expected holding time per cycle H(t , t , tyj is 0  t-t„  J J [' -  r  t,-t  0  - l]g(l)dlf{t)dt  + F(t ) \{t r  r  -t Q  l]g(l)dl  (2-39)  78  The expected shortage time per cycle K(t , t^J is 0  (2-40)  CO  R + LF(t )  00  + J \{t  a  + I - t]g(l)dlf  0  to  w  a  l-t ]g(l)dl w  CO  + | j G(l)dlf(t)dt to  +  w  00  w  a  + ~F(t ) \ [t t -t„  t  = R + LF(t )  (t)dt  '-'o  + F(t )  JG(l)dl  w  t-t  t -t„  0  w  The expected length of a cycle is  R + \(t  + L)f(t)dt  + F(t )  J(/  r  0  +l)g(l)dl  + t F(t )G(t r  r  - O  r  t -t„  o  r  t  t,  r  +jtG(t  - t )f(t)dt  + J \(t  Q  +  + l)g(l)dlf(t)dt  a  t„  = )l(t)dt  CO  .  (2-41)  t-t„  K(t ,t ) 0  r  0  where the last term is K(t , ty) but fixing t = t . 0  MI  r  The sum of the six cost terms divided by the expected cycle length gives the expected cost rateJOR(t , 0  t ,t ). r  w  (2-42)  cC(t ,t ,t ) a  r  w  + bB(t ,t ,t ) 0  r  + aA(t ,t ,t )  w  0  JF(t)dt  +  r  + hH(t ,t )  w  a  r  +  kK(t ,t ) a  w  K(t ,t ) 0  r  o We delay until the next section a discussion of the properties of this equation. 2.10 Single Spare, Two Random Lead Times This section combines the results of the previous two sections, in that unscheduled orders may be more expensive and faster on average than scheduled orders and that both of these lead times may be random. This is the most general single-spare model that we  79  consider in this chapter, and is similar to the ones considered in [KH81], [PP86a], and [PP86b]. To run such a system requires the use of a double age maintenance policy (t , r  ) combined with a double age ordering policy (t , t ). x  number policy (t ,t ,t , x  0  Thus the system will use a four  0  t ), where 0 < t < t < t < t < <x>.  r  w  x  0  r  w  For this model the expected costs per cycle of replacement, breakage, operating, and holding are the same as those in the previous section. The expected ordering cost is (2-43) The expected shortage time per cycle changes to (2-44) K(t ,t ,t ) x  +J  a  = R + L F(t )  w  x  + ][t  x  + I ~ t]g(l)dlf(t)dt  = R + \F(t)dt + j JG(l)dlf(t)dt  + LF(t )  + F(t )  + F(t ) w  t  o  + L - t  t]f(t)dt  \ [t  w  - [  a  +L -  0  Q  x  +  -  l-t ]g(l)dl w  L ]F(t ) x  x  JG(l)dl  The average cycle time is simply (2-45) where the last term is K(t ,t ,t^) but fixing t = t . x  0  w  r  Combining these yields the cost function JTR(t ,t , x  0  t ,t ) r  w  shown below.  80  (2-46)  cC(t ,t ,t ) 0  r  + bB(t ,t ,t )  w  a  r  + aA(t ,t ,t„)  w  a  + hH(t ,t )  r  B  + kK(t,,t ,t )  r  a  +  w  c ]u(t)f(t)dt x  o_ i,  j~F(t)dt  +  K(t ,t ,t ) x  0  r  o  Note that setting L = L, t =t , and c = 0 transforms JTR back into the JOR x  x  0  x  equation of the previous section. This means that both equations have the properties described below. Theorem 2-9: JTR(t ,t ,t ,t )\s x  0  r  pseudo-convex in each of t , t and t ; thus there  w  x  r  w  exists a unique optimum for each variable (given values for the others) as described below: (a)  For all t ,t , and t , the optimal t is: x  (i) t = t w  (ii)  0  if  r  w  if  oo  cv\t ) + bz(t ) + aq(t ) - k > 0 ; r  r  w  w  (iii) t =  w  r  cvXt ) + bz(t ) + aq(t )-k=0;  r <* <ooif r  r  w  w  cv'(co) + bz(<x>)  + aq(<x>) - k < 0 .  Case (iii) implies that the machine should not be shut down unless a spare is available to perform an immediate replacement. Case (i) means we shut down the machine when the scheduled replacement age is reached, regardless of whether a spare is available. It is worthwhile noting that the expressions above do not include the other decision variables , nor do they include such other factors as the lead time distribution G(l) or mean repair time R; this indicates that the optimal wear-out age is independent of all of these other parameters. Later in Chapter 3 we will encounter this same property in another context; (b)  Given t ,t , and t , the optimal t is: x  (i) t =t + L r  0  w  r  i f cv\t + L) + bz(t + L) + aq(t +L) + h- JTR(t ,t ,t  0  0  0  0  x  0  0  + L, t ) > 0;  (ii) t + L < t < oo if cv'(t ) + bz(t ) + aq(t ) + h - JTR(t , t , t , t ) = 0; 0  (iii) t = t  r  r  w  r  r  r  x  0  r  w  i f cv'(t ) + bz(t ) + aq(t ) + h - JTR(t , t , t , t )<0. w  w  w  x  0  w  w  Case (i) implies that a spare should be installed upon receipt, while (iii) implies that replacements should be performed only at failure.  w  81  (c)  Given t , t , and t , the optimal t is: 0  (i) t =0  r  if  x  (ii) 0 < t < t if x  Q  *  W  (in) t =t x  w  0  0  x  _£^l  j R(0,  c  k  +  J^I_*1 (f + Z) - (t + L ) x  ifE^ilA  _  Q  r  k  x  0  r  w  x  < 0  JTR(t ,t ,t ,t ) 0  k+  > 0  w  + 77K(/ , / , / f ) = o  C  0  t ,t ,t )  T  0  r  w  Case (i) implies that unscheduled orders should never be placed, while (iii) implies that a spare should always be ordered as soon as the installed machine fails. We have been unable to derive an equivalent optimality result for t ; we 0  conjecture that the behaviour of JTR with respect to t depends on the particular choice of 0  lead time distribution G(l). This means that in practice we need to make careful use of non-linear programming techniques to search for a good value for the ordering time, as there may be multiple local minima. Given the results above, one can use a procedure like the following to determine a good and hopefully optimal set of parameters (t ,t ,t , t ). x  0  r  w  Step 1: Initially set t =t = t =t = 0; x  0  r  w  Step 2: Find the optimal t via N L P line search methods, using Theorem 2-9 to verify w  optimality; Step 3: Find a t < t via N L P line search methods, using Theorem 2-9 to verify r  w  optimality (given the other current values of t , t and t ); x  Step 4: Find a t <t -L 0  0  w  via N L P line search methods (this is more difficult, as JTR may  r  not be pseudo-convex in t , i.e. there may be multiple local minima); 0  Step 5: Find a t < t via N L P line search methods, using Theorem 2-9 to verify x  0  optimality (given the other current values of t , t and t ); 0  Step 6: Calculate JTR(t ,t ,t , t ); x  0  r  w  r  w  82  Step 7: Repeat steps 3 through 6 to find improved solutions, until satisfied that no material improvement in JTR(t ,t ,t , t ) is possible (using standard stopping rules for non-linear x  0  r  w  search). Instead of using this coordinate search procedure, after finding t in step 2 one can use w  standard N L P algorithms to search the interval [0, t ] for good values of t , t and t . w  x  Q  r  Once a solution has been found, we can say something about how it would change if a cost coefficient were to be changed. Proposition 2-7:  A change in the cost coefficients will tend to change the optimal  solution as described below: (a) the optimal wear-out age t is non-decreasing in the shortage cost rate k, and nonw  increasing in the operating cost rate a, the breakage cost b, and the replacement cost c. It is constant in the holding cost rate h and the expediting cost c ; x  (b) for given t , t and t , the optimal replacement age t is non-decreasing in h, and nonx  Q  w  r  increasing in k and c ; and, x  (c) for given t , t and t , the optimal expediting age t is non-decreasing in k, and nonx  0  w  x  increasing in c, b, a, h, and c . x  The work in this section generalizes the results of [PP86b], [PP86a], and [KH81], except that [PP86a] makes different use of minimal repairs in its system. To some extent it also generalizes some of our work in previous sections, as illustrated in Figure 2-4 where the different models are listed in decreasing generality. Figure 2-4:  Relations Amongst Single Spare Models T w o Random Leadtimes (2.10)  i  '  One R a n d o m Leadtime (2.09) 1  1  1 T w o Deterministic Leadtimes (2.08) 1  One Deterministic Leadtime (2.04) & (2.07)  83  2.11 Multiple Spares without Stock-outs In this section we allow our system to order and hold multiple spares, but without the full complexity of the general problem that we introduced at the beginning of the chapter. To our knowledge this multiple spare model has not been investigated before in the context of continuous review ordering policies, though Falkner considered similar issues in the context of a periodic review stocking policy; see [Fa68], [Fa69], and [Fa70]. We avoid the interaction problem between maintenance demand and inventory supply by considering two special cases: the borrowed spare case and the instantly expedited delivery case. Each special case eliminates the possibility of a stock-out, which means that inventory deficiencies will not stop the maintenance demand process. We revert in this section to our earlier assumptions that the lead time is deterministic and identical for both scheduled and unscheduled orders, that replacements are instantaneous (R = 0), and that the only costs are simply replacement c, breakage b, operating aq(t), holding h, and shortage k. Suppose a point in time has been reached at which the maintenance manager wants (at t ) or needs (on failure) to perform a replacement, but there is no spare in stock. It is r  sometimes the case that a spare can be obtained by borrowing it from an alternative source; for example, it is not unheard of for airlines which are not direct competitors to loan spare parts back and forth amongst themselves, so as to avoid canceling flights due to shortages. This works as follows. To perform a replacement with no spare in stock, borrow a spare from an alternative source and immediately install it in our system. When our spare arrives from the regular source, give it to the alternative source; in the meantime, we incur a penalty cost at rate k as a rental fee to the alternative source for the privilege of borrowing. In our analysis, this borrowing arrangement is modeled as follows. Since a spare can be borrowed to avoid stock-out we will assume that maintenance uses a single age policy. Note that this is not necessarily optimal, as it might be better to delay a preventive  84  replacement instead of paying a premium rate to borrow a spare; but this would indicate the use of a double age maintenance policy which creates the kind of maintenanceinventory interaction that we are trying to avoid. Given the use of a single age maintenance policy, we can use the delayed (s-1, s) inventory policy developed by Atkins and Katircioglu [AK95] for EFR demand processes. This gives the following equation for the expected cost rate, CB(t , s, t ). 0  r  (2-47)  QO  c + bF(t )  + aj q(x)F(x)dx  r  +  (x)dx  F(t )  r us  a  .  0 L  co  0  L  0  + hj  (x) ax J  L  (x)dx  + F(tJ .  |  F(x)dx  0  The first three terms in the numerator are the replacement, breakage, and operating cost per cycle or machine-life. The first square-bracketed term gives the holding and shortage costs of the policy assuming that the currently installed machine lasts at least until we place the next order, while the second bracketed term gives those costs assuming that the installed machine fails before the next order is due to be placed. Here ®  trjtgjls  (0  represents the distribution function for the time until the 5-th replacement; that is, the sum of 5-1 lifetimes of future machines each with distribution F(t) truncated at t , plus the r  length of the current cycle which would have the same distribution except that we delay ordering by a time t . Thus <t>, , 0  (t) is an 5-fold convolution. Similarly ^  (<>  (0  represents the corresponding density function. The decision problem is to find values for t , 5, and t that minimize the expected cost rate for the system. To help with part of this, 0  r  we can adapt the following result from [AK95].  85  Inventory Solution Procedure from [AK95]: Find t and s by the following procedure: 0  (a) first find s: <D, , ( Z ) < r>0>  t  (2-48)  h+k  (b) second, find t : O (L) 0  < «W,(£)  =  ts  (2-49)  k  A +  Lemma 2-1: For a given t , the 7 and s values from the above procedure are optimal in r  0  CB(t ,s,t ). 0  r  This means that given an age replacement policy t (perhaps sub-optimal) we can at least r  in principle find an optimal inventory policy to support it. Instead of borrowing a spare, it may be that in the event of stock-out a spare part can be delivered very rapidly by paying a premium cost. If this delivery time is short enough relative to the machine life so as to be negligible, then we may treat this as an instantaneous delivery. This is similar to the lost sales model often used in other inventory work; in fact, we extend the results from [Sc89] for this version of our model. To account for the difference in stock-out penalties we need to make one change from the previous borrowed spare equation: we now use c to represent the fixed cost premium (not a cost x  rate) for expediting. This gives the following equation CE(t ,s,t ) for the expected cost 0  r  rate. (2-50) i  oo  r  c + bF(J ) + afq(x)F(x)dx  + F(t )  r  +F(t ) a  c  x  a>,„.,_,(£) + 0  a  c <& (L) x  Aj<&r,.o.*-i(x)dx L  j F(x)dx  ttoS  + hJQ,,.t.,s(x)dx  86  As with the borrowed spare case, we would like to characterize the optimal inventory policy for this instantly expedited case. We conjecture that this could be done in a manner such as the following. Given t , find optimal t and s by the following r  0  procedure: (a) first find 5: =  < — < =r  (b) second, find t : 0  (2-51)  = —  (2-52)  Note that whereas in the borrowed spare case we compared a cost ratio to a distribution function, here we compare a cost ratio to a hazard function. Schultz in [Sc89] proved a result equivalent to part (b) for the special case where s=l, but we have been unable to prove our conjecture for the general case of any s. Finding the actual solution in an application of either the borrowed spare case or the expedited spare case is unfortunately not a trivial exercise, as the convolutions of truncated lifetime distributions are not easy to calculate either analytically or numerically. We can improve things somewhat if we restrict our maintenance policy to replacement at failure, so that t = QO. Aside from making our maintenance policy trivial, this assumption r  also means that the machine lifetime distributions are untruncated, so that at least for some choices of distribution functions (e.g. the normal distribution) the convolutions become much more tractable. With t = QO the equation CB(t ,s,t ) for the borrowed spare case would become r  CB(t , s, 00) as follows. 0  0  r  87  (2-53) CO  + F(t ) k j ® (x)dx 0  L  kj .  ts  _  0  T  oo  L  c + b + ajq(x)F(x)dx  + hj  0  L  Q>t„,s  (x)dx J  «)  <D  0  (*)<&  +  hj  0  O >- l 0  (x)dx  L  Lemma 2-1 still applies to this simpler equation. As an example, let c = $ 10, b = $ 20, a = $ 0, k = $ 100, h = $ 2, Z, = 0.05, and F $ = 1 - e- . Then the critical ratio h I [h + k] = 0.02, which using (2-48) and (2-49) /3  leads to optimal values s = 1 and t = 0.22. Use these with (2-53) to find CB(0.22, 1, oo) 0  = $35. 2.12 The General Problem We now briefly return to the general problem which allows the lead times for receipt of spares to be random (with the distribution possibly dependent on the type of order) and the number of spares in stock or on order is unconstrained. This general problem appears to be intractable, in part because of the maintenance-inventory interaction. We conjecture that an optimal policy form would have four critical parameters: s,t ,t , 0  r  and t . Maintenance would use a double age replacement policy, w  whereby it would shut down the installed machine when its age was greater than t if a r  spare was in stock, or in any event when the age reached t . Inventory would use a w  delayed (5-1,5) policy, where s would be the target inventory level, and new orders would be placed after a delay of t or at the time of the next demand, whichever occurred first. 0  The complication is that optimal t and t would need to be dynamic functions of the 0  r  inventory-on-order (represented by a vector of demand times) rather than static numbers. One way to approach this problem would be to define some bounds on the minimum cost solution, and then within these bounds use heuristics to search for a good solution for a particular problem instance. We could start by trying the solution methods  88  given in the previous sections for more restricted versions (e.g. single spare) of the problem. Any of these which have the appropriate structural assumptions (e.g. lead time being deterministic or random) will give feasible policies with known cost rates and hence can provide candidate solutions and act as upper bounds on the unknown optimum. For example, the following could be used to start the solution procedure in applications where the lead time is deterministic: Step 1: restrict the system to a single spare with a non-negative ordering time, and then develop an initial solution by optimizing JOD(t , t ); 0  r  Step 2: if the calculated ordering time t > 0, then this indicates that a single spare is in 0  fact optimal, so stop with the optimal solution; Step 3: if t = 0, then next restrict the maintenance policy to replacement on receipt and 0  optimize JED(t ). Compare this cost rate to that from step 1 and use whichever policy 0  gives the best results. 2.13 Future Research Obviously many applications of the joint maintenance and inventory optimization problem will require policies of a more general form than those contained herein, i.e. most real inventory systems must handle multiple spares with the possibility of suffering stockouts. The general form of the problem unfortunately appears to be intractable, at least for the method of analysis we use in this chapter. Thus we recommend this more difficult but important problem for future work. Another avenue for further development is the merger of the work dealing with systems of one machine and one spare to handle the entire life-cycle of a machine. Ideally this might result in a model which can deal with the classic bathtub-shaped hazard rate curve, wherein we could return a spare after the initial burn-in phase, take no action during the primary service life, and then order a replacement at some point in the machine's wear-out phase.  89  2.14 Concluding Remarks For the general form of the problem we considered, we have only been able to suggest some heuristic solution procedures; exact optimality appears to be intractable to our style of analysis. When we assumed that we were to use a single age maintenance policy and could either borrow a spare or instantly expedite one, then we were able to write down the cost function and specify some of its properties. When we restricted our inventory policy to having a single spare in stock or on order with general lead times, then we were able to specify the optimal policy form, write down the cost function, and determine some of this function's optimality conditions. When we further restricted the lead times to be deterministic, we were able to derive a full set of optimality conditions. We found that the objective function is pseudoconvex in each decision variable and any economically viable K K T point is a global minimum. We also showed that joint optimization can be worthwhile in comparison to traditional separate optimization. Separate optimization can be arbitrarily bad in extreme cases, while for the numerical data we considered, joint optimization gave an average improvement of 3% over the best form of sequential optimization with one tenth of the cases having a difference greater than 10%. Overall, our work enhances the existing single-spare optimization literature by generalizing and combining the work of previous authors, and by introducing the notions of breakage costs, return of spares, service constraints, and double age policies.  90  Chapter 3 The Machine Repairman Problem with Preventive Maintenance 3.01 Chapter Abstract Although the machine repairman problem has been a subject of research for quite some time, it seems that all existing studies have made the reasonable but restrictive assumption that repairs are made only after machine failure, i.e. preventive maintenance is ignored. Our work extends the machine repairman model by allowing for preventive repair under the assumption that the machines have an increasing hazard function. We derive analytical results for the case where we restrict the form of the repair policy to a single critical age, and then investigate other heuristic policies which can improve the system's cost effectiveness. We conclude by comparing the performance of different policies using two numerical studies. 3.02 Introduction Consider a system consisting of n repair shops which support m machines subject to stochastic failure. Whenever a machine breaks down, it is repaired by a repair shop; each repair keeps a shop busy for a random amount of time, during which it cannot service other broken machines. The problem of efficiently operating such a system is commonly referred to as the machine repairman problem (MRP), or more generally the machine interference problem. This type of problem can arise not only in maintenance and repair situations, but also in situations where workers tend several automatic machines or a computer processes jobs from different input sources. In our system, we will also allow the system manager to have the option of shutting down an old but still operational machine before it breaks, so as to effect a preventive repair. To take advantage of this ability, the manager must decide when to best perform preventive shut downs. This decision is typically based upon the analysis of a single machine as described in e.g. Barlow & Proschan [BP75] or Jorgenson et al [JMR67], which would prescribe a single critical age for the preventive repair policy.  91  Such a policy, while optimal for a single machine, is not in general optimal for a system of many machines sharing common repair facilities; this is the issue that we explore herein. There is a fairly extensive literature regarding the MRP. The work by Cho & Parlar [CP91] provides a review of multiple machine reliability and maintenance in general, while Stecke & Aronson [SA85] provide a comprehensive review of the machine interference literature in particular. Agnihothri [Ag89] considers the relationships amongst system parameters that hold true for any MRP. Most of the other existing M R P papers (e.g. [Cr75], [Go77], [Wi77], [GI81], [Le84]) examine the performance characteristics of a system under a variety of assumptions regarding failure distributions, number of repair shops, costs incurred, etc., but seem to always assume that repair will occur only after failure i.e. that there is no preventive maintenance. One exception is the work by Albright [A180], which allows for some preventive maintenance in that the rate of failure for the machines can be reduced by paying a cost which represents oiling, tuning, and other minor adjustments which do not take the machine out of service. We also recently became aware of some work by Moinzadeh & Berke [MB95] which studies a system that appears quite similar to ours, except that the machines in their system get less productive as they age but never actually fail; hence their mathematical model and method of analysis are quite different from ours. They are essentially working on an enhancement to what is sometimes referred to as a tool wear model; see e.g. [JY92]. We do not discuss the kind of economic interdependence (versus stochastic interdependence) amongst machines that leads to what are often called opportunistic maintenance models; this topic has a large body of research of its own (for an example, see [Oz88]). In our work, we make use of the GI/M/nfinitepopulation queuing results developed for the M R P by Bunday and Scraton [BS80] and later generalized by Sztrik in [Sz85] and [Sz88]. Bunday and Scraton showed that the steady state probabilities for the number of machines operating could be calculated using only the means of the failure and  92  repair distributions. Although they did not explicitly consider it, their formulae can be used to analyze certain kinds of preventive maintenance policies. This work is motivated in part by the author's previous experience in aircraft maintenance. At his place of work, each of a group of aircraft underwent a periodic inspection by a single maintenance shop after having flown a predetermined number of hours. The rate of flying was sufficiently high that the shop was most often busy; whenever one aircraft had been completed, the next was generally ready to be brought in. However, it occasionally happened that the next aircraft still had a considerable number of hours left in its rated life; this prompted the question of whether it was best to bring it in anyway, or to leave the shop empty until this life was exhausted. Conversely, it often happened that an airplane would reach its rated life before the shop was finished with the previous one; the question then was whether to allow it to fly some more or to require it to sit idle until the shop was ready for it. It was never clear in either situation what the optimal decisions might be. We begin our discussion by describing more precisely our continuous time system model. We then examine single age repair policies and their properties, followed by more general policies; we shall see that single age repair is tractable to analyze, but is generally sub-optimal. We next compare the performance of all of these policies in a numerical study using discrete time systems; we switch to discrete time here because it allows us to readily calculate the optimal policies for comparison purposes. There follows a second numerical study, which uses simulation to evaluate some of our heuristics in the continuous time model. We conclude with a summary of our results and a discussion of desirable extensions to our work. Proofs for our results are given in Appendix C. 3.03 Basic Model and Notation Our continuous time system consists of m identical machines serviced by n identical repair shops, n <m. The operating life of each machine has distribution F(t), density f(t), and hazard function z(t). We assume that z(t) is strictly increasing in t, so that  93  the machines have a strictly increasing failure rate (are EFR). A machine can be repaired either before (preventive repair) or after (corrective repair) it has failed; for both forms of repair, the repair duration has a negative exponential distribution T(t) with density y(t) and mean R. Repairs are assumed to be perfect, and any shop can fix any machine. There are four costs charged to our system. A shortage cost is incurred at rate k per unit time per machine unavailable (while waiting for or undergoing repair). We incur a repair cost c every time we perform a repair (corrective or preventive), and in addition pay a breakage cost b for every machine which fails in use (i.e. this is a cost premium for corrective repair). While a machine is running, it incurs an operating cost at rate aq(t), where a is a cost coefficient and q(t) is non-negative non-decreasing function of the machine age; this cost can represent such things as e.g. increased fuel consumption, quality losses, and some types of minimal repair. The sunk cost of having the repair shops is common to all policies and so is ignored in this analysis. Our objective is to determine a preventive maintenance policy which will minimize the long run expected cost C(policy) of operating the system. We may think of this system as a finite population n server queue in which service times are independent and identically distributed (iid) markovian. The arrivals to this queue are iid IFR if the decision to do preventive maintenance is based only on the age of that particular machine; but in general we will want to take into account the state of the whole system in making such a decision, so that arrivals will not necessarily be independent of each other or of service times. More generally, we may describe our system as a simple form of closed queuing network (CQN). This particular C Q N has two queuing nodes (sometimes referred to as stations): an w-server repair node whose service times are iid negative exponential, and an m-server "running" node whose service times are IFR and depend on the maintenance policy chosen. A fixed number m of machines circulates between these two nodes; whenever a machine finishes service at one node, it next enters the queue at the other  94  node. Since the machines are identical and repairs have a negative exponential distribution, we may assume that the queuing discipline for each station is either first-come first-serve (FCFS) or last-comefirst-servepreempt-resume (LCFS.PR) according to our needs; we could also treat the running node as having an infinite number of servers, if that were useful. Our goal is to find a maintenance policy which leads to a minimum cost steady state for this CQN. See Figure 3-1 for an illustration of such a CQN. Figure 3-1:  A CQN with four machines and two repair shops.  rOi  f O u O Repair S h o p s Running Machines  CQN like this but with all queuing stations having independent negative exponential service distributions have been extensively studied, as in e.g. [SY87] and [SY88]. Unfortunately our machine running times are not negative exponential and need not even be independent, so there seem to be few previous results from the CQN literature that apply to this problem. 3.04 The Simple Single Age Repair Policy The most straightforward form of preventive maintenance policy is an age repair policy.  Let us assume for the moment that the system consists of only one machine plus  one repair shop. We will specify a single critical age t  h  such that if the machine reaches  age tj without having failed it will be shut down at that time for preventive maintenance. We can easily model this small system as a renewal process, in that whenever the machine  95  finishes repair it is as good as new and the system is renewed. Accordingly, we can calculate the cost rate of a policy by taking the expected cost per renewal period and dividing by the expected length of a renewal period (as done in e.g. [BP75] and [JMR67]) to create the equation below.  t  c + kR + bF(t)  + aj  q(x)F(x)dx  rzr^  C,(0 =  R +j  (3_1)  F(x)dx  o  Here we have the sum of the repair cost, the expected shortage cost per cycle, the expected failure cost, and the expected operating cost, divided by the sum of the mean time to repair plus the expected machine life. Lemma 3-1:  Cjft) is pseudo-convex in t, thus it will have a unique minimum at tj, 0 < tj  < oo as follows: (a) tj = 0  if  bz(0) + aq(0) - C,(0) > 0 ;  (b) 0 < tj < oo if  bz(tj) + aq(tj) - Q ( r ) = 0 ;  (c) tj = co  bz(<x>) + a<7(oo) - C;(oo) < 0 .  if  7  Case (a) implies that the system is not viable and therefore should not operate, while case (c) means that preventive replacement should never be performed. This critical age policy is clearly optimal for a single machine or whenever n>m, and would be the policy used if one were to naively apply the age repair principles discussed in e.g. [BP75] for a single machine to a system where several machines share a smaller number of repair shops. Suppose we were to calculate a tj from the above formula and use it in a system of m machines. We might naively expect that the cost rate would be mCjftj), but it will generally be higher than this due to interference at the repair shops; i.e., a machine will sometimes be idle waiting for repair because all the shops are busy. The rate given by mCjft]) is instead a lower bound for the actual cost rate resulting  96  from the use of tj (or indeed of any feasible policy). In the next section we show how to determine the actual cost of single age policies for systems of multiple machines. 3.05 The Joint Single Age Repair Policy If there are more machines than shops in the system, it is harder to derive a cost rate equation based upon renewal concepts directly, so we take a different approach. Bunday and Scraton in [BS80] derived a set of integro-differential equations which they solved to find the steady state distribution of the system, i.e. the probability that i of m machines in a system are inoperative. For an age t repair policy, the probability of having /' machines inoperative is (using [BS80])  prob (t)  m R' J F(x)dx {m - i)! / !  prob (t)  m! (in - i)  t  t  where  prob (t)  for 0 < i < n  0  R! j F(x)dx  prob (t) 0  (3-2)  for n < i < m (3-3)  (3-4)  ^ prob (t) = 1 t  (=0  which after solving for probo(t) become respectively (3-5)  m!  (m -  prob (0 = t  yl  —  i. for 0 < l < n  R'  i) !/!  R (\F(x)dx)  J  + V  J  J  F(x)dx  —  v.  —R {'\F(x)dx J  97  (3-6)  J  (m - i)\n\«'""  Probst) =  (' —  i  X \ J  F(x)dx  m  +Z T  Z ;—^vrr^  (' —  i  i  \  \! ^ F  J  d x  for « < i < m while the cost rate cost (t) for each case /' is t  cost (t) = ik + [m - i]rate(t) {  where rate(t) =  c + bF(t) + aj q(x)F(x)dx  (3-7)  J F(x)dx  is the cost rate per machine running. Taking the summation of the products of costs and their probabilities gives us an equation for our expected cost rate: m  in  C (0 = Z cost, (t)prob (0 = 2 ['* m  t  i=0  +  (™ ~ i)rate{t)]prob (t) t  .  (3-8)  i=0  This reduces to (3-1) when m = 1 and n = 1. We can also re-arrange terms to get  C (0 m  = rate(t) ™ ~ Z ' P i rob  i =0  (0  (3-9)  +  = rafe(0£[Mf/(0] + k E[MD(t)] where MU(t) is the number of machines up (running) and MD(t) is the number of machines down (not running). It will later be useful to have this cost rate in the form of a fraction. Define .y, as m! (m - i)!  for 0 < /' < n  -R'(jF(x)dx\  — R> (m - i)\ n \ n  \F(x)dx  forw < /' < m  (3-10)  (3-11)  98  so that it represents the numerator of (3-5) or (3-6) depending on the value of i. Then  £ C (0  [ i * + {m - i)rate(t)]y  t  = 7 f = ^  m  "  (3-12)  j = 0  We can use any of (3-8), (3-9), or (3-12) to calculate the actual cost rate resulting from the use of ^ in our m machine system. They can also be used to develop a better age repair policy, which we call joint single age repair. We will specify a single age t , such m  that any machine that reaches age t (i.e. without having already failed) will be shut down m  for preventive maintenance. The ages t and tj are generally not equal, as tj is obtained m  by optimizing (3-1), while t results from optimizing (3-8). This latter equation has m  several useful properties. Proposition 3-1: C (t ) m  An interior minimum (i.e. dC^t^) / dt = 0) occurs at t such that m  = C _,(t ) + bz(t ) + aq(t )  m  m  m  m  where C = 0  m  (3-13)  0  or equivalently, [bz(t ) + aq(t ) - rate(tj] m  _  m  Var[MU(tj]  [k-rate(t )]  (3-14)  E[MU(t )]  m  m  Lemma 3-2: C (t) is pseudo-convex in t with a unique minimum at t as follows 2  (a) t = tj  2  if  C ( / ) + bz(t ) + aq(t ) - C (tj) > 0 ;  (b) / ; < t < 00 i f  Cj{t ) + bz{t ) + aq(t ) - C (t ) = 0 ;  (c) t = 00  Q ( o o ) + /3z(oo) + aq(oo) - C (oo) < 0 .  2  2  if  2  Proposition 3-2:  7  7  }  2  2  }  2  2  2  2  2  For any given t, the expected number of machines down is  increasing in the number of machines m at a non-decreasing rate, i.e. we have (a) E[MD(t)  ] > E[MD(t)  m+1  ] ; and,  (b) { E[MD(t)  m  m+2  ] - E[MD(t)  m+1  ] }>{ E[MD(t)  m+1  ] - E[MD(t)  m  ] }.  99  Theorem 3-1: Over its viable range, i.e. for all t: C (t) < mk, we have that C (t) is m  m  pseudo-convex in t with a global minimum at t as follows: m  (a) t = t .j  if  C . (/ _ ) + bz(f _i) + aq{t _i) - C (t .j) > 0 ;  (b) t .j <t <co  if  C .j(t ) + bz(t ) + aq(t ) - CJtJ  (c) t = oo  if  C . (oo) + /3z(oo) + aqr(co) - C (cc) < 0 .  m  m  m  m  m  OT  7  m  m  OT  m  7  m  m  m  m  7  m  m  =0 ;  m  Case (c) above implies that preventive maintenance should not be performed. Proposition 3-3:  C (t) and its optimal age t have the following properties: m  m  (a)  the optimal age t is increasing in k and c, and decreasing in b and a;  (b)  both the expected number of machines down (in or awaiting repair) and the  m  expected number of machines idle (awaiting repair) are increasing in R and decreasing in / and n; and, (c)  for any given t, the ratio C (t) /mCj(t) is increasing in m, decreasing in n. m  To actually calculate t we can use non-linear programming (NLP) methods to m  search C (t) over the interval [0,oo], using Proposition 3-1 to identify the optimal solution. m  If we happened to know the optimal ages for smaller or larger systems, for example t _ m  1  and t i, then we could narrow down the search interval considerably since t _j <t < m+  m  m  tm+l •  The use of t from the above analysis rather than tj from traditional maintenance m  procedures reduces our operating cost rate without an increase in policy complexity (we discuss later the size of this reduction). In fact, within the class of single age policies t  m  gives an optimal policy. As we shall see in the following sections, to reduce the cost rate further requires the use of more complicated policies. 3.06 Better Heuristic Policies So far the decision to shut down a working machine for preventive maintenance has been based only upon the age of that particular machine. Next we examine policies which consider other state information. In doing so, it may be helpful to illustrate the policy forms in a diagram which represents a system of two machines, in which each axis  100  indicates the age of one of the machines. Our M R P then consists of finding a boundary for this diagram to indicate the feasible area for our policy, such that whenever we contact the boundary we shut down the oldest machine. For example, Figure 3-2 shows the shape of a single age policy. Figure 3-2:  Single Age Policy  0  tm  ^  A  9  e  1  The current availability of the repair shops is probably the most obvious additional information to consider. With only one machine, whenever we shut down, a shop is immediately available; but this is not always true with multiple machines. We expect to get interference, in that the machines may be delayed in a queue waiting to be repaired by a shop; thus we might specify a maintenance policy that explicitly considers whether the shops are busy in deciding whether to shut down a machine. Our goal in this is to avoid shutting down a machine if it is just going to sit idle awaiting service. The following is useful in determining such a policy. Theorem 3-2: Whenever all the repair shops are busy, it is optimal to shut down a functioning machine as soon as its age equals the wear-out age t , and not before. The w  optimal wear-out age t can be obtained as follows: w  if  bz(0) + aq(0) - k > 0 ;  (b) 0 < t < oo if  bz(t ) + aq(t ) -k=0;  (c) t -  6z(oo) +  (a) * = 0 w  w  w  oo  if  w  w  aq(<x>) - k < 0 .  101  Case (a) implies that the system is not viable, while case (c) means that we should always wait until a shop is available before shutting down. Note that searching for t is fairly w  straightforward, as both z(7) and q(t) are monotonically increasing in t; as well, t is w  independent of m, n, and R. This wear-out age is similar in principle to that discussed in the previous chapter for Theorem 2-9. A simple way to use this wear-out age is to specify a repair policy which uses age t whenever at least one repair shop is available, and the wear-out age t otherwise. Thus L  w  we get our first heuristic. The Double Age Policy:  Shut down the oldest machine if its age is > t and a shop is L  available, or if its age is > t and all shops are busy. w  We could also refer to this as a "can-must" policy, in that we can shut down any time after t if the shop is free but we must shut down by t regardless. Unfortunately, while we can L  w  easily determine the optimal value of the upper age limit t , we do not know of any exact w  analytical way to find an optimal value for t , so the use of the double age policy would L  imply some numerical optimization via dynamic programming or some kind of search using simulation to evaluate candidate ages. Figure 3-3 shows the form of the double age policy for a system of two machines and one repair shop. The boundary extensions near the vertical and horizontal axis apply when one machine has age = 0, i.e. it is either undergoing or just completed a repair. Note that a single age policy is a special case of the double age policy in which we arbitrarily set t = t . Another special case of the double age policy is obtained when w  L  we arbitrarily fix t = oo. This gives a slightly simpler heuristic. w  The Wait Until Available (WUA) Policy: Shut down the oldest machine at the first point that its age is >  and a repair shop is available.  So if a machine has reached its critical age t but the repair shop is busy, wait until the L  shop is available before shutting down. This seems like a sensible policy that one might  102  expect to actually see in use in a real system. Figure 3-4 illustrates the shape of this policy. Figure 3-3:  Double Age Policy  Operating Region  ir~iw 9  o  A  Figure 3-4: Age  E  1  W U A Policy  2  tL  Operating Region 0  Age 1  tL  We could of course generalize the double age policy to an n+I age policy (ti(i), tyj, which uses a shut down age t (i) whenever / shops are available, i={l,...n), and age t L  whenever no shops are available. As noted in [MB95], this form of policy would be more complicated to optimize and implement, and so we do not pursue it further in this chapter. As an aside, let us suppose that the decision to shut down each machine and send it for repair is not made by a central manager who knows the state of the entire system, but by a local operator who has limited knowledge about the rest of the system. If the  w  103  operator has no information beyond the age of that particular machine, then clearly the optimal policy for the local maintenance decision will be the joint single age t policy. If it m  is also known whether or not at least one repair shop is available, then the optimal policy for the local maintenance decision is the double age (t , t ) policy, since this policy takes L  w  into account all the information available to the local operator. If the status of each repair shop is also known, then the optimal policy for the local maintenance decision must be an n+1 age (t (i), t ) policy, as again it makes use of all the available information in making L  w  the shut down decision. Next we want to look beyond the availability of repair shops to consider the ages of the machines other than the one that we are considering shutting down. In effect we would like to coordinate the future shut down decisions of all the machines in the system. A policy which explicitly considered all of the possible combinations of all machine ages in making a shut down decision would obviously be quite complicated to optimize and to implement. A simpler way to handle these ages is through a single statistic that summarizes the state of the system, such as the mean age. The Average Age Policy:  Shut down the oldest machine when the average age of the  machines exceeds some chosen value; i.e. when [mean age] > t . a  As with the double age policy, finding the best value of t requires numerical optimization. a  See Figure 3-5 for an illustration of this policy. The last heuristic policy we consider is based upon the short term cost of machine operation, rather than the machine ages as such. This myopic policy looks ahead at the expected cost over the next short period of time assuming that a preventive shut down is not done; if this cost exceeds some chosen limit, then we shut down a machine now rather than incur this cost. Thus we get the following policy. The Constant Cost Policy: Shut down the oldest machine when the expected short term operating cost exceeds some chosen critical value, i.e. when  [cost of operatingfor a further period At] > cost  max  .  104  If we let Af—»0, then the shut down decision will be based upon the instantaneous expected cost rate. Figure 3-6 shows the approximate shape of this policy form. Figure 3-5:  Average Age Policy  Operating Region 0  ta  Figure 3-6:  Age 1  Constant Cost Policy  Age 2  Operating Region 0  Age 1  Unfortunately, we are unable to specify analytically the cost of any of these heuristic policies (let alone determine the overall optimum). Thus later in the chapter we turn to numerical studies to approach this problem. 3.07 Service Constraints As was mentioned in Chapter 2, in some situations a system manager may be unable to easily determine values for some of the cost coefficients, particularly the breakage cost b or the shortage cost rate k; in this case it might be more appropriate to  105  impose service constraints rather than specifying explicit costs. This technique is commonly used in inventory modeling where the same problem arises. Thus instead of including a breakage cost in the cost equation, we would require that on average no more than a certain proportion of machines would fail before shut down; instead of the shortage cost, we could require that our system have on average at least a certain number of machines working, at least a certain number of machines working with a specified probability, or at most a certain expected length of downtime per machine per cycle. Alternatively, one could include both the cost terms and these service constraints in the decision process. Thus we could rephrase our optimization problem as a constrained problem; the example below shows this for a single age policy. Minimize:  0,(0  m  = Z  +  [  i k  ( - /)rafe(0]/™*l(0 m  (3-15)  1=0  Subject to: (3-16) F(t) < maximum expected proportion of machines which fail before shut down m  ^  (m - i)prob (t)  > minimum required average # machines up  t  (3-17)  1=0 m-U  ^  prob (t) > minimum required probability of having at least U machines up t  (3-18)  i=0  m  r l  £ i  prob^t)\F(x)dx -  • JT (m -  < maximum average down time per machine per cycle (3-19)  i)prob (t) t  .=0  The addition of such constraints obviously makes the solution process more difficult in general, but for a single age policy only univariate optimization is required so such constraints should not pose any major problems.  106  3.08 Discrete Time Model So far in this chapter we have concentrated on deriving analytical results for our system of machines, but we now turn to numerical studies so as to examine the effectiveness of the various policies that we have introduced. For simplicity, we consider systems with a small number of machines for which the operating cost rate and the repair cost are negligible (i.e. a - 0, c = 0). Our first numerical study uses a discrete time formulation of our system model. Switching to discrete time allows us to numerically solve for the optimal policy, thus providing a standard to compare our heuristics against. It is also an interesting version of the problem in its own right; see [ST80] for an analysis of a similar but simpler model. In the following paragraphs we describe the discrete time model that is related to our original continuous time one; in some sense, each is an approximation of the other. In this model, machine age is measure in discrete periods of length At, with a maximum age allowed of T such periods; hence each machine can be in any of (T+l) states, and the whole system of m machines has (T+Vf states in total. The objective, as 1  with the continuous time model, is to minimize the long run expected cost. We must decide at the beginning of each period what action to take during the period, with the results of that action becoming known at the end of the period. If a machine is operating at the beginning of the period and we do not shut it down, then it has a probability fail; of failing at the end of the current period, where / indicates the number of periods it has already run and fail  i+1  > fail; (this models an IFR lifetime distribution); due to our age  limit, we always have fail = 1. The machine is assumed to operate throughout the current T  period regardless. If we do shut down an operating machine, or if a machine is already broken before the start of the period, then that machine will be unavailable throughout the period. A repair shop can try to repair one machine each period; there is a constant probability rep of it being fixed at period's end (this mimics a CFR repair time  107  distribution). For each machine that breaks at the end of a period, we incur a cost broke; for each machine not operating in a period, we incur a shortage cost short. This model can be formulated as a discrete time finite state infinite horizon dynamic program, and then solved using linear programming to find the optimal solution for the particular T, At, and cost parameters chosen. We refer the reader to chapters 8 and 9 of [Pu94] for the details of using a linear program in this manner. In dynamic program form, the state of the system is the vector of machine ages; the actions we can take in each state are either "do nothing" (represented below by N) or "do a preventive shut down" (represented by M). We give the linear program below for the case of two machines and one repair shop. The formulation for a system of more than two machines would add more subscripts to the terms in an obvious way. A system with more than one repair shop would require more decision variables to represent the additional decisions possible in each period (e.g. L could represent the action "do a preventive shut down for two machines"). Minimize (3-20)  CostRate Subject to M  u  + N  ij  -H[Mi,J\M _ ]M a  b  +Pr[/,7|/Y ,J/Y ] = 0  atb  a  a i  V»,y(3-21)  a,b  (3-22)  M  tJ  > 0, N  tj  > 0, M N Uj  iJ  =0  V  i,j  where  T  = maximum age of a machine, measured in periods of length At = machine ages at start of period, 0 < / < T, 0 <j < T steady state probability of being in state ij and doing a shut down  108  Ny  = steady state probability of being in state ij and not shutting down  costfAjj)  = one period cost of being in state ij and taking action A e {M, N)  Pr[i,j\A ]  = transition probability of ending the period in state ij given that we  ab  started in state ab and took action A Here, (3-20) gives the expected cost of operating the system in the steady state. The set of constraints defined by (3-21) are sometimes referred to as balance constraints, in that they require the rate of transition into any state be equal the rate of transition out of that state. (3-22) simply requires that the steady state probabilities sum to one. Solution of the L P gives the steady state probabilities of being in a state and taking a particular action when using the optimal policy. A decision variable which is positive indicates that that state-action pair is used in the optimal policy, while a decision variable which is zero indicates that that state-action pair is not used. Hence the pattern of positive values indicates the form of the optimal policy. One advantage of using an LP formulation is that it is easy to add additional constraints to the model. We in fact do this for each of the heuristics, to force the solution to have the appropriate form (e.g. a single shut down age for the single age policies). One disadvantage is the relatively large size of the state space. The full L P has (T+l)  m+2  rows  and (n+\)(T+\) columns, though these can each be reduced by about half by combining m  "symmetric" states together, e.g. machine ages 2,3 and 3,2 are identical in their behaviour and so can be combined into a single state. Combining states in this way makes the L P smaller but denser. 3.09 Performance Study For our numerical study, we use T= 30 time periods each of length At in our model; thus, a machine age can never be exceed 3 OA*. We generate failure probabilities faili with  109  F{(i + I) At) -  M  FJiAt)  = —r^Fim—  except that fail  30  -  (3 23)  s= 1. We choose F(t) to be a Weibull distribution with scale parameter X  = 1 in all cases, and shape parameter rj = 1.5 for "high variance" cases and r\ = 4.0 for "low variance" cases. For the high variance cases, each period At = 0.10 units of time, and for the low variance cases each period At = 0.05 units of time. We fix broke = 1 and vary rep and short according to the failure distribution: low variance uses rep = {0.2, 0.3, 0.4, 0.5} and short= {0.05, 0.1, 0.2, 0.4, 0.8, 1.6}; high variance cases use rep = {0.4, 0.5, 0.6, 0.8} and short = {0.2, 0.4, 0.8}. We test these parameters on systems of 2 and 3 machines, each with one repair shop. The parameter ranges give 24 combinations for the low variance case and 12 for the high; in total we have 72 different combinations of costs, repair rates, failure variance, and number of machines. Solution of the linear program gives the optimal policy and its cost rate. To test the heuristic policies, we add extra constraints to the model which force the solution to have the appropriate structure, and use a simple search procedure to find the critical ages that give the best results for each heuristic in each case. We use A M P L mathematical modeling software to specify the linear program, and then C P L E X optimization software to solve it. Appendix D shows an example of such an A M P L model specification. The largest L P in this study has approximately 15,000 rows and 100,000 non-zero coefficients. The tables which follow show the cost differences amongst the various policies in our discrete time study. The overall difference is that between the naive lower bound given by mCjftj), which is what we might expect the cost rate to be if we used the single machine optimal age, and the feasible policy C (tj), which is what the cost rate will m  actually be if we use the single machine optimal age. This overall difference can be divided into two parts:  110  (a) the first gap is that between C (t ) and the calculated optimum. This represents the m  }  amount we can potentially recover using other heuristics in place of the simple single age policy. We might also think of this as the value of having more information and coordination available in making maintenance decisions; and, (b) the second gap is that between the calculated optimum and mC](tj). This represents the cost premium due to running a system with multiple machines but only one shop (i.e. the unavoidable cost increase). This can also be thought of as the looseness of our naive lower bound mC](tj). Tables 3-1 to 3-4 show the average and range of performances of each policy applied to our data set. Each table summarizes the results obtained when we vary the cost parameters as mentioned above while keeping the number of machines and the failure distribution fixed. The values listed are the percentages above the optimal expected cost rate, i.e. % = [heuristic cost - optimal cost] / [optimal cost]; the Naive column refers to the lower bound mCj(tj), and shows negative percentages because its cost rate is below the feasible optimum. W U A with t = tj represents the use of the wait until available L  heuristic except that we fix t = tj rather than searching for an optimal r^; a similar L  simplification is also tried for the double age policy. Table 3-1:  Two machines, low failure variance (24 cases)  Policy Naive  WUA Joint Simple Single Age Single Age  Min Max Mean  -0.8%  0.6%  0.6%  -10.5%  6.5%  -4.2%  4.1%  WUA  Double Age, Double Age  Constant Cost  0.2%  0.3%  1.2%  0.3% 0.7%  0.9%  5.0%  0.7%  0.5%  0.5%  1.9%  0.3% 5.6%  0.4%  5.8%  0.4% 5.6%  3.9%  1.1%  0.9%  Table 3-2:  Three machines, low failure variance (12 cases)  Policy Naive  WUA Simple Joint Single Age Single Age tf=t,  WUA  Average Age  Double Age, Double Age tT. . tj  Average Constant Age Cost  Min  -2.0%  1.5%  1.4%  0.8%  0.5%  0.6%  0.4%  0.1%  0.9%  Max  -20.5%  18.1%  18.1%  12.5%  11.7%  2.2%  9.1%  8.3%  2.3%  1.5%  1.4%  1.1% 0.7%  2.1% 0.9%  6.5%  -9.3%  Mean  3.5%  Ill  Table 3-3:  Two machines, high failure variance (24 cases)  Policy Naive  WUA Simple Joint Single Age Single Age t =t,  WUA  Double Age, Double Age tr. - 1 ,  {  Average Age  Constant Cost  Min  -0.6%  0.0%  0.0%  0.0%  0.0%  0.0%  0.0%  0.1%  0.0%  Max  -7.0%  2.7%  2.1%  0.1%  0.1%  0.1%  0.1%  0.4%  0.4%  Mean  -3.1%  1.2%  1.0%  0.0%  0.0%  0.0%  0.0%  0.2%  0.2%  Table 3-4: Policy Min Max Mean  Naive -1.4%  Three machines, high failure variance (12 cases) Simple Joint Single Age Single Age  WUA  WUA  tj=t  Double Age  tr.  }  0.1%  0.0%  0.0%  -14.4%  0.1% 5.7%  4.2%  -6.7%  2.8%  2.0%  0.2% 0.1%  -U  Double Age  Average Constant Age Cost  0.0%  0.0% 0.2%  0.0% 0.0%  0.0% 0.5%  0.1% 0.5%  0.0%  0.1%  0.0%  0.2%  0.3%  Figure 3-7 illustrates typical shapes observed for optimal policies for a two machine system. The extensions along the vertical axis and horizontal axis were a common feature to all the trial cases. On the other hand, the principle shape of the boundary ranged between being almost square (especially when the failure variance was high) to almost diagonal (when the failure variance was low). We might describe this shape as a squashed square, in which the degree of squashing varied according to the case. For the three machine cases, the three dimensional equivalents were observed; that is, the optimal policy resembled a squashed cube. Figure 3-7: Typical Optimal Policies Age 2  Age 2  Age 1  112  By considering the above summaries as well as the individual solutions (not shown), we can develop some assessment of the performance of our policies. First of all note that the overall difference (i.e. Simple versus Naive) is increasing in m and decreasing in the failure variance. The optimal policy cost rate seems on average to lie roughly in the middle of this overall difference. The gap between C (tj) and the optimal repair policies m  (i.e. the recoverable cost increase) appears to be increasing in the repair rate, roughly concave in the shortage cost, decreasing in the failure variance, and increasing in m. The gap between the optimal repair policy and mCj(t ) (i.e. the unrecoverable cost increase) }  appears to be decreasing in the repair rate, roughly convex in the shortage cost, decreasing in the failure variance, and increasing in m. In general, it seems that both of the single age policies are outperformed on average by all of the other heuristic policies considered, while within this latter set the differences on average are fairly small. To put it another way, it seems to be more important to use more than one critical age than it is to get the choice of critical age(s) exactly right. For example, the use of the joint single age t does not provide much of a m  cost rate reduction compared to the simple single age tj. The double age policy performs best both on average and in most of the examples evaluated. This is not surprising, as it is a generalization of all of the other heuristics tested except for average age and constant cost, and thus will do at least as well in each case considered. When the relative shortage cost is moderate to high, the W U A policies are much better than the single age policies and do almost as well as the respective double age policies. Conversely, when the shortage cost is very low the single age policies perform better than W U A policies and are almost as good as the double age policies. A s for the optimal ages themselves, we observe that they are ordered t (3 machines)< t (two L  machines) < tj <t <t 2  3  < t in each case, with all the t values quite close to each other w  except that the t values tend to be much larger. That tj <t <t w  L  2  3  follows from Theorem  113  3-1, and by comparing their respective optimality criteria one can see that t < t also is to t  w  be expected. The average age policy does almost as well overall as the double age policy, but with more variation in results. We can explain this by thinking of the shapes of these policies on our diagrams. Average age yields its best results when the optimal policy shape is almost a straight diagonal line, just like the shape of the average age policy in Figure 3-5. This occurs when the failure variance is low, the shortage cost is high, and the repair rate is high; in these situations it out-performs the double age policy. Conversely, the double age policy does best under the opposite conditions, when the optimal policy shape is almost a square as in Figure 3-3. The constant cost policy is generally the worst of the heuristic policies, but is still better than single age policies; it is at its best when shortage costs are high. 3.10 Simulation Study This section describes a simulation study which tests the performance of some of our heuristics for the original continuous time model. The data set we use has m = {2, 4, 6}, n = 1, R = {0.05, 0.10}, c = 0, a = 0, and b = 1. We again use two Weibull failure distributions: with shape parameter r\ = 4.0 we let k = {2, 16}, and with r\ = 1.5 we let k = {4, 8}. For each combination of parameters we evaluate the simple single age and joint single age policies, the naive lower bound, and the double age policy with t = tj . We L  choose to test the latter heuristic because it performed fairly well in our discrete time study and yet requires no numerical searching for parameter values. We evaluate the single age policies and the naive lower bound analytically using (3-8), and use SIGMA simulation software to evaluate the double age policy. The simulation was run for a simulated clock time of 10,000 for the 2 machine systems, and 5,000 for the 4 and 6 machine systems; thus each run consisted of at least 20,000 machine life-cycles. Each parameter combination was run twice using different random number seeds for the two runs; thus the study consists of 48 simulation runs in total. The table below gives the  114  mean % above our naive lower bound i.e. A % = [heuristic cost - naive cost] / [naive cost] for the three policies, broken down by number of machines and failure variance. Each entry in Table 3-5 is the mean of 4 simulation runs. We are comparing the performance to the lower bound (naive cost) rather than the optimum because the optimum is unknown; from our discrete time study, we know that this lower bound can be fairly loose. Table 3-5:  Policy  m=2 m=\ m=6  Three Policies Compared to Lower Bound  Low Variance n = 4.0 Simple Joint 5.1% 27.0% 52.7%  4.9% 24.1% 43.5%  High Variance r[ = 1.5 Double, tj = t] 1.5% 9.2% 21.7%  Simple 2.5% 13.1% 25.7%  Joint 2.3% 10.7% 18.7%  Double, tj =t] 1.5% 7.2% 14.9%  The most obvious feature of these results is their magnitude. The two machine values are similar to what we found with the discrete time case, but as we increase the number of machines the percentages climb quite sharply. This is in part due to the fact that the repair shop is getting busier servicing more machines; for our test set, six machines give close to 100% utilization. As with the discrete time results, the high variance (rj = 1.5) cases have smaller differences than do the low variance cases (r\ = 4.0). We can see that the joint single age begins to provide a material cost saving as m increases; more noticeably, the double age heuristic provides quite a large saving, reducing by roughly half the difference between the simple single age cost and the naive cost. This is an important point, because in our discrete time results we saw that the optimal cost rate tended to be about half-way between the naive and the simple single age costs. The implication here is that the double age policy may be near optimal once the number of machines gets moderately large. The trends in optimal age values in our discrete time results lend support to this conjecture. One of our main interests is to see how our double age heuristic performs compared to the single age policies. Tables 3-6 and 3-7 show the amount by which the  115  double age heuristic improves on the cost of the joint single age policy, i.e. we consider A % = [joint cost - double cost] / [joint cost]. Table 3-6 shows the average improvements broken down by the number of machines; each entry is the mean of 16 simulation runs. We can see that the cost savings are increasing in the number of machines. Table 3-6:  Mean Improvement by Number of Machines  Mean A 3.0% m=2 7.5 % m=4 9.2 % m=6 Table 3-7 shows the average improvements broken down by the other three factors that were varied; each entry is the mean of 24 simulation runs. We can see that the cost savings tend to be larger when each factor is relatively low. Table 3-7: Level \ Factor Low High  Mean Improvement by Other Factor Failure Variance Shortage Cost Rate Mean Repair Time 7.6 % 10.7% 8.5 % 2.5 % 4.7 % 5.6 %  These two tables indicate that the cost savings due to using the double age heuristic in place of the joint single age policy tend to be higher when the failure variance, shortage cost rate, and mean repair time were relatively small while the number of machines is relatively large. Thus our results here for simulation in continuous time are in general agreement with those we obtained in our discrete time study. To check whether these results might be due to the randomness in simulation, we performed an analysis of variance (ANOVA) test. We used an fixed effect additive model which considered the main effects of our four factors along with their first-order interactions. Calculation of the F-statistics indicated that all of the main effects were statistically significant (P < 0.001), as were most of the interactions: the interactions between failure variance and repair time, and between shortage cost and repair time, were  116  not significant (both P = 0.09). A check of the marginal means showed that even where the interaction effects were statistically significant, there were no counter-intuitive results (e.g when failure variance and shortage cost were both low, the cost savings was larger than when either or both factors were high). Thus we can reasonably conclude that the improvements seen in Tables 3-6 and 3-7 are not just due to random behaviour of the simulation. 3.11 Future Research There are several directions in which we would like to be able to extend the work in this chapter. One possibility is to drop our requirement for the machines to be identical, so as to allow each to have a different failure rate or cost structure. Possibly one could build on the results of Sztrik in [Sz85] for such an extension. In our model we allow for multiple repair shops, but we take their number n as given and so ignore their cost. If we included the shop costs and considered n as another decision variable, then we could address questions of capacity planning, i.e. what is the optimal number of repair shops to build and operate? If both the shops and the machines are heterogeneous then this would raise another issue commonly considered in the M R P , the optimal assignment of machines to shops. We would also like to allow the repair of machines to be deterministic in duration, as this might better model maintenance of complex machines where a repair action may actually be a relatively standardized set of inspections and adjustments. Equivalent numerical work could readily be done using the same dynamic programming and simulation methods to compare heuristics. The difficulty here is that any analytical work would need to take a different approach in order to be tractable, as to our knowledge there are no existing results for finite population GI/D/n queues. One possible approach would use our discrete time framework as the basis for a semi-markov model, which could handle repair times with general distributions.  117  3.12 Concluding Remarks Our analytical and numerical work shows clearly that for the type of system we consider, single age policies are quite tractable to analyze and optimize; but their performance is sub-optimal and can result in potentially meaningful inefficiencies. We can improve on this situation using heuristic policies; we find that these tend to be most worthwhile (i.e. give the largest cost savings) in systems with more machines, low shortage costs, low failure variance, and a fast repairs. Finally, although we are unable to describe the truly optimal policy structure, the heuristics we examine seem to perform well enough to make consideration of more complicated policies unnecessary, at least for the small systems we studied. For actual applications, we suggest the following guidelines: (a) if machine lifetimes have a high variance, then it does not greatly matter what policy is chosen, so make the choice based upon other factors (such as administrative convenience, labour considerations, etc.); (b) if lifetime variance is fairly low and shortage costs are moderate or low relative to breakage costs, then the double age policy would likely be the best policy type, with  -  tj being a reasonable choice for initial consideration; or, (c) if lifetime variance is fairly low and shortage costs are high relative to breakage costs, then the average age policy would likely be the best choice, with t = tj/m being a good a  value for initial consideration. Note that these recommendations are based upon our numerical studies of relatively small systems using only two example failure distributions, and so of course their applicability to general systems is a matter of conjecture.  118  Bibliography [Ag89] Agnihothri SR, "Interrelationships between performance measures for the machine-repairman problem", Naval Research Logistics 36, 265-271, 1989 [A180] Albright SC, "Optimal maintenance-repair policies for the machine repair problem", Naval Research Logistics 27', 17-27, 1980 [AD68] Allan SG, D'Esopo D A , "An ordering policy for repairable stock items", Operations Research 16, 669-674, 1968 [AB84] Agrawal A, Barlow RE, "Survey of network reliability and domination theory", Operations Research 32, 478-492, 1984 [Ar95] Armstrong MJ, "Joint reliability-importance of components", IEEE Transactions on Reliability 44, 408-412, 1995 [AA96] Armstrong MJ, Atkins DR, "Joint optimization of maintenance and inventory policies for a simple system", HE Transactions 28, 415-424 ,1996 [AK95] Atkins DR, Katircioglu K, "A new optimal policy for a classical inventory problem with unit demand: delayed (s-l,s)", Working Paper, Faculty of Commerce, University of British Columbia, Vancouver, 1995 [ADSZ88] Avriel M , Diewert WE, Schaible S, Zang I, Generalized Concavity, Plenum Press, New York, 1988 [BP75] Barlow RE, Proschan F, Statistical Theory of Reliability and Life Testing, Holt Reinhart Winston Inc, 1975 [BHP63] Barlow RE, Hunter L C , Proschan F, "Optimum redundancy when components are subject to two kinds of failure", Journal of SIAM11, 64-73, 1963 [BSS93] Bazaraa M S , Sherali HD, Shetty C M , Nonlinear Programming Theory and Algorithms (2nd Ed), John Wiley and Sons, Toronto, 1993 [Be93] Beichelt F, " A unifying treatment of replacement policies with minimal repair", Naval Research Logistics AO, 51-67, Feb 1993 [BS80] Bunday B D and Scraton RE, "The G/M/r machine interference model", European Journal of Operational Research 4, 399-402, 1980 [CP91] Cho DI and Parlar M , " A survey of maintenance models for multi-unit systems", European Journal of Operational Research 51, 1-23, 1991  119  [Co87] Colbourn CJ, The Combinatorics of Network Reliability, Oxford University Press, 1987 [Cr75] Crabill TB, "Optimal control of a maintenance system with variable service rates", Operations Research 22, 736-745, 1974 [DL67] Derman C, Lieberman GJ, " A markovian decision model for a joint replacement and stocking problem", Management Science 13, 609-617, 1967 [DR86] Dhillon BS, Rayapati SN, " A method to evaluate reliability of three-state device networks", Microelectronics and Reliability 26, 535-554, 1986 [Fa68] Falkner C H , "Jointly optimal inventory and maintenance policies for stochastically failing equipment", Operations Research 16, 587-601, 1968 [Fa69] Falkner C H , "Optimal spares for stochastically failing equipment", Naval Research Logistics 16, 287-295, 1969 [Fa70] Falkner C H , "Jointly optimal deterministic inventory and replacement policies", Management Science, vol 16, 622-635, May 1970 [FS66] Feeney GJ, Sherbrooke CC, "The (s-l,s) inventory policy under compound poisson demand", Management Science 12, 391-411, 1966 [Go77] Goheen L C , "On the optimal operating policy for the machine repair problem when failure and repair times have Erlang distributions", Operations Research 25, 484-492, 1977 [GV85] Granot F, Veinott AF, "Substitutes, Complements and Ripples in Network Flows", Mathematics of Operations Research 10, 471-497, Aug 1985 [GRZ93] Graves SC, Rinnooy Kan A H G , Zipkin PH, editors, Handbooks in OR and MS vol 4: Logistics of Production and Inventory, Elsevier Science, 1993 [GI81] Grose D, Ince JF, "The machine repairman problem with heterogeneous populations", Operations Research 29, 532-549, 1981 [HL93] Hong JS, Lie C H , "Joint reliability importance of two edges in an undirected network", IEEE Transactions on Reliability 42, 17-23, 1993 [JY92] Jeang A, Yang K, "Optimal tool replacement with non-decreasing tool wear", International Journal of Production Research 30, 299-314, 1992 [JMR67] Jorgenson DW, McCall JJ, Radner R, Optimal Replacement Policy, Rand McNally, Chicago, 1967  120  [K078a] Kaio N , Osaki S, "Optimum ordering policies with lead time for an operating unit in preventive maintenance", IEEE Transactions on Reliability 27, 270-271, Oct 1978 [K078b] Kaio N , Osaki S, "Optimum ordering policies when order costs depend on time", Revue Francaise d'Automatique, Informatique, et Recherche Operationelle Operations Research 12, 93-99, Feb 1978 [K078c] Kaio N , Osaki S, "Optimum inspection ordering policies with salvage cost"', Microelectronics and Reliability 18, 253-257, 1978 [K078d] Kaio N , Osaki S, "Optimum ordering policies with two kinds of lead times and non-linear ordering costs", International Journal of Systems Science 9, 265-272, 1978 [K079] Kaio N , Osaki S, "Discrete time ordering policies", IEEE Transactions on Reliability 28, 405-406, Dec 1979 [K081] Kaio N , Osaki S, "Optimum planned maintenance policies with lead time", IEEE Transactions on Reliability 30, 79, Oct 1981 [K088] Kaio N , Osaki S, "Optimum planned policies with minimal repair", Microelectronics andReliability 28, 287-293, 1988 [KH81] Kalpakam S, Hameed M A S , "Optimum ordering policies with random lead times", Microelectronics and Reliability 21, 737-741, 1981 [Ka73] 1973  Kao E, "Optimal replacement rules", Operations Research 21, 1231-1249,  [Le84] Lehtonen T, "On the optimal policies of an exponential machine repairman problem", Naval Research Logistics 31, 173 -181, 1984 [Lo83] Lovasz L, "Submodular functions and convexity", Mathematical Programming: the State of the Art, Bonn 1982, (edited by Bachem et al), Springer Verlag, 1983 [MK77] Mine H , Kawai H, "Optimal ordering and replacement for a 1-unit system", IEEE Transactions on Reliability 26, 273-276,1977 [MB95] Moinzadeh K and Berke E, "Analysis of maintenance policies for m machines with deteriorating performance", Working Paper, School of Business, University of Washington, Seattle, Aug 1995  121  [N074] Nakagawa T, Osaki S, "Optimum replacement policies with delay", Journal of Applied Probability 11, 102-110, Mar 1974 [N075] Nakagawa T, Osaki S, "A note on age replacement", IEEE Transactions on Reliability 24, 92-94, Apr 1975 [N078] Nakagawa T, Osaki S, "Optimum ordering policies with lead time for an operating unit", Revue Francaise dAutomatique, Informatique et Recherche Operationnelle - Operations Research 12, 383-393, Nov 1978 [OKY81] Osaki S, Kaio N , Yamada S, "A summary of optimal ordering policies", IEEE Transactions on Reliability 30, 272-277, Aug 1981 [Os77] Osaki S, "An ordering policy with lead time", International Journal of Systems Science 8, 1091-1095, 1977 [Oz88] Ozekici S, "Optimal periodic replacement of multi-component reliability systems", Operations Research 36, 542-552, 1988 [PP94] Page L B , Perry JE, "Reliability polynomials and link importance in networks", IEEE Transactions on Reliability 43, 51-58, 1994 [PP87] Page L B , Perry JE, "Reliability of networks and three state devices", Microelectronics and Reliability 27, 175-178, 1987 [PS91] Papastavridis SG, Sfakianakis M E , "Optimal arrangement and importance of the components in a consecutive k-out-of-r-from-n:f system", IEEE Transactions on Reliability 40, 277-279, Aug 1991 [PP86a] Park KS, Park Y T , "Ordering policies under minimal repair", IEEE Transactions on Reliability 35, 82-84, 1986 [PP85] Park Y T , Park KS, "Optimal stocking for replacement with minimal repair", Microelectronics and Reliability 25, 147-155, 1985 [PP86b] Park Y T , Park KS, "Generalized spare ordering policies with random lead time", European Journal of Operational Research 23, 320-330, 1986 [PM94] Pham, Malone, "Optimal design of a system with competing failure modes", IEEE Transactions on Reliability 43, 251-2xx, 1994 [PY72] Posner M J M , Yansouni B, "A class of inventory models with customer impatience", Naval Research Logistics 19, 483-492, 1972  122  [Pu94] Puterman M L , Markov Decision Processes, John Wiley and Sons, New York, 1994 [Ro69] Ross SM, "A markovian replacement model with a generalization to include stocking", Management Science 15, 702-715, 1969 [SSYY93] Satoh N , Sasaki M , Yuge T, Yanagi S, "Reliability of 3-state device systems with simultaneous failures, IEEE Transactions on Reliability 42, 408-412, 1993 [SN85] Schmidt CP, Nahmias S, "(s-l,s) policies for perishable inventory", Management Science 31, 719-728, 1985 [Sc89] Schultz CR, "Replenishment delays for expensive slow-moving items", Management Science 35, 1454-1462, 1989 [SS90] Shaked M , Shanthikumar JG, "Reliability and maintainability", from Handbooks in OR and MS vol 2: Stochastic Models, Elsevier Science, 1990 [SLT92] Sheu SH, Liou CT, Tseng BC, "Optimal ordering policies and optimal number of minimal repairs before replacement", Microelectronics and Reliability 32, 995-1002, Jul 1992 [SL92] Sheu SH, Liou CT, "An ordering policy with age-dependent minimal repair and age-dependent random repair costs", Microelectronics and Reliability 32, 11051113, Aug 1992 [Sm77] Smith SA, "Optimal inventories for an (s-l,s) system with no backorders", Management Science 23, 522-528, 1977 [Sr92] Sridharan V , "Optimum planned policies with minimal repair and random lead times", Microelectronics and Reliability 32, 905-909, Jul 1992 [SA85] Stecke K E , Aronson JE, "Review of operator/machine interference models", International Journal of Production Research 23, 129-151, 1985 [ST80] Stengos D, Thomas L C , "The blast furnaces problem", European Journal of Operational Research 4, 330-336, 1980 [SS89] Subramanian R, Sridharan V , "Optimal ordering policies with random lead times and salvage cost", Microelectronics and Reliability 29, 523-527, 1989 [SY87] Shantikumar JG, Yao DD, Stochastic monotonicity of the queue lengths in closed queuing networks, Operations Research 35, 583-588, 1987  123  [SY88] Shantikumar JG, Yao DD, Second order properties of the throughput of a closed queuing network, Mathematics of Operations Research 13, 524-534, 1988 [Sz85] Sztrik J, "On the finite source G/M/r queue", European Journal of Operational Research 20, 261-268, 1985 [Sz88] Sztrik J, "The G/M/r/FJJO machine interference model with statedependent speeds", Journal of the Operational Research Society 39, 201-207, 1988 [T078a] Thomas L C , Osaki S, " A note on ordering policy", IEEE Transactions on Reliability 27, 380-381, Dec 1978 [T078b] Thomas L C , Osaki S, "An optimal ordering policy for a spare unit with lead time", European Journal of Operational Research 2, 409-419, 1978 [VF89] Valdez-Flores C, Feldman R M , " A survey of preventive maintenance models for stochastically deteriorating single-unit systems", Naval Research Logistics 36, 419-446, 1989 [Wi77] Winston W, "Optimal control of discrete and continuous time maintenance systems with variable service rates", Operations Research 25, 259-268, 1977  124  Appendix A This appendix describes the specific K K T conditions that apply to the optimization of equation JOD in Chapter 2 and outlines how one might use a customized algorithm to search for an optimal solution (t , t ). a  r  An interior solution (both derivatives zero) will occur at a point where (2-13) is true, so we could direct our search amongst such points. To start with, the particular K K T conditions for the joint optimality of JOD are shown below. General K K T Conditions: aJOD(t^ t )/dt r  0  -u+v= 0  ajODft^ tj /dt - v = o u(t )=0 v(t -t -L)=0 u>0 ,v>0 K^O ,t >t + L r  o  r  o  r  Q  We can write these conditions out more specifically for each of the 4 categories of optimal solutions, using our earlier results to simplify the derivative terms: (a) interior solutions, i.e. where t >0 and t > t + L o  r  Q  (A-l) F(t  cv' (t ) + aq(t ) + h + bz(t ) = k - h r  r  a  r  F(t  +  Q  + L) )-F(t )  L  =  JOD(t ,t ) a  r  0  (b) t boundary solutions, i.e. where t = 0 and t >t + L a  a  cv'(t )  + aq(t ) + h + bz(t )  r  k - h  r  F(t  a  F(t  a  +  r  + L)  =  >  L)-F(t )  r  a  J0D(t ,t ) o  r  (A-3)  JOD(t ,t ) a  (A-2)  r  Q  (c) t boundary solutions, i.e. where t >0 and t = t + L r  0  cv'(t )  + aq(t ) + h + bz(t )  r  aj0D/dt  r  o  = -aj0D/dt  r  r  >  r  a  JOD(t ,t ) 0  r  (A-4) (A-5)  125  (d)  (0, L) boundary solutions, i.e. where t = 0 and t = L 0  cv'(t ) + aq(t ) + h + bz(t ) r  r  r  >  r  JOD(t ,t ) 0  (A-6)  r  ajOD/dt >-ajOD/dt 0  (A-7)  r  We could substitute these optimality conditions into an existing N L P package to simplify the evaluation of candidate solutions. Alternatively, they could be used directly in the following procedure, which acts as a front end to a standard N L P package; it attempts to find the "easy" solutions using the above conditions, but defaults to the N L P code if it runs into "difficulties". It proceeds in basically 4 steps as follows: Step 1: Start at (0,L) and check to see if (A-6) and (A-7) hold as per case (d) above. Step 2: Look for a point on the t = 0 boundary such that (A-2) is true. Check this point to see if the unconstrained optimum has t < 0, and if so the optimality conditions under case (b) will be met. Q  Q  Step 3: Search along the (increasing) curve defined by (2-14) to try to find a feasible point where (A-l) holds and so the optimality conditions under case (a) will be met. Step 4: If difficulties are encountered with any of the above steps, or the t = t + L boundary is run into during the search steps, then switch over to the standard N L P code to finish solving the problem. r  Q  Of course, if we somehow knew in advance that the optimum was an interior solution of case (a), we could greatly simplify our search procedure to cover only the portion in Step 3 above and not refer to an N L P package at all. An outline of an algorithm follows below. Note that for convenience of notation, we define:  Gr(t ) r  EE cv'(t ) + aq(t ) + h + b z(t ) r  Go(t ) = k - h 0  r  F(t  r  Q  F(t  0  + L) +L)-F(t ) 0  [A-8]  [A-9]  126  Algorithm 2-1: 1 2  Set t = 0,t = L,Z = Go(0) If Gr(L) >JOD(0,L) and aJOD(t^ t ) /d t > -dJOD(t^ t ) /d t then Goto 15  3  WGo(0) >Gr(L) then  4 5 6 7 8 9 10 11 12 13 14 15  Find by Line Search t : Gr(t ) = Z KGr(t ) >JOD(0, t) then Find by Line Search t : Gr(t ) = JOD(0, tj Goto 15 SetZ = Z + 0.5 [ J O D f t t)-Z] Find by Line Search t : Go(tJ = Z Find by Line Search t : Gr(t ) = Z lft <t + L then set t = t + L IfJODft, t ) = Z then Goto 15 If* > t + L then Goto 8 Call N L P Routine with (t , t ) as initial feasible point Stop with (t , t ) optimal  o  r  r  r  0  r  r  r  r  r  r  0  r  r  0  r  r  0  r  r  Q  0  0  r  r  The "0.5" in line 8 determines the step size for the search; any value in the interval (0,1) could be substituted instead. Also note that we should skip immediately to line 14 if any of the line searches fail. Go and Gr are known to be monotonic and do not involve integration in their calculation, so their line searches proceed fairly quickly. The evaluation of JOD and its derivatives can be time consuming if F(t) is not analytically integrable (as with a Weibull distribution, for example) but instead must be evaluated numerically. Thus the line search in step 6 would likely be the slowest part of the algorithm (though it is executed only once), while step 12 would be the slowest step that is repeated each pass through the search loop. We did not use this algorithm in our numerical study, but instead used a simple coordinate search algorithm with line searches done using a bisection procedure (refer to [BSS93] chapters 8 and 9 for explanations of these methods).  127  Appendix B For convenience, in what follows the numerator and denominator of an equation are sometimes represented by N and D respectively. We also use the results from chapter 3 of[BP75] that  F(t) is IFR  d_ dt F(t)  <=>  > 0  F(t) d_ dt F(t + x)  <=>  > 0  The inequalities would be strict for a strictly IFR distribution, while they would be reversed for a D F R distribution. Most of the proofs in this appendix use the following line of reasoning. Begin by taking the first derivative of a cost function with respect to a decision variable. When this derivative equals zero, the function is at an extreme point (either a maximum or a minimum). Next take the second derivative of the cost function, evaluated at such an extreme point. If that second derivative is always positive, that indicates that the extreme point is always a minimum, and hence must be the global minimum of the cost function with respect to that decision variable. Refer to chapter 3 of [BSS93] for more information on non-linear optimality properties. Proof of Theorem 2-1: For this theorem and the next two, we actually consider the more general function JOD rather than JCF. Take dJOD I dt and rearrange terms to get r  JL JOD = ot r  D  {cv' (t ) + h + bz{t ) + aq(t )r  r  r  JOD(t  0  ,t,)}  (B-l)  When this is zero, take derivatives (d I dt) again to get  -^JOD Ot r  = ^lil{cv"(t ) D r  + bz'{t )+aq'(t )} r  > 0  r  (B-2)  where the inequality is strict if F(t) is strictly EFR, if q(t) is strictly increasing, or if v(t) is strictly convex. This means that any zero of d JOD Id t will be a minimum, and hence there can be only one minimum. So JOD is unimodal and pseudo-convex in t . Note that for (B-l) to equal zero we must have the bracketed term equal to zero, which occurs when r  r  [cV (t ) + h + bz(t ) + aq(t )] = r  r  r  JOD(t ,t ) a  r  (B-3)  128 Proof of Theorem 2-2: Take dJOD I dt and rearrange terms to get a  (B-4) F{t  + L)  Q  dt„  D  F{t )  - F{t  0  J0D(t ,t ) o  + L)]  Q  r  When this is zero, take derivatives (d I dt ) again to get a  *  J  O  D  -  H  t  D  °  +  -F(t  + L)  0  ^  L)  > ( 0  - F(t  a  + L)]  > 0 (B-5)  This means that any zero of d JOD Id t will be a minimum, and hence there can be only one minimum. So JOD is unimodal and pseudo-convex in t . Note that for (B-4) to equal zero we must have the bracketed term equal to zero, which occurs when 0  a  F{t + L) h-r= [F{t )-F{t +L)] a  k -  0  JOD(t ,t ) a  (B-6)  r  0  Proof of Proposition 2-1: Proven simply by examining the signs of the derivatives of d JOD/dt and OJ0D/dt by each of the cost coefficients. If the second derivative is positive, this implies that the first derivative is increasing in that cost coefficient and hence that the minimum of the function is decreasing; the opposite results follows if the second derivative is negative. (B-7) r  d  o  2  dt.ca  JOD  F(t ) D  q{t )  r  r  2  JF(x)dx+  J F(x)dx + R  - j  q(x)F{x)dx  > 0  (B-8)  ' dt db  JOD= ^ D  d  F  z  r  D  2  \{  'Fit,)  j F(x)dx + J F(x)dx + R  K  r  ,  \  D  [{  F(t ) r  F{x)  K  r  ' \  129  * dt„3h  a dk  J O D - ^ D F  JOD  r  -f-  ct.dc  J F(x)dx + J" F(x)dx + R -  l  j F(x)dx \ > 0 (B-9) t„ + L  t„ + L  F(t ) D r  -R - j F(x)dx  2  JOD = ^1  { . v  D  M  -  D  v ( t r  (B-10)  < 0  )} < 0  (B-H)  Note that this and the corresponding part of Proposition 2-7 are always true only with the additional assumption that v(t) is non-increasing; the convexity assumption is sufficient for all of our other results.  Sl.Sa  j T J  0  D  « £ ( ^ ± ^ ± £ ( 0  F(r  + g  ^)  +  {  _  (^)|-|.  / r  F  (B-12)  J )  v ( x ) /  }  <  (B-13)  0  ( ) x  < f c  - v(r )F(r )J r  r  < 0 (B-14)  (B-15)  " JOD = - ^ dt„dh 2  F(t d JOD = dt„dk 1  a  +  L  )  +  F  F{t  Q  M  { [F(t  D  a  +  + L) + F(t ) D - R D Q  2  + L)  L) -  F(t )]  D  J F(x)dx  < 0  t„ + L  0  t„ + L  j F(x)dxi  >  (B-16)  Proof of Theorem 2-3: It is helpful to refer to Figure B - l while reading through this rather long proof, which proceeds in three main steps. Step 1 shows that there can be at most one local minimum point in the interior or on the lower boundary of the feasible region, and that it will always be viable; step 2 shows that there can be at most one viable local minimum point on the diagonal boundary; and step 3 shows that these two types of minima are mutually exclusive, so that a viable local minimum must be a global minimum.  130  Step 1: consider points (t , t ) such that t > t + L. From Theorems 2-1 and 2-2 it follows that any K K T point with t > t + L is a local minimum. Also by Theorem 2-1, there cannot exist more than one local minimum on the lower boundary (i.e. with t = 0). a  r  r  r  0  a  0  Figure B-1:  Interior K K T point to + L = tr  Interior M i n i m u m  Suppose (t \ V ) is an interior minimum, like the point indicated in Figure B - l . From (B-4), we see that g  d  r  J0D(t ,t ) o  has the same sign as  r  k - h  F{t  Q  F{t  0  (B-17)  + L)  + L) -  -  F{t )}  and by Theorem 2-1, we know that JOD(t ',t )> these facts shows that 0  k - h  [F{t '+L)-F{t ')] 0  JOD(t ,t ) 0  r  0  r  J0D(t \t ) o  0  r  JOD(t ', t ') V t * V . Combining 0  <o  r  yt.  r  * t.  r  ( B - l 8)  and so -g-JOD{t \t ) 0  r  < 0  V7 * / '  Next make use of the above together with Theorem 2-2 to show that  (B-19)  131  JOD{t ,t ) a  < 0  r  .&o  Vt  0  < t *, V t 0  (B-20)  r  \  So there cannot exist a local minimum (t , t ) with 0 < t < t '; or in terms of Figure B - l , there can not be another minimum "below" the given interior minimum, including on the lower boundary where t = 0. Conversely, there cannot exist another interior local minimum "above" the given one, as we could then use our argument to rule out the existence of the first one. Thus we conclude that there can be at most one local minimum with t > t + L. a  r  a  0  a  r  a  Next we show that any K K T point with t > t + L must be viable. Since we are at a K K T point, either we are at an interior minimum with t > 0 so that d JOD(t , t ) I dt = 0; or else we are at a lower boundary minimum with t = 0 so that d JOD(t , t ) I dt > 0; either way, from equation (B-4) we have that r  a  a  a  a  k-h  F  (  F(t )-F(t  t  0  0  °  +  L  )  + L)  JOD(t ,t ) 0  r  0  r  r  > 0  a  a  (B-21)  Since the fraction inside the square brackets is non-negative for any t , it must be that a  k>J0D(t ,t ) o  (B-22)  r  where the inequality is strict when F(t) is strictly IFR. This completes step 1 of the proof. Step 2: consider points such that t = t + L (i.e. a K K T point on the diagonal boundary). This step proceeds in the following way. First note that a K K T point with t = t + L is either a local minimum or a saddle point. We show that if such a point is viable then it must be a local minimum, whereas if it is a saddle point then it cannot be viable. Next show that there cannot be more than one viable diagonal minimum. r  Q  r  0  Consider JOD restricted to points on the diagonal boundary, i.e. let J(t ) = JOD(t , t + L), so as to create a univariate function in t . In this representation a K K T point for JOD corresponds to either a univariate minimum (if the K K T point is a local minimum) or else a univariate maximum (if the K K T point is a saddle point). (B-23) 0  a  a  t +L  t. + L  0  cC(t  0  JM  0  + L) + bF(t  0  + L) + a j q(x)F(x)dx  •  + k J F(x)dx 4  J F(x)dx  +L +R  + kR  132  Take dJftJ I dt and rearrange terms to get Q  (B-24)  cv'(t  + L)F(t  0  dt„  D  + L) + bf(t  a  +k[F{t  + L) _  0  + L) - F{t )] + aq(t + L)F(t  a  a  0  +Q-  a  F(t )J(t )\ 0  0  Note that when this is zero, it must be that  J{t )  = = F(t )  0  Q  cv'(/  0  + L)F(t  0  + k[F{t  a  + L) + bf(t  + L)  0  + L) - F(t )] a  + aq(t + L)F(t 0  a  (B-25)  + L)  When (B-24) is zero, take derivatives (dJ(tJ 18Q again to get  cv" (t  + L)F(t  Q  J  _L_ +bf'(t  cv' (t  o  0  0  a  + L)  + )-f(t )]  0  +aq'0 + L)F(t [+f(t )J(t )  + L)f(t  a  + L) + k[f{t  0  D  +L)-  a  L  e  (B-26)  + L) - aq(t + L)f(t 0  0  + L)  0  Replace the J(t ) term in (B-26) by its equivalent from (B-25) to get 0  (B-27)  cv" (t  + L)F(t  a  + L) - cv' (t  0  +bf'{t  0  + L)f(t  Q  +L) + k[f{t  0  + L)  +L)-f(t )]  0  e  1_  + L) J = - +aq'(t + L)F(t + L) - aq(t + L)f(t D cv'(t + L)F(t + L)+bf(t + L) 0  0  0  0  + k[F{t  Q  and then rearrange terms to get  D  +L)-  0  0  F{t )] + aq(t + L)F(t a  0  a  + L)  133  (B-28)  v"(t  + L)F(t  0  + v'(t 1  + L)F(t )  0  F{t )D 0  +a  +  0  +b[f'(t  v'(t  0  + L)F(t  0  J =  + L)F(t )-  0  + L)F{t  0  + L) - q(t  + q(t  e  + L)F(t  0  +  + L)f(t  0  e  +  L))F(t ) 0  L)f(t ) 0  L) - f{t ))F(t )  +  a  +!)/(/„)]  0  0  0  L)F(t )  0  (q'(t [+k[(f(t  +  0  L)f(t )  +f(t  0  + L)f(t  0  0  0  +  (F(t  0  +  L) -  F(t ))f(t }\ 0  o)  and then (B29)  c[v" (t + L) - v' (t + L)z(t 0  J =  0  f'(t + L) + b F(t D + a[q'(t  + L)  0  F(t  0  +L  0  0  0  + z(t  a  + L) - q(t  + k[z{t +  + L) + v' (t  e  0  0  + L)z(t )]} 0  + L)z(t ) 0  + L)z(t  0  + L) + q(t  0  + L)z(t )] 0  L)-z(t )] 0  Note that because z(t) is increasing in t, the k term is always positive; furthermore, in many cases the entire second derivative will be positive and hence a zero of the first derivative will be a minimum. Unfortunately, while this is often true it is not always the case, so next we show a sufficient condition for when it will be true. In what follows we assume that we are at a t : dJ(tJ I dt = 0 (one of the points indicated on Figure B-2) which corresponds to a K K T point of JOD. a  o  134  Figure B-2:  Diagonal Boundary K K T Points to + L = tr  to  non-viable local minimum  cf  s a d d l e point viable local minimum  tr  Suppose that J(tJ < k, i.e. it is viable. Substitute k in place of J(tJ in (B25) to get cv'(t + L)F(t + L) + bf(t + L) r / \ / \i + k[F{t + L) - F(t )] + aq(t + L)F(t Q  0  0  e  0  Divide both sides by F(t + L) +  a  (B-30)  F  0  +  a  + L) and rearrange terms to see that this is equivalent to  a  cv' (t  1 t < (t )k L)\  Q  /3z(X + Z ) + a?00  + L)  < k  (B-31)  We know that the hazard rate is increasing, so its derivative must be positive, i.e. d — z(t o  F(t + L)f'(t + L) = — — 0  + L) + f(t _  0  °  a  + I)/(f  0  0  0  + L) ->0  (B-32)  + l)  F{t  a  Work with the numerator (which must also be positive) as follows. /('.  + L)f(t  + Z ) > -F{t  f(t  + L)f(t  + L)F(t )  0  0  0  > -f<(t  0  +  L  [/('„  )  >-f'(t  0  0  - f(t  a  + Z)F(*  {t  + )  0  0  + Z ) F ( U - f(t  0  +L)F(t )-f(t  0  - f(t )F(t 0  0  + Z)^  + L)f(t )F(t  a  a  (B-3 3)  L  0  + L)f(t )F{t  0  + L)F(t ) 0  + L)r  0  + Z)l  +L)f(t ) 0  0  + Z)  (B-34)  135  f_{t + L) F(t +L)  -/'(*, + L)F{t )/(/. + f(t + L)F(t ) - f(t )F(t  0  a  0  a  +  L  +L)>-  H  h  +  L  0  z(t  ) +  Ah  L)f(t ) + L)  0  0  0  0  L)z(t )  +  0  —  )  + L) -  (B-37)  z(t ) a  We also know that q(t) is increasing, so its derivative must be positive, i.e.  JL (t L)>0 &o q  (B-38)  o+  Work with this as follows.  q'(t  a  + L) - q(t + L)z(t + L) + q(t + L)z(t ) > -q(t + L)z(t + L) + q(t + L)z(t ) 0  a  a  0  a  _ q'Oo + L) - q(t  0  a  0  a  + L)z(t + L) + q(t + L) - z(t ) 0  D  + L)z(t ) Q  <  +  Q )  a  Substitute (B-37) and (B-40) into (B-31) to get  cv'(t  0  +  F(t  L)-b  + L)  a  + z(t  a  + L)z(t ) 0  <k  z(t + L) - z(t ) q' (t + L) - q(t + L)z(t + L) + q(t -a z(t + L) - z(t ) a  a  a  0  a  0  a  (B-41)  + L)z(t ) 0  0  Re-arranging terms yields  - c v ' (*„ + L)[z(t  0  +b  F(t  a  [+a[q'(t  0  + L)  +L)-  + z(t  + L) - q(t  a  0  z(t )] + k[z(t 0  0  + L) -  z(t )]} a  + L)z(t )  > 0  0  + L)z(t  0  + L) + q(t •+ L)z(t )] J 0  0  (B-42)  136  Referring back to (B-29), we see that this implies that the second derivative must be positive, and therefore t must be a minimum of J(t ) and in turn (t , t +L) must be a minimum of JOD. This gives the result that if a K K T point is viable, it must be a minimum. Q  0  a  a  By reversing the preceding line of reasoning, we can derive the result that if a K K T point is not a minimum (i.e. is a saddle point) of JOD, then it cannot be viable. Next suppose that we are at a K K T point (t ,t + L) that is not viable, so that J(tJ > k and b\J(tJ I dt = 0. Then substitution of k into (B-31) shows that Q  a  a  (B-43)  cv' (t + L) + bz(t + L) + aq(t + L) > k a  0  a  Since z(t), q(t), and v'(t) are all increasing in t, we know that the left hand side of this expression is increasing V t>t , and so this relation will remain true V t>t . Hence if there exists a t that is viable, it must be earlier i.e. at some t < t (refer again to Figure B 2). D  a  Q  a  Now suppose we find a t that is viable and hence is a local minimum of J(t ). For there to be some other local minimum on the diagonal, we must have local maximum in between them. But a local maximum of J(t ) is a saddle point of JOD, which we already know is never viable. By the preceding argument, any point beyond a saddle point can not be viable either. Thus there can be no other viable local minimum on the diagonal. This result completes step 2 of the proof. a  0  D  Step 3: if an interior or lower boundary minima exists then there cannot be any viable diagonal minimum, and vice versa. First, suppose (tj, V ) is an interior minimum, like the point indicated in Figure B-3. r  Figure B-3:  Interior K K T Point Again to + L = tr  to  0  0  Interior M i n i m u m  tr  From (B-l), we see that  137  JOD(t ,t ) a  dt  r  has the same sign as  (B-44)  r  {[A + c v ' ( 0 + bz(t ) + aq(t )] r  JOD(t ,t )}  r  0  r  and by Theorem 2-2, we know that JOD(t , t ') > JOD(t ', t ') V t * tj . Combining these facts shows that 0  [h + cv'(t )  r  0  + bz{t ) + aq(t ) - JOD(t ,//)]<  r  r  r  r  0  a  Q  ' V/„ * t  (B-45)  1 Q  and so JOD(t ,t ') a  r  < 0  (B-46)  Vt„ * t  n  Next make use of the above together with Theorem 2-1 to show that (B-47)  < 0  -£-JOD(t ,t,) 0  So there cannot exist another local minimum (t , t ) with 0 < t < t '; or in terms of Figure B-3, there can not be another minimum "to the left o f the given interior minimum, including on the diagonal boundary where t = t + L. Conversely, there cannot exist an interior local minimum (tj, t}) "to the right o f a diagonal boundary minimum (t- L, Q with t < tj, as we could then use our argument to rule out the existence of the latter. Thus there cannot exist both an interior minimum (tj, t^) and a diagonal boundary minimum with t < t'. 0  r  r  r  r  Q  r  r  Continue to suppose that an interior minimum {t \ tj) exists, and let (t ", t") be a feasible point such that t" > tj and t" > t^, i.e. to the upper right of the interior minimum, as shown in Figure B-4. 0  0  138  Figure B-4:  Interior and Diagonal K K T Points  Consider the curve where dJOD(t*, t*) I dt = dJOD(t*, t*) I dt : a  r  (a) if (t ", t") lies to the upper left of this curve, i.e. t" > t* for some t*, then Theorem 2-2 shows that dJOD(t ", t") fdt >0 and so (t ", tf) cannot be a K K T point; or, a  0  o  0  (b) if (t ", t") lies to the lower right of this curve, i.e. t" > t* for some t*, then Theorem 2-1 shows that dJ0D(t ", t") I dt > 0 and so (t ", t ") cannot be a K K T point. Q  o  r  0  r  These two cases together show that there cannot exist both an interior minimum (tj, t*) and a diagonal boundary minimum with t > tj. So we have the result that there cannot exist both an interior minimum and a diagonal boundary minimum. r  Next suppose that (0, V ) is a lower boundary minimum, like the point indicated in Figure B-5; and further suppose that there also exists a minimum on the diagonal boundary at it"- L, t") The argument built on (B-44) to (B-47) can be used to show that there can be no diagonal boundary minimum to the left, i.e. with t" < V , so we need only worry about the existence of a diagonal minimum to the right, i.e. with t" > V . r  r  r  Figure B-5:  Lower Boundary and Diagonal K K T Points  139  By Theorem 2-2, dJOD(t ' - L, tj) I dt > 0. Since (t "~ L, t ") is a K K T point, d JOD(t " - L, t ") I dt < 0. Hence there exists some t* : 8JOD(t* - L, t*) 18t = 0, t ' < t* < t". And then again by Theorem 2-2 it must be that dJOD(0, t*) I dt < 0. N o w again use the argument built on (B-44) to (B-47), this time to show that dJOD(0, t*) I dt < 0. But since dJOD(0, tj) /dt = 0 then by Theorem 2-1 it must be that dJOD(0, t*) I d t >0. This contradiction shows that there cannot exist a diagonal minimum to the right, i.e. with t" > V . Thus we have the result that there cannot exist both a lower boundary minimum and a diagonal boundary minimum. This result completes step 3 of the proof. r  r  r  0  r  r  0  a  r  0  r  r  o  r  These three steps allow us to conclude that a viable K K T point is a global minimum. In effect it seems that while JOD is not in general pseudo-convex, it is quasiconnected or arc-wise quasi-convex (see chapter 9 of [ADSZ88]). Note that if we have found a non-viable K K T point with t = t + L , then it could be a global minimum, a local minimum, or a saddle point; if there also exists a K K T point which is viable, then it lies at some t* < t . r  a  r  Also note that if h > k then there is no K K T point with t + L< t , as h > k implies that it is cheaper to go short than it is to ever hold stock. a  r  Proof of Proposition 2-2: We show this with an example. Let F(t) be the uniform distribution on [0.5,1.5]; L = 0.4; c = 5, b =5; k= 100 h, R = 0, a = 0, v(t) = 1. Arbitrarily choose t = 0.1 and t = 0.5 to get JCF = 10 regardless of h. (B-48) r  0.5  0.5  5 + 100/2 j F{x)dx + hj F(x)dx + 5F(0. 5) JCF(0,0.5)  0.1  0.5 0.5_  0.5  j F(x)dx + j" F(x)dx . 0  5 + 0+ 0+0 = 10 0.5 + 0  0.1  Thus regardless of h, when using the joint approach the optimum JCF < 10. For the sequential approach, first find the sequential t at dRCF /dt = 0. This gives a sequential optimum t = 0.9; substitute this value into JCF to get the following (B-49) r  0.9  t +L 0  5 + 100/1 J F(x)dx JCF: RCF =  0.9  J F(x)dx  r  + h J F(x)ax  +5F{0. 9)  t„ + L  7 + 0. 4h  t + L  0. 9  a  + J F(x)dx  140  Letting h —» QO causes the sequential optimal JCF.RCF -> oo, while the joint optimal JCF < 10. So the ratio ofJCF.RCF /JCF -> oo also. Proof of Proposition 2-3: Note that OCF is equivalent to JCF with c = b = a = 0. By Proposition 2-1 the optimal t is increasing in c, b, and a, so a decrease in one cost will lead to a lower (or at least no greater) new optimal t . Setting c = b = a = 0 makes JCF equivalent to OCF and lowers the new optimal t , hence 0  0  a  (t optimal for OCF.RCF) < (^optimal for JCF.RCF). a  Proof of Proposition 2-4: We show this with an example. Let F(t) be the uniform distribution on [0.5,1.5], so u=7; L=0.4; c=b=5, k=2, h=0.5, R = 0, a = 0, v(t) = 1. From the proof of Theorem 2-4, we know that the sequential optimum has t = 0.9. Use this in evaluating OCF at both endpoints to find that r  OCF(0)=  =0.500 OCF(0.5)= =0.178. 0.82 + 0 0.82 + 0.08  0 +  0  4  1  0  1  (B-50), (B-51)  6+ 0  Clearly t = 0.5 is the best endpoint to choose. Now evaluate SOF: SOF = kL - hju = 2(0. 4) - 0. 5(1) = 0. 3  .  (B-52)  SOFs positive value indicates that the other endpoint t = 0 should be chosen. Proof of Theorem 2-4: For this proof instead of using JED(t ) which corresponds to JCF(t , t ) we use a more general form corresponding to JOD(t , t ), except that the time required to perform the replacement is required to be a deterministic value (rather than a random variable) equal to R. Q  0  r  0  r  Take 8JEDI dt and rearrange terms to get 0  cv' (t + L) + bz(t + L) + aq(t + L) d F(t + L) [kF{t +L)JED(t )] — JED = —— <3r D + F{t + L) a  0  0  0  0  0  (B-53)  a  When this is zero, take derivatives (d I dt ) again to get 0  cv"(t  0  dt:  JED =  F(t  D  + L) D  +  +L) + bz'(t  0  [k - JED(t )]f{t 0  [F(t  e  + L)  0  + L)]  2  a  + L) + aq'(t  + L) (B-54)  141  which is non-negative when F(t) is strictly IFR and t is viable. This means that any viable zero of 8 JED(tJ Id t will be a minimum, and hence there can be only one minimum (refer to the proof of Theorem 2-3 to see more details of a similar argument). So JED is unimodal and pseudo-convex in t . 0  0  a  Proof of Theorem 2-5: Take the derivative of ORS with respect to t and rearrange terms to get (B-55) a  dt  \  D  Q  F(t  a  + L) -  ORS(t ,t ) 0  F(t )  (B-56)  When this equals zero, the second derivative will be  *  0  R  J l ^  S  +  L  )  D  -  F  W  F(t_ + L) F(t )  h  s  a  Q  Q  F{t ) F(t + L) 0  < 0  a  This means that any zero of 8 ORS 18 t will be a maximum, and hence there can be only one maximum. The desired global minimum then must occur at one of the endpoints, either at t = 0 or t = oo. Of course if F(t) were IFR, then the second derivative would be non-negative, indicating the opposite result that any zero of 8 ORS 181 would be a minimum. a  0  Proof of Theorem 2-6: Take the derivative of ORS with respect to t to get s  a  K  o  ,  S  D  )  s  \  Lz(t ) s  L  +  ORS(t ,t )\. 0  s  (B-57)  When this equals zero, the second derivative will be  h D  a  dt  Lz(t )  s  > 0  (B-58)  s  This means that any zero of 8 ORS 18 t will be a minimum, and hence there can be only one minimum. s  Proof of Proposition 2-5: Proven simply by examining the signs of the derivatives of 8 ORS/3 t and 8 ORS/d t , by each of the four cost coefficients. s  a cc s  0  D  1  J  (B-59)  142  ORS =  -Lf(t ) D  D  s  J F(x)dx  2  > 0  (B-60)  (B-61)  dt.dr  ORS  t„ + L  •Lfjt.) D  H + j F(x)dx + LF(t ) s  2  +R  -  F(t.)  < 0  t„ + L  dt,dk  ORS  H + J F(x)dx + LF(t )  LfjtJ D  s  2  dt dc D  dt„dr  \F(t  n  J F(x)dx - LF(t ) + L) - Fit)]  (B-62)  < 0  to + L  - R  s  ORS  +R  r  ,  (B-63)  D  ORS  (B-64) (B-65)  F{t + L) D F(t +L)F(t )  J F(x)dx  0  0  < 0  f„ + L  0  (B-66)  t„ + L  M + j F(x)dx + LF(t ) s  dt„dk  D  1  t  0  +R > 0  + L  - j F(x)dx - LF(t ) s  - R  143  Proof of Theorem 2-7:  Take the derivative of RS with respect to t to get s  When this equals zero, the second derivative will be  4  JB(0  Si.'  £Ai>Li-_*_Uo  =  D  \St,  (B-68)  Lz{t,)\  This means that any zero of d RS Id t will be a minimum, and hence there can be only one minimum. s  Proof of Proposition 2-6: Proven simply by examining the signs of the derivatives of d RSId t by each of the four cost coefficients. s  RS = ~  0 2  2  Lf(t )  =  D  s  d'&  d  k  \  RS = ~ J LF  RS =  TS)  L F  J  T S )  {[»  Proof of Theorem 2-8: A. JTD = ot r  > 0  (B-69)  _ _D  s  a ah  dt.dr  °{-\}  L f (  i  Lz(t )  p  >  J  J  + LFM  + R}-  s  + LF(t )  + R]-  s  Q  <0  LF(t ) s  (B-71)  -R}<0  (B-72)  First consider t . Take d I dt and rearrange terms to get r  r  [h + cv* (t ) + bz(t ) + aq(t ) - JTD(t ,t ,t )} r  D  r  r  x  0  r  (B-73)  When this is zero, take derivatives (dl dt ) again to get r  J^-jTD= ^l{cv"(t ) F  r  + bz'(t ) r  +  aq'(t )}>0 r  (B-74)  where the inequality is strict when F(t) is strictly IFR. This means that any zero of the first derivative will be a minimum, and hence there can be only one minimum; hence JTD(t , t , ty) is unimodal and pseudo-convex in t . x  r  0  144  Take 8 18t and rearrange terms to get a  (B-75)  dt„  D  F(t )~  F(t  x  JTD(t ,t ,t ) x  +L)  0  a  r  When this is zero, take derivatives (818t ) again to get a  JTD = ^  0 2  0t„  ~  F  (  K  f(t  +  a  +  D  L)F(t )  F(t )-F(t x  > 0  x  (B-76)  + L)]  2  0  where the inequality is strict when F(t) is strictly IFR. This means that any zero of the first derivative will be a minimum, and hence there can be only one minimum; hence JTD(t , t , t ) is unimodal and pseudo-convex in t . x  r  0  a  Take 818t and rearrange terms to get x  (B-77) d  [t + L -t a  JTD  -  x  L ]f(t ) x  x  k+  D  c u(t ) x  + JTD(t ,  x  x  t„ + L - t - L r  t ,t )\ 0  r  r  When this is zero, take derivatives (818t ) again to get x  d  1 J  m  =  f(U  [ c « ' ( * . ) [ * . + L - t - L ] + c u(t )\ x  x  x  [t„+L-t -  D  x  ^  x  (B-78)  q  L]  x  x  where the inequality is strict when F(t) is strictly IFR. This means that any zero of the first derivative will be a minimum, and hence there can be only one minimum; hence JTD(t , t , t ) is unimodal and pseudo-convex in t . x  r  x  Proof of Theorem 2-9: d ot...  0  J T R  =  F(t )G(J w  First consider t . Take 818t and rearrange terms to get w  t) r  w  0  C T  ,  (  }  +  w  fe(  }  +  (  }  _  k  (B-79)  ]  D  When this is zero, take derivatives (818t ) again to get w  JTR  F(t )G(t w  w  D  t) 0  ^  )  +  ^  {  K  )  +  a  q  ,  {  K  )  } >  Q  (B-80)  145  where the inequality is strict when F(t) is strictly IFR. This means that any zero of the first derivative will be a minimum, and hence there can be only one minimum; hence JTR(t , t ,t ,t ) is unimodal and pseudo-convex in t . . • • x  r  w  Q  w  Next consider t . Take d I dt and rearrange terms to get r  JLj at  ( r)G(t  F TR  =  r  )+  t  r  b  z  (  t  +  )  a  q  {  t  ) +  h  _ JTR)  (B.81)  D  r  When this is zero, take derivatives (d I dt ) again to get r  J? at  J  T  R=  F(t )G(J r  - t)  r  Q  { c v  „^  } +  ^  fel  )  +  a  q  ,  M  } ^ 0  (B-82)  L>  r  where the inequality is strict when F(t) is strictly IFR. This means that any zero of the first derivative will be a minimum, and hence there can be only one minimum; hence JTR(t , t ,t ,t )is unimodal and pseudo-convex in t . x  r  w  0  r  Take d I dt and rearrange terms to get x  _d_  J  T  R =  [t. + L - t -  L ]f(t ) | \[t  x  a  x  D  x  x  0  +  c,"x(f.) L-t -L ] x  + JTR\ (B-83) J  K  x  When this is zero, take derivatives (d I dt ) again to get x  d  2  dt  J  T  R =  f(t ) x  D  2 x  \c u'(t )[t x  \  x  0  +L-t -  L ] + c u(t )  x  x  x  (B-84)  x  [t +L-t -L ] 0  x  x  where the inequality is strict when F(t) is strictly IFR. This means that any zero of the first derivative will be a minimum, and hence there can be only one minimum; hence JTR(t , t , t , t ) is unimodal and pseudo-convex in t . x  r  w  0  x  Proof of Proposition 2-7: Part (a): proven by examining the signs of the derivatives of d JTRId t (shown in the proof of the previous theorem) by each of the cost coefficients. In this particular case, simple inspection of (B-79) gives the desired results. w  Part (b): examine the signs of the derivatives of d JTRId t by each of the cost coefficients as follows. r  146  ?  j r a  f(oog-',){,,_*(',.uk  =  dt dh  D  r  *  ^  =  [  FMGO,-,.)  ( B  .  8 5 )  D  (.±  Z)  0  [  J , Z )  (  0  /  (  0  (  4  |  o  s  w  J  J 0  Part (c): examine the signs of the derivatives of 6 JTRId t by each of the cost coefficients as follows. x  ^  j  m  =  [ h  L -  +  t  x  - L  ] f (  x  t  ) (  x  i  K(t ,t ,t )}  +  x  o  w  0  (  B  8  8  )  (B-89) d  _[t.+L-t,  1  dt,dc,  ~L ]f(t ) x  \  x  D  j J>.+L-,,-L,]f(O<C( „, ,<„)} D dt dc TR  t  r  2  JTR =  [t  Q  + L-t  -  x  ^  a $)  U  x  n  +  , 1 _ | „  i  J  /  (  ,  )  , ,  >_0  D  K  x  Q  J  lr  I  D  0  90)  1I  ,t ,t ) 4 ]/(',)\ \B(t "^Q> " "> x  (  ^_  >0  x  d  (t ) ^ _  _  [  > o  l w  D  (B-91)  x  d  2  [t +  JTR = ^  Q  L-t  x  -  -  4 ]/(/,)<jA(t°',t ,t ) ' x  i  x  v  0  r  w r ;  M /  a ca  D  \  a &i  D  1 D J  x  x  D  y> 0  (B-92)  J  Proof of Lemma 2-1: In CZ?ff , 5, f ), r and 5 appear only in the shortage and holding cost terms; the other terms are constant with respect to these two decision variables. Thus for a given t we can optimize the expected cost rate with respect to t and s by optimizing the expected inventory cost per spare. We can do this using the results of [AK95], which is effectively what we re-state in this proposition. 0  r  r  o  Q  147  Appendix C Proof of Lemma 3-1: Take the derivative of Cj(t) with respect to t and rearrange terms to get  4 C (0 = ^pdt D x  {bz(t) + aq(t) - C, (/)}•  (C-l)  When this equals zero (i.e. when the term in braces equals zero), we find that the second derivative of Cj(t) with respect to t is  at  (C-2)  D  where the inequality is strict if F(t) is strictly IFR. Thus any interior extreme point t of Ci(t) is a unique minimum, i.e. at tj : 0 < tj < oo. If the term in braces in (C-l) never equals zero, then its sign indicates the optimal decision: if it is negative at t = oo, then repair only after failure; if it is positive at t = 0, then do not run the system (it is not viable). Proof of Proposition 3-1: Take the derivative of (3-10) by t to get  d  a  mK  , x  ~  F) N' -D' DD m  N  D' N' D_m l \D' m  m  c.(0  (C-3)  which will equal zero when the term in braces equals zero. First work with the term D', m  m!  a  R F(t)[m (  (' — - /] j F(x)dx (C-4)  m m-1  m!  — R'F(t)[m-i]  ^,[m - ; ] ! » ! »  JF(x)dx  148  . m-l-i  dt "  = mF(t){  ^  - 1- i]!/ !  [m m-l  I  J 0  f' — j F(x)dx  [m-l]!  (C-5)  . m-l-i  Vo  mF(t)D _  I D .  m  (C-6)  x  a* Similarly transform the N' term m  (C-7) . m-i-1  N  =Y  m  —  i?' F(t)[m  - i]\ f F(x)dx  J  0 ( t  m-l  +  [/* + (m - / W e ( f )1  AW !  I  — U ' F ( f ) [ » i - / ] y F(x)dxj  —/?'[ J F(x)<ftJ  +  \m-i-l  [ik + (m - i)rate(t)]  +0  [(m - z)rate'(f)]  m-l  m tZZxlrn-iy. n\n'-  JF(x)dx  n  [(m - i)rate< (z)] + 0  (C-8) R' j~F(x)dx — N = mF(t)Y— dt 'ti [m - 1 - i ] 11 I [m-l]!  [ik + (m -  l  m  \u-l-l  m - l  +m  +m  J F(x)dx  - 1- / ] ! « ! «  ?-\ [tn — 1 ] ! f^[m  -/]!/!  ^' J F(x)dx  -  i]\n\n''  n  [ik + (m -  i)rate(t)]  J [-rate(t)(m  - i) + (bz(t)  + aq(t))(m  - /)]  J  Vo f  j = n +1[m  o -\m-i-1  i)rate(t)  t  _  j F(x)dx  y i - i - l  (m - i)[-rate(t)  + (bz(t)  +  aq(t))]  149  (C-9) (  a  t  j F(x)dx  H\™ - i -  m  z  +m  \  \IH-|-1  ,..  m  [ik + (m - 1 - i)rate(t)] [ft + (jh - 1 - i)rate(t)]  fJ . m-i'-l  +m m-l  + m^ ( 0  d_ N  a  m  1  [m-l]!  [m -/]!«!  i=n+ \  = mF{t)N _,  .  — R n  l  JF(x)dx  + mF(t)[bz(t)  m  m-i-l  (bz(t) + oq(t))  + aq(t)]D _,  (C-10)  m  Substitution of (C-6) and (C-10) back into (C-3) gives  4c (*) m  =  a  g  '  f  (  f  m  - '  { C - , ( 0 + bz(t) + aq(t) -  (C-ll)  C (t)} m  D  m  which leads to the result of (3-13). To get (3-14), begin with (3-9) and take the derivative by t to get (C-12)  — C ( 0 = rate' (t) m  a  »-  Z 1 = 0 i  p i rob  rate(t)^ i prob\ (t) + k^i  (o  i=0  prob\ (t)  =0  (C-13)  where  rate' (t) =  1  | j F ( x ) d x [ & / ( 0 + aq(t)F(t)] '  F(t) c + bF(t) + aj q(x)F(x)dx  t  J" F(x)dx . 0  - 2  J  150  rate' (t) =  [bz(t) + aq(t) -  rate(t)]F(t)  (C-14)  j F(x)dx  and  m  -i +  Z  J  P  r o b  i  F(0 (C-15)  ; =0  prob\ (0 = prob {t)  " t  t  _ 0  Substitute (C-14) and (C-15) into (C-12) and rearrange terms to get (C-16)  m  [bz(t) + aq(t) -- rate(t)]  ™~ Z '  P  1 = U  F{ty  >  -[rate(t) - k]^Ti prob^t) - 7 +  -c.(0 dt  i(0  m  m  1  r o b  J=°  =0  ^JProbjO) J.  " t  m  JF(x)ax _0  where the double summation in the numerator equals Var[# machines up]. Thus we have the result that (C-12) equals zero if and only if (3-14) is true. Proof of Lemma 3-2: Take the derivative of C (t) with respect to t as shown in ( C - l 1). 2  ic (t) 2  =  2F  (OA D.  { () Cl  t  + bz(t) + aq(t) -  C (t)} 2  (C-19)  This equals zero only when the term in braces equals zero. We know that [bz(t) + aq(t)] is increasing V t, and from Lemma 3-1 we know that Cj(t) is increasing V t > tj, so together [Cj(t) + bz(t) + aq(t)] is increasing V t>tj. Thus for any potential minimum point t > tj 2  151  (C-20)  a  2  0(0  =  ^ ' {j- [0(0 + bz(t)  2 F  ) D  + aq(t)) - o j > 0  V * > /, 2  which means there can be at most one minimum point t > tj. We also know from part (a) that Cj(t) < [bz(t) + aq(t)] V t<t and clearly we must have C (t) > 2Cj(t); together these imply that 2  h  A  a  C  l  "  (0 2 v  =  2  F  J  (  )  { C , (0  A  2  + bz(t)  + aq(t)  - C  2  (t)}  D.  '  <  2  F  (  '  (C-21) )  {C, (t) + C (t) - C {t)} < 0  A  x  2  V/ < t  x  A  which means there cannot be a minimum t <t . Thus any interior extreme point of C (t) must be a unique minimum, i.e. at t > tj. Note that if the term in braces in (C-19) never equals zero, then its sign indicates the optimal decision: if it is negative at t = oo, then repair only after failure; if it is positive at t = 0, then do not run the system (it is not viable). 2  1  2  2  Proof of Proposition 3-2: Begin by letting m = n, so that there is one repair shop for each machine. Clearly there will be no interference or delay waiting for an available shop, so each machine will be down a proportion of time q . m  Next add one extra machine to the system, for a total of m+l. Since the machines are identical and the repair times are Markovian, let us give the original m machines priority for entering the repair shop, i.e. they can preempt the repair of machine m+l. This prioritization does not affect the system performance overall, though it does affect the performance of individual machines. The original m machines will still be down a proportion of time q , but machine m+l must occasionally wait for a shop and therefore will be down a proportion of time q j>q . m  m+  m  In a similar fashion add another machine to the system (total of m+2), giving it the lowest priority for repair (so that it can be preempted by any of the other m+l). Then machine m+2 will face a larger expected wait for repairs and will be down a proportion of time q >q >q . m+2  m+1  m  Now note that  E[ MD(t) ] E[MD(t) E[MD(t) m  m+1  m+2  = q + q +...+ q ] = qj + q +...+ q + q ]= qj+ q +...+ q + q }  2  m  2  2  m  m  m+1  m+1  + q  m+2  152  Subtraction gives the desired results that the expected number of machines down is increasing in m E[ MD(t)  ] - E[ MD(t)  m+1  ]= q  m  (C-22)  m+1  at a non-decreasing rate { E[MD(t)  ] - E[MD(t)  m+2  m+1  Proof of Theorem 3-1: rearrange terms to get 4 C  m+2  a  (0 =  {  m +  ]} - {E[MD(t)  m+1  F  m  ] }= q  m+2  m+1  Take the derivative of C (t) with respect lot and m+2  l^ ^ D,  2  ] - E[MD(t)  (C-23) - q >0  {  D  Cm+1  (0 + bz(t) + aq(t) - C  m+2  (f)}  (C-24)  When this equals zero (i.e. when the term in braces equals zero), we find that the second derivative of C (t) with respect to t is (C-25) m+2  (m + 2 ) F ( Q Z ) . 2  U  m + 2 V* /  (^ + i)F.(QA, C (t)  -c (t)  —  (0 + aq' (t)  m+2  + bz(t) + aq(t)  m  + 1  m+x  which by substitution of terms from the braces in the first derivative (C-24) becomes (C-26)  (m + 2  L  'm + 2 \  l  )  ~  (m + 1 ) F ( Q D ,  2)F(t)D  m+1  D.m + 2  ^m l +  +/3z' (?) + aq' (r)  C (0 m  -  -c (0  C (0 + M+1  C (t) m+2  B+1  We need this to be positive; we already know that z'(t) and q'(t) are both positive, so we need only worry about the term in square brackets. Proposition 3-1.1 shows that { E[MD(t)  m+2  ] - E[MD(t)  m+1  ]} - {E[MD(t)  m+1  Combine this with (3-9) to find that for any viable t  ] - E[MD(t)  m  ]} > 0  (C-27)  153  (C-28)  C (t) m  ~  2C (0 m+1  +  C (0 = m+2  rate(t) m -  m+l - 2rate(t) m + 1 - ^ /' prob  im  i=0  +  rafe(r)  1  m+2 J^i prob  m + 2-  prob (t) im  prob, (t) m+l  =0  m+2 + k ^ i prob  (f)  lm+2  + k^i i=0  m+l - 2k^i  (f)  tm + 1  prob (t) i =0  (t)  t  m+2  (C-29)  0(0  - 2C. (/) + + I  C (0 m+2  = [* -  mfe(0]  + (m)rate(t) i=0  "m+l - 2[* - rate(t)] £ z pro/3.  m + 1  2(m + \)rate(t)  (0  . '=0  "m+2  + [* -  mfe(0] Z '  ^..m+2  (0  + (m + 2)rate(t)  O(0-2O+i(0 + O+ (0 = 2  [* - rate(t)]{E[MD(t) ]  - 2E[MD(t) ]  m  m+l  + E[MD(t) ]} m+2  >0  (C-30)  This result can be combined with (C-26) to show that the second derivative must be positive. Thus any viable interior extreme point t of C (t) is a unique minimum, i.e. at t 0<t <cc. If the term in braces in (C-24) never equals zero, then its sign indicates the optimal decision: if it is negative at t = oo, then repair only after failure; if it is positive at t = 0, then do not run the system (it is not viable). m  m  m  That t j < t braces in (C-24) equals m+  m+2  follows from a similar line of reasoning. At t , m+1  (O + l Cm + 1 ) + M ' m + 1 ) + "<7('m+, ) " 0 + 2 ('m+l )}  the term in  (C-31)  or ( 0 + l('m+l)  +  O + l ('m+l )  On ('m + l )  0+2('m + l)l  (C-32)  154  which by the same argument of (C-27) to (C-30) must be negative. Since C (t) is pseudoconvex, it must be that t j <t . m  m+  m+2  Proof of Proposition 3-3: Part (a): Begin with the derivative shown in (C-24), and substitute in (3-8). We then get the stated results by finding the signs of the respective partial derivatives of C' (t) with respect to the cost coefficients k, c, a, and b. m  d  2  dtdk  C (t) =  m D.  m  1=0  l*=o  m-l  Z ( / « - 1cm(t)  *  CtCC  =  m  F  (  '  )  D  i)prob _ (t) im  x  i=0  " '  (C-34)  < 0  m  Z (m -  i)prob (t) im  i=0  m-l  Z {m - 1 d  2  c (t)  i)prob _,(t)-  J F(x)dx  mF(t)D _ m  F(t)  im  0  x  (C-35)  m  +z(t) - Z ( « - i)prob  Uu  (t)  F  (  t  )  t  '~°  J F(x)dx (C-36)  m-l  <?  2  c.(0 =  mF(t)D,  F(r)  m-l  Z ( w - 1-  i)prob _,(t) im  0 m  Z(w l  o  1  =0  i)prob (t) im  , /(O F(0  > 0  155  (C-37) \q(x)F(x)dx  m-l  X (m - 1 - i)prob _ (t) J L _ _ Lm  x  i=0  d  1  dtda  c.(0  jV(x)<fc  ™F(t)D _ m  x  D_  \q(x)F(x)dx | F ( X ) C * C  1 = 0  (C-38)  m-l  J q(x)F(x)dx £ ( m - 1 -  *  c.(o--- ' ?(  )J>  -  _o  i)prob _ _ (t) t  m  x  =0  > 0  m  J F(x)dx Part (b): First show that these expectations are decreasing in We u s e ^ as defined for (3-10), so as to express the expected number of machines down or idle as (C-39), (C-40)  Z(»-i)^  E[# machines down ] =  E[# machines waiting ] = — 11 yj  ±y, 7=0  Take the derivative of the first equation above with respect to t to get  156  4  Since the denominator is always positive and  y\ =  y (m-i)F(t) J F(x)dx  (C-42)  t  then the sign of the derivative in (C-41) is the same as  sign of <  5>. Z i=0  - j)j  (m  Z  y  i  7=0  sign of z Z w i=0  ,1  to  ~ J)J - (  (  m  - )j]  m  - Oyt  =0  <  f  Z Jy*  (C-43)  0  j=0  Since the derivative is always negative, we have the result that the expected number of machines down is decreasing in t. A similar derivation provides the same conclusion for the number of machines awaiting repair, i.e. for (C-40) instead of (C-39). For R and n, it should be clear that the stated relations hold for any given age t policy; increasing the number of shops n or decreasing the average repair time R could never lead to an increase in the expected number of machines awaiting maintenance. Part (c): Our conclusion that ratio = C (t) I mCjft) is increasing in m and decreasing in n is based upon the simple notion that C (t)> C .j(t) + Cj(t) for given n. Hence m  m  m  ratio. =  mC (t)  >  "  m  1  1  mC (t) x  x  >••• >  mC (t)  m  (C-44)  x  Proof of Theorem 3-2: Consider a multi-machine system, in which the repair shops are all busy. We wish to know until what age t we should wait before we shut down a functioning machine, whose current age is T. We can consider each machine individuallyfromnow until whenever the first shop becomes available, because until then there is nothing linking them; further, we need only consider a machine's expected costs from now until that shop becomesfree,because our actions do not affect costs beyond that point. This myopic expected cost Kft^J is given by w  157 ( C M S ) T + x  F(T)K(tJ  =J 0  + J  J b + k(T + y - x ) + a [  T  L  T  a J?(v)</vF(r + .y) L  0  \q(v)dv / ( * ) < & k ( . y ) ^  U>0^  T  oo fr+A  T + x  b + k(T + y - x) + a j q(v)dv f(x)dx\y(y)dy  + I  A  T T + A  + J I k(y - A ) + a j q{v)dv F(T + A ) \y(y)dy  where A = t - T. Take the derivative with respect to t and cancel terms to get w  K(t ) w  w  = F(t )T(A)[bz(t ) w  w  (C-46)  + aq(t ) - k] w  When this is zero take derivatives again to get d  1  K(t ) w  = F(t )T(A)[bz'(t ) w  w  + aq<(tj]  >0  (C-47)  This is our desired result. That Kft^J is pseudo-convex follows from z(t) and q(t) being increasing. Note that this result means that the optimal t is independent of R and our choice of maintenance policy; we should actually find t first, and then somehow choose the rest of the maintenance policy so that any other shut-down times are less than t . w  w  w  158  Appendix D AMPL Formulation of Discrete Time Model Following is the A M P L code which describes the linear programming formulation of the discrete time model used by the first numerical study in Chapter 3. The particular formulation shown is for a system of 2 machines and 1 repair shop for which we wish to solve for the optimal policy. This formulation also serves as the basis for the testing of heuristic policies, in that each heuristic involves adding more constraints to force the solution to have the required form.  # failmodl.txt # M J Armstrong 8 July 1995 # A M P L L P model for discrete time M D P two machines one shop general solution # requires data file or manual entry to specify size, fail vector, short, rep # specifies coefficients by columns rather than rows  parameter size integer, >0;  # number of time slices for failure distribution  set A G E S := -l..(size+l);  # -1 for machine broken, positive for age # last one is dummy  parameter fail {0..size} >=0, <= 1;  # failure distribution  parameter short >0;  # relative shortage cost  parameter rep >0, <=1;  # repair probability per period  minimize cost;  # objective  subject to Balance {i in AGES, j in AGES : j <= i}: tocome = 0; subject to Normalization : to_come = 1; variable M {i in CLsize, j in 0..size : j <= i} >= 0, <=1, # stationary probability for doing preventive maintenance objective cost ( + + +  short * rep * (l-fail[j]) (1+short) * rep * fail[j] short * (1-rep) * (l-fail[j]) (1+short) * (1-rep) * fail[j] ),  159  coefficient coefficient coefficient coefficient coefficient coefficient  Normalization Balance[i,j] Balance[j+1,0] Balance[0,-1] Balance[j+1,-1] Balance[-1,-1]  1, 1, - rep * (l-fail[j]), - rep * failfj], - (1-rep) * (l-fail[j]), - (1-rep) * fail[j];  variable N {i in 0..size, j in 0..size : j <= ij >= 0, # stationary probability for not doing preventive maintenance objective cost ( + + + coefficient coefficient coefficient coefficient coefficient coefficient  0 fail[i]*(l-fail[j]) (l-fail[i])*fail[j] 2* fail[i] * fail[j] ),  Normalization Balance[i,j] Balance[i+l,j+l] Balance[j+1,-1] Balance[i+1,-1] Balance[-1,-1]  1, 1, - (l-fail[i]) * (l-fail[j]), - fail[i] * (l-fail[j]), - (l-fail[i]) * fail[j], - fail[i] * failfj];  variable M l {i in 0..size} >= 0, <=1,  # stands for M[i,-1]  objective cost (2*short), coefficient coefficient coefficient coefficient  Normalization Balance[i,-1 ] Balance[0,-1 ] Balance[-1,-1]  1, 1, - rep, -(1-rep);  variable N l {i in 0..size} >= 0, <=1, objective cost (fail[i] + short), coefficient Normalization  1,  coefficient coefficient coefficient coefficient coefficient  1, - (l-fail[i]) * rep, - (1 -fail[i]) * (1 -rep), - fail[i] * rep, - fail[i] * (1-rep);  Balance[i,-1] Balance[i+1,0] Balance[i+1,-1] Balance[0,-1] Balance[-1,-1]  # stands for N[i,-1]  160  variable N i l >= 0, <=1, objective cost coefficient coefficient coefficient coefficient  (2*short),  Normalization Balance[-1,-1] Balance[0,-1] Balance[-1,-1]  # end of model  # stands for N[-1,-1 ] (no M[-1,-1 ] needed)  1, 1, -rep, -(1-rep);  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0087743/manifest

Comment

Related Items