"Non UBC"@en . "DSpace"@en . "Haukaas, T. (Ed.) (2015). Proceedings of the 12th International Conference on Applications of Statistics and Probability in Civil Engineering (ICASP12), Vancouver, Canada, July 12-15."@en . "International Conference on Applications of Statistics and Probability (12th : 2015 : Vancouver, B.C.)"@en . "Memarzadeh, Milad"@en . "Pozzi, Matteo"@en . "Kolter, J. Zico"@en . "2015-05-25T13:36:12Z"@en . "2015-07"@en . "Identifying optimal management policies for systems made up by similar components is\na challenging task, due to dependence in the components\u00E2\u0080\u0099 behavior. In this setting, observations\ncollected on one component are also relevant for learning the behavior of others. Probabilistic graphical\nmodels allow for consistent inference using all available data, taking dependence among components\ninto account, while optimizing system operation. In this paper we propose a framework for\nmanagement of systems made by similar components based on hierarchical Bayesian modeling, called\nMultiple Uncertain Partially Observable Markov Decision Processes (MU-POMDP), that overcomes\nsome limitations of a previously proposed approaches. We describe a detailed numerical algorithm to\nlearn the system parameters within this framework and we investigate its performance with an example\nof management of a wind farm (i.e., the system) made up by turbines of the same type (i.e., the\ncomponents)."@en . "https://circle.library.ubc.ca/rest/handle/2429/53395?expand=metadata"@en . "12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 1 Hierarchical Modeling of Systems with Similar Components Milad Memarzadeh Doctoral Candidate, Dept. of Civil and Env. Eng., Carnegie Mellon University, Pittsburgh, USA Matteo Pozzi Assistant Professor, Dept. of Civil and Env., Carnegie Mellon University, Pittsburgh, USA J. Zico Kolter Assistant Professor, Dept. of Computer Science, Carnegie Mellon University, Pittsburgh, USA ABSTRACT: Identifying optimal management policies for systems made up by similar components is a challenging task, due to dependence in the components\u00E2\u0080\u0099 behavior. In this setting, observations collected on one component are also relevant for learning the behavior of others. Probabilistic graphical models allow for consistent inference using all available data, taking dependence among components into account, while optimizing system operation. In this paper we propose a framework for management of systems made by similar components based on hierarchical Bayesian modeling, called Multiple Uncertain Partially Observable Markov Decision Processes (MU-POMDP), that overcomes some limitations of a previously proposed approaches. We describe a detailed numerical algorithm to learn the system parameters within this framework and we investigate its performance with an example of management of a wind farm (i.e., the system) made up by turbines of the same type (i.e., the components). 1. INTRODUCTION Optimal policies for operation and maintenance of systems can be identified under the assumption of independent components, or of components governed by the same model. However, it is challenging to represent systems made by different but similar components. In this case, component parameters should be treated as a set of interdependent random variables, and inference performed so that observations collected on any component is also relevant, to a certain degree, in the management of others. Many infrastructure systems can be modeled using this framework. Markov Decision Process (MDP) (Bertsekas 1996, Sutton and Barto 1998) and Partially Observable MDP (POMDP) (Smallwood and Sondik 1973, Sondik 1978) are classic frameworks for sequential operation. Specifically, POMDP assumes that evolution of the component state follows the Markov property, but allows for partial and/or imperfect observation of that state. One limitation of these frameworks is that the stochastic modeling of the state evolution (i.e. the \u00E2\u0080\u009Ctransition probabilities\u00E2\u0080\u009D) and, in case of POMDP, of the observations (i.e. the \u00E2\u0080\u009Cemission probabilities\u00E2\u0080\u009D) is not affected by uncertainty nor it is updated by data processing. However, in many applications, these models are uncertain, and different for each individual component. Ross et al. (2011) have proposed the Bayes-Adaptive POMDP (BA-POMDP) framework that allows to treat models as random variables and to update their distribution as data are analyzed during the management process. However, exact algorithms in this framework require high computational complexity and become easily intractable for systems with a high number components and states. Approximate methods for optimal management under model uncertainty have been proposed by Jaulmes et al. (2005a,b) and, recently, by ourselves 12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 2 (Memarzadeh et al. 2013,2014a). The latter method, called PLUS (Planning and Learning in Uncertain dynamic Systems), is structured in two phases: learning and planning. In the learning phase, it makes use of Markov Chain Monte Carlo (MCMC) Gibbs sampling (Carter and Kohn 1994). The planning phase is based on an approximation which neglects the exploratory value of future learning of the model, to make the algorithm efficient and tractable. Details about PLUS can be found in Memarzadeh et al. (2014a). When applied to the management of a set of components, the learning approach of PLUS can be implemented in two alternative ways: independent models can be tuned for each component, or a single model can be updated by processing data from all components. While these two implementations are suitable for some applications, they are not for systems made by components that we assume to be similar, but not identical. Following a preliminary publication on systems with similar components (Memarzadeh et al. 2014b), in this paper we generalize PLUS to a framework we called Multiple Uncertain POMDP (MU-POMDP) that is based on hierarchical Bayesian modeling. This framework allows assuming specific levels of similarities among components, and consistently processing observations collected at system level. The motivation for this research is to improve operation and maintenance of wind farms. MDP and POMDP have been applied to management in this field (Byon et al. 2010, Byon and Ding 2010, Nielsen and Sorensen 2012). Turbines can be understood as components, and their deterioration modeled as a stochastic process, depending on the actions taken. Epistemic uncertainties on deterioration of different components can be assumed as dependent, as turbines are instances of the same structural model. On the other hand, differences on component construction and position lead to slight or significant dissimilarities among deterioration processes. Similar considerations can be made on the monitoring systems providing observations about the condition state. For these reasons, MU-POMDP can be a suitable framework for this application. The following parts of this paper present the MU-POMDP framework (2), and compare its performance with that of PLUS on a numerical example (3) before drawing conclusions (4). 2. THE MU-POMDP SCHEME The MU-POMDP scheme targets the management of a set of components, each defined by a POMDP whose transition and emission probabilities are modeled as dependent random variables defined by hyper-parameters. Figure 1 shows the graphical model of the MU-POMDP, where circles define random variables, squares decision variables, diamonds utility variables, dots fixed parameters, and arrows define dependence among variables. In that graph, \u00F0\u009D\u0091\u0086 indicates the component state, \u00F0\u009D\u0090\u00B4 the maintenance action, \u00F0\u009D\u0091\u008D the available observation and \u00F0\u009D\u0091\u0085 the monetary reward (or loss, if value is negative). Subscript \u00E2\u0080\u009C\u00F0\u009D\u0091\u0098, \u00F0\u009D\u0091\u00A1\u00E2\u0080\u009D refers the variable to component \u00F0\u009D\u0091\u0098 at time \u00F0\u009D\u0091\u00A1. Variables \u00F0\u009D\u0091\u008D are shaded to indicate that they can be directly observed. We consider a \u00F0\u009D\u0090\u00BE -component system. For each \u00F0\u009D\u0091\u0098 = 1,\u00E2\u0080\u00A6 ,\u00F0\u009D\u0090\u00BE, model parameter \u00F0\u009D\u0090\u0093? indicates the transition probability of component \u00F0\u009D\u0091\u0098, and model parameter \u00F0\u009D\u0090\u008E? indicates the corresponding emission probability. Figure 1. MU-POMDP scheme for a two-component system represented as a probabilistic graphical model. POMPD Comp 1 POMPD Comp 212th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 3 Additional layers model the prior distributions: hyper-parameters are marked as \u00F0\u009D\u009B\u00BC? , \u00F0\u009D\u009B\u0083? , \u00F0\u009D\u009B\u00BC? and \u00F0\u009D\u009B\u0083? in Figure 1: the first two define the dependence in the transitions, while the latter ones that of emissions. While model parameters are different for each component, hyper-parameters are common to the entire system. Parameter \u00F0\u009D\u009B\u0088? and \u00F0\u009D\u009B\u0088?, \u00F0\u009D\u009C\u0086?, and \u00F0\u009D\u009C\u0086? define the distribution of those hyper-parameters. As apparent in Figure 1, the model parameters (e.g. \u00F0\u009D\u0090\u0093? and \u00F0\u009D\u0090\u0093? ) of different components are not marginally independent, because of the common hyper-parameter parents. Consequently, observations on any component, by affecting the knowledge of the hyper-parameters, affect in turn model parameters of other components as well. The overall management task is to select actions to minimize the expected sum of the discounted losses (Bertsekas 1996). As in the PLUS approach, MU-POMDP is composed by two phases: learning and planning. The planning phase is the same as PLUS (Memarzadeh et al. 2014a), as the samples produced by the learning phase are fully compatible with PLUS planning, without the need for any adjustment. The learning phase represents the posterior distribution of all variables, conditional to all observations \u00F0\u009D\u0091\u008D and actions \u00F0\u009D\u0090\u00B4 observed up to present time. In principle, once each distribution in the probabilistic graphical model is analytically defined, posterior distributions could be computed theoretically. However, exact learning is generally not feasible for the graph presented in Figure 1, and approximate methods need to be adopted. 2.1. MCMC updating scheme Extending the approach used in PLUS, learning in MU-POMDP is based on Gibbs sampling, which is an effective implementation of MCMC. Figure 2 reports a scheme of the inference process. In the figure, the upper bar indicates a collection of variables, from the beginning of the management process up to a specific time. For example, \u00F0\u009D\u0091\u0086?,? indicates the state trajectory \u00F0\u009D\u0091\u0086?,?,\u00E2\u0080\u00A6 , \u00F0\u009D\u0091\u0086?,? for component \u00F0\u009D\u0091\u0098. The superscript (\u00F0\u009D\u0091\u0097) refers to the \u00F0\u009D\u0091\u0097-th samples generated by the MCMC algorithm. At component level, the sampling of states and model parameters is identical to that adopted by PLUS. At system level, the hyper-parameters are generated conditional to the sampled model parameters for all components. This task can be accomplished, if needed, by using the Metropolis-Hastings (M-H) approach (MacKay 2003). In summary, Figure 2 can be read as a recipe for generating samples from the joint posterior distribution: after initialization, state trajectories and model parameters are sampled for each component, then hyper-parameters are sampled at system level, and these steps are alternated. The Burn-in phase is discarded and the remaining part of the random walk is kept. Figure 2. The proposed MCMC sampling approach 2.2. Probabilistic models and inference The graphical model in Figure 1 requires a specific assignment of marginal and conditional distributions for every random variable. A feasible set of distributions for MU-POMDP, inspired by Kemp et al. (2007), is defined as follows: \u00F0\u009D\u009B\u00BC? \u00C2\u00A0~ \u00C2\u00A0Exponential \u00F0\u009D\u009C\u0086? , \u00F0\u009D\u009B\u00BC?~ \u00C2\u00A0Exponential \u00F0\u009D\u009C\u0086? \u00C2\u00A0\u00F0\u009D\u009B\u0083? \u00C2\u00A0~ \u00C2\u00A0Dirichlet \u00F0\u009D\u009B\u0088? , \u00F0\u009D\u009B\u0083? \u00C2\u00A0~ \u00C2\u00A0Dirichlet \u00F0\u009D\u009B\u0088? \u00C2\u00A0\u00F0\u009D\u0090\u0093? \u00C2\u00A0~ \u00C2\u00A0Dirichlet \u00F0\u009D\u009B\u00BC?\u00F0\u009D\u009B\u0083? , \u00F0\u009D\u0090\u008E? \u00C2\u00A0~ \u00C2\u00A0Dirichlet \u00F0\u009D\u009B\u00BC?\u00F0\u009D\u009B\u0083? \u00C2\u00A0\u00F0\u009D\u0090\u0093? \u00E2\u008A\u00A5 \u00F0\u009D\u0090\u0093? \u00E2\u0088\u00A3 \u00F0\u009D\u009B\u00BC? , \u00F0\u009D\u009B\u0083? , \u00F0\u009D\u0090\u008E? \u00E2\u008A\u00A5 \u00F0\u009D\u0090\u008E? \u00E2\u0088\u00A3 \u00F0\u009D\u009B\u00BC? , \u00F0\u009D\u009B\u0083? \u00C2\u00A0 \u00C2\u00A0 \u00C2\u00A0\u00F0\u009D\u0091\u0086?,? \u00C2\u00A0 \u00E2\u0088\u00A3 \u00F0\u009D\u0091\u0086?,???, \u00F0\u009D\u0090\u00B4?,? \u00C2\u00A0~ \u00C2\u00A0Multinomial \u00F0\u009D\u0090\u0093? \u00C2\u00A0 \u00C2\u00A0 \u00C2\u00A0\u00F0\u009D\u0091\u008D?,? \u00E2\u0088\u00A3 \u00F0\u009D\u0091\u0086?,? , \u00F0\u009D\u0090\u00B4?,? \u00C2\u00A0 \u00C2\u00A0 \u00C2\u00A0 \u00C2\u00A0 \u00C2\u00A0~ \u00C2\u00A0Multinomial \u00F0\u009D\u0090\u008E? (1) As in the PLUS approach, state trajectories are sampled using forward filtering backward Gibbs Sampling@ component levelM-H Sampling @ system levelstates model parametershyper-parameters12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 4 sampling (FFBS) (Fruhwirth-Schnatter 2006). The Dirichlet distribution on model parameters is appropriate in this context, because it is conjugate prior for the multinomial distribution, and this facilitates the implementation of the Gibbs approach. Because of this, samples of model parameters can be easily generated, as in the PLUS algorithm. It is worth clarifying the role of hyper-parameters \u00F0\u009D\u009B\u00BC and \u00F0\u009D\u009B\u0083 in the definition of the prior distribution of model parameters. Matrix \u00F0\u009D\u009B\u0083 defines the expected value of the corresponding model parameters, while scalar variable \u00F0\u009D\u009B\u00BC affects the uncertainty of model parameters: a high value of \u00F0\u009D\u009B\u00BC induces a low variance. Figure 3. M-H algorithm for sampling hyper-parameters on transition probabilities. Samples of hyper-parameters \u00F0\u009D\u009B\u00BC? and \u00F0\u009D\u009B\u0083? can be generated following the recipe reported in Figure 3, based on the M-H algorithm (the corresponding procedure for \u00F0\u009D\u009B\u00BC? and \u00F0\u009D\u009B\u0083? being identical, with obvious changes in the input variables). Input variables are the parameters defining the prior distribution (\u00F0\u009D\u009C\u0086? and \u00F0\u009D\u009B\u0088?), the transition probabilities for all components (\u00F0\u009D\u0090\u0093) which, following Gibbs, are obtained by sampling. Inputs \u00F0\u009D\u009C\u008E? and \u00F0\u009D\u0091\u0090? control the step size of the M-H proposal distribution in the direction of \u00F0\u009D\u009B\u00BC? and \u00F0\u009D\u009B\u00BD? respectively, while \u00F0\u009D\u0090\u00BD is the length of the Markov Chain. In Figure 3, Dirichlet \u00F0\u009D\u0090\u00B1; \u00F0\u009D\u0090\u00B2 is the value assumed by the Dirichlet distribution with parameters \u00F0\u009D\u0090\u00B2 at \u00F0\u009D\u0090\u00B1. \u00F0\u009D\u0091\u0083 indicates the un-normalized joint distribution of hyper-parameters and model parameters that, following Eq. (1), reads: \u00F0\u009D\u0091\u0083 \u00F0\u009D\u0090\u0093, \u00F0\u009D\u009B\u00BC? , \u00F0\u009D\u009B\u0083? , \u00F0\u009D\u009C\u0086? , \u00F0\u009D\u009B\u0088?= \u00F0\u009D\u009C\u0086? exp \u00E2\u0088\u0092\u00F0\u009D\u009C\u0086?\u00F0\u009D\u009B\u00BC?\u00C3\u0097Dirichlet \u00F0\u009D\u009B\u0083?;\u00F0\u009D\u009B\u0088?\u00C3\u0097 Dirichlet \u00F0\u009D\u0090\u0093?; \u00F0\u009D\u009B\u00BC? , \u00F0\u009D\u009B\u0083????? (2) At any state during the management process, this procedure provides samples of model parameters, hyper-parameters and state, to be used in the planning phase. 3. NUMERICAL VALIDATION 3.1. PLUS as alternative processing approach We compare MU-POMDP performance with PLUS on a numerical example similar to that presented in Memarzadeh et al. (2014a). Figure 4 shows the graphical models for PLUS algorithm used for comparison, which models transition and emission for all components as identical. Actually, it should be noted that, for practical applications of PLUS, the additional layer of hyper-parameters is unnecessary and the prior of model parameters can be defined directly. However, in order to achieve a fair comparison between MU-POMDP and PLUS, we make use of the same arrangement of layers and values for \u00F0\u009D\u009C\u0086, \u00F0\u009D\u009C\u00BC , and for conditional distributions of hyper-parameters and model parameters, to get the same marginal distribution for model parameters across the two approaches. 3.2. Parameters for numerical investigation For the purpose of validation, we consider a wind farm made up by 5 turbines of the same type placed in similar environmental conditions. The state condition of each turbine is discretized into three possible states where \u00F0\u009D\u0091\u00A0 = 1 refers to an intact structure, \u00F0\u009D\u0091\u00A0 = 2 to a damaged one, and \u00F0\u009D\u0091\u00A0 = 3 to the failure of the turbine; the agent input: , , , , ,initialize ,for dosample sample p-ratio q-ratio accept = p-ratio q-ratiosample if , else, endendoutput: hyper-parameters , 12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 5 receives observations from a set of four possible observations where \u00F0\u009D\u0091\u00A7 = 1 suggests that the turbine is undamaged, \u00F0\u009D\u0091\u00A7 = 2 and \u00F0\u009D\u0091\u00A7 = 3 indicate two symptoms of damage, and \u00F0\u009D\u0091\u00A7 = 4 indicates the failure of the turbine; three actions are available: Do-Nothing (DN), Repair (RE), and Visual Inspection (VI). When the agent chooses DN, the condition state of the turbine degrades owing to fatigue and aging, potentially causing a structural failure and a relevant economical loss. In turn, the agent can perform a costly intervention (i.e., RE) to avoid failure and improve the condition state of the turbines. VI better measures the condition state of the turbine (that evolves according to the degradation model, as for DN). Each time step is assumed to be six months, and the agent takes one action per turbine at each time step. Figure 4. Probabilistic graphical models of PLUS Parameters of priors for hyper-parameters are fixed as follow: \u00F0\u009D\u009C\u0086? = \u00F0\u009D\u009C\u0086? = 1 1000 \u00F0\u009D\u009C\u00BC?, \u00E2\u0084\u00A2 , \u00E2\u0084\u00A2 = \u00F0\u009D\u009C\u0085 \u00C2\u00A0\u00C3\u0097 \u00C2\u00A0 0.57 0.28 0.150 0.67 0.330 0 1 \u00F0\u009D\u009C\u00BC?, \u00E2\u0084\u00A2 = \u00F0\u009D\u009C\u0085 \u00C2\u00A0\u00C3\u0097 \u00C2\u00A0 0.67 0.33 00.67 0.33 00.67 0.33 0 \u00F0\u009D\u009C\u00BC?, \u00E2\u0084\u00A2 , \u00E2\u0084\u00A2 = \u00F0\u009D\u009C\u0085 \u00C2\u00A0\u00C3\u0097 \u00C2\u00A0 0.57 0.28 0.15 00.15 0.57 0.28 00 0 0 1 \u00F0\u009D\u009C\u00BC?, \u00E2\u0084\u00A2 = \u00F0\u009D\u009C\u0085 \u00C2\u00A0\u00C3\u0097 \u00C2\u00A0 0.67 0.33 0 00.33 0 0.67 00 0 0 1 where subscripts report the action symbol, \u00F0\u009D\u009C\u0085 controls the skewness of the prior and has been fixed to 50, so that the corresponding coefficient of variation of the samples is about 0.26. Parameter \u00F0\u009D\u009C\u0086 controls the correlation among the model parameters across components: as \u00F0\u009D\u009C\u0086 decreases, the correlation increases, and it is about 75% given the values reported above. Entries in square brackets define the expected value of transition and emission probabilities: for example, the expected value of the probability that the undamaged turbine becomes damaged under DN is 28%. The costs for repair, visual inspections and down-time due to failure are assumed to be US $10,000, $500, and $50,000, respectively. The discount factor is assumed to be \u00F0\u009D\u009B\u00BE = 0.95 . The initial belief state for all turbines is defined as \u00F0\u009D\u0091\u008F? = 0.8 0.2 0 , which means that the agent believes that, at the beginning of the process, the turbines are in the intact state with 80% probability and in damaged state with 20% probability. 3.3. Scheme for numerical validation To investigate the performance of MU-POMDP and compare it with PLUS, we simulate the response of a system characterized by model \u00F0\u009D\u009A\u00AF\u00E2\u0088\u0097 = \u00F0\u009D\u009B\u0089?\u00E2\u0088\u0097 , \u00F0\u009D\u009B\u0089?\u00E2\u0088\u0097 ,\u00E2\u0080\u00A6 , \u00F0\u009D\u009B\u0089?\u00E2\u0088\u0097 , where \u00F0\u009D\u009B\u0089?\u00E2\u0088\u0097 = \u00F0\u009D\u0090\u0093?\u00E2\u0088\u0097,\u00F0\u009D\u0090\u008E?\u00E2\u0088\u0097 defines transition and emission probabilities for component \u00F0\u009D\u0091\u0098. Comparison is performed in terms of effectiveness of learning and planning. For learning, we evaluate the accuracy in inferring condition state (i.e. the accuracy of beliefs). At time step \u00F0\u009D\u0091\u00A1 , the probability distribution of states for the entire farm is defined as \u00F0\u009D\u0091\u0083 \u00F0\u009D\u0090\u00AC??? \u00F0\u009D\u009A\u00AF\u00E2\u0088\u0097, \u00F0\u009D\u0090\u0099? ,\u00F0\u009D\u0090\u0080? , where \u00F0\u009D\u0090\u00AC? =\u00F0\u009D\u0091\u00A0?,? \u00C2\u00A0, \u00F0\u009D\u0091\u00A0?,? ,\u00E2\u0080\u00A6 , \u00F0\u009D\u0091\u00A0?,? , \u00F0\u009D\u0090\u0099? = \u00F0\u009D\u0091\u008D?,? , \u00F0\u009D\u0091\u008D?,? ,\u00E2\u0080\u00A6 , \u00F0\u009D\u0091\u008D?,? , \u00F0\u009D\u0090\u0080? = \u00F0\u009D\u0090\u00B4?,? ,\u00F0\u009D\u0090\u00B4?,? ,\u00E2\u0080\u00A6 ,\u00F0\u009D\u0090\u00B4?,? . Corresponding distributions not knowing the actual models are \u00F0\u009D\u0091\u0083 \u00F0\u009D\u0090\u00AC??? \u00F0\u009D\u0090\u0099? ,\u00F0\u009D\u0090\u0080? ,\u00E2\u0084\u00B1 , where framework \u00E2\u0084\u00B1 indicate POMPD Comp 1 POMPD Comp 212th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 6 MU-POMDP or PLUS. This latter distribution can be approximated via Monte Carlo: \u00F0\u009D\u0091\u0083 \u00F0\u009D\u0090\u00AC??? \u00F0\u009D\u0090\u0099? ,\u00F0\u009D\u0090\u0080? ,\u00E2\u0084\u00B1 \u00C2\u00A0\u00E2\u0089\u0085 \u00F0\u009D\u0091\u0083 \u00F0\u009D\u0090\u00AC??? \u00F0\u009D\u0090\u0099? ,\u00F0\u009D\u0090\u0080? ,\u00F0\u009D\u009A\u00AF ????? (3) where samples \u00F0\u009D\u009A\u00AF ? ???? ~\u00F0\u009D\u0091\u009D \u00F0\u009D\u009A\u00AF \u00F0\u009D\u0090\u0099? ,\u00F0\u009D\u0090\u0080? ,\u00E2\u0084\u00B1 are generated according to the posterior distribution within framework \u00E2\u0084\u00B1, using the scheme outlined in Figure 2. Error in the inference can be measured by the Kullback-Leibler (KL) divergence (Cover and Thomas 2006), that is a measure of information lost when one distribution is used to approximate another: \u00F0\u009D\u009C\u0080 \u00F0\u009D\u009A\u00AF\u00E2\u0088\u0097, \u00F0\u009D\u0090\u0099? ,\u00F0\u009D\u0090\u0080? ,\u00E2\u0084\u00B1 = \u00C2\u00A0KL \u00F0\u009D\u0091\u0083 \u00F0\u009D\u0090\u00AC??? \u00F0\u009D\u009A\u00AF\u00E2\u0088\u0097, \u00F0\u009D\u0090\u0099? ,\u00F0\u009D\u0090\u0080? , \u00F0\u009D\u0091\u0083(\u00F0\u009D\u0090\u00AC??? \u00E2\u0088\u00A3 \u00F0\u009D\u0090\u0099? ,\u00F0\u009D\u0090\u0080? ,\u00E2\u0084\u00B1) (4) Function \u00F0\u009D\u009C\u0080 \u00F0\u009D\u009A\u00AF\u00E2\u0088\u0097, \u00F0\u009D\u0090\u0099? ,\u00F0\u009D\u0090\u0080? ,\u00E2\u0084\u00B1 depends on the specific framework (MU-POMDP or PLUS) and realization of model, actions, observations. Despite expected value can be taken (as done in Memarzadeh et al. 2014b), in this paper we validate the effectiveness of MU-POMDP vs PLUS on a specific realization. To do so, we have sampled farm model \u00F0\u009D\u009A\u00AF\u00E2\u0088\u0097 from the MU-POMDP priors outlined above, and actions \u00F0\u009D\u0090\u0080? and observations \u00F0\u009D\u0090\u0099? consequently. For \u00F0\u009D\u0091\u00A1 = 60 and 5 turbines, we estimate via sampling that \u00F0\u009D\u009C\u0080 \u00F0\u009D\u009A\u00AF\u00E2\u0088\u0097, \u00F0\u009D\u0090\u0099? ,\u00F0\u009D\u0090\u0080? ,\u00E2\u0084\u00B1 = PLUS = 5.47% , and \u00F0\u009D\u009C\u0080 \u00F0\u009D\u009A\u00AF\u00E2\u0088\u0097, \u00F0\u009D\u0090\u0099? ,\u00F0\u009D\u0090\u0080? ,\u00E2\u0084\u00B1 = MU_POMDP = 2.93% . As expected, since data are simulated from the MU-POMDP framework, this is more effective in identifying the correct beliefs. Figure 5 reports examples of the inference process, plotting samples for one entry in the transition matrix under DN, for component 1 and 2. The red star shows the value used for simulating the data, while the green points shows the samples generated from the prior distribution under MU-POMDP (a), the posterior at \u00F0\u009D\u0091\u00A1 = 60 (b), and the posterior for PLUS (c). It is worth noting that variables are dependent under MU-POMDP and identical under PLUS (and that the generating model assumes similar but not identical values). Figure 5. Examples of samples of model parameter (green dots) and exact value (red star) for MU-POMDP (a-b) and PLUS (c). In light of this, it is worth describing in details how different agents consider the collected observations, for the sake of inferring 0 0.2 0.4 0.6 0.8 100.20.40.60.81Prior, F = MU-POMDP SamplesTrue Model0 0.2 0.4 0.6 0.8 100.20.40.60.81P(S k,2=2S k-1,2=1,ak-1,2=DN, F)Posterior, F = MU-POMDP 0 0.2 0.4 0.6 0.8 100.20.40.60.81P(Sk,1=2 Sk-1,1=1,ak-1,1=DN, F)Posterior, F = PLUS c)b)a)12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 7 the model parameters. According to the MU-POMDP formulation, observations can be partitioned in two subsets. As shown in Figure 1, observations \u00F0\u009D\u0090\u0099?, collected on component \u00F0\u009D\u0091\u0098, are particularly useful to infer model parameters \u00F0\u009D\u009B\u0089? = \u00F0\u009D\u0090\u0093? ,\u00F0\u009D\u0090\u008E? , and we can call them \u00E2\u0080\u009Cdirect measures\u00E2\u0080\u009D. On the other hand, observations \u00F0\u009D\u0090\u0099???, collected on all components except the \u00F0\u009D\u0091\u0098-th one, are also useful for inferring \u00F0\u009D\u009B\u0089?, but only via the hyper-parameters \u00F0\u009D\u009B\u00BC and \u00F0\u009D\u009B\u0083 , and we call them \u00E2\u0080\u009Cindirect measures\u00E2\u0080\u009D. In the limit for \u00F0\u009D\u0090\u00BE and \u00F0\u009D\u0091\u00A1 going to infinite, the set of indirect measures is equivalent to a perfect observation of the hyper-parameters. This, however, would not allow getting a perfect prediction of \u00F0\u009D\u009B\u0089?. On the other hand, for \u00F0\u009D\u0091\u00A1 going to infinite and if all actions are sufficiently explored, the set of direct measures correspond to observing \u00F0\u009D\u009B\u0089? directly. Figure 4 shows that PLUS does not apply the distinction between direct and indirect measures: it puts all measures on the same level, for the sake of inferring \u00F0\u009D\u009B\u0089. We can quantify the effect of learning by measuring the KL at different times: Table 1, reports the KL for transition and emission probabilities, computed with a formula similar to Eq.4. It can be noted how KL decays with time, and how that of MU-POMDP is less than that of PLUS. Table 1. Comparison of KL of MU-POMDP and PLUS for transition and emission models t = 0 t = 60 MU-POMDP: transition 0.0215 0.0348 PLUS: transition 0.0158 0.0161 MU-POMDP: emission 0.0039 0.0120 PLUS: emission 0.0037 0.0053 3.4. Simulation of the planning phase In the last numerical campaign, we investigate the economic impact of adopting MU-POMDP, showing how the more accurate learning algorithm, which accounts for discrepancies in the component models, allows for a more effective planning phase. Figure 6 shows the cost of operation and maintenance for the farm (i.e. the negative reward) as a function of the time step for (i) an agent with perfect knowledge about the actual model parameters (black line), (ii) an agent following MU-POMDP (red line), and (iii) an agent following PLUS (blue line). Estimates are based 16 independent simulations in the time domain. Agent (i) represents a lower limit for the cost, as she has no uncertainty on the model parameters. For the specific example, the benefit of adopting MU-POMDP instead of PLUS can be quantified in about 2K$ per turbine per time steps, that is a value similar to the gap between a perfect knowledge on the model and MU-POMDP. Of course, these values depend on the specific application, and are not constant in time. Coefficient of variation for results is around 0.2. Figure 6. Average cost of O&M per time and turbine for three agents. 4. CONCLUSIONS We have proposed the MU-POMDP framework using hierarchical Bayesian modeling approach. It extends our previously proposed PLUS algorithm by allowing selecting a level of similarity between components of the system. The computational complexity of the MU-POMDP framework is higher than PLUS as it requires an extra layer of hyper-parameters to model the dependence among components and it is linear with number of components. Specifically, MU-POMDP makes use of the M-H scheme, and this asks for numerical calibration, 0 20 40 60 80 100302520151050time step tcost per time step and turbine [K$] True ModelPLUSMU-POMDP12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 8 e.g. in the selection of appropriate proposal distributions and burn-in phases. The practical efficiency of this framework will depend on its detailed numerical implementation. However, we have shown on a simple application that it has the potential for significant improvement with respect to other formulations. Furthermore, applications to systems with high costs for operation and maintenance, as wind farms, easily justify the adoption of accurate and computationally complex frameworks. ACKNOWLEDGEMENTS This work is supported in part by the Pennsylvania Infrastructure Technology Alliance, a partnership of Carnegie Mellon, Lehigh University and the Commonwealth of Pennsylvania\u00E2\u0080\u0099s Department of Community and Economic Development (DCED), via grant PITA YR16 31571.1.9.1042204. The authors also acknowledge Kevin Wigell and EverPower Wind Holdings for their collaboration. 5. REFERENCES Bertsekas, D.P. (1996). \u00E2\u0080\u009CDynamic programming and optimal control\u00E2\u0080\u009D Athena Scientific, Belmont, MA. Byon, E., Ntaimo, L., and Ding, Y. (2010). \u00E2\u0080\u009COptimal maintenance strategies for wind turbine system under stochastic weather conditions\u00E2\u0080\u009D IEEE Transaction on Reliability, 59(2), 393-404. Byon, E., and Ding, Y. (2010). \u00E2\u0080\u009CSeason-dependent condition-based maintenance for a wind turbine using a partially observed Markov decision process\u00E2\u0080\u009D IEEE transaction on Power Systems, 25(4), 1823-1834. Carter, C.K., and Kohn, R. (1994). \u00E2\u0080\u009COn Gibbs sampling for state space models\u00E2\u0080\u009D Biometrika, 81(3), 541-553. Cover, T.M., and Thomas, J.A. (2006). \u00E2\u0080\u009CElements of information theory\u00E2\u0080\u009D John Wiley & Sons, Inc. Fruhwirth-Schnatter, S. (2006). \u00E2\u0080\u009CFinite mixture and Markov switching models\u00E2\u0080\u009D Springer, New York. Jaulmes, R., Pineau, J., and Precup, D. (2005a). \u00E2\u0080\u009CActive learning in partially observable Markov decision processes\u00E2\u0080\u009D European Conf. on Machine Learning, Porto, Portugal, 601-608. Jaulmes, R., Pineau, J., and Precup, D. (2005b). \u00E2\u0080\u009CLearning in non-stationary partially observable Markov decision processes\u00E2\u0080\u009D Eur. Conf. on Machine Learning Workshop on Reinf. Learning in Non-Stationary Environments, Porto, Portugal. Jensen, F.V., and Nielsen, T.D. (2007). \u00E2\u0080\u009CBayesian networks and decision graphs\u00E2\u0080\u009D Springer. Kemp, C, Perfors, A., and Tenenbaum, J.B. (2007). \u00E2\u0080\u009CLearning overhypotheses with hierarchical Bayesian models\u00E2\u0080\u009D Developmental Science, 10(3), 307-331. MacKay, D.J.C. (2003). \u00E2\u0080\u009CInformation theory, inference, and learning algorithms\u00E2\u0080\u009D Cambridge University Press, Cambridge, UK. Memarzadeh, M., Pozzi, M., and Kolter, J.Z. (2013). \u00E2\u0080\u009CProbabilistic learning and planning for optimal management of wind farms\u00E2\u0080\u009D Proc. 9th Int. Workshop on Structural Health Monitoring, Stanford, CA, 2720-2728. Memarzadeh, M., Pozzi, M., and Kolter, J.Z. (2014a). \u00E2\u0080\u009COptimal planning and learning in uncertain environments for the management of wind farms\u00E2\u0080\u009D ASCE J. of Computing in Civil Engineering, DOI: 10.1061/(ASCE)CP. 1943-5487.0000390. Memarzadeh, M., Pozzi, M., and Kolter, J.Z. (2014b). \u00E2\u0080\u009CManaging systems made up by similar components: A probabilistic framework for the maintenance of wind farms\u00E2\u0080\u009D Proc. 6th World Conf. on Structural Control and Monitoring, Barcelona, Spain. Nielesen, J.S., and Sorensen, J.D. (2012). \u00E2\u0080\u009CMaintenance optimization for offshore wind turbines using POMDP\u00E2\u0080\u009D Proc. 16th Conf. of Int. Federation for Information Processing on Reliability and Optimization of Structural Systems, 175-182. Ross, S., Pineau, B., Chaib-draa, B., and Kreitmann, P. (2011). \u00E2\u0080\u009CA Bayesian approach for learning and planning in partially observable Markov decision process\u00E2\u0080\u009D J. Machine Learning Research, 12, 1729-1770. Sutton, R.S., and Barto, A.G. (1998). \u00E2\u0080\u009CReinforcement learning: An introduction\u00E2\u0080\u009D MIT Press, Cambridge, MA. Smallwood, R.D., and Sondik, E.J. (1973). \u00E2\u0080\u009CThe optimal control of partially observable Markov processes over a finite horizon\u00E2\u0080\u009D Operations research, 21(5), 1071-1088. Sondik, E.J. (1978). \u00E2\u0080\u009CThe optimal control of partially observable Markov processes over the infinite horizon\u00E2\u0080\u009D Operations Research, 26(2), 282-304. "@en . "This collection contains the proceedings of ICASP12, the 12th International Conference on Applications of Statistics and Probability in Civil Engineering held in Vancouver, Canada on July 12-15, 2015. Abstracts were peer-reviewed and authors of accepted abstracts were invited to submit full papers. Also full papers were peer reviewed. The editor for this collection is Professor Terje Haukaas, Department of Civil Engineering, UBC Vancouver."@en . "Conference Paper"@en . "10.14288/1.0076244"@en . "eng"@en . "Unreviewed"@en . "Vancouver : University of British Columbia Library"@en . "Attribution-NonCommercial-NoDerivs 2.5 Canada"@en . "http://creativecommons.org/licenses/by-nc-nd/2.5/ca/"@en . "Faculty"@en . "Researcher"@en . "Hierachical modeling of systems with similar components"@en . "Text"@en . "http://hdl.handle.net/2429/53395"@en .