12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 1 Hierarchical Modeling of Systems with Similar Components Milad Memarzadeh Doctoral Candidate, Dept. of Civil and Env. Eng., Carnegie Mellon University, Pittsburgh, USA Matteo Pozzi Assistant Professor, Dept. of Civil and Env., Carnegie Mellon University, Pittsburgh, USA J. Zico Kolter Assistant Professor, Dept. of Computer Science, Carnegie Mellon University, Pittsburgh, USA ABSTRACT: Identifying optimal management policies for systems made up by similar components is a challenging task, due to dependence in the componentsโ behavior. In this setting, observations collected on one component are also relevant for learning the behavior of others. Probabilistic graphical models allow for consistent inference using all available data, taking dependence among components into account, while optimizing system operation. In this paper we propose a framework for management of systems made by similar components based on hierarchical Bayesian modeling, called Multiple Uncertain Partially Observable Markov Decision Processes (MU-POMDP), that overcomes some limitations of a previously proposed approaches. We describe a detailed numerical algorithm to learn the system parameters within this framework and we investigate its performance with an example of management of a wind farm (i.e., the system) made up by turbines of the same type (i.e., the components). 1. INTRODUCTION Optimal policies for operation and maintenance of systems can be identified under the assumption of independent components, or of components governed by the same model. However, it is challenging to represent systems made by different but similar components. In this case, component parameters should be treated as a set of interdependent random variables, and inference performed so that observations collected on any component is also relevant, to a certain degree, in the management of others. Many infrastructure systems can be modeled using this framework. Markov Decision Process (MDP) (Bertsekas 1996, Sutton and Barto 1998) and Partially Observable MDP (POMDP) (Smallwood and Sondik 1973, Sondik 1978) are classic frameworks for sequential operation. Specifically, POMDP assumes that evolution of the component state follows the Markov property, but allows for partial and/or imperfect observation of that state. One limitation of these frameworks is that the stochastic modeling of the state evolution (i.e. the โtransition probabilitiesโ) and, in case of POMDP, of the observations (i.e. the โemission probabilitiesโ) is not affected by uncertainty nor it is updated by data processing. However, in many applications, these models are uncertain, and different for each individual component. Ross et al. (2011) have proposed the Bayes-Adaptive POMDP (BA-POMDP) framework that allows to treat models as random variables and to update their distribution as data are analyzed during the management process. However, exact algorithms in this framework require high computational complexity and become easily intractable for systems with a high number components and states. Approximate methods for optimal management under model uncertainty have been proposed by Jaulmes et al. (2005a,b) and, recently, by ourselves 12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 2 (Memarzadeh et al. 2013,2014a). The latter method, called PLUS (Planning and Learning in Uncertain dynamic Systems), is structured in two phases: learning and planning. In the learning phase, it makes use of Markov Chain Monte Carlo (MCMC) Gibbs sampling (Carter and Kohn 1994). The planning phase is based on an approximation which neglects the exploratory value of future learning of the model, to make the algorithm efficient and tractable. Details about PLUS can be found in Memarzadeh et al. (2014a). When applied to the management of a set of components, the learning approach of PLUS can be implemented in two alternative ways: independent models can be tuned for each component, or a single model can be updated by processing data from all components. While these two implementations are suitable for some applications, they are not for systems made by components that we assume to be similar, but not identical. Following a preliminary publication on systems with similar components (Memarzadeh et al. 2014b), in this paper we generalize PLUS to a framework we called Multiple Uncertain POMDP (MU-POMDP) that is based on hierarchical Bayesian modeling. This framework allows assuming specific levels of similarities among components, and consistently processing observations collected at system level. The motivation for this research is to improve operation and maintenance of wind farms. MDP and POMDP have been applied to management in this field (Byon et al. 2010, Byon and Ding 2010, Nielsen and Sorensen 2012). Turbines can be understood as components, and their deterioration modeled as a stochastic process, depending on the actions taken. Epistemic uncertainties on deterioration of different components can be assumed as dependent, as turbines are instances of the same structural model. On the other hand, differences on component construction and position lead to slight or significant dissimilarities among deterioration processes. Similar considerations can be made on the monitoring systems providing observations about the condition state. For these reasons, MU-POMDP can be a suitable framework for this application. The following parts of this paper present the MU-POMDP framework (2), and compare its performance with that of PLUS on a numerical example (3) before drawing conclusions (4). 2. THE MU-POMDP SCHEME The MU-POMDP scheme targets the management of a set of components, each defined by a POMDP whose transition and emission probabilities are modeled as dependent random variables defined by hyper-parameters. Figure 1 shows the graphical model of the MU-POMDP, where circles define random variables, squares decision variables, diamonds utility variables, dots fixed parameters, and arrows define dependence among variables. In that graph, ๐ indicates the component state, ๐ด the maintenance action, ๐ the available observation and ๐ the monetary reward (or loss, if value is negative). Subscript โ๐, ๐กโ refers the variable to component ๐ at time ๐ก. Variables ๐ are shaded to indicate that they can be directly observed. We consider a ๐พ -component system. For each ๐ = 1,โฆ ,๐พ, model parameter ๐? indicates the transition probability of component ๐, and model parameter ๐? indicates the corresponding emission probability. Figure 1. MU-POMDP scheme for a two-component system represented as a probabilistic graphical model. POMPD Comp 1 POMPD Comp 212th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 3 Additional layers model the prior distributions: hyper-parameters are marked as ๐ผ? , ๐? , ๐ผ? and ๐? in Figure 1: the first two define the dependence in the transitions, while the latter ones that of emissions. While model parameters are different for each component, hyper-parameters are common to the entire system. Parameter ๐? and ๐?, ๐?, and ๐? define the distribution of those hyper-parameters. As apparent in Figure 1, the model parameters (e.g. ๐? and ๐? ) of different components are not marginally independent, because of the common hyper-parameter parents. Consequently, observations on any component, by affecting the knowledge of the hyper-parameters, affect in turn model parameters of other components as well. The overall management task is to select actions to minimize the expected sum of the discounted losses (Bertsekas 1996). As in the PLUS approach, MU-POMDP is composed by two phases: learning and planning. The planning phase is the same as PLUS (Memarzadeh et al. 2014a), as the samples produced by the learning phase are fully compatible with PLUS planning, without the need for any adjustment. The learning phase represents the posterior distribution of all variables, conditional to all observations ๐ and actions ๐ด observed up to present time. In principle, once each distribution in the probabilistic graphical model is analytically defined, posterior distributions could be computed theoretically. However, exact learning is generally not feasible for the graph presented in Figure 1, and approximate methods need to be adopted. 2.1. MCMC updating scheme Extending the approach used in PLUS, learning in MU-POMDP is based on Gibbs sampling, which is an effective implementation of MCMC. Figure 2 reports a scheme of the inference process. In the figure, the upper bar indicates a collection of variables, from the beginning of the management process up to a specific time. For example, ๐?,? indicates the state trajectory ๐?,?,โฆ , ๐?,? for component ๐. The superscript (๐) refers to the ๐-th samples generated by the MCMC algorithm. At component level, the sampling of states and model parameters is identical to that adopted by PLUS. At system level, the hyper-parameters are generated conditional to the sampled model parameters for all components. This task can be accomplished, if needed, by using the Metropolis-Hastings (M-H) approach (MacKay 2003). In summary, Figure 2 can be read as a recipe for generating samples from the joint posterior distribution: after initialization, state trajectories and model parameters are sampled for each component, then hyper-parameters are sampled at system level, and these steps are alternated. The Burn-in phase is discarded and the remaining part of the random walk is kept. Figure 2. The proposed MCMC sampling approach 2.2. Probabilistic models and inference The graphical model in Figure 1 requires a specific assignment of marginal and conditional distributions for every random variable. A feasible set of distributions for MU-POMDP, inspired by Kemp et al. (2007), is defined as follows: ๐ผ? ย ~ ย Exponential ๐? , ๐ผ?~ ย Exponential ๐? ย ๐? ย ~ ย Dirichlet ๐? , ๐? ย ~ ย Dirichlet ๐? ย ๐? ย ~ ย Dirichlet ๐ผ?๐? , ๐? ย ~ ย Dirichlet ๐ผ?๐? ย ๐? โฅ ๐? โฃ ๐ผ? , ๐? , ๐? โฅ ๐? โฃ ๐ผ? , ๐? ย ย ย ๐?,? ย โฃ ๐?,???, ๐ด?,? ย ~ ย Multinomial ๐? ย ย ย ๐?,? โฃ ๐?,? , ๐ด?,? ย ย ย ย ย ~ ย Multinomial ๐? (1) As in the PLUS approach, state trajectories are sampled using forward filtering backward Gibbs Sampling@ component levelM-H Sampling @ system levelstates model parametershyper-parameters12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 4 sampling (FFBS) (Fruhwirth-Schnatter 2006). The Dirichlet distribution on model parameters is appropriate in this context, because it is conjugate prior for the multinomial distribution, and this facilitates the implementation of the Gibbs approach. Because of this, samples of model parameters can be easily generated, as in the PLUS algorithm. It is worth clarifying the role of hyper-parameters ๐ผ and ๐ in the definition of the prior distribution of model parameters. Matrix ๐ defines the expected value of the corresponding model parameters, while scalar variable ๐ผ affects the uncertainty of model parameters: a high value of ๐ผ induces a low variance. Figure 3. M-H algorithm for sampling hyper-parameters on transition probabilities. Samples of hyper-parameters ๐ผ? and ๐? can be generated following the recipe reported in Figure 3, based on the M-H algorithm (the corresponding procedure for ๐ผ? and ๐? being identical, with obvious changes in the input variables). Input variables are the parameters defining the prior distribution (๐? and ๐?), the transition probabilities for all components (๐) which, following Gibbs, are obtained by sampling. Inputs ๐? and ๐? control the step size of the M-H proposal distribution in the direction of ๐ผ? and ๐ฝ? respectively, while ๐ฝ is the length of the Markov Chain. In Figure 3, Dirichlet ๐ฑ; ๐ฒ is the value assumed by the Dirichlet distribution with parameters ๐ฒ at ๐ฑ. ๐ indicates the un-normalized joint distribution of hyper-parameters and model parameters that, following Eq. (1), reads: ๐ ๐, ๐ผ? , ๐? , ๐? , ๐?= ๐? exp โ๐?๐ผ?รDirichlet ๐?;๐?ร Dirichlet ๐?; ๐ผ? , ๐????? (2) At any state during the management process, this procedure provides samples of model parameters, hyper-parameters and state, to be used in the planning phase. 3. NUMERICAL VALIDATION 3.1. PLUS as alternative processing approach We compare MU-POMDP performance with PLUS on a numerical example similar to that presented in Memarzadeh et al. (2014a). Figure 4 shows the graphical models for PLUS algorithm used for comparison, which models transition and emission for all components as identical. Actually, it should be noted that, for practical applications of PLUS, the additional layer of hyper-parameters is unnecessary and the prior of model parameters can be defined directly. However, in order to achieve a fair comparison between MU-POMDP and PLUS, we make use of the same arrangement of layers and values for ๐, ๐ผ , and for conditional distributions of hyper-parameters and model parameters, to get the same marginal distribution for model parameters across the two approaches. 3.2. Parameters for numerical investigation For the purpose of validation, we consider a wind farm made up by 5 turbines of the same type placed in similar environmental conditions. The state condition of each turbine is discretized into three possible states where ๐ = 1 refers to an intact structure, ๐ = 2 to a damaged one, and ๐ = 3 to the failure of the turbine; the agent input: , , , , ,initialize ,for dosample sample p-ratio q-ratio accept = p-ratio q-ratiosample if , else, endendoutput: hyper-parameters , 12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 5 receives observations from a set of four possible observations where ๐ง = 1 suggests that the turbine is undamaged, ๐ง = 2 and ๐ง = 3 indicate two symptoms of damage, and ๐ง = 4 indicates the failure of the turbine; three actions are available: Do-Nothing (DN), Repair (RE), and Visual Inspection (VI). When the agent chooses DN, the condition state of the turbine degrades owing to fatigue and aging, potentially causing a structural failure and a relevant economical loss. In turn, the agent can perform a costly intervention (i.e., RE) to avoid failure and improve the condition state of the turbines. VI better measures the condition state of the turbine (that evolves according to the degradation model, as for DN). Each time step is assumed to be six months, and the agent takes one action per turbine at each time step. Figure 4. Probabilistic graphical models of PLUS Parameters of priors for hyper-parameters are fixed as follow: ๐? = ๐? = 1 1000 ๐ผ?, โข , โข = ๐ ย ร ย 0.57 0.28 0.150 0.67 0.330 0 1 ๐ผ?, โข = ๐ ย ร ย 0.67 0.33 00.67 0.33 00.67 0.33 0 ๐ผ?, โข , โข = ๐ ย ร ย 0.57 0.28 0.15 00.15 0.57 0.28 00 0 0 1 ๐ผ?, โข = ๐ ย ร ย 0.67 0.33 0 00.33 0 0.67 00 0 0 1 where subscripts report the action symbol, ๐ controls the skewness of the prior and has been fixed to 50, so that the corresponding coefficient of variation of the samples is about 0.26. Parameter ๐ controls the correlation among the model parameters across components: as ๐ decreases, the correlation increases, and it is about 75% given the values reported above. Entries in square brackets define the expected value of transition and emission probabilities: for example, the expected value of the probability that the undamaged turbine becomes damaged under DN is 28%. The costs for repair, visual inspections and down-time due to failure are assumed to be US $10,000, $500, and $50,000, respectively. The discount factor is assumed to be ๐พ = 0.95 . The initial belief state for all turbines is defined as ๐? = 0.8 0.2 0 , which means that the agent believes that, at the beginning of the process, the turbines are in the intact state with 80% probability and in damaged state with 20% probability. 3.3. Scheme for numerical validation To investigate the performance of MU-POMDP and compare it with PLUS, we simulate the response of a system characterized by model ๐ฏโ = ๐?โ , ๐?โ ,โฆ , ๐?โ , where ๐?โ = ๐?โ,๐?โ defines transition and emission probabilities for component ๐. Comparison is performed in terms of effectiveness of learning and planning. For learning, we evaluate the accuracy in inferring condition state (i.e. the accuracy of beliefs). At time step ๐ก , the probability distribution of states for the entire farm is defined as ๐ ๐ฌ??? ๐ฏโ, ๐? ,๐? , where ๐ฌ? =๐ ?,? ย , ๐ ?,? ,โฆ , ๐ ?,? , ๐? = ๐?,? , ๐?,? ,โฆ , ๐?,? , ๐? = ๐ด?,? ,๐ด?,? ,โฆ ,๐ด?,? . Corresponding distributions not knowing the actual models are ๐ ๐ฌ??? ๐? ,๐? ,โฑ , where framework โฑ indicate POMPD Comp 1 POMPD Comp 212th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 6 MU-POMDP or PLUS. This latter distribution can be approximated via Monte Carlo: ๐ ๐ฌ??? ๐? ,๐? ,โฑ ย โ ๐ ๐ฌ??? ๐? ,๐? ,๐ฏ ????? (3) where samples ๐ฏ ? ???? ~๐ ๐ฏ ๐? ,๐? ,โฑ are generated according to the posterior distribution within framework โฑ, using the scheme outlined in Figure 2. Error in the inference can be measured by the Kullback-Leibler (KL) divergence (Cover and Thomas 2006), that is a measure of information lost when one distribution is used to approximate another: ๐ ๐ฏโ, ๐? ,๐? ,โฑ = ย KL ๐ ๐ฌ??? ๐ฏโ, ๐? ,๐? , ๐(๐ฌ??? โฃ ๐? ,๐? ,โฑ) (4) Function ๐ ๐ฏโ, ๐? ,๐? ,โฑ depends on the specific framework (MU-POMDP or PLUS) and realization of model, actions, observations. Despite expected value can be taken (as done in Memarzadeh et al. 2014b), in this paper we validate the effectiveness of MU-POMDP vs PLUS on a specific realization. To do so, we have sampled farm model ๐ฏโ from the MU-POMDP priors outlined above, and actions ๐? and observations ๐? consequently. For ๐ก = 60 and 5 turbines, we estimate via sampling that ๐ ๐ฏโ, ๐? ,๐? ,โฑ = PLUS = 5.47% , and ๐ ๐ฏโ, ๐? ,๐? ,โฑ = MU_POMDP = 2.93% . As expected, since data are simulated from the MU-POMDP framework, this is more effective in identifying the correct beliefs. Figure 5 reports examples of the inference process, plotting samples for one entry in the transition matrix under DN, for component 1 and 2. The red star shows the value used for simulating the data, while the green points shows the samples generated from the prior distribution under MU-POMDP (a), the posterior at ๐ก = 60 (b), and the posterior for PLUS (c). It is worth noting that variables are dependent under MU-POMDP and identical under PLUS (and that the generating model assumes similar but not identical values). Figure 5. Examples of samples of model parameter (green dots) and exact value (red star) for MU-POMDP (a-b) and PLUS (c). In light of this, it is worth describing in details how different agents consider the collected observations, for the sake of inferring 0 0.2 0.4 0.6 0.8 100.20.40.60.81Prior, F = MU-POMDP SamplesTrue Model0 0.2 0.4 0.6 0.8 100.20.40.60.81P(S k,2=2S k-1,2=1,ak-1,2=DN, F)Posterior, F = MU-POMDP 0 0.2 0.4 0.6 0.8 100.20.40.60.81P(Sk,1=2 Sk-1,1=1,ak-1,1=DN, F)Posterior, F = PLUS c)b)a)12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 7 the model parameters. According to the MU-POMDP formulation, observations can be partitioned in two subsets. As shown in Figure 1, observations ๐?, collected on component ๐, are particularly useful to infer model parameters ๐? = ๐? ,๐? , and we can call them โdirect measuresโ. On the other hand, observations ๐???, collected on all components except the ๐-th one, are also useful for inferring ๐?, but only via the hyper-parameters ๐ผ and ๐ , and we call them โindirect measuresโ. In the limit for ๐พ and ๐ก going to infinite, the set of indirect measures is equivalent to a perfect observation of the hyper-parameters. This, however, would not allow getting a perfect prediction of ๐?. On the other hand, for ๐ก going to infinite and if all actions are sufficiently explored, the set of direct measures correspond to observing ๐? directly. Figure 4 shows that PLUS does not apply the distinction between direct and indirect measures: it puts all measures on the same level, for the sake of inferring ๐. We can quantify the effect of learning by measuring the KL at different times: Table 1, reports the KL for transition and emission probabilities, computed with a formula similar to Eq.4. It can be noted how KL decays with time, and how that of MU-POMDP is less than that of PLUS. Table 1. Comparison of KL of MU-POMDP and PLUS for transition and emission models t = 0 t = 60 MU-POMDP: transition 0.0215 0.0348 PLUS: transition 0.0158 0.0161 MU-POMDP: emission 0.0039 0.0120 PLUS: emission 0.0037 0.0053 3.4. Simulation of the planning phase In the last numerical campaign, we investigate the economic impact of adopting MU-POMDP, showing how the more accurate learning algorithm, which accounts for discrepancies in the component models, allows for a more effective planning phase. Figure 6 shows the cost of operation and maintenance for the farm (i.e. the negative reward) as a function of the time step for (i) an agent with perfect knowledge about the actual model parameters (black line), (ii) an agent following MU-POMDP (red line), and (iii) an agent following PLUS (blue line). Estimates are based 16 independent simulations in the time domain. Agent (i) represents a lower limit for the cost, as she has no uncertainty on the model parameters. For the specific example, the benefit of adopting MU-POMDP instead of PLUS can be quantified in about 2K$ per turbine per time steps, that is a value similar to the gap between a perfect knowledge on the model and MU-POMDP. Of course, these values depend on the specific application, and are not constant in time. Coefficient of variation for results is around 0.2. Figure 6. Average cost of O&M per time and turbine for three agents. 4. CONCLUSIONS We have proposed the MU-POMDP framework using hierarchical Bayesian modeling approach. It extends our previously proposed PLUS algorithm by allowing selecting a level of similarity between components of the system. The computational complexity of the MU-POMDP framework is higher than PLUS as it requires an extra layer of hyper-parameters to model the dependence among components and it is linear with number of components. Specifically, MU-POMDP makes use of the M-H scheme, and this asks for numerical calibration, 0 20 40 60 80 100302520151050time step tcost per time step and turbine [K$] True ModelPLUSMU-POMDP12th International Conference on Applications of Statistics and Probability in Civil Engineering, ICASP12 Vancouver, Canada, July 12-15, 2015 8 e.g. in the selection of appropriate proposal distributions and burn-in phases. The practical efficiency of this framework will depend on its detailed numerical implementation. However, we have shown on a simple application that it has the potential for significant improvement with respect to other formulations. Furthermore, applications to systems with high costs for operation and maintenance, as wind farms, easily justify the adoption of accurate and computationally complex frameworks. ACKNOWLEDGEMENTS This work is supported in part by the Pennsylvania Infrastructure Technology Alliance, a partnership of Carnegie Mellon, Lehigh University and the Commonwealth of Pennsylvaniaโs Department of Community and Economic Development (DCED), via grant PITA YR16 31571.1.9.1042204. The authors also acknowledge Kevin Wigell and EverPower Wind Holdings for their collaboration. 5. REFERENCES Bertsekas, D.P. (1996). โDynamic programming and optimal controlโ Athena Scientific, Belmont, MA. Byon, E., Ntaimo, L., and Ding, Y. (2010). โOptimal maintenance strategies for wind turbine system under stochastic weather conditionsโ IEEE Transaction on Reliability, 59(2), 393-404. Byon, E., and Ding, Y. (2010). โSeason-dependent condition-based maintenance for a wind turbine using a partially observed Markov decision processโ IEEE transaction on Power Systems, 25(4), 1823-1834. Carter, C.K., and Kohn, R. (1994). โOn Gibbs sampling for state space modelsโ Biometrika, 81(3), 541-553. Cover, T.M., and Thomas, J.A. (2006). โElements of information theoryโ John Wiley & Sons, Inc. Fruhwirth-Schnatter, S. (2006). โFinite mixture and Markov switching modelsโ Springer, New York. Jaulmes, R., Pineau, J., and Precup, D. (2005a). โActive learning in partially observable Markov decision processesโ European Conf. on Machine Learning, Porto, Portugal, 601-608. Jaulmes, R., Pineau, J., and Precup, D. (2005b). โLearning in non-stationary partially observable Markov decision processesโ Eur. Conf. on Machine Learning Workshop on Reinf. Learning in Non-Stationary Environments, Porto, Portugal. Jensen, F.V., and Nielsen, T.D. (2007). โBayesian networks and decision graphsโ Springer. Kemp, C, Perfors, A., and Tenenbaum, J.B. (2007). โLearning overhypotheses with hierarchical Bayesian modelsโ Developmental Science, 10(3), 307-331. MacKay, D.J.C. (2003). โInformation theory, inference, and learning algorithmsโ Cambridge University Press, Cambridge, UK. Memarzadeh, M., Pozzi, M., and Kolter, J.Z. (2013). โProbabilistic learning and planning for optimal management of wind farmsโ Proc. 9th Int. Workshop on Structural Health Monitoring, Stanford, CA, 2720-2728. Memarzadeh, M., Pozzi, M., and Kolter, J.Z. (2014a). โOptimal planning and learning in uncertain environments for the management of wind farmsโ ASCE J. of Computing in Civil Engineering, DOI: 10.1061/(ASCE)CP. 1943-5487.0000390. Memarzadeh, M., Pozzi, M., and Kolter, J.Z. (2014b). โManaging systems made up by similar components: A probabilistic framework for the maintenance of wind farmsโ Proc. 6th World Conf. on Structural Control and Monitoring, Barcelona, Spain. Nielesen, J.S., and Sorensen, J.D. (2012). โMaintenance optimization for offshore wind turbines using POMDPโ Proc. 16th Conf. of Int. Federation for Information Processing on Reliability and Optimization of Structural Systems, 175-182. Ross, S., Pineau, B., Chaib-draa, B., and Kreitmann, P. (2011). โA Bayesian approach for learning and planning in partially observable Markov decision processโ J. Machine Learning Research, 12, 1729-1770. Sutton, R.S., and Barto, A.G. (1998). โReinforcement learning: An introductionโ MIT Press, Cambridge, MA. Smallwood, R.D., and Sondik, E.J. (1973). โThe optimal control of partially observable Markov processes over a finite horizonโ Operations research, 21(5), 1071-1088. Sondik, E.J. (1978). โThe optimal control of partially observable Markov processes over the infinite horizonโ Operations Research, 26(2), 282-304.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- International Conference on Applications of Statistics and Probability in Civil Engineering (ICASP) (12th : 2015) /
- Hierachical modeling of systems with similar components
Open Collections
International Conference on Applications of Statistics and Probability in Civil Engineering (ICASP) (12th : 2015)
Hierachical modeling of systems with similar components Memarzadeh, Milad; Pozzi, Matteo; Kolter, J. Zico 2015-07
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Hierachical modeling of systems with similar components |
Creator |
Memarzadeh, Milad Pozzi, Matteo Kolter, J. Zico |
Contributor |
International Conference on Applications of Statistics and Probability (12th : 2015 : Vancouver, B.C.) |
Date Issued | 2015-07 |
Description | Identifying optimal management policies for systems made up by similar components is a challenging task, due to dependence in the componentsโ behavior. In this setting, observations collected on one component are also relevant for learning the behavior of others. Probabilistic graphical models allow for consistent inference using all available data, taking dependence among components into account, while optimizing system operation. In this paper we propose a framework for management of systems made by similar components based on hierarchical Bayesian modeling, called Multiple Uncertain Partially Observable Markov Decision Processes (MU-POMDP), that overcomes some limitations of a previously proposed approaches. We describe a detailed numerical algorithm to learn the system parameters within this framework and we investigate its performance with an example of management of a wind farm (i.e., the system) made up by turbines of the same type (i.e., the components). |
Genre |
Conference Paper |
Type |
Text |
Language | eng |
Notes | This collection contains the proceedings of ICASP12, the 12th International Conference on Applications of Statistics and Probability in Civil Engineering held in Vancouver, Canada on July 12-15, 2015. Abstracts were peer-reviewed and authors of accepted abstracts were invited to submit full papers. Also full papers were peer reviewed. The editor for this collection is Professor Terje Haukaas, Department of Civil Engineering, UBC Vancouver. |
Date Available | 2015-05-25 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0076244 |
URI | http://hdl.handle.net/2429/53395 |
Affiliation |
Non UBC |
Citation | Haukaas, T. (Ed.) (2015). Proceedings of the 12th International Conference on Applications of Statistics and Probability in Civil Engineering (ICASP12), Vancouver, Canada, July 12-15. |
Peer Review Status | Unreviewed |
Scholarly Level | Faculty Researcher |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 53032-Paper_484_Pozzi.pdf [ 5.86MB ]
- Metadata
- JSON: 53032-1.0076244.json
- JSON-LD: 53032-1.0076244-ld.json
- RDF/XML (Pretty): 53032-1.0076244-rdf.xml
- RDF/JSON: 53032-1.0076244-rdf.json
- Turtle: 53032-1.0076244-turtle.txt
- N-Triples: 53032-1.0076244-rdf-ntriples.txt
- Original Record: 53032-1.0076244-source.json
- Full Text
- 53032-1.0076244-fulltext.txt
- Citation
- 53032-1.0076244.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.53032.1-0076244/manifest