UBC Theses and Dissertations
Learning and planning in structured worlds Dearden, Richard W.
This thesis is concerned with the problem of how to make decisions in an uncertain world. We use a model of uncertainty based on Markov decision problems, and develop a number of algorithms for decision-making both for the planning problem, in which the model is known in advance, and for the reinforcement learning problem in which the decision-making agent does not know the model and must learn to make good decisions by trial and error. The basis for much of this work is the use of structural representations of problems. If a problem is represented in a structured way we can compute or learn plans that take advantage of this structure for computational gains. This is because the structure allows us to perform abstraction. Rather than reasoning about each situation in which a decision must be made individually, abstraction allows us to group situations together and reason about a whole set of them in a single step. Our approach to abstraction has the additional advantage that we can dynamically change the level of abstraction, splitting a group of situations in two if they need to be reasoned about separately to find an acceptable plan, or merging two groups together if they no longer need to be distinguished. We present two planning algorithms and one learning algorithm that use this approach. A second idea we present in this thesis is a novel approach to the exploration problem in reinforcement learning. The problem is to select actions to perform given that we would like good performance now and in the future. We can select the current best action to perform, but this may prevent us from discovering that another action is better, or we can take an exploratory action, but we risk performing poorly now as a result. Our Bayesian approach makes this tradeoff explicit by representing our uncertainty about the values of states and using this measure of uncertainty to estimate the value of the information we could gain by performing each action. We present both model-free and model-based reinforcement learning algorithms that make use of this exploration technique. Finally, we show how these ideas fit together to produce a reinforcement learning algorithm that uses structure to represent both the problem being solved and the plan it learns, and that selects actions to perform in order to learn using our Bayesian approach to exploration.
Item Citations and Data