UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Taking advantage of common assumptions in policy optimization and reinforcement learning Lavington, Jonathan Wilder

Abstract

This work considers training conditional probability distributions called policies, using simulated environments via gradient-based optimization methods. It begins by investigating the effects that complex model classes have on settings where a policy is learned through the imitation of expert data which is gathered through repeated environmental interaction. Next, it discusses how to build a gradient based optimizer which is tailored specifically to policy optimization where querying gradient information is expensive. We then consider policy optimization settings which contain imperfect expert demonstration, and design an algorithm which utilizes additional information available during training to improve the policy performance at test time and the efficiency of learning. Lastly, we consider how to generate behavioral data which satisfies hard constraints by using a combination of learned inference artifacts and a special variant of sequential Monte Carlo.

Item Media

Item Citations and Data

Rights

Attribution-NoDerivatives 4.0 International