UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Scalable deep reinforcement learning for physics-based motion control Berseth, Glen


This thesis studies the broad problem of learning robust control policies for difficult physics-based motion control tasks such as locomotion and navigation. A number of avenues are explored to assist in learning such control. In particular, are there underlying structures in the motor-learning system that enable learning solutions to complex tasks? How are animals able to learn new skills so efficiently? Animals may be learning and using implicit models of their environment to assist in planning and exploration. These potential structures motivate the design of learning systems and in this thesis, we study their effectiveness on physically simulated and robotic motor-control tasks. Five contributions that build on motion control using deep reinforcement learning are presented. First, a case study on the motion control problem of brachiation, the movement of gibbons through trees is presented. This work compares parametric and non-parametric models for reinforcement learning. The difficulty of this motion control problem motivates separating the control problem into multiple levels. Second, a hierarchical decomposition is presented that enables efficient learning by operating across multiple time scales for a complex locomotion and navigation task. First, reinforcement learning is used to acquire a low-level, high-frequency policy for joint actuation, used for bipedal footstep-directed walking. Subsequently, an additional policy is learned that provides directed footstep plans to the first level of control in order to navigate through the environment. Third, improved action exploration methods are investigated. An explicit action valued function is constructed using the learned model. Using this action-valued function we can compute actions that increase the value of future states. Fourth, a new algorithm is designed to progressively learn and integrate new skills producing a robust and multi-skilled physics-based controller. This algorithm combines the skills of experts and then applies transfer learning methods to initialize and accelerate the learning of new skills. In the last chapter, the importance of good benchmarks for improving reinforcement learning research is discussed. The computer vision community has benefited from large carefully processed collections of data, and, similarly, reinforcement learning needs well constructed and interesting environments to drive progress.

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International