UBC Theses and Dissertations
Developing locomotion skills with deep reinforcement learning Peng, Xue Bin
While physics-based models for passive phenomena such as cloth and fluids have been widely adopted in computer animation, physics-based character simulation remains a challenging problem. One of the major hurdles for character simulation is that of control, the modeling of a character's behaviour in response to its goals and environment. This challenge is further compounded by the high-dimensional and complex dynamics that often arise from these systems. A popular approach to mitigating these challenges is to build reduced models that capture important properties for a particular task. These models often leverage significant human insight, and may nonetheless overlook important information. In this thesis, we explore the application of deep reinforcement learning (DeepRL) to develop control policies that operate directly using high-dimensional low-level representations, thereby reducing the need for manual feature engineering and enabling characters to perform more challenging tasks in complex environments. We start by presenting a DeepRL framework for developing policies that allow character to agilely traverse across irregular terrain. The policies are represented using a mixture of experts model, which selects from a small collection of parameterized controllers. Our method is demonstrated on planar characters of varying morphologies and different classes of terrain. Through the learning process, the networks develop the appropriate strategies for traveling across various irregular environments without requiring extensive feature engineering. Next, we explore the effects of different action parameterizations on the performance of RL policies. We compare policies trained using low-level actions, such as torques, target velocities, target angles, and muscle activations. Performance is evaluated using a motion imitation benchmark. For our particular task, the choice of higher-level actions that incorporate local feedback, such as target angles, leads to significant improvements in performance and learning speed. Finally, we describe a hierarchical reinforcement learning framework for controlling the motion of a simulated 3D biped. By training each level of the hierarchy to operate at different spatial and temporal scales, the character is able to perform a variety of locomotion tasks that require a balance between short-term and long-term planning. Some of the tasks include soccer dribbling, path following, and navigation across dynamic obstacles.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International