- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Scalable deep reinforcement learning for physics-based...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Scalable deep reinforcement learning for physics-based motion control Berseth, Glen
Abstract
This thesis studies the broad problem of learning robust control policies for difficult physics-based motion control tasks such as locomotion and navigation. A number of avenues are explored to assist in learning such control. In particular, are there underlying structures in the motor-learning system that enable learning solutions to complex tasks? How are animals able to learn new skills so efficiently? Animals may be learning and using implicit models of their environment to assist in planning and exploration. These potential structures motivate the design of learning systems and in this thesis, we study their effectiveness on physically simulated and robotic motor-control tasks. Five contributions that build on motion control using deep reinforcement learning are presented. First, a case study on the motion control problem of brachiation, the movement of gibbons through trees is presented. This work compares parametric and non-parametric models for reinforcement learning. The difficulty of this motion control problem motivates separating the control problem into multiple levels. Second, a hierarchical decomposition is presented that enables efficient learning by operating across multiple time scales for a complex locomotion and navigation task. First, reinforcement learning is used to acquire a low-level, high-frequency policy for joint actuation, used for bipedal footstep-directed walking. Subsequently, an additional policy is learned that provides directed footstep plans to the first level of control in order to navigate through the environment. Third, improved action exploration methods are investigated. An explicit action valued function is constructed using the learned model. Using this action-valued function we can compute actions that increase the value of future states. Fourth, a new algorithm is designed to progressively learn and integrate new skills producing a robust and multi-skilled physics-based controller. This algorithm combines the skills of experts and then applies transfer learning methods to initialize and accelerate the learning of new skills. In the last chapter, the importance of good benchmarks for improving reinforcement learning research is discussed. The computer vision community has benefited from large carefully processed collections of data, and, similarly, reinforcement learning needs well constructed and interesting environments to drive progress.
Item Metadata
Title |
Scalable deep reinforcement learning for physics-based motion control
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2019
|
Description |
This thesis studies the broad problem of learning robust control policies for difficult physics-based motion control tasks such as locomotion and navigation. A number of avenues are explored to assist in learning such control. In particular, are there underlying structures in the motor-learning system that enable learning solutions to complex tasks? How are animals able to learn new skills so efficiently? Animals may be learning and using implicit models of their environment to assist in planning and exploration. These potential structures motivate the design of learning systems and in this thesis, we study their effectiveness on physically simulated and robotic motor-control tasks. Five contributions that build on motion control using deep reinforcement learning are presented.
First, a case study on the motion control problem of brachiation, the movement of gibbons through trees is presented. This work compares parametric and non-parametric models for reinforcement learning. The difficulty of this motion control problem motivates separating the control problem into multiple levels.
Second, a hierarchical decomposition is presented that enables efficient learning by operating across multiple time scales for a complex locomotion and navigation task. First, reinforcement learning is used to acquire a low-level, high-frequency policy for joint actuation, used for bipedal footstep-directed walking. Subsequently, an additional policy is learned that provides directed footstep plans to the first level of control in order to navigate through the environment.
Third, improved action exploration methods are investigated. An explicit action valued function is constructed using the learned model. Using this action-valued function we can compute actions that increase the value of future states.
Fourth, a new algorithm is designed to progressively learn and integrate new skills producing a robust and multi-skilled physics-based controller. This algorithm combines the skills of experts and then applies transfer learning methods to initialize and accelerate the learning of new skills.
In the last chapter, the importance of good benchmarks for improving reinforcement learning research is discussed. The computer vision community has benefited from large carefully processed collections of data, and, similarly, reinforcement learning needs well constructed and interesting environments to drive progress.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2019-04-10
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0378079
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2019-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Loading media...
Item Citations and Data
Permanent URL (DOI):
Copied to clipboard.Rights
Attribution-NonCommercial-NoDerivatives 4.0 International