UBC Theses and Dissertations
Reinforcement learning using sensorimotor traces Li, Jingxian
The skilled motions of humans and animals are the result of learning good solutions to difficult sensorimotor control problems. This thesis explores new models for using reinforcement learning to acquire motion skills, with potential applications to computer animation and robotics. Reinforcement learning offers a principled methodology for tackling control problems. However, it is difficult to apply in high-dimensional settings, such as the ones that we wish to explore, where the body can have many degrees of freedom, the environment can have significant complexity, and there can be further redundancies that exist in the sensory representations that are available to perceive the state of the body and the environment. In this context, challenges to overcome include: a state space that cannot be fully explored; the need to model how the state of the body and the perceived state of the environment evolve together over time; and solutions that can work with only a small number of sensorimotor experiences. Our contribution is a reinforcement learning method that implicitly represents the current state of the body and the environment using sensorimotor traces. A distance metric is defined between the ongoing sensorimotor trace and previously experienced sensorimotor traces and this is used to model the current state as a weighted mixture of past experiences. Sensorimotor traces play multiple roles in our method: they provide an embodied representation of the state (and therefore also the value function and the optimal actions), and they provide an embodied model of the system dynamics. In our implementation, we focus specifically on learning steering behaviors for a vehicle driving along straight roads, winding roads, and through intersections. The vehicle is equipped with a set of distance sensors. We apply value-iteration using off-policy experiences in order to produce control policies capable of steering the vehicle in a wide range of circumstances. An experimental analysis is provided of the effect of various design choices. In the future we expect that similar ideas can be applied to other high-dimensional systems, such as bipedal systems that are capable of walking over variable terrain, also driven by control policies based on sensorimotor traces.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International