UBC Theses and Dissertations
Reinforcement learning of a feedforward controller with soft actor-critic for a reaching task Srungarapu, Venkata Praneeth
Learning to control is a complicated process, yet humans seamlessly control various complex movements. Motor theory suggests that humans start motor learning by learning to act in a feedforward manner. However, it is still unclear how humans learn feedforward control strategies. We hypothesize that this mechanism is governed by the criterion of success (reinforcement) or failure (penalty) of the task. Taking this for inspiration, we investigate how we can learn a feedforward controller utilizing reinforcement learning. Additionally, we investigate how the factors such as the difficulty of the task and noise present in the motor system are related to human motor control. Hence, a one-dimensional muscle-based biomechanical model is built to create a reaching task setup. The model contains an actuator controlled by the antagonistic and agonistic muscle pair and a goal or target to reach. Then, an end-to-end reinforcement-learning-based feedforward controller is learned to estimate control signals while taking the difficulty levels of a reaching task and noise levels into account. To design the learning-based controller, we adapted the model-free RL algorithm ``Soft Actor-Critic". As a result, during training, we observed that the SAC-based feedforward controller has learned to prepare co-activation to reach a target in the kinematic space using a minimum number of controller predictions. Moreover, we found that the controller has learned to estimate high-amplitude muscle activations as a way to adapt to the noise levels in the motor system. Finally, we conducted information analysis similar to Fitts' analysis to determine how the difficulty of the task and noise affected the controller. The effect of the task's difficulty and the noise in the system is determined by finding the relationship between the number of controller predictions, task difficulty, and the amount of noise. Our analysis demonstrates that the number of controller predictions increases exponentially with the increase in the difficulty of the task with the amount of noise kept constant. A linear relationship exists between the number of controller predictions and the amount of noise with ID kept constant. Additionally, we found that the effect of target width is more dominant than the distance, which confirms Welford's observation.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International