UBC Undergraduate Research

Applying Network Resets in Model-Based Reinforcement Learning with Deep Neural Networks in Constrained Data Settings Akins, Seth Lincoln

Abstract

Artificial Intelligence and reinforcement learning are only becoming more prevalent in our society. Specifically, applications and research with deep neural networks have drastically increased. Often, these methods require large amounts of data to train to the desired level of effectiveness. Collecting this data can sometimes be highly expensive, impractical, or outright impossible. As such, advancing algorithms which can achieve high levels of performance on smaller amounts of data is essential; however, algorithms which achieve this often lose the ability to generalize well to unseen data, or eventually lose their ability to learn from new data. Maintaining network plasticity while training on small data samples is a difficult problem. We propose the ShrinkZero algorithm, an adaption of the EfficientZero algorithm. EfficientZero achieved state-of-the-art performance in Atari while consuming only 100k frames of gameplay. ShrinkZero added frequent network resets, increased the number of layers in the network, and modified the rollout lengths based on the time since the last reset. These features were shown to work well in Bigger Better Faster (BBF), a model-free algorithm which also achieved state-of-the-art performance in Atari with under 100k gameplay frames. ShrinkZero consistently underperforms EfficientZero and fails to leverage the resets which worked well in BBF. It only achieves higher performance than both algorithms in one Atari game and is frequently outperformed by humans by a large margin. Further work is needed to try to leverage these resets in a model-based setting.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International