UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Exploring algorithmic reasoning and memorization in transformers : challenges and insights Mahdavi, Seyed Mohammad Sadegh

Abstract

In this thesis, we investigate the ability of neural networks, particularly Transformers, to reason and memorize. First, we focus on graph neural networks and Transformers, and analyze their performance on algorithmic reasoning tasks. We show that while models can achieve high accuracy on data from the same distribution as their training data, their performance drops significantly when faced with new, out-of-distribution data. We further show that even high performance on benchmark numbers may be misleading and true reasoning capability of these models remains limited. We identify several challenges involved in achieving true reasoning abilities and generalization to new data. We propose solutions to some of these challenges, including fixing input representation issues, hybrid models, and enlarging the training dataset. We also examine the expressivity of Transformers, providing a theoretical analysis of their ability to memorize data points. The results show a linear relationship between a Transformer's memory capacity and both the number of its attention heads as well as the input's context size.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International