Exploring algorithmic reasoning and memorization in transformers : challenges and insights

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Exploring algorithmic reasoning and memorization in transformers : challenges and insights Mahdavi, Seyed Mohammad Sadegh

Abstract

In this thesis, we investigate the ability of neural networks, particularly Transformers, to reason and memorize. First, we focus on graph neural networks and Transformers, and analyze their performance on algorithmic reasoning tasks. We show that while models can achieve high accuracy on data from the same distribution as their training data, their performance drops significantly when faced with new, out-of-distribution data. We further show that even high performance on benchmark numbers may be misleading and true reasoning capability of these models remains limited. We identify several challenges involved in achieving true reasoning abilities and generalization to new data. We propose solutions to some of these challenges, including fixing input representation issues, hybrid models, and enlarging the training dataset. We also examine the expressivity of Transformers, providing a theoretical analysis of their ability to memorize data points. The results show a linear relationship between a Transformer's memory capacity and both the number of its attention heads as well as the input's context size.

Item Metadata

Title	Exploring algorithmic reasoning and memorization in transformers : challenges and insights
Creator	Mahdavi, Seyed Mohammad Sadegh
Supervisor	Liao, Renjie; Thrampoulidis, Christos
Publisher	University of British Columbia
Date Issued	2023
Description	In this thesis, we investigate the ability of neural networks, particularly Transformers, to reason and memorize. First, we focus on graph neural networks and Transformers, and analyze their performance on algorithmic reasoning tasks. We show that while models can achieve high accuracy on data from the same distribution as their training data, their performance drops significantly when faced with new, out-of-distribution data. We further show that even high performance on benchmark numbers may be misleading and true reasoning capability of these models remains limited. We identify several challenges involved in achieving true reasoning abilities and generalization to new data. We propose solutions to some of these challenges, including fixing input representation issues, hybrid models, and enlarging the training dataset. We also examine the expressivity of Transformers, providing a theoretical analysis of their ability to memorize data points. The results show a linear relationship between a Transformer's memory capacity and both the number of its attention heads as well as the input's context size.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2023-08-23
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0435551
URI	http://hdl.handle.net/2429/85578
Degree (Theses)	Master of Applied Science - MASc
Program (Theses)	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2023-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Exploring algorithmic reasoning and memorization in transformers : challenges and insights Mahdavi, Seyed Mohammad Sadegh

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights