- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Exploring algorithmic reasoning and memorization in...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Exploring algorithmic reasoning and memorization in transformers : challenges and insights Mahdavi, Seyed Mohammad Sadegh
Abstract
In this thesis, we investigate the ability of neural networks, particularly Transformers, to reason and memorize. First, we focus on graph neural networks and Transformers, and analyze their performance on algorithmic reasoning tasks. We show that while models can achieve high accuracy on data from the same distribution as their training data, their performance drops significantly when faced with new, out-of-distribution data. We further show that even high performance on benchmark numbers may be misleading and true reasoning capability of these models remains limited. We identify several challenges involved in achieving true reasoning abilities and generalization to new data. We propose solutions to some of these challenges, including fixing input representation issues, hybrid models, and enlarging the training dataset. We also examine the expressivity of Transformers, providing a theoretical analysis of their ability to memorize data points. The results show a linear relationship between a Transformer's memory capacity and both the number of its attention heads as well as the input's context size.
Item Metadata
Title |
Exploring algorithmic reasoning and memorization in transformers : challenges and insights
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2023
|
Description |
In this thesis, we investigate the ability of neural networks, particularly Transformers, to reason and memorize. First, we focus on graph neural networks and Transformers, and analyze their performance on algorithmic reasoning tasks. We show that while models can achieve high accuracy on data from the same distribution as their training data, their performance drops significantly when faced with new, out-of-distribution data. We further show that even high performance on benchmark numbers may be misleading and true reasoning capability of these models remains limited. We identify several challenges involved in achieving true reasoning abilities and generalization to new data. We propose solutions to some of these challenges, including fixing input representation issues, hybrid models, and enlarging the training dataset. We also examine the expressivity of Transformers, providing a theoretical analysis of their ability to memorize data points. The results show a linear relationship between a Transformer's memory capacity and both the number of its attention heads as well as the input's context size.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2023-08-23
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0435551
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2023-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International