- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Learning dynamics of deep learning -- force analysis...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Learning dynamics of deep learning -- force analysis of deep neural networks Ren, Yi (Joshua)
Abstract
This thesis investigates the learning dynamics of deep learning systems through a local, physics-inspired analytical lens. Motivated by the need for fine-grained insights into model behavior, we begin with the step-wise influence that a single training example exerts on a specific observing example during learning. Central to our approach is the proposed AKG decomposition, which dissects this influence into three interpretable components: similarity (K), normalization (A), and prediction gap (G). This decomposition enables an analogy with classical force analysis: the force originates from G, is shaped by AK, and is ultimately applied to the target object, e.g., to the model confidence, output, hidden representations, or parameters. Building upon this foundation, we gradually scale the analysis from individual interactions to cumulative effects over time, akin to tracking an object’s motion under multiple forces. We apply it to the following problems. Supervised classification: We study the learning trajectories of examples with varying difficulty and reveal an interesting "zig-zag" pattern that emerges during optimization. Our analysis explains this behavior and inspires a novel knowledge distillation method, Filter-KD, which improves supervision signals for student models. Large language model (LLM) finetuning: We extend the framework to account for the autoregressive nature of LLMs and the presence of negative gradients. The unified perspective explains behaviors across finetuning methods such as SFT, DPO, and GRPO. We also highlight the critical role of negative gradients. In particular, we identify the "squeezing effect": a counterintuitive phenomenon caused by improperly applied gradient ascent. Representation learning: We explore the dynamics of hidden features, revealing how adaptation energy and directions influence the feature drift. Our analysis leads to a provable pattern of feature adaptation in a head-probing then finetuning pipeline, offering insights and inspiring several practical strategies. Simplicity bias and compositional learning: Revisiting foundational questions about why structured representations are learned faster, we apply our framework to a compositional learning setting. Our findings align with principles such as Occam’s Razor and the idea of "compression for AGI," offering a novel dynamical explanation rooted in compression and learning speed.
Item Metadata
Title |
Learning dynamics of deep learning -- force analysis of deep neural networks
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2025
|
Description |
This thesis investigates the learning dynamics of deep learning systems through a local, physics-inspired analytical lens. Motivated by the need for fine-grained insights into model behavior, we begin with the step-wise influence that a single training example exerts on a specific observing example during learning. Central to our approach is the proposed AKG decomposition, which dissects this influence into three interpretable components: similarity (K), normalization (A), and prediction gap (G). This decomposition enables an analogy with classical force analysis: the force originates from G, is shaped by AK, and is ultimately applied to the target object, e.g., to the model confidence, output, hidden representations, or parameters. Building upon this foundation, we gradually scale the analysis from individual interactions to cumulative effects over time, akin to tracking an object’s motion under multiple forces. We apply it to the following problems. Supervised classification: We study the learning trajectories of examples with varying difficulty and reveal an interesting "zig-zag" pattern that emerges during optimization. Our analysis explains this behavior and inspires a novel knowledge distillation method, Filter-KD, which improves supervision signals for student models. Large language model (LLM) finetuning: We extend the framework to account for the autoregressive nature of LLMs and the presence of negative gradients. The unified perspective explains behaviors across finetuning methods such as SFT, DPO, and GRPO. We also highlight the critical role of negative gradients. In particular, we identify the "squeezing effect": a counterintuitive phenomenon caused by improperly applied gradient ascent. Representation learning: We explore the dynamics of hidden features, revealing how adaptation energy and directions influence the feature drift. Our analysis leads to a provable pattern of feature adaptation in a head-probing then finetuning pipeline, offering insights and inspiring several practical strategies. Simplicity bias and compositional learning: Revisiting foundational questions about why structured representations are learned faster, we apply our framework to a compositional learning setting. Our findings align with principles such as Occam’s Razor and the idea of "compression for AGI," offering a novel dynamical explanation rooted in compression and learning speed.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2025-09-24
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0450240
|
URI | |
Degree (Theses) | |
Program (Theses) | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2025-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International