Learning dynamics of deep learning -- force analysis of deep neural networks

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Learning dynamics of deep learning -- force analysis of deep neural networks Ren, Yi (Joshua)

Abstract

This thesis investigates the learning dynamics of deep learning systems through a local, physics-inspired analytical lens. Motivated by the need for fine-grained insights into model behavior, we begin with the step-wise influence that a single training example exerts on a specific observing example during learning. Central to our approach is the proposed AKG decomposition, which dissects this influence into three interpretable components: similarity (K), normalization (A), and prediction gap (G). This decomposition enables an analogy with classical force analysis: the force originates from G, is shaped by AK, and is ultimately applied to the target object, e.g., to the model confidence, output, hidden representations, or parameters. Building upon this foundation, we gradually scale the analysis from individual interactions to cumulative effects over time, akin to tracking an object’s motion under multiple forces. We apply it to the following problems. Supervised classification: We study the learning trajectories of examples with varying difficulty and reveal an interesting "zig-zag" pattern that emerges during optimization. Our analysis explains this behavior and inspires a novel knowledge distillation method, Filter-KD, which improves supervision signals for student models. Large language model (LLM) finetuning: We extend the framework to account for the autoregressive nature of LLMs and the presence of negative gradients. The unified perspective explains behaviors across finetuning methods such as SFT, DPO, and GRPO. We also highlight the critical role of negative gradients. In particular, we identify the "squeezing effect": a counterintuitive phenomenon caused by improperly applied gradient ascent. Representation learning: We explore the dynamics of hidden features, revealing how adaptation energy and directions influence the feature drift. Our analysis leads to a provable pattern of feature adaptation in a head-probing then finetuning pipeline, offering insights and inspiring several practical strategies. Simplicity bias and compositional learning: Revisiting foundational questions about why structured representations are learned faster, we apply our framework to a compositional learning setting. Our findings align with principles such as Occam’s Razor and the idea of "compression for AGI," offering a novel dynamical explanation rooted in compression and learning speed.

Item Metadata

Title	Learning dynamics of deep learning -- force analysis of deep neural networks
Creator	Ren, Yi (Joshua)
Supervisor	Sutherland, Danica J.
Publisher	University of British Columbia
Date Issued	2025
Description	This thesis investigates the learning dynamics of deep learning systems through a local, physics-inspired analytical lens. Motivated by the need for fine-grained insights into model behavior, we begin with the step-wise influence that a single training example exerts on a specific observing example during learning. Central to our approach is the proposed AKG decomposition, which dissects this influence into three interpretable components: similarity (K), normalization (A), and prediction gap (G). This decomposition enables an analogy with classical force analysis: the force originates from G, is shaped by AK, and is ultimately applied to the target object, e.g., to the model confidence, output, hidden representations, or parameters. Building upon this foundation, we gradually scale the analysis from individual interactions to cumulative effects over time, akin to tracking an object’s motion under multiple forces. We apply it to the following problems. Supervised classification: We study the learning trajectories of examples with varying difficulty and reveal an interesting "zig-zag" pattern that emerges during optimization. Our analysis explains this behavior and inspires a novel knowledge distillation method, Filter-KD, which improves supervision signals for student models. Large language model (LLM) finetuning: We extend the framework to account for the autoregressive nature of LLMs and the presence of negative gradients. The unified perspective explains behaviors across finetuning methods such as SFT, DPO, and GRPO. We also highlight the critical role of negative gradients. In particular, we identify the "squeezing effect": a counterintuitive phenomenon caused by improperly applied gradient ascent. Representation learning: We explore the dynamics of hidden features, revealing how adaptation energy and directions influence the feature drift. Our analysis leads to a provable pattern of feature adaptation in a head-probing then finetuning pipeline, offering insights and inspiring several practical strategies. Simplicity bias and compositional learning: Revisiting foundational questions about why structured representations are learned faster, we apply our framework to a compositional learning setting. Our findings align with principles such as Occam’s Razor and the idea of "compression for AGI," offering a novel dynamical explanation rooted in compression and learning speed.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2025-09-24
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0450240
URI	http://hdl.handle.net/2429/92432
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2025-11
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Learning dynamics of deep learning -- force analysis of deep neural networks Ren, Yi (Joshua)

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights