UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Neural networks through the lens of partial differential equations : from discrete layers to continuous dynamics Zakariaei, Niloufar

Abstract

This thesis studies deep neural networks through the lens of partial differential equations (PDEs), showing how continuous-time and operator-based perspectives can improve the stability, interpretability, and scalability of modern learning systems. Rather than treating neural networks as purely discrete stacks of layers, this work views them as discretizations of underlying dynamical systems, enabling architectural and algorithmic design guided by physical principles. We first develop the Reaction–Diffusion Neural Network (RDNN), an IMEX architecture in which each layer combines an implicit diffusion step with an explicit reaction update. This construction yields unconditional stability, helping mitigate exploding and vanishing gradients, while also improving robustness to noise and domain shifts. It further provides an interpretable framework in which learned parameters correspond to physically meaningful quantities such as diffusion and reaction coefficients. We then extend this idea by introducing the Advection–Diffusion–Reaction Neural Network (ADRNN), which incorporates a learnable transport mechanism to capture directional and long-range interactions. By combining advection, diffusion, and reaction within each layer, the resulting architecture unifies transport, smoothing, and nonlinear feature transformation in a single framework. Empirical results show that this approach improves stability and produces sharper long-horizon forecasts in spatio-temporal prediction tasks. Finally, we address the challenge of efficient training at high resolution through a multiscale optimization framework inspired by multigrid methods. To overcome the unstable gradient behavior of standard convolutions under mesh refinement, we introduce Mesh-Free Convolutions (MFCs), a class of operators derived from differential operator theory whose behavior remains stable across discretizations. Coupled with a coarse-to-fine training strategy, this framework substantially reduces computational cost while maintaining or improving predictive accuracy. Overall, this thesis demonstrates that PDE-based structure can serve not only as a tool for interpreting neural networks, but also as a principled foundation for designing models and training methods that are more stable, physically grounded, and computationally efficient.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International