Open Collections will undergo scheduled maintenance on Monday February 2nd between 11:00 AM and 1:00 PM PST.

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Bridging control and reinforcement learning with partial model knowledge Wang, Shuyuan

Abstract

This thesis develops a series of complementary approaches that bridge control theory and reinforcement learning through systematic exploitation of partial model knowledge. Control theory leverages known system structure to deliver precise solutions but struggles with unknown dynamics, whereas reinforcement learning is flexible yet typically suffers from poor sample efficiency. The proposed methods integrate the strengths of both paradigms by combining model-based control where knowledge is available with learning-based adaptation for unknown components. We first consider linear systems and introduce Partial Knowledge Least Squares Policy Iteration (PLSPI), which decomposes system dynamics into known and unknown components (A = A1 + A2, B = B1 + B2). This formulation enables a principled interpolation between optimal control and reinforcement learning, improving sample efficiency while retaining robustness to modeling errors. We then provide a theoretical analysis explaining when and why PLSPI achieves superior convergence compared to standard LSPI. Through spectral analysis of the value function estimator, we show that the estimator norm in PLSPI can be smaller under certain conditions, leading to reduced variance and improved convergence behavior. Experiments across different partial knowledge configurations further illustrate how the design of known model components influences learning performance. We extend the partial knowledge paradigm to nonlinear systems through a hybrid architecture that combines structured control modules with neural network policies. Different from the PLSPI structure, this framework explicitly separates the roles of known and unknown dynamics within the policy, enabling effective nonlinear control with improved sample efficiency compared to black-box deep reinforcement learning. Finally, we develop DiLQR, a framework that makes the iterative Linear Quadratic Regulator (iLQR), a numerical nonlinear controller, fully differentiable via implicit differentiation. The proposed method computes exact gradients while accounting for all parameter dependencies, and introduces a forward algorithm with O(T) complexity, yielding substantial gains in computational efficiency and learning performance. Overall, this thesis presents principled methods for leveraging structural knowledge without sacrificing adaptability, with applications in robotics, autonomous systems, and industrial process control where sample efficiency and reliability are critical.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International