UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Prior knowledge-driven model pre-training and fine-tuning for ophthalmic image analysis Huang, Yijin

Abstract

Deep learning has become a cornerstone in ophthalmic image analysis, enabling automated disease detection, grading, and segmentation across modalities such as fundus photography, OCT angiography (OCTA), and fundus fluorescein angiography (FFA). However, its deployment faces major challenges: limited annotated datasets, the need for high-resolution inputs to capture subtle lesions, and the inefficiency of conventional fine-tuning strategies for large pre-trained models (LPMs). This thesis addresses these challenges by developing and evaluating novel pre-training and fine-tuning frameworks tailored for ophthalmic imaging. We first establish a strong and reproducible baseline for ophthalmic classification and segmentation using Vision Transformers (ViTs). A systematic grid search identifies a robust training pipeline that generalizes across diverse datasets, highlighting the sensitivity of performance to choices such as input resolution, sampling, and optimization strategies. Building upon this baseline, we introduce SSiT (Saliency-guided Self-supervised Image Transformer), a prior knowledge-driven pre-training framework that integrates saliency maps into contrastive learning. By guiding models to focus on diagnostically relevant retinal regions, SSiT learns transferable and clinically meaningful representations. Experiments on multiple fundus, OCTA, and FFA datasets demonstrate that while conventional self-supervised methods underperform compared to ImageNet pre-training, SSiT consistently surpasses it, confirming the value of embedding clinical priors into self-supervised learning. To address the prohibitive memory cost of fine-tuning LPMs on high-resolution medical images, we propose FPT+ (Fine-grained Prompt Tuning Plus), a parameter- and memory-efficient fine-tuning method. FPT+ combines asymmetric input design, fine-grained prompts with fusion modules, important token selection, and feature preloading to achieve efficient adaptation. Across fundus and OCTA datasets, FPT+ outperforms state-of-the-art parameter-efficient tuning methods, reducing Graphics Processing Unit (GPU) memory consumption by over 95% while maintaining or improving classification accuracy. The contributions of this thesis are threefold: (1) establishing a standardized baseline pipeline for ophthalmic image analysis with ViTs; (2) introducing SSiT, which demonstrates the importance of saliency-driven pre-training for learning clinically relevant representations; and (3) proposing FPT+, a memory- and parameter-efficient fine-tuning strategy that enables high-resolution transfer learning on resource-limited hardware. Together, these advances provide a unified framework for efficient and generalizable ophthalmic AI, with potential applications in clinical decision support and large-scale screening.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International