Prior knowledge-driven model pre-training and fine-tuning for ophthalmic image analysis

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Prior knowledge-driven model pre-training and fine-tuning for ophthalmic image analysis Huang, Yijin

Abstract

Deep learning has become a cornerstone in ophthalmic image analysis, enabling automated disease detection, grading, and segmentation across modalities such as fundus photography, OCT angiography (OCTA), and fundus fluorescein angiography (FFA). However, its deployment faces major challenges: limited annotated datasets, the need for high-resolution inputs to capture subtle lesions, and the inefficiency of conventional fine-tuning strategies for large pre-trained models (LPMs). This thesis addresses these challenges by developing and evaluating novel pre-training and fine-tuning frameworks tailored for ophthalmic imaging. We first establish a strong and reproducible baseline for ophthalmic classification and segmentation using Vision Transformers (ViTs). A systematic grid search identifies a robust training pipeline that generalizes across diverse datasets, highlighting the sensitivity of performance to choices such as input resolution, sampling, and optimization strategies. Building upon this baseline, we introduce SSiT (Saliency-guided Self-supervised Image Transformer), a prior knowledge-driven pre-training framework that integrates saliency maps into contrastive learning. By guiding models to focus on diagnostically relevant retinal regions, SSiT learns transferable and clinically meaningful representations. Experiments on multiple fundus, OCTA, and FFA datasets demonstrate that while conventional self-supervised methods underperform compared to ImageNet pre-training, SSiT consistently surpasses it, confirming the value of embedding clinical priors into self-supervised learning. To address the prohibitive memory cost of fine-tuning LPMs on high-resolution medical images, we propose FPT+ (Fine-grained Prompt Tuning Plus), a parameter- and memory-efficient fine-tuning method. FPT+ combines asymmetric input design, fine-grained prompts with fusion modules, important token selection, and feature preloading to achieve efficient adaptation. Across fundus and OCTA datasets, FPT+ outperforms state-of-the-art parameter-efficient tuning methods, reducing Graphics Processing Unit (GPU) memory consumption by over 95% while maintaining or improving classification accuracy. The contributions of this thesis are threefold: (1) establishing a standardized baseline pipeline for ophthalmic image analysis with ViTs; (2) introducing SSiT, which demonstrates the importance of saliency-driven pre-training for learning clinically relevant representations; and (3) proposing FPT+, a memory- and parameter-efficient fine-tuning strategy that enables high-resolution transfer learning on resource-limited hardware. Together, these advances provide a unified framework for efficient and generalizable ophthalmic AI, with potential applications in clinical decision support and large-scale screening.

Item Metadata

Title	Prior knowledge-driven model pre-training and fine-tuning for ophthalmic image analysis
Creator	Huang, Yijin
Supervisor	Tam, Roger
Publisher	University of British Columbia
Date Issued	2025
Description	Deep learning has become a cornerstone in ophthalmic image analysis, enabling automated disease detection, grading, and segmentation across modalities such as fundus photography, OCT angiography (OCTA), and fundus fluorescein angiography (FFA). However, its deployment faces major challenges: limited annotated datasets, the need for high-resolution inputs to capture subtle lesions, and the inefficiency of conventional fine-tuning strategies for large pre-trained models (LPMs). This thesis addresses these challenges by developing and evaluating novel pre-training and fine-tuning frameworks tailored for ophthalmic imaging. We first establish a strong and reproducible baseline for ophthalmic classification and segmentation using Vision Transformers (ViTs). A systematic grid search identifies a robust training pipeline that generalizes across diverse datasets, highlighting the sensitivity of performance to choices such as input resolution, sampling, and optimization strategies. Building upon this baseline, we introduce SSiT (Saliency-guided Self-supervised Image Transformer), a prior knowledge-driven pre-training framework that integrates saliency maps into contrastive learning. By guiding models to focus on diagnostically relevant retinal regions, SSiT learns transferable and clinically meaningful representations. Experiments on multiple fundus, OCTA, and FFA datasets demonstrate that while conventional self-supervised methods underperform compared to ImageNet pre-training, SSiT consistently surpasses it, confirming the value of embedding clinical priors into self-supervised learning. To address the prohibitive memory cost of fine-tuning LPMs on high-resolution medical images, we propose FPT+ (Fine-grained Prompt Tuning Plus), a parameter- and memory-efficient fine-tuning method. FPT+ combines asymmetric input design, fine-grained prompts with fusion modules, important token selection, and feature preloading to achieve efficient adaptation. Across fundus and OCTA datasets, FPT+ outperforms state-of-the-art parameter-efficient tuning methods, reducing Graphics Processing Unit (GPU) memory consumption by over 95% while maintaining or improving classification accuracy. The contributions of this thesis are threefold: (1) establishing a standardized baseline pipeline for ophthalmic image analysis with ViTs; (2) introducing SSiT, which demonstrates the importance of saliency-driven pre-training for learning clinically relevant representations; and (3) proposing FPT+, a memory- and parameter-efficient fine-tuning strategy that enables high-resolution transfer learning on resource-limited hardware. Together, these advances provide a unified framework for efficient and generalizable ophthalmic AI, with potential applications in clinical decision support and large-scale screening.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2025-12-16
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0451027
URI	http://hdl.handle.net/2429/93192
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Biomedical Engineering
Affiliation	Applied Science, Faculty of; Biomedical Engineering, School of
Degree Grantor	University of British Columbia
Graduation Date	2026-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Prior knowledge-driven model pre-training and fine-tuning for ophthalmic image analysis Huang, Yijin

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights