- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Prior knowledge-driven model pre-training and fine-tuning...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Prior knowledge-driven model pre-training and fine-tuning for ophthalmic image analysis Huang, Yijin
Abstract
Deep learning has become a cornerstone in ophthalmic image analysis, enabling automated disease detection, grading, and segmentation across modalities such as fundus photography, OCT angiography (OCTA), and fundus fluorescein angiography (FFA). However, its deployment faces major challenges: limited annotated datasets, the need for high-resolution inputs to capture subtle lesions, and the inefficiency of conventional fine-tuning strategies for large pre-trained models (LPMs). This thesis addresses these challenges by developing and evaluating novel pre-training and fine-tuning frameworks tailored for ophthalmic imaging.
We first establish a strong and reproducible baseline for ophthalmic classification and segmentation using Vision Transformers (ViTs). A systematic grid search identifies a robust training pipeline that generalizes across diverse datasets, highlighting the sensitivity of performance to choices such as input resolution, sampling, and optimization strategies.
Building upon this baseline, we introduce SSiT (Saliency-guided Self-supervised Image Transformer), a prior knowledge-driven pre-training framework that integrates saliency maps into contrastive learning. By guiding models to focus on diagnostically relevant retinal regions, SSiT learns transferable and clinically meaningful representations. Experiments on multiple fundus, OCTA, and FFA datasets demonstrate that while conventional self-supervised methods underperform compared to ImageNet pre-training, SSiT consistently surpasses it, confirming the value of embedding clinical priors into self-supervised learning.
To address the prohibitive memory cost of fine-tuning LPMs on high-resolution medical images, we propose FPT+ (Fine-grained Prompt Tuning Plus), a parameter- and memory-efficient fine-tuning method. FPT+ combines asymmetric input design, fine-grained prompts with fusion modules, important token selection, and feature preloading to achieve efficient adaptation. Across fundus and OCTA datasets, FPT+ outperforms state-of-the-art parameter-efficient tuning methods, reducing Graphics Processing Unit (GPU) memory consumption by over 95% while maintaining or improving classification accuracy.
The contributions of this thesis are threefold: (1) establishing a standardized baseline pipeline for ophthalmic image analysis with ViTs; (2) introducing SSiT, which demonstrates the importance of saliency-driven pre-training for learning clinically relevant representations; and (3) proposing FPT+, a memory- and parameter-efficient fine-tuning strategy that enables high-resolution transfer learning on resource-limited hardware. Together, these advances provide a unified framework for efficient and generalizable ophthalmic AI, with potential applications in clinical decision support and large-scale screening.
Item Metadata
| Title |
Prior knowledge-driven model pre-training and fine-tuning for ophthalmic image analysis
|
| Creator | |
| Supervisor | |
| Publisher |
University of British Columbia
|
| Date Issued |
2025
|
| Description |
Deep learning has become a cornerstone in ophthalmic image analysis, enabling automated disease detection, grading, and segmentation across modalities such as fundus photography, OCT angiography (OCTA), and fundus fluorescein angiography (FFA). However, its deployment faces major challenges: limited annotated datasets, the need for high-resolution inputs to capture subtle lesions, and the inefficiency of conventional fine-tuning strategies for large pre-trained models (LPMs). This thesis addresses these challenges by developing and evaluating novel pre-training and fine-tuning frameworks tailored for ophthalmic imaging.
We first establish a strong and reproducible baseline for ophthalmic classification and segmentation using Vision Transformers (ViTs). A systematic grid search identifies a robust training pipeline that generalizes across diverse datasets, highlighting the sensitivity of performance to choices such as input resolution, sampling, and optimization strategies.
Building upon this baseline, we introduce SSiT (Saliency-guided Self-supervised Image Transformer), a prior knowledge-driven pre-training framework that integrates saliency maps into contrastive learning. By guiding models to focus on diagnostically relevant retinal regions, SSiT learns transferable and clinically meaningful representations. Experiments on multiple fundus, OCTA, and FFA datasets demonstrate that while conventional self-supervised methods underperform compared to ImageNet pre-training, SSiT consistently surpasses it, confirming the value of embedding clinical priors into self-supervised learning.
To address the prohibitive memory cost of fine-tuning LPMs on high-resolution medical images, we propose FPT+ (Fine-grained Prompt Tuning Plus), a parameter- and memory-efficient fine-tuning method. FPT+ combines asymmetric input design, fine-grained prompts with fusion modules, important token selection, and feature preloading to achieve efficient adaptation. Across fundus and OCTA datasets, FPT+ outperforms state-of-the-art parameter-efficient tuning methods, reducing Graphics Processing Unit (GPU) memory consumption by over 95% while maintaining or improving classification accuracy.
The contributions of this thesis are threefold: (1) establishing a standardized baseline pipeline for ophthalmic image analysis with ViTs; (2) introducing SSiT, which demonstrates the importance of saliency-driven pre-training for learning clinically relevant representations; and (3) proposing FPT+, a memory- and parameter-efficient fine-tuning strategy that enables high-resolution transfer learning on resource-limited hardware. Together, these advances provide a unified framework for efficient and generalizable ophthalmic AI, with potential applications in clinical decision support and large-scale screening.
|
| Genre | |
| Type | |
| Language |
eng
|
| Date Available |
2025-12-16
|
| Provider |
Vancouver : University of British Columbia Library
|
| Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
| DOI |
10.14288/1.0451027
|
| URI | |
| Degree (Theses) | |
| Program (Theses) | |
| Affiliation | |
| Degree Grantor |
University of British Columbia
|
| Graduation Date |
2026-05
|
| Campus | |
| Scholarly Level |
Graduate
|
| Rights URI | |
| Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International