- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Faculty Research and Publications /
- Small or Large? Zero-Shot or Finetuned? : Guiding...
Open Collections
UBC Faculty Research and Publications
Small or Large? Zero-Shot or Finetuned? : Guiding Language Model Choice for Specialized Applications in Healthcare Gondara, Lovedeep; Simkin, Jonathan; Sayle, Graham; Devji, Shebnum; Arbour, Gregory; Ng, Raymond Tak-yan, 1963-
Abstract
Objectives: To guide language model (LM) selection by comparing finetuning vs. zero-shot use, generic pretraining vs. domain-adjacent vs. further domain-specific pretraining, and bidirectional language models (BiLMs) such as BERT vs. unidirectional LMs (LLMs) for clinical classification. Materials and Methods: We evaluated BiLMs (RoBERTa, PathologyBERT, Gatortron) and LLM (Mistral nemo instruct 12B) on three British Columbia Cancer Registry (BCCR) pathology classification tasks varying in difficulty/data size. We assessed zero-shot vs. finetuned BiLMs, zero-shot LLM, and further BCCR-specific pretraining using macro-average F1 scores. Results: Finetuned BiLMs outperformed zero-shot BiLMs and zero-shot LLM. The zero-shot LLM outperformed zero-shot BiLMs but was consistently outperformed by finetuned BiLMs. Domain-adjacent BiLMs generally outperformed generic BiLMs after finetuning. Further domain-specific pretraining boosted complex/low-data task performance, with otherwise modest gains. Conclusions: For specialized classification, finetuning BiLMs is crucial, often surpassing zero-shot LLMs. Domain-adjacent pretrained models are recommended. Further domain-specific pretraining provides significant performance boosts, especially for complex/low-data scenarios. BiLMs remain relevant, offering strong performance/resource balance for targeted clinical tasks.
Item Metadata
| Title |
Small or Large? Zero-Shot or Finetuned? : Guiding Language Model Choice for Specialized Applications in Healthcare
|
| Creator | |
| Contributor | |
| Publisher |
Multidisciplinary Digital Publishing Institute
|
| Date Issued |
2025-10-17
|
| Description |
Objectives: To guide language model (LM) selection by comparing finetuning vs. zero-shot use, generic pretraining vs. domain-adjacent vs. further domain-specific pretraining, and bidirectional language models (BiLMs) such as BERT vs. unidirectional LMs (LLMs) for clinical classification. Materials and Methods: We evaluated BiLMs (RoBERTa, PathologyBERT, Gatortron) and LLM (Mistral nemo instruct 12B) on three British Columbia Cancer Registry (BCCR) pathology classification tasks varying in difficulty/data size. We assessed zero-shot vs. finetuned BiLMs, zero-shot LLM, and further BCCR-specific pretraining using macro-average F1 scores. Results: Finetuned BiLMs outperformed zero-shot BiLMs and zero-shot LLM. The zero-shot LLM outperformed zero-shot BiLMs but was consistently outperformed by finetuned BiLMs. Domain-adjacent BiLMs generally outperformed generic BiLMs after finetuning. Further domain-specific pretraining boosted complex/low-data task performance, with otherwise modest gains. Conclusions: For specialized classification, finetuning BiLMs is crucial, often surpassing zero-shot LLMs. Domain-adjacent pretrained models are recommended. Further domain-specific pretraining provides significant performance boosts, especially for complex/low-data scenarios. BiLMs remain relevant, offering strong performance/resource balance for targeted clinical tasks.
|
| Subject | |
| Genre | |
| Type | |
| Language |
eng
|
| Date Available |
2026-01-09
|
| Provider |
Vancouver : University of British Columbia Library
|
| Rights |
CC BY 4.0
|
| DOI |
10.14288/1.0451169
|
| URI | |
| Affiliation | |
| Citation |
Machine Learning and Knowledge Extraction 7 (4): 121 (2025)
|
| Publisher DOI |
10.3390/make7040121
|
| Peer Review Status |
Reviewed
|
| Scholarly Level |
Faculty; Researcher
|
| Rights URI | |
| Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
CC BY 4.0