UBC Faculty Research and Publications

Small or Large? Zero-Shot or Finetuned? : Guiding Language Model Choice for Specialized Applications in Healthcare Gondara, Lovedeep; Simkin, Jonathan; Sayle, Graham; Devji, Shebnum; Arbour, Gregory; Ng, Raymond Tak-yan, 1963-

Abstract

Objectives: To guide language model (LM) selection by comparing finetuning vs. zero-shot use, generic pretraining vs. domain-adjacent vs. further domain-specific pretraining, and bidirectional language models (BiLMs) such as BERT vs. unidirectional LMs (LLMs) for clinical classification. Materials and Methods: We evaluated BiLMs (RoBERTa, PathologyBERT, Gatortron) and LLM (Mistral nemo instruct 12B) on three British Columbia Cancer Registry (BCCR) pathology classification tasks varying in difficulty/data size. We assessed zero-shot vs. finetuned BiLMs, zero-shot LLM, and further BCCR-specific pretraining using macro-average F1 scores. Results: Finetuned BiLMs outperformed zero-shot BiLMs and zero-shot LLM. The zero-shot LLM outperformed zero-shot BiLMs but was consistently outperformed by finetuned BiLMs. Domain-adjacent BiLMs generally outperformed generic BiLMs after finetuning. Further domain-specific pretraining boosted complex/low-data task performance, with otherwise modest gains. Conclusions: For specialized classification, finetuning BiLMs is crucial, often surpassing zero-shot LLMs. Domain-adjacent pretrained models are recommended. Further domain-specific pretraining provides significant performance boosts, especially for complex/low-data scenarios. BiLMs remain relevant, offering strong performance/resource balance for targeted clinical tasks.

Item Media