Enhanced Lung Cancer Survival Prediction Using Semi-Supervised Pseudo-Labeling and Learning from Diverse PET/CT Datasets

UBC Faculty Research and Publications

Enhanced Lung Cancer Survival Prediction Using Semi-Supervised Pseudo-Labeling and Learning from Diverse PET/CT Datasets Salmanpour, Mohammad R.; Gorji, Arman; Mousavi, Amin; Fathi Jouzdani, Ali; Sanati, Nima; Maghsudi, Mehdi; Leung, Bonnie; Ho, Cheryl; Yuan, Ren; Rahmim, Arman

Abstract

Objective: This study explores a semi-supervised learning (SSL), pseudo-labeled strategy using diverse datasets such as head and neck cancer (HNCa) to enhance lung cancer (LCa) survival outcome predictions, analyzing handcrafted and deep radiomic features (HRF/DRF) from PET/CT scans with hybrid machine learning systems (HMLSs). Methods: We collected 199 LCa patients with both PET and CT images, obtained from TCIA and our local database, alongside 408 HNCa PET/CT images from TCIA. We extracted 215 HRFs and 1024 DRFs by PySERA and a 3D autoencoder, respectively, within the ViSERA 1.0.0 software, from segmented primary tumors. The supervised strategy (SL) employed an HMLS–PCA connected with six classifiers on both HRFs and DRFs. The SSL strategy expanded the datasets by adding 408 pseudo-labeled HNCa cases (labeled by the Random Forest algorithm) to 199 LCa cases, using the same HMLS techniques. Furthermore, principal component analysis (PCA) linked with four survival prediction algorithms were utilized in the survival hazard ratio analysis. Results: The SSL strategy outperformed the SL method (p << 0.001), achieving an average accuracy of 0.85 ± 0.05 with DRFs from PET and PCA + Multi-Layer Perceptron (MLP), compared to 0.69 ± 0.06 for the SL strategy using DRFs from CT and PCA + Light Gradient Boosting (LGB). Additionally, PCA linked with Component-wise Gradient Boosting Survival Analysis on both HRFs and DRFs, as extracted from CT, had an average C-index of 0.80, with a log rank p-value << 0.001, confirmed by external testing. Conclusions: Shifting from HRFs and SL to DRFs and SSL strategies, particularly in contexts with limited data points, enabling CT or PET alone, can significantly achieve high predictive performance.

Item Metadata

Title	Enhanced Lung Cancer Survival Prediction Using Semi-Supervised Pseudo-Labeling and Learning from Diverse PET/CT Datasets
Creator	Salmanpour, Mohammad R.; Gorji, Arman; Mousavi, Amin; Fathi Jouzdani, Ali; Sanati, Nima; Maghsudi, Mehdi; Leung, Bonnie; Ho, Cheryl; Yuan, Ren; Rahmim, Arman
Contributor	BC Cancer Research Centre; BC Cancer Agency
Publisher	Multidisciplinary Digital Publishing Institute
Date Issued	2025-01-17
Description	Objective: This study explores a semi-supervised learning (SSL), pseudo-labeled strategy using diverse datasets such as head and neck cancer (HNCa) to enhance lung cancer (LCa) survival outcome predictions, analyzing handcrafted and deep radiomic features (HRF/DRF) from PET/CT scans with hybrid machine learning systems (HMLSs). Methods: We collected 199 LCa patients with both PET and CT images, obtained from TCIA and our local database, alongside 408 HNCa PET/CT images from TCIA. We extracted 215 HRFs and 1024 DRFs by PySERA and a 3D autoencoder, respectively, within the ViSERA 1.0.0 software, from segmented primary tumors. The supervised strategy (SL) employed an HMLS–PCA connected with six classifiers on both HRFs and DRFs. The SSL strategy expanded the datasets by adding 408 pseudo-labeled HNCa cases (labeled by the Random Forest algorithm) to 199 LCa cases, using the same HMLS techniques. Furthermore, principal component analysis (PCA) linked with four survival prediction algorithms were utilized in the survival hazard ratio analysis. Results: The SSL strategy outperformed the SL method (p << 0.001), achieving an average accuracy of 0.85 ± 0.05 with DRFs from PET and PCA + Multi-Layer Perceptron (MLP), compared to 0.69 ± 0.06 for the SL strategy using DRFs from CT and PCA + Light Gradient Boosting (LGB). Additionally, PCA linked with Component-wise Gradient Boosting Survival Analysis on both HRFs and DRFs, as extracted from CT, had an average C-index of 0.80, with a log rank p-value << 0.001, confirmed by external testing. Conclusions: Shifting from HRFs and SL to DRFs and SSL strategies, particularly in contexts with limited data points, enabling CT or PET alone, can significantly achieve high predictive performance.
Subject	lung cancer; deep and handcrafted radiomic features; machine learning; survival prediction; supervised and semi-supervised strategy
Genre	Article
Type	Text
Language	eng
Date Available	2025-02-14
Provider	Vancouver : University of British Columbia Library
Rights	CC BY 4.0
DOI	10.14288/1.0448076
URI	http://hdl.handle.net/2429/90373
Affiliation	Medicine, Faculty of; Science, Faculty of; Other UBC; Physics and Astronomy, Department of; Radiology, Department of
Citation	Cancers 17 (2): 285 (2025)
Publisher DOI	10.3390/cancers17020285
Peer Review Status	Reviewed
Scholarly Level	Faculty; Researcher; Other
Rights URI	https://creativecommons.org/licenses/by/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Faculty Research and Publications