A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers

UBC Faculty Research and Publications

A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers Günther, Oliver P.; Chen, Virginia; Cohen Freue, Gabriela V.; Balshaw, Robert F.; Tebbutt, Scott J.; Hollander, Zsuzsanna; Takhar, Mandeep; McMaster, W. R.; McManus, Bruce M.; Keown, Paul A.; Ng, Raymond Tak-yan, 1963-

Abstract

Background: Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? Results The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Conclusion Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.

Item Metadata

Title	A computational pipeline for the development of multi-marker bio-signature panels and ensemble classifiers
Creator	Günther, Oliver P.; Chen, Virginia; Cohen Freue, Gabriela V.; Balshaw, Robert F.; Tebbutt, Scott J.; Hollander, Zsuzsanna; Takhar, Mandeep; McMaster, W. R.; McManus, Bruce M.; Keown, Paul A.; Ng, Raymond Tak-yan, 1963-
Contributor	James Hogg iCAPTURE Centre (University of British Columbia)
Publisher	BioMed Central
Date Issued	2012-12-08
Description	Background: Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? Results The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Conclusion Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.
Subject	Biomarkers; Computational; Pipeline; Genomics; Proteomics; Ensemble; Classification
Genre	Article
Type	Text
Language	eng
Date Available	2015-12-23
Provider	Vancouver : University of British Columbia Library
Rights	Attribution 4.0 International (CC BY 4.0)
DOI	10.14288/1.0221585
URI	http://hdl.handle.net/2429/56054
Affiliation	Computer Science, Department of; Medical Genetics, Department of; Medicine, Department of; Pathology and Laboratory Medicine, Department of; Science, Faculty of; Statistics, Department of; Non UBC; Medicine, Faculty of
Citation	BMC Bioinformatics. 2012 Dec 08;13(1):326
Publisher DOI	10.1186/1471-2105-13-326
Peer Review Status	Reviewed
Scholarly Level	Faculty
Copyright Holder	Günther et al.; licensee BioMed Central Ltd.
Rights URI	http://creativecommons.org/licenses/by/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Faculty Research and Publications

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights