A Comparison of Clustering and Prediction Methods for Identifying Key Chemical–Biological Features Affecting Bioreactor Performance

UBC Faculty Research and Publications

A Comparison of Clustering and Prediction Methods for Identifying Key Chemical–Biological Features Affecting Bioreactor Performance Tsai, Yiting; Baldwin, Susan A.; Siang, Lim C.; Gopaluni, Bhushan

Abstract

Chemical–biological systems, such as bioreactors, contain stochastic and non-linear interactions which are difficult to characterize. The highly complex interactions between microbial species and communities may not be sufficiently captured using first-principles, stationary, or low-dimensional models. This paper compares and contrasts multiple data analysis strategies, which include three predictive models (random forests, support vector machines, and neural networks), three clustering models (hierarchical, Gaussian mixtures, and Dirichlet mixtures), and two feature selection approaches (mean decrease in accuracy and its conditional variant). These methods not only predict the bioreactor outcome with sufficient accuracy, but the important features correlated with said outcome are also identified. The novelty of this work lies in the extensive exploration and critique of a wide arsenal of methods instead of single methods, as observed in many papers of similar nature. The results show that random forest models predict the test set outcomes with the highest accuracy. The identified contributory features include process features which agree with domain knowledge, as well as several different biomarker operational taxonomic units (OTUs). The results reinforce the notion that both chemical and biological features significantly affect bioreactor performance. However, they also indicate that the quality of the biological features can be improved by considering non-clustering methods, which may better represent the true behaviour within the OTU communities.

Item Metadata

Title	A Comparison of Clustering and Prediction Methods for Identifying Key Chemical–Biological Features Affecting Bioreactor Performance
Creator	Tsai, Yiting; Baldwin, Susan A.; Siang, Lim C.; Gopaluni, Bhushan
Publisher	Multidisciplinary Digital Publishing Institute
Date Issued	2019-09-10
Description	Chemical–biological systems, such as bioreactors, contain stochastic and non-linear interactions which are difficult to characterize. The highly complex interactions between microbial species and communities may not be sufficiently captured using first-principles, stationary, or low-dimensional models. This paper compares and contrasts multiple data analysis strategies, which include three predictive models (random forests, support vector machines, and neural networks), three clustering models (hierarchical, Gaussian mixtures, and Dirichlet mixtures), and two feature selection approaches (mean decrease in accuracy and its conditional variant). These methods not only predict the bioreactor outcome with sufficient accuracy, but the important features correlated with said outcome are also identified. The novelty of this work lies in the extensive exploration and critique of a wide arsenal of methods instead of single methods, as observed in many papers of similar nature. The results show that random forest models predict the test set outcomes with the highest accuracy. The identified contributory features include process features which agree with domain knowledge, as well as several different biomarker operational taxonomic units (OTUs). The results reinforce the notion that both chemical and biological features significantly affect bioreactor performance. However, they also indicate that the quality of the biological features can be improved by considering non-clustering methods, which may better represent the true behaviour within the OTU communities.
Subject	machine learning; bioinformatics; statistics
Genre	Article
Type	Text
Language	eng
Date Available	2019-09-24
Provider	Vancouver : University of British Columbia Library
Rights	CC BY 4.0
DOI	10.14288/1.0380986
URI	http://hdl.handle.net/2429/71762
Affiliation	Applied Science, Faculty of; Chemical and Biological Engineering, Department of
Citation	Processes 7 (9): 614 (2019)
Publisher DOI	10.3390/pr7090614
Peer Review Status	Reviewed
Scholarly Level	Faculty
Rights URI	https://creativecommons.org/licenses/by/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Faculty Research and Publications