UBC Faculty Research and Publications

diceR: an R package for class discovery using an ensemble driven approach Chiu, Derek S; Talhouk, Aline

Abstract

Background: Given a set of features, researchers are often interested in partitioning objects into homogeneous clusters. In health research, cancer research in particular, high-throughput data is collected with the aim of segmenting patients into sub-populations to aid in disease diagnosis, prognosis or response to therapy. Cluster analysis, a class of unsupervised learning techniques, is often used for class discovery. Cluster analysis suffers from some limitations, including the need to select up-front the algorithm to be used as well as the number of clusters to generate, in addition, there may exist several groupings consistent with the data, making it very difficult to validate a final solution. Ensemble clustering is a technique used to mitigate these limitations and facilitate the generalization and reproducibility of findings in new cohorts of patients. Results: We introduce diceR (diverse cluster ensemble in R), a software package available on CRAN: https://CRAN.R-project.org/package=diceR Conclusions: diceR is designed to provide a set of tools to guide researchers through a general cluster analysis process that relies on minimizing subjective decision-making. Although developed in a biological context, the tools in diceR are data-agnostic and thus can be applied in different contexts.

Item Media

Item Citations and Data

Rights

Attribution 4.0 International (CC BY 4.0)