UBC Faculty Research and Publications

flowClust: a Bioconductor package for automated gating of flow cytometry data Lo, Kenneth; Hahne, Florian; Brinkman, Ryan R; Gottardo, Raphael May 14, 2009

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12859_2009_Article_2875.pdf [ 462.81kB ]
JSON: 52383-1.0220576.json
JSON-LD: 52383-1.0220576-ld.json
RDF/XML (Pretty): 52383-1.0220576-rdf.xml
RDF/JSON: 52383-1.0220576-rdf.json
Turtle: 52383-1.0220576-turtle.txt
N-Triples: 52383-1.0220576-rdf-ntriples.txt
Original Record: 52383-1.0220576-source.json
Full Text

Full Text

ralssBioMed CentBMC BioinformaticsOpen AcceSoftwareflowClust: a Bioconductor package for automated gating of flow cytometry dataKenneth Lo*1, Florian Hahne2, Ryan R Brinkman3 and Raphael Gottardo4,5Address: 1Department of Statistics, University of British Columbia, 333-6356 Agricultural Road, Vancouver, BC, V6T1Z2, Canada, 2Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA, 3Terry Fox Laboratory, BC Cancer Research Center, 675 West 10th Avenue, Vancouver, BC, V5Z1L3, Canada, 4Institut de recherches cliniques de Montreal, 110, avenue des Pins Ouest, Montreal, QC, H2W 1R7, Canada and 5Département de biochimie, Université de Montreal, 2900, boul Edouard-Montpetit, Montreal, QC, H3T 1J4, CanadaEmail: Kenneth Lo* - c.lo@stat.ubc.ca; Florian Hahne - fhahne@fhcrc.org; Ryan R Brinkman - rbrinkman@bccrc.ca; Raphael Gottardo - raphael.gottardo@ircm.qc.ca* Corresponding author    AbstractBackground: As a high-throughput technology that offers rapid quantification of multidimensionalcharacteristics for millions of cells, flow cytometry (FCM) is widely used in health research, medicaldiagnosis and treatment, and vaccine development. Nevertheless, there is an increasing concernabout the lack of appropriate software tools to provide an automated analysis platform toparallelize the high-throughput data-generation platform. Currently, to a large extent, FCM dataanalysis relies on the manual selection of sequential regions in 2-D graphical projections to extractthe cell populations of interest. This is a time-consuming task that ignores the high-dimensionalityof FCM data.Results: In view of the aforementioned issues, we have developed an R package called flowClustto automate FCM analysis. flowClust implements a robust model-based clustering approach basedon multivariate t mixture models with the Box-Cox transformation. The package provides thefunctionality to identify cell populations whilst simultaneously handling the commonly encounteredissues of outlier identification and data transformation. It offers various tools to summarize andvisualize a wealth of features of the clustering results. In addition, to ensure its convenience of use,flowClust has been adapted for the current FCM data format, and integrated with existingBioconductor packages dedicated to FCM analysis.Conclusion: flowClust addresses the issue of a dearth of software that helps automate FCManalysis with a sound theoretical foundation. It tends to give reproducible results, and helps reducethe significant subjectivity and human time cost encountered in FCM analysis. The packagecontributes to the cytometry community by offering an efficient, automated analysis platform whichfacilitates the active, ongoing technological advancement.Background chemical characteristics for a large number of cells in aPublished: 14 May 2009BMC Bioinformatics 2009, 10:145 doi:10.1186/1471-2105-10-145Received: 10 January 2009Accepted: 14 May 2009This article is available from: http://www.biomedcentral.com/1471-2105/10/145© 2009 Lo et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 8(page number not for citation purposes)Flow cytometry (FCM) is a high-throughput technologythat offers rapid quantification of a set of physical andsample. FCM is widely used in health research and treat-ment for a variety of tasks, such as providing the counts ofBMC Bioinformatics 2009, 10:145 http://www.biomedcentral.com/1471-2105/10/145helper-T lymphocytes needed to monitor the course andtreatment of HIV infection, in the diagnosis and monitor-ing of leukemia and lymphoma patients, the evaluation ofperipheral blood hematopoietic stem cell grafts, andmany other diseases [1-8]. The technology is also used incross-matching organs for transplantation and in researchinvolving stem cells, vaccine development, apoptosis,phagocytosis, and a wide range of cellular propertiesincluding phenotype, cytokine expression, and cell-cyclestatus [9-14].Currently, FCM can be applied to analyze thousands ofsamples per day. Nevertheless, despite its widespread use,FCM has not reached its full potential due to the lack ofan automated analysis platform to parallel the high-throughput data-generation platform. In contrast to thetremendous interest in the FCM technology, there is adearth of statistical and bioinformatics tools to manage,analyze, present, and disseminate FCM data. There is con-siderable demand for the development of appropriatesoftware tools, as manual analysis of individual samples iserror-prone, non-reproducible, non-standardized, notopen to re-evaluation, and requires an inordinate amountof time, making it a limiting aspect of the technology[1,7,15-21].One core component of FCM analysis involves gating, theprocess of identifying cell populations that share a set ofcommon properties or display a particular biologicalfunction. Currently, to a large extent, gating relies on thesequential application of a series of manually drawn gates(i.e., data filters) that define regions in 1- or 2-D graphicalprojections of FCM data. This process is manually time-consuming and subjective as researchers have tradition-ally relied on intuition rather than standardized statisticalinference [7,22,23]. In addition, this process ignores thehigh-dimensionality of FCM data, which may conveymore information than that provided by only looking at1- or 2-D projections.Recently, a suite of several R packages providing infra-structure for FCM analysis have been released though Bio-conductor [24], an open source software developmentproject for the analysis of genomic data. flowCore [25],the core package among them, provides data structuresand basic manipulation of FCM data. flowViz [26] offersvisualization tools, while flowQ provides quality controland quality assessment tools for FCM data. Finally,flowUtils provides utilities to deal with data import/export for flowCore. In spite of these low-level tools, thereis still a dearth of software that helps automate FCM gat-ing analysis with a sound theoretical foundation [15].age (Additional file 1) to help resolve the currentbottleneck. flowClust implements a robust model-basedclustering approach [27-29] which extends the multivari-ate t mixture model with the Box-Cox transformationoriginally proposed in [30]. As a result of the extensionsmade, flowClust has included options allowing for a clus-ter-specific estimation of the Box-Cox transformationparameter and/or the degrees of freedom parameter; theImplementation section and the Results and Discussionsection provide a detailed account of these extensions.ImplementationThe modelIn statistics, model-based clustering [28,29,31,32] is apopular unsupervised approach to look for homogeneousgroups of observations. The most commonly used model-based clustering approach is based on finite Gaussianmixture models, which have been shown to give goodresults in various applied fields [28,29,33,34]. However,Gaussian mixture models might give poor representationsof clusters in the presence of outliers, or when the clustersare far from elliptical in shape, phenomena commonlyobserved in FCM data. In view of this, we have proposedin [30] an approach based on t mixture models [27,28]with the Box-Cox transformation to handle these twoissues simultaneously. Formally, given independent p-dimensional multivariate observations y1, y2,...,yn, anddenoting by Ψ the collection of all unknown parameters,the likelihood for a mixture model with G components iswhere wg is the probability that an observation belongs tothe g-th component, and φp(·|μg, Σg, νg) is the p-dimen-sional multivariate t density with mean μg (νg > 1), covar-iance matrix νg (νg - 2)-1 Σg (νg > 2) and νg degrees offreedom.  is the value obtained upon transforming yiwith the Box-Cox parameter λg; the transformation used isa variant of the original Box-Cox transformation which isalso defined for negative-valued data [35]. Finally, is the Jacobian inducedby the transformation. Please refer to [30] for a detailedaccount of an Expectation-Maximization (EM) algorithm[36] for the simultaneous estimation of all unknownparameters Ψ = (Ψ1,...,ΨG) where Ψg = (wg, μg, Σg, νg, λg).The EM algorithm needs to be initialized. By default, ran-L w Jn g pgGini g g g i gg( | , , ) ( | , , ) | ( ; ) |,( )Ψ Σy y y y111… = ⋅==∑∏ ϕ ν λλ μ(1)y ig( )λ| ( ; ) | | |J y y yi g i i ipg g gy λ λ λ λ= − − −1 1 2 1 1"Page 2 of 8(page number not for citation purposes)In view of these issues, based on a formal statistical clus-tering approach, we have developed the flowClust pack-dom partitioning is performed 10 times in parallel, andthe one delivering the highest likelihood value after a fewBMC Bioinformatics 2009, 10:145 http://www.biomedcentral.com/1471-2105/10/145EM runs will be selected as the initial configuration for theeventual EM algorithm.Note that, in the model originally proposed in [30], theBox-Cox parameter λ is set common to all components ofthe mixture, and the degrees of freedom parameter ν isfixed at a predetermined common value. In the latestdevelopment of our software, we have generalized themodel such that ν may also be estimated, and both λ andν are allowed to be component-specific, as reflected inEquation (1).When the number of clusters is unknown, we use theBayesian Information Criterion (BIC) [37], which givesgood results in the context of mixture models [29,38].The packageWith the aforementioned theoretical basis, we have devel-oped flowClust, an R package to conduct an automatedFCM gating analysis and produce visualizations for theresults. Its source code is written in C for optimal utiliza-tion of system resources and makes use of the Basic LinearAlgebra Subprograms (BLAS) library, which facilitatesmultithreaded processes when an optimized library isprovided.flowClust is released through Bioconductor [24], alongwith those R packages mentioned in the Background sec-tion. The GNU Scientific Library (GSL) is needed for suc-cessful installation of flowClust. Please refer to thevignette that comes with flowClust for details aboutinstallation; Windows users may also consult theREADME file included in the package for procedures oflinking GSL to R.The package adopts a formal object-oriented program-ming discipline, making use of the S4 system [39] todefine classes and methods. The core function, flowClust, implements the clustering methodology andreturns an object of class flowClust. A flowClustobject stores essential information related to the cluster-ing result which can be retrieved through various methodssuch as summary, Map, getEstimates, etc. To visu-alize the clustering results, the plot and hist methodscan be applied to produce scatterplots, contour/imageplots and histograms.To enhance communications with other Bioconductorpackages designed for the cytometry community, flow-Clust has been built with the aim of being highly inte-grated with flowCore. Methods in flowClust can bedirectly applied on a flowFrame, the standard R imple-mentation of a Flow Cytometry Standard (FCS) filebasic filtering methods defined in flowCore (e.g., filter, %in%, Subset and split) in order to providesimilar functionality for classes defined in flowClust.Results and discussionAnalysis of real FCM dataIn this section, we illustrate how to use flowClust to con-duct an automated gating analysis of real FCM data. Fordemonstration, we use the graft-versus-host disease(GvHD) data (Additional file 2) [40]. The data are storedin FCS files, and consist of measurements of four fluores-cently conjugated antibodies, namely, anti-CD4, anti-CD8β, anti-CD3 and anti-CD8, in addition to the forwardscatter and sideward scatter parameters. One objective ofthe gating analysis is to look for the CD3+CD4+CD8β+ cellpopulation, a distinctive feature found in GvHD-positivesamples. We have adopted a two-stage strategy [30]: wefirst cluster the data by using the two scatter parameters toidentify basic cell populations, and then perform cluster-ing on the population of interest using all fluorescenceparameters.At the initial stage, we extract the lymphocyte populationusing the forward scatter (FSC-H) and sideward scatter(SSC-H) parameters:GvHD <- read.FCS("B07", trans = FALSE)res1 <- flowClust(GvHD, varNames = c("FSC-H", "SSC-H"), K = 1:8)To estimate the number of clusters, we run flowCluston the data repetitively with K = 1 up to K = 8 clustersin turn, and apply the BIC to guide the choice. Values ofthe BIC can be retrieved through the criterionmethod. Figure 1 shows that the BIC curve remains rela-tively flat beyond four clusters. We therefore choose themodel with four clusters. Below is a summary of the cor-responding clustering result.** Experiment Information **Experiment name: Flow ExperimentVariables used: FSC-H SSC-H** Clustering Summary **Number of clusters: 4Proportions: 0.1779686 0.1622115 0.38820430.2716157Page 3 of 8(page number not for citation purposes)defined in flowCore; FCS is the typical storage mode forFCM data. Another step towards integration is to overload** Transformation Parameter **BMC Bioinformatics 2009, 10:145 http://www.biomedcentral.com/1471-2105/10/145lambda: 0.1126388** Information Criteria **Log likelihood: -146769.5BIC: -293765.9ICL: -300546.2** Data Quality **Number of points filtered from above: 168(1.31%)Number of points filtered from below: 0(0%)Rule of identifying outliers: 90% quantileNumber of outliers: 506 (3.93%)Uncertainty summary:Min.       1st Qu.   Median    Mean      3rd9.941e-04 1.211e-02 3.512e-02 8.787e-021.070e-01 6.531e-01 1.680e+02The estimate of the Box-Cox parameter λ is 0.11, implyinga transformation close to a logarithmic one (λ = 0).Note that, by default, flowClust selects the same trans-formation for all clusters. We have also enabled theoption of estimating the Box-Cox parameter λ for eachcluster. For instance, if a user finds the shapes of the clus-ters significantly deviate from one another and opts for adifferent transformation for each cluster, he may write thefollowing line of code:flowClust(GvHD, varNames = c("FSC-H","SSC-H"), K = 4, trans = 2)The trans argument acts as a switch to govern how λ ishandled: fixed at a predetermined value (trans = 0),estimated and set common to all clusters (trans = 1),or estimated for each cluster (trans = 2). Incidentally,the option of estimating the degrees of freedom parameterν has also been made available, either common to all clus-ters or specific to each of them. The nu.est argument is thecorresponding switch and takes a similar interpretation totrans. Such an option of estimating ν further fine-tunesthe model-fitting process such that the fitted model canreflect the data-specific level of abundance of outliers. Tocompare the models adopting a different combination ofthese options, one may make use of the BIC again. SeeAdditional file 3 for a graph with two BIC curves corre-sponding to the default setting (common λ) and the set-ting with cluster-specific λ, respectively. Little difference inthe BIC values between the two settings can be observed.In accordance with the principle of parsimony in Statisticswhich favors a simpler model, we opt for the default set-ting here.Graphical functionalities are available to users for visual-izing a wealth of features of the clustering results, includ-ing the cluster assignment, outliers, and the size and shapeof the clusters. Figure 2 is a scatterplot showing the clusterassignment of points upon the removal of outliers. Out-liers are shown in grey with the + symbols. The black solidlines represent the 90% quantile region of the clusterswhich defines the cluster boundaries. The summaryshown above states that the default rule used to identifyoutliers is 90% quantile, which means that a point out-side the 90% quantile region of the cluster to which it isassigned will be called an outlier. In most applications,the default rule should be appropriate for identifying out-liers. In case a user wants finer control and would like tospecify a different rule, he may apply the ruleOutliersA plot of BIC against the number of clusters for the first-stage cluster analysisFigure 1A plot of BIC against the number of clusters for the first-stage cluster analysis. The BIC curve remains rela-tively flat beyond four clusters, suggesting that the model fit using four clusters is appropriate.1 2 3 4 5 6 7 8−304000−300000−296000No. of clustersBICPage 4 of 8(page number not for citation purposes)Qu.   Max.      NA's replacement method:BMC Bioinformatics 2009, 10:145 http://www.biomedcentral.com/1471-2105/10/145ruleOutliers(res1[[4]]) <- list(level =0.95)See Additional file 4 for the corresponding summary. Asshown in the summary, this rule is more stringent thanthe 90% quantile rule: 133 points (1.03%) are nowcalled outliers, as opposed to 506 points (3.93%) in thedefault rule.Clusters 1, 3 and 4 in Figure 2 correspond to the lym-phocyte population defined with a manual gating strategyadopted in [40]. We then extract these three clusters toproceed with the second-stage analysis:GvHD2 <- split(GvHD, res1[[4]], population= list(lymphocyte = c(1,3,4), deadcells =2))The subsetting method split allows us to split the datainto several flowFrames representing the different cellpopulations. To extract the lymphocyte population (clus-removes outliers upon extraction. The deadcells = 2list element is included above for demonstration purpose;it is needed only if we want to extract the dead cell popu-lation (cluster 2), too.In the second-stage analysis, in order to fully utilize themultidimensionality of FCM data we cluster the lym-phocyte population using all the four fluorescence param-eters, namely, anti-CD4 (FL1-H), anti-CD8β (FL2-H),anti-CD3 (FL3-H) and anti-CD8 (FL4-H), at once:res2 <- flowClust(GvHD2$lymphocyte, varNames = c("FL1-H", "FL2-H", "FL3-H", "FL4-H"), K = 1:15)The BIC curve remains relatively flat beyond 11 clusters(Figure 3), suggesting that the model with 11 clusters pro-vides a good fit. Figure 4 shows a contour plot superim-posed on a scatterplot of CD8β against CD4 for the sub-population of CD3-stained cells, which were selectedbased on a threshold obtained from a negative controlsample [40]. We can easily identify from it the red andpurple clusters at the upper right as the CD3+CD4+CD8β+cell population. A corresponding image plot is given byFigure 5. Also, see Additional file 5 for the code used toproduce all the plots shown in this article.A scatterplot revealing the cluster assignment in the first-stage analysisFigure 2A scatterplot revealing the cluster assignment in the first-stage analysis. Clusters 1, 3 and 4 correspond to the lymphocyte population, while cluster 2 is referred to as the dead cell population. The black solid lines represent the 90% quantile region of the clusters which define the cluster boundaries. Points outside the boundary of the cluster to which they are assigned are called outliers and marked with "+".200 400 600 800 100002004006008001000FSC−HeightSSC−Height+++++++++ +++++++ ++++++++ ++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++ +++++++++++++++++++++ +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++++++ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++Clusters1234A plot of BIC against the number of clusters for the second-stage cluster analysisFigure 3A plot of BIC against the number of clusters for the second-stage cluster analysis. The BIC curve remains rel-atively flat beyond 11 clusters, suggesting that the model fit 2 4 6 8 10 12 14−450000−445000−440000−435000No. of clustersBICPage 5 of 8(page number not for citation purposes)ters 1, 3 and 4), we may type GvHD2$lymphocyte orGvHD2[[1]], which is a flowFrame. By default, splitusing 11 clusters is appropriate.BMC Bioinformatics 2009, 10:145 http://www.biomedcentral.com/1471-2105/10/145The example above shows how an FCM analysis is con-ducted with the aid of flowClust. When the number ofcell populations is not known in advance, and the BIC val-ues are relatively close over a range of the possible numberof clusters, the researcher may be presented with a set ofpossible solutions instead of a clear-cut single one. In sucha case, the level of automation may be undermined as theresearcher may need to select the best one based on hisexpertise. We acknowledge that more effort is needed toextend our proposed methodology towards a higher levelof automation. Currently, we are working on an approachwhich successively merges the clusters in the solution assuggested by the BIC using some entropy criterion to givea more reasonable estimate of the number of clusters.Integration with flowCoreAs introduced in the Background section, flowClust hasbeen built in a way such that it is highly integrated withthe flowCore package. The core function flowClustwhich performs the clustering operation may be replacedby a call to the constructor tmixFilter creating a filter  object similar to the ones used in other gating or fil-tering operations found in flowCore (e.g., rectangle-Gate, norm2Filter, kmeansFilter). As anexample, the coderes1 <- flowClust(GvHD, varNames = c("FSC-H", "SSC-H"), K = 1:8)used in the first-stage analysis of the GvHD data may bereplaced by:s1filter <- tmixFilter("lymphocyte",c("FSC-H", "SSC-H"), K = 1:8)res1f <- filter(GvHD, s1filter)The use of a dedicated tmixFilter-class object separatesthe task of specifying the settings (tmixFilter) fromthe actual filtering operation (filter), facilitating thecommon scenario in FCM gating analysis that filteringwith the same settings is performed upon a large numberof data files. The filter method returns a list objectres1f with elements each of class tmixFilterResult,which directly extends the filterResult class definedin flowCore. Users may apply various subsetting opera-tions defined for the filterResult class in a similarfashion on a tmixFilterResult object. For instance,Subset(GvHD [, c("FSC-H", "SSC-H")],res1f[[4]])A contour plot superimposed on a scatterplot of CD8β agains  CD4 for the CD3+ populationFigure 4A contour plot superimposed on a scatterplot of CD8β against CD4 for the CD3+ population. The red and purple clusters at the upper right correspond to the CD3+CD4+CD8β+ cell population, indicative of the GvHD.0 100 200 300 400 500 6000100200300400500600CD4−HeightCD8ββ−HeightAn image plot of CD8β against CD4 for the CD3+ populationFigure 5An image plot of CD8β against CD4 for the CD3+ population. The five clusters corresponding to the CD3+ population shown in Figure 5 can also be identified clearly on this image plot.0 100 200 300 400 500 6000100200300400500600CD4−HeightCD8ββ−HeightPage 6 of 8(page number not for citation purposes)BMC Bioinformatics 2009, 10:145 http://www.biomedcentral.com/1471-2105/10/145outputs a flowFrame that is the subset of the GvHD dataupon the removal of outliers, consisting of the twoselected parameters, FSC-H and SSC-H, only. Anotherexample is given by the split method introduced earlierin this section.We realize that occasionally a researcher may opt to com-bine the use of flowClust with filtering operations inflowCore to define the whole sequence of an FCM gatinganalysis. To enable the exchange of results between thetwo packages, filters created by tmixFilter may betreated like those from flowCore; users of flowCore willfind that filter operators, namely, &, |, ! and %subset%,also work in the flowClust package. For instance, supposethe researcher is interested in clustering the CD3+ cell pop-ulation which he defines by constructing an interval gatewith the lower end-point at 270 on the CD3 parameter.He may use the following code to perform the analysis:rectGate <- rectangleGate(filterId="CD3+", "FL3-H" =c(270, Inf))s2filter <- tmixFilter("s2filter", c("FL1-H", "FL2-H", "FL3-H", "FL4-H"), K = 5)res2f <- filter(GvHD2$lymphocyte, s2filter%subset% rectGate)The constructors rectangleGate and tmixFiltercreate two filter objects storing the settings of the inter-val gate and flowClust, respectively. When the last lineof code is run, the interval gate will first be applied to theGvHD data. flowClust is then performed on a subset ofthe GvHD data contained by the interval gate.ConclusionflowClust is an R package dedicated to FCM gating analy-sis, addressing the increasing demand for software capa-ble of processing and analyzing the voluminous amountof FCM data efficiently via an objective, reproducible andautomated means. The package implements a statisticalclustering approach using multivariate t mixture modelswith the Box-Cox transformation [30], and provides toolsto summarize and visualize results of the analysis. The sta-tistical model underlying flowClust extends the one orig-inally proposed in [30]. The extensions have includedmodeling options allowing for a cluster-specific estima-tion of the Box-Cox parameter λ and the degrees of free-dom parameter ν. The package contributes to thecytometry community by offering an efficient, automatedanalysis platform which facilitates the active, ongoingtechnological advancement.Project homepage: http://bioconductor.orgOperating systems: Platform independentProgramming language: C, ROther requirements: GSL, R, BioconductorLicense: Artistic 2.0Any restrictions to use by non-academics: flowClustdepends on the mclust software, the use of which needsto abide by the terms stated in http://www.stat.washington.edu/mclust/license.txt.Authors' contributionsKL and RG developed the methodology and software, andperformed the analyses. FH participated in the develop-ment of the software. RRB and RG conceived of the study,and participated in its design and coordination. FH, RRBand RG helped KL draft the manuscript. All authors readand approved the final manuscript.Additional materialAdditional file 1A copy of the flowClust package. The zip file contains the source code of the flowClust package (version 2.2.0) as a gzipped tarball for direct installation into R from a command-line interface. This current release is also available from Bioconductor at http://bioconductor.org/packages/2.4/bioc/html/flowClust.html.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2105-10-145-S1.zip]Additional file 2A copy of the GvHD data file used in this article. The zip file contains the data file in FCS format used in the GvHD analysis. Interested readers may go to http://www.ficcs.org/software.html#Data_Files for a complete set of data files for the GvHD study [40].Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2105-10-145-S2.zip]Additional file 3A graph with two BIC curves corresponding to the settings with a com-mon λ and cluster-specific λ respectively for the first-stage cluster analysis. Little difference in the BIC values between the two settings is observed. In accordance with the principle of parsimony which favors a simpler model, we opt for the default setting here.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2105-10-145-S3.pdf]Page 7 of 8(page number not for citation purposes)Availability and requirementsProject name: flowClustBMC Bioinformatics 2009, 10:145 http://www.biomedcentral.com/1471-2105/10/145AcknowledgementsThe authors thank Martin Morgan, Patrick Aboyoun and Marc Carlson for their advice on the technical issues of building the flowClust package, and the two reviewers for suggestions that improved an earlier draft of the arti-cle. This work was supported by the NIH grants EB005034 and EB008400, and by the Michael Smith Foundation for Health Research.References1. Braylan RC: Impact of flow cytometry on the diagnosis andcharacterization of lymphomas, chronic lymphoproliferativedisorders and plasma cell neoplasias.  Cytometry A 2004,58A:57-61.2. Hengel RL, Nicholson JK: An update on the use of flow cytome-try in HIV infection and AIDS.  Clin Lab Med 2001, 21(4):841-856.3. Illoh OC: Current applications of flow cytometry in the diag-nosis of primary immunodeficiency diseases.  Arch Pathol LabMed 2004, 128:23-31.4. Kiechle FL, Holland-Staley CA: Genomics, transcriptomics, pro-teomics, and numbers.  Arch Pathol Lab Med 2003,127(9):1089-1097.5. Mandy FF: Twenty-five years of clinical flow cytometry: AIDSaccelerated global instrument distribution.  Cytometry A 2004,58A:55-56.6. Orfao A, Ortuno F, de Santiago M, Lopez A, San Miguel J: Immu-nophenotyping of acute leukemias and myelodysplastic syn-dromes.  Cytometry A 2004, 58A:62-71.7. Bagwell CB: DNA histogram analysis for node-negative breastcancer.  Cytometry A 2004, 58A:76-78.8. Keeney M, Gratama JW, Sutherland DR: Critical role of flowcytometry in evaluating peripheral blood hematopoieticstem cell grafts.  Cytometry A 2004, 58A:72-75.9. Krutzik PO, Irish JM, Nolan GP, Perez OD: Analysis of proteinphosphorylation and cellular signaling events by flow cytom-etry: techniques and clinical applications.  Clin Immunol 2004,110(3):206-221.10. Maecker H, Maino V: Flow cytometric analysis of cytokines 6th edition.Washington, DC: ASM Press. Manual of Clinical Laboratory Immunol-ogy; 2002. 11. Pozarowski P, Darzynkiewicz Z: Analysis of cell cycle by flowcytometry.  Methods Mol Biol 2004, 281:301-312.12. Pala P, Hussell T, Openshaw PJ: Flow cytometric measurementof intracellular cytokines.  J Immunol Methods 2000, 243(1–2):107-124.13. Vermes I, Haanen C, Reutelingsperger C: Flow cytometry of apop-totic cell death.  J Immunol Methods 2000, 243(1–2):167-190.14. Lehmann AK, Sornes S, Halstensen A: Phagocytosis: measure-ment by flow cytometry.  J Immunol Methods 2000, 243(1–2):229-242.15. Lizard G: Flow cytometry analyses and bioinformatics: Inter-16. de Rosa SC, Brenchley JM, Roederer M: Beyond six colors: a newera in flow cytometry.  Nat Med 2003, 9:112-117.17. Redelman D: CytometryML.  Cytometry A 2004, 62A:70-73.18. Roederer M, Treister A, Moore W, Herzenberg LA: Probabilitybinning comparison: a metric for quantitating univariate dis-tribution differences.  Cytometry 2001, 45(1):37-46.19. Roederer M, Moore W, Treister A, Hardy RR, Herzenberg LA:Probability binning comparison: a metric for quantitatingmultivariate distribution differences.  Cytometry 2001,45(1):47-55.20. Tzircotis G, Thorne RF, Isacke CM: A new spreadsheet methodfor the analysis of bivariate flow cytometric data.  BMC Cell Biol2004, 5:10.21. Spidlen J, Gentleman RC, Haaland PD, Langille M, Le Meur N, OchsMF, Schmitt C, Smith CA, Treister AS, Brinkman RR, et al.: Datastandards for flow cytometry.  OMICS 2006, 10(2):209-214.22. Suni MA, Dunn HS, Orr PL, de Laat R, Sinclair E, Ghanekar SA, BredtBM, Dunne JF, Maino VC, Maecker HT: Performance of plate-based cytokine flow cytometry with automated data analy-sis.  BMC Immunol 2003, 4:9.23. Parks DR: Data processing and analysis: Data management.  InCurrent Protocols in Cytometry Volume chap. 10. New York: John Wiley& Sons, Inc.; 1997:10.1.1-10.1.6. 24. Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S,Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W,Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G,Smith C, Smyth G, Tierney L, Yang JYH, Zhang J: Bioconductor:Open software development for computational biology andbioinformatics.  Genome Biol 2004, 5(10):R80.25. Hahne F, Le Meur N, Brinkman R, Ellis B, Haaland P, Sarkar D, SpidlenJ, Strain E, Gentleman R: flowCore: A Bioconductor softwarepackage for high throughput flow cytometry data analysis.BMC Bioinformatics 2008, 10:106.26. Sarkar D, Le Meur N, Gentleman R: Using flowViz to visualizeflow cytometry data.  Bioinformatics 2008, 24(6):878-879.27. Peel D, McLachlan GJ: Robust mixture modelling using the t dis-tribution.  Stat Comput 2000, 10(4):339-348.28. McLachlan G, Peel D: Finite Mixture Models Wiley Series in Probabilityand Statistics: Applied Probability and Statistics, New York: Wiley-Interscience; 2000. 29. Fraley C, Raftery AE: Model-based clustering, discriminantanalysis, and density estimation.  J Amer Statist Assoc 2002,97(458):611-631.30. Lo K, Brinkman RR, Gottardo R: Automated gating of flowcytometry data via robust model-based clustering.  CytometryA 2008, 73(4):321-332.31. Titterington DM, Smith AFM, Makov UE: Statistical Analysis of FiniteMixture Distributions Chichester, UK: John Wiley & Sons; 1985. 32. McLachlan GJ, Basford KE: Mixture Models: Inference and Applications toClustering New York, NY: Marcel Dekker Inc; 1988. 33. Banfield JD, Raftery AE: Model-based Gaussian and Non-Gaus-sian Clustering.  Biometrics 1993, 49:803-821.34. Fraley C, Raftery AE: MCLUST Version 3 for R: Normal Mix-ture Modeling and Model-Based Clustering.  In Technical ReportDepartment of Statistics, University of Washington; 2006. 35. Bickel PJ, Doksum KA: An analysis of transformations revisited.J Amer Statist Assoc 1981, 76(374):296-311.36. Dempster AP, Laird NM, Rubin DB: Maximum likelihood fromincomplete data via the EM algorithm.  J R Statist Soc B 1977,39:1-22.37. Schwarz G: Estimating the Dimension of a Model.  Ann Statist1978, 6:461-464.38. Fraley C, Raftery AE: How many clusters? Which clusteringmethod? Answers via model-based cluster analysis.  Comput J1998, 41(8):578-588.39. Chambers JM: Programming with Data: A Guide to the S Language NewYork, NY: Springer; 2004. 40. Brinkman RR, Gasparetto M, Lee SJJ, Ribickas A, Perkins J, Janssen W,Smiley R, Smith C: High-content flow cytometry and temporaldata analysis for defining a cellular signature of Graft-versus-Host disease.  Biol Blood Marrow Transplant 2007, 13(6):691-700.Additional file 4Result summary of the first-stage analysis with four clusters of the GvHD data. The rule used to identify outliers is 95% quantile. 133 points (1.03%) are called outliers.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2105-10-145-S4.txt]Additional file 5Code to produce the plots in this article. R code to produce the plots in the GvHD analysis.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2105-10-145-S5.r]Page 8 of 8(page number not for citation purposes)est in new softwares to optimize novel technologies and tofavor the emergence of innovative concepts in cell research.Cytometry A 2007, 71A:646-647.


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items