Open Collections

UBC Faculty Research and Publications

SIGMA2: A system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes,… Chari, Raj; Coe, Bradley P; Wedseltoft, Craig; Benetti, Marie; Wilson, Ian M; Vucic, Emily A; MacAulay, Calum; Ng, Raymond T; Lam, Wan L Oct 7, 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12859_2008_Article_2407.pdf [ 2.54MB ]
JSON: 52383-1.0223324.json
JSON-LD: 52383-1.0223324-ld.json
RDF/XML (Pretty): 52383-1.0223324-rdf.xml
RDF/JSON: 52383-1.0223324-rdf.json
Turtle: 52383-1.0223324-turtle.txt
N-Triples: 52383-1.0223324-rdf-ntriples.txt
Original Record: 52383-1.0223324-source.json
Full Text

Full Text

ralssBioMed CentBMC BioinformaticsOpen AcceSoftwareSIGMA2: A system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomesRaj Chari*1, Bradley P Coe1, Craig Wedseltoft1, Marie Benetti1, Ian M Wilson1, Emily A Vucic1, Calum MacAulay2, Raymond T Ng3 and Wan L Lam1Address: 1Department of Cancer Genetics and Developmental Biology, BC Cancer Agency Research Centre, Vancouver, BC, Canada, 2Department of Cancer Imaging, BC Cancer Agency Research Centre, Vancouver, BC, Canada and 3Department of Computer Science, University of British Columbia, Vancouver, BC, CanadaEmail: Raj Chari* -; Bradley P Coe -; Craig Wedseltoft -; Marie Benetti -; Ian M Wilson -; Emily A Vucic -; Calum MacAulay -; Raymond T Ng -; Wan L Lam -* Corresponding author    AbstractBackground: High throughput microarray technologies have afforded the investigation ofgenomes, epigenomes, and transcriptomes at unprecedented resolution. However, softwarepackages to handle, analyze, and visualize data from these multiple 'omics disciplines have not beenadequately developed.Results: Here, we present SIGMA2, a system for the integrative genomic multi-dimensional analysisof cancer genomes, epigenomes, and transcriptomes. Multi-dimensional datasets can besimultaneously visualized and analyzed with respect to each dimension, allowing combinatorialintegration of the different assays belonging to the different 'omics.Conclusion: The identification of genes altered at multiple levels such as copy number, loss ofheterozygosity (LOH), DNA methylation and the detection of consequential changes in geneexpression can be concertedly performed, establishing SIGMA2 as a novel tool to facilitate the highthroughput systems biology analysis of cancer.BackgroundMultiple mechanisms of gene disruption have beenshown to be important in the development of cancer.Genetic alterations (mutations, changes in gene dosage,allele imbalance) and epigenetic alterations (changes inDNA methylation and histone modification states) areresponsible for changing the expression of genes. High(transcriptomic) profiles at unprecedented resolution [1-6]. However, a gene can be disrupted by one or by a com-bination of mechanisms, therefore, investigation in a sin-gle 'omics dimension (genomics, epigenomics, ortranscriptomics) alone cannot detect all disrupted genesin a given tumor. Moreover, individual tumors may havedifferent patterns of gene disruption, by different mecha-Published: 7 October 2008BMC Bioinformatics 2008, 9:422 doi:10.1186/1471-2105-9-422Received: 25 June 2008Accepted: 7 October 2008This article is available from:© 2008 Chari et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 12(page number not for citation purposes)throughput approaches have afforded the ability to inter-rogate the genomic, epigenomic and gene expressionnisms for a given gene while achieving the same net effecton phenotype. Hence, a multi-dimensional approach isBMC Bioinformatics 2008, 9:422 to identify the causal events at the DNA level andunderstand their downstream consequences.The current state of software for global profile comparisontypically focuses on analyzing and displaying data from asingle dimension, for example CGH Fusion (infoQuantLtd, London, UK) for DNA copy number profile analysisand GeneSpring (Agilent Technologies, Santa Clara, CA,USA) for gene expression profile analysis. Software forintegrative analysis have been restricted to working withdatasets derived from limited combination of technologyplatforms (Table 1) [7-10]. Though different software cananalyze data generated from different platforms, the abil-ity to perform meta-analysis using data from multiplemicroarray platforms is limited to a small number of soft-ware packages. Consequently, integrative analysis of can-cer genomes typically involves no more than two types ofdata, most commonly the integration of gene dosage andgene expression data [11-16] and recently expanded tointegrating allelic information [17]. Software to performmulti-dimensional analysis are therefore greatly indemand.Here, we present SIGMA2, a novel software package whichallows users to integrate data from the various 'omics dis-ciplines such as genomics, epigenomics and transcriptom-ics. Multi-dimensional datasets can be simultaneouslycompared, analyzed and visualized with respect to indi-vidual dimensions, allowing combinatorial integration ofthe different assays belonging to the different 'omics. Theidentification of genes altered at multiple levels such ascopy number, LOH, DNA methylation and the detectionof consequential changes in gene expression can be con-certedly performed, establishing SIGMA2 as a tool to facil-itate the high throughput systems biology analysis ofcancer. SIGMA2 is freely available for academic andresearch use from our website, is implemented in Java, and requires version 1.6+of the runtime compiler. In addition, the statistical pack-age R and database application MySQL are also required.The java interface communicates with MySQL using aJDBC connector and with R using the JRI package by JGR(Figure 1). MySQL is used for data storage and queryingwhile R is used for the segmentation and statistical analy-sis. All genomic coordinate information was obtainedfrom University of California Santa Cruz (UCSC) genomedatabases [18].Results and discussionLook and feel of SIGMA2The novel multi-dimensional 'omics data analysis soft-ware SIGMA2 is built on the framework of a facile visuali-zation tool called SIGMA, which can display alignment ofgenomic data from a built-in static database [7]. The arse-nal of functionalities introduced in SIGMA2 is shown inTable 1.Description of application scope and functionalitySIGMA2 is built to handle a variety of analysis techniquestypically used in the high-throughput study of cancer,allowing the combinatorial integration of multiple 'omicsdisciplines. The hierarchy, which underlies the program,groups data into genome, epigenome, and transcriptomeis shown in Figure 2A and the overall functionality map isgiven in Figure 2B and listed in Table 2. With each 'omicsTable 1: Features required for integrative analysisFeatures required for integrative analysis Nexus CGH CGH Fusion ISA-CGH VAMP *CGH Analytics MD-SeeGH SIGMA SIGMA2Built-in segmentation for array CGH 3 3 3 3 3 3 3Consensus calling using multiple segmentation algorithms3Array platform-independent combined CGH analysis 3 3 3Custom microarray data handling 3 3 3 3 3 3 3Basic copy number and expression integration 3 3 3 3Alignment and analysis of genetic and epigenetic data 3 3 3Multi-dimensional visualization of genetic, epigenetic and gene expression data3Two group statistical comparison 3 3 3 3Two group combinatorial gene dosage and gene expression comparison3Linking to external biological databases 3 3 3 3 3 3 3 3Linking to external gene expression (GEOProfiles) 3Context-based visualization of genome features 3 3 3 3 3Conversion of data between different genome assemblies3 3 3 3Page 2 of 12(page number not for citation purposes)Free for academic/research use 3 3 3 3 3 3BMC Bioinformatics 2008, 9:422, data sets may be imported representing any ofthe major types of biological measurements beingassayed, for example, (i) examining both DNA copynumber and LOH assays within the genomic bundle, (ii)examining both DNA methylation and histone modifica-tion status within the epigenomics bundle, and (iii) exam-ining both gene expression profiles and microRNAsexpression assays within the transcriptomic bundle. Eachassay may branch into data sources from a multitude oftechnology platforms.Approach to integration between array platforms and assaysSIGMA2 treats all data in the context of genome positionbased on the relevant human genome build using theUCSC genome assemblies. An interval-based approach isused to sample across different array platforms and assaysand data from each interval are merged together. Briefly,this is done by querying data at fixed genomic intervals foreach platform and subsequently taking an average of themeasurements within each interval. The algorithm islisted in Figure 3.Format requirements of input dataStandard tab-delimited text files are used for the input ofdata for all of the assay types. For genomic data, specifi-cally array CGH, normalization is recommended usingSIGMA2, but results from external analysis can beimported and used in the consensus calling feature. Thealgorithms which can be called within SIGMA2 currentlyinclude DNACopy and GLAD [21,22]. Multiple samplebatch importing is available to facilitate efficient loadingof datasets. To utilize this, the user must create an infor-mation file which describes each sample in the dataset.Formatting requirements of the information file are spec-ified in the manual. Alternatively, for Affymetrix SNParray analysis, data should also be pre-processed and nor-malized using the appropriate software, such as CNAGbefore importing into SIGMA2 [23]. Genotyping callsshould be made prior to importing, using the "AA", "AB"and "BB" convention. If the genotype call does not exist,"NC" must be specified. For epigenomic data, data fromaffinity based-approaches (MeDIP [6] and ChIP [24])should contain a value representing the level of enrich-ment and the genomic coordinates for each spot. Simi-larly, for bisulphite-based approaches [25], a percent ofconverted CpGs should be provided along with thegenomic coordinates for each spot. Finally, for transcrip-tome data, gene expression data from Affymetrix experi-ments can be directly imported and processed as CEL filesand are normalized using the MAS 5.0 algorithm imple-mented in the "affy" package of R. For any assay type, cus-tom data can be imported whereby the user provides amap of the platform based on the given genome build,Main structural components of SIGMA2Figure 1Main structural components of SIGMA2. Data and genome mapping information is stored in the MySQL database. Seg-mentation analysis using DNACopy and GLAD and statistical analysis is performed using R, with results stored in database. Java was used to program the application, specifically for the user interface and the different types of visualization. Base-pair posi-tions and gene annotations are linked to other biological databases to facilitate further interrogation by the user.MySQL-Data storage-QueryingR-Segmentaon-Stascal analysisJava-User interface-VisualizaonJGR / JRIJDBCRMySQLLink to external resourcesSIGMA2Biological Databases• PubMed• OMIM• NCBI Gene• UCSC Genome Browser• GEO Profiles• Database of Genomic VariantsPage 3 of 12(page number not for citation purposes)external algorithms such as CGH-Norm and MANOR[19,20]. Segmentation analysis can be performed withinand the unique identifier for the map must be used for thedata generated from those experiments.BMC Bioinformatics 2008, 9:422 4 of 12(page number not for citation purposes)(A) Data hierarchy describing the relationship between platforms, assays and 'omics disciplinesFigure 2(A) Data hierarchy describing the relationship between platforms, assays and 'omics disciplines. (B) Functionality map of SIGMA2. List of the various functions and the output from that function that can be performed given the number of samples or sample groups and dimensions. Multiple sample analysis (single group and two group) are microarray platform inde-pendent. Functions listed in boxes are in addition to those listed in the box preceding the arrows.Combinatorial IntegrationGenome Epigenome Transcriptome DNA Copy Number Allelic Imbalance (LOH) DNA Methylaon Histone modificaonGene & MicroRNAExpressionOmicsAssayPlatform BAC array CGHOligo array CGHSNP ArraysMicrosatellite markersSAGE MicroarraysMeDIP -array CGHBisulphite-based methodsChIP-on-chipSingle sampleMulple samples (one group)Mulple samples (two groups)Single Plaorm / Single AssaySingle ‘omics(Mulple assays)A,B,C,Q,R,S D,E,H A,B,C,L,Q,R,SA,B,C,L,M,Q,R,SD,E,HD,E,HCombinatorial Integraon (Mulple ‘omics)F,G,O,PF,G,I,J,K,O,PF,G,I,J,K,N,O,PA Segmentation analysis for array CGH to identify regions of gain and loss B Moving average thresholding for affinity based approaches (MeDIP for DNA methylation, ChIP-on-chip for histone modification states) C Regions of loss of heterozygosity (LOH) D Regions of copy number change and LOH E Regions of copy number neutrality and LOH (e.g. UPD) F Regions of copy number AND methylation alteration ("two" hit) G Regions of copy number OR methylation alteration (compensatory change with same net effect) H Epigenetic interplay between DNA methylation and various modification states of histones I Correlation of copy number and gene expression (dataset with matched copy number and expression profiles) J Statistical comparison of samples with copy number change versus without copy number change (dataset with matched copy number and expression profiles) using Mann Whitney U-test K Correlation of DNA methylation and gene expression (dataset with matched DNA methylation and expression profiles) L Identify recurrent changes (copy number alterations, common enrichment patterns [MeDIP, ChIP], regions of LOH)  M Statistical comparison of patters of recurrent changes between two groups using Fisher's exact test N Two-dimensional two-group comparisons (statistical comparison of expression profiles of genes in regions of difference identified by Fisher's exact comparison) O Identify "And" events between three or more DNA-based dimensions  (copy number, LOH, DNA methylation, histone modification states) P Identify "Or" events between three or more DNA-based dimensions  (copy number, LOH, DNA methylation, histone modification states) Q Cancer gene discovery R Lists of genes for systems/function/pathway analysis  S Linking to public biological databases (PubMed, NCBI Gene, OMIM, NCBI GEO Profiles, UCSC Genome Browser, Database of Genomic Variants) ABBMC Bioinformatics 2008, 9:422 of user interface zations simultaneously (Figure 4). The left part of theTable 2: Summary of Input, analysis, output for each dimension'Omics classification Assay(s) measured Input Functionality*** OutputGenomics Copy number Array CGH SegmentationDirect thresholdingMoving average-based thresholdingZ-transformation of moving averageWhole genome visualizationRegions of gain and lossGene lists for further analysisHigh-resolution karyogram imagesFrequency histogramsGenomics LOH SNPs* LOH based on consecutive altered markersRegions of LOHGenomics LOH Microsatellite markers Same as above Same as aboveGenomics Copy number, LOH Identify regions of uniparental disomy (UPD): LOH with no copy number changeEpigenomics DNA methylation MeDIP + array CGH Direct thresholdingMoving average-based thresholdingZ-transformation of moving averageRegions of enrichment and lack of methylationGene lists for further analysisEpigenomics DNA methylation Bilsulphite-based Visualization against genome positionThresholding of proportion of methylated CpG'sEpigenomics Histone modification states ChIP-on-chip Direct thresholdingMoving average-based thresholdingZ-transformation of moving averageRegions of enrichment and lack of enrichmentGene lists for further analysisEpigenomics DNA methylation, Histone modification statesEpigenetic interplay Regions of mutually exclusive change between chromatin state and DNA methylationTranscriptomics Gene expression** Microarrays Heatmap visualization, clusteringHistogramsStatistical comparisonsExpression of genes of interested based on DNA analysisTranscriptomics Gene expression** SAGE Heatmap visualization, clusteringHistogramsStatistical comparisonsExpression of genes of interested based on DNA analysisGenomics, TranscriptomicsCopy number, Gene expressionCorrelation analysis of copy number and expressionStatistical comparison of expression in regions of copy number difference (two group analysis)Genes whose expression is strongly regulatd by copy numberp-values for associationsp-values for group comparisonGenomics, Epigenomics Copy number, DNA methylationIdentify regions of concerted change in BOTH copy number and methylation ("two-hit")Identify regions with change in copy number OR DNA methylationGenomics, Epigenomics LOH, DNA methylation Identify allele-specific methylation eventsRegions of allele specific aberrant methylationGenomics, Epigenomics, TranscriptomicsCopy number, LOH, DNA methylation, Histone modification Gene ExpressionIdentify co-ordinate genetic, epigenetic and gene expression changesGenes altered at multiple levels* Affymetrix and Illumina data must be pre-processed prior to import; ** functionality invoked in the context of genetic and epigenetic data analyses; ***aligned to genome features (Database of genomic variants, CpG Islands, microRNAs etc.)Page 5 of 12(page number not for citation purposes)The main user interface in SIGMA2 utilizes a tabbed win-dow-pane which allows the user to open multiple visuali-window manages the analyses and projects which belongto the current user and button shortcuts for the main func-BMC Bioinformatics 2008, 9:422 are spread along the top of the window. Using anexample of an array CGH profile from the Agilent 244Kplatform, we demonstrate the step-wise interrogation of aregion of interest [26]. Briefly, using the highlighting tool-bar button, the user can select a region of interest and sub-sequently, by clicking the right mouse button, the user cansearch for annotated genes within the specified genomiccoordinates.Analysis of data from a single assay typeThe first, and most basic, level of analysis is from a singleassay type. For array CGH, multiple options for segmenta-tion algorithms are available within the program andresults from externally run segmentation can be importedas well. However, each segmentation algorithm has itsadvantages and disadvantages depending on the type ofdata used and the quality of data at hand. A unique featureof SIGMA2 is the ability to take a consensus of multiplealgorithms using "And" or "Or" logic between algorithms.Moreover, a level of consensus can be specified (Figure5A). For example, if an experiment is analyzed using fiveapproaches, the user can select areas of gain and losswhich were detected by one algorithm, at least three algo-rithms, all five algorithms, etc. For LOH, basic analysisusing the number of consecutive markers that exhibittion states or bead-based percentage of CpG island meth-ylation is analyzed by either direct thresholding or z-transform thresholding. For any of the different assaytypes, when examining across a number of samples, a fre-quency of alteration can be calculated and plotted.For data from different array platforms, but assaying thesame biological measurement, the algorithm for integra-tion is used to derive common data. This feature is mostapplicable to DNA copy number data due to the numberof array CGH platforms. This allows for better utilizationof publicly available data and thus, increasing sample sizefor statistical analysis. Similar to the multiple sampleanalysis of data on the sample platform, a frequency ofaltered states can be generated and plotted. Figure 5Ashows the concerted analysis of a sample profiled on theAffymetrix 500K SNP array, Agilent 244K CGH array andthe whole genome tiling path BAC array (Figure 5B).Analysis of data from multiple assays in a given 'omics dimensionWithin a given 'omics dimension, multiple assay types canbe analyzed in combination. For example, it is useful toinvestigate copy number and LOH and the interplaybetween DNA methylation and different states of histoneAlgorithm for integrating between different array platformsFi u e 3Algorithm for integrating between different array platforms. Data for every platform is matched to genomic position. Subsequently, an interval-based approach is used to systematically query data for each interval. In this figure, the interval, k, is 10 kb in size. By converting everything to genomic position, samples sets of the same disease type but on different array plat-forms can be aggregated affording the user with additional statistical power.numSamples <- number of samplesfor chr <- 1: 24k <- 10000chrEnd <- length of chromosomeintervals <- chrEnd % kdata <- array[intervals, numSamples]currentInterval <- 0for pos<-0, pos < chrEnd, pos+=kfor sampleNum <- 1:numSamplesdata[currentInterval,sampleNum] <- data from sample for interval pos and pos+k*endcurrentInterval <- currentInterval + 1endend*if mulple data points exist in the interval, an average is used.  If no data exists, blank is returned.  If it is array CGH data that is segmented, data is assumed to exist for any genomic posion.Page 6 of 12(page number not for citation purposes)LOH is used to determine its status. Affinity-basedapproaches for DNA methylation and histone modifica-modification. Typically, in regions of copy number loss,LOH is also observed. However, LOH can also occur inBMC Bioinformatics 2008, 9:422 which are copy number neutral, indicating achange in allelic status which is not interpretable by onedimension alone. Here, we show a sample for which copynumber and LOH information exists, a region of copynumber loss associated with LOH (Figure 6). In terms ofepigenetics, DNA methylation and states of histone meth-ylation and acetylation have been known to be biologi-cally relevant. With high throughput technologiesCombinatorial analysis of multiple 'omics dimensions – gene dosage and gene expressionThe most common analysis of multiple 'omics dimen-sions is the influence of the genome on the transcriptome.A number of software packages have started to incorpo-rate approaches to examining gene dosage and geneexpression [8,9,27]. In SIGMA2, there are multiple func-tionalities which allow the user to link DNA copy numberDescription of the SIGMA2 user interface using a single sample visualization as an exampleFigu e 4Description of the SIGMA2 user interface using a single sample visualization as an example. (A) Customizable toolbar with shortcut buttons, (B) Project/Analysis tree to track work within and between sessions, (C) Main display area using tab-based navigation, (D) Information console and (E) Genome features tracks. Here, a copy number change is displayed in the context of CpG islands (red), microRNAs (orange) and regions annotated in the database of genomic variants (blue).ABDCESearch for genes, link to databasesPage 7 of 12(page number not for citation purposes)available to assay these dimensions, this type of analysiswill become more gene expression. For a single group of samples, withmatching DNA copy number and gene expression pro-BMC Bioinformatics 2008, 9:422, the user can determine associations through twomain options: a) using a correlation-based approach, cor-relating the log ratios with the normalized gene expres-sion intensities and b) using a statistical-based approachcomparing the expression in samples with copy numberchanges against those without copy number change utiliz-ing the Mann Whitney U test, analogous to approachestaken in previous studies [27]. Spearman, Kendall or Pear-son correlation coefficients can be calculated for optiona). Similarly, this functionality is also available for corre-lating epigenetic profiles and gene expression.In addition to single group analysis, two-dimensionalgenome/transcriptome analysis can be applied to two-group comparison analysis. For example, if patterns ofcopy number alterations are compared between twogroups and a particular region is more frequently gainedin one group than another, the expression data can subse-quently compared between the groups of sample to deter-mine if there is an association between gene dosage andgene expression. That is, we would expect the group withdoes not require both copy number and expression datato exist for the same sample, but allows the user to selectan independent dataset for expression data comparisons(Figure 7).Group comparison analysis – single 'omics dimensionFinally, for two groups of samples, the user can comparethe distribution of changes between two groups to deter-mine if the patterns are statistically different using aFisher's Exact test. For DNA copy number, it is the distri-bution of gain and losses; for DNA methylation or histonemodification states, the proportion of samples that meetthe threshold of enrichment for each group (low or high);and for LOH, proportion of samples with LOH for aregion for each group.Group comparison analysis – integrating multiple 'omics dimensionsThis type of analysis can be performed with a single sam-ple or multiple samples, thus allowing combinatorial("and") analysis for large datasets. In addition, the user(A) Consensus calling using multiple algorithmsFigure 5(A) Consensus calling using multiple algorithms. Multiple algorithms (and different parameters) can be selected to analyze a given array CGH sample and this can be defined for each array platform independently as each platform may have exhibit different noise and ratio response characteristics. (B) Heterogeneous array analy-sis using data from multiple array CGH platforms. Sample from the Agilent 244K, Affymetrix SNP 500K and whole genome BAC array were segmented to define areas of gain and loss. Subsequently, the results were aggregated into a frequency histo-gram plot showing the common areas of gain and loss across the three samples.A BAffymetrix SNP 500KAgilent244K CGHBCCAWGTP 32KPage 8 of 12(page number not for citation purposes)more frequent copy number gain to have higher expres-sion than the other group. Notably, this functionalitycan also identify "or" events, where a change in any of thedimensions can be flagged. This is more important inBMC Bioinformatics 2008, 9:422 datasets as one dimension may not capturecomplex alterations of a particular region.Multi-dimensional analysis of a breast cancer genomeUsing the breast cancer cell line HCC2218, we show theintegration of genomic, epigenomic, and transcriptomicdata. Interestingly, when we examine the ERBB2 gene onchromosome 17, we show concurrent amplification,LOH, loss of methylation and drastic increase in geneexpression (Figure 8). ERBB2 has shown to be an impor-tant gene in breast cancer development and therapeuticpatterns in disease samples where multiple causes canlead to a single effect.Exporting data and resultsHigh resolution images can be exported for all types ofvisualizations in SIGMA2. Histogram plots of gene expres-sion, heatmaps with clustering of gene expression, karyo-gram plots and frequency histogram plots are the maintypes of visualization available. Frequency histogram datawhich is used to generate the plots can also be exported.Integrated plots with data plotted serially or overlaid areParallel visualization and analysis of the copy number and genotype profiles of the breast cancer cell line HCC2218Figure 6Parallel visualization and analysis of the copy number and genotype profiles of the breast cancer cell line HCC2218. Genotype profile of the matching normal blood lymphoblast line (HCC2218BL) is also provided to define regions of LOH. DNA copy number profile was generated on the BCCA whole genome tiling path BAC array and genotype profiles are from the Affymetrix SNP 10K array [28]. This region of chromosome arm 3q has a defined segmental copy number loss and the boundary of the change is evident from the LOH profile. In the genotype profile, the horizontal blue lines indicate a SNP transition from heterozygous in normal to homozygous in the tumor, indicating LOH.LOHCopy NumberHCC2218 HCC2218 HCC2218BLPage 9 of 12(page number not for citation purposes)intervention. This demonstrates the value in integratingmultiple dimensions to understand complex alterationalso available for analysis involving multiple genomicand epigenomic dimensions. Genes which are obtainedBMC Bioinformatics 2008, 9:422 the conjunctive (And) and disjunctive (Or) multi-dimensional analysis can be exported with their status.Results of statistical analysis such as Fisher's exact compar-isons and U-test comparisons of gene expression can beexported against annotate gene lists based on user-speci-fied human genome builds. Currently, April 2003 (hg15),May 2004 (hg17) and March 2006 (hg18) are the availa-ble genome builds [18]. As new builds are released, sup-A two-group two dimensional comparison of 37 NSCLC and 16 SCLC cancer cell linesFigure 7A two-group two dimensional comparison of 37 NSCLC and 16 SCLC cancer cell lines. First, segmentation analysis is performed to delineate gains and losses in each sample. Next, a statistical comparison of the distribution of gains and losses between the two groups is done using the Fisher's exact test. (A) Using the interactive search, one of the regions of difference identified is on chromosome 7, with a NSCLC and SCLC sample aligned next to each other. The NSCLC has a clear segmental gain of that region, with the SCLC not having the gain. The right-most graph is a frequency plot summary of two sample sets (NSCLC and SCLC). NSCLC is color-coded in red while SCLC in green, and the overlap appears in yellow. The frequency of chromosome arm 7p gain is higher in the red group. (B) A heatmap is shown representing 15 NSCLC and 15 SCLC gene expression profiles, of the specific genes in the region highlighted in yellow. (C) When examining gene expression data of EGFR specifically, a gene in this region, we can see that the expression is drastically higher in NSCLC vs. SCLC, as predicted by the higher frequency of gain in NSCLC vs. SCLC of that region. Gene expression data are represented as log2 of the normalized intensities.NSCLC SCLCA B CMulti-dimensional perspective of chromosome 17 of the HCC2218 breast cancer cell lineFigure 8Multi-dimensional perspective of chromosome 17 of the HCC2218 breast cancer cell line. Copy number, LOH, and DNA methylation, and profiling identifies an amplification of ERBB2 coinciding with allelic imbalance and loss of methyla-tion. When examining the gene expression, the expression of HCC2218 is significantly higher than a panel of normal luminal HCC2218 MyoepithelialLuminalDNA Copy Number Allelic imbalance (LOH) DNA MethylationPage 10 of 12(page number not for citation purposes)and myoepithelial cell lines [29].BMC Bioinformatics 2008, 9:422 for those builds will be available. Finally, data frommulti-platform integration can be exported based onbased pair position for additional external analysis if nec-essary.ConclusionWith the increase in high-throughput data covering mul-tiple dimensions of the genome, epigenome and tran-scriptome, the approaches and tools to analyze this datamust advance accordingly to handle, analyze and inter-pret this data in an integrated manner. SIGMA2 meetsthese requirements and provides the framework for theincorporation of data from future approaches and tech-nologies. Specifically, with the movement from array tosequence-based technologies, the ability to assimilatesequence data with the various 'omics data sets willbecome a future requirement of software packages.Availability and requirementsProject name: SIGMA2Operating system(s): Java SE V.1.6+, R Project V.2.5+,Windows XP or VistaLicense: Free for academic and research use; commercialusers please contactAuthors' contributionsRC designed and developed the software and wrote themanuscript. BPC contributed to the design and develop-ment of the software. CW and MB contributed signifi-cantly to software development. IMW and EAVcontributed to beta testing and ideas for refinement ofsoftware. CM and RTN contributed concepts for imple-mentation of data integration and statistical analyses.WLL is the principle investigator of this study. All authorscontributed to the critical reading and editing of the man-uscript.AcknowledgementsWe thank William W. Lockwood and Timon P.H. Buys for useful discussion and critical reading of manuscript, Ashleen Shadeo for providing data for breast cancer samples, and Anna Chu, Byron Cline, Devon Macey, Andrew Thomson, Lan Wei, Reginald Sacdalan, Tiffany Chao, and Laura Aslan for help with software development. This work was supported by funds from Canadian Institutes for Health Research (CIHR), NIH (NIDCR) R01 DE15965, and Genome Canada/British Columbia. RC and IMW were sup-ported by scholarships from CIHR and the Michael Smith Foundation for Health Research.References1. Garnis C, Buys TP, Lam WL: Genetic alteration and geneexpression modulation during cancer progression.  Mol Cancer2004, 3:9.2. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP,Snijders A, Albertson DG, Pinkel D, Marra MA, et al.: A tiling reso-3. Khulan B, Thompson RF, Ye K, Fazzari MJ, Suzuki M, Stasiek E,Figueroa ME, Glass JL, Chen Q, Montagna C, et al.: Comparativeisoschizomer profiling of cytosine methylation: the HELPassay.  Genome Res 2006, 16(8):1046-1055.4. Lockwood WW, Chari R, Chi B, Lam WL: Recent advances inarray comparative genomic hybridization technologies andtheir applications in human genetics.  Eur J Hum Genet 2006,14(2):139-148.5. Rauch T, Li H, Wu X, Pfeifer GP: MIRA-assisted microarray anal-ysis, a new technology for the determination of DNA meth-ylation patterns, identifies frequent methylation ofhomeodomain-containing genes in lung cancer cells.  CancerRes 2006, 66(16):7939-7947.6. Weber M, Davies JJ, Wittig D, Oakeley EJ, Haase M, Lam WL, Schube-ler D: Chromosome-wide and promoter-specific analysesidentify sites of differential DNA methylation in normal andtransformed human cells.  Nat Genet 2005, 37(8):853-862.7. Chari R, Lockwood WW, Coe BP, Chu A, Macey D, Thomson A,Davies JJ, MacAulay C, Lam WL: SIGMA: a system for integrativegenomic microarray analysis of cancer genomes.  BMC Genom-ics 2006, 7:324.8. Conde L, Montaner D, Burguet-Castell J, Tarraga J, Medina I, Al-Shahrour F, Dopazo J: ISACGH: a web-based environment forthe analysis of Array CGH and gene expression whichincludes functional profiling.  Nucleic Acids Res 2007:W81-85.9. La Rosa P, Viara E, Hupe P, Pierron G, Liva S, Neuvial P, Brito I, LairS, Servant N, Robine N, et al.: VAMP: visualization and analysisof array-CGH, transcriptome and other molecular profiles.Bioinformatics 2006, 22(17):2066-2073.10. Chi B, deLeeuw RJ, Coe BP, Ng RT, MacAulay C, Lam WL: MD-SeeGH: a platform for integrative analysis of multi-dimen-sional genomic data.  BMC Bioinformatics 2008, 9:243.11. Carrasco DR, Tonon G, Huang Y, Zhang Y, Sinha R, Feng B, StewartJP, Zhan F, Khatry D, Protopopova M, et al.: High-resolutiongenomic profiles define distinct clinico-pathogenetic sub-groups of multiple myeloma patients.  Cancer Cell 2006,9(4):313-325.12. Chin K, DeVries S, Fridlyand J, Spellman PT, Roydasgupta R, Kuo WL,Lapuk A, Neve RM, Qian Z, Ryder T, et al.: Genomic and tran-scriptional aberrations linked to breast cancer pathophysiol-ogies.  Cancer Cell 2006, 10(6):529-541.13. Coe BP, Lockwood WW, Girard L, Chari R, Macaulay C, Lam S,Gazdar AF, Minna JD, Lam WL: Differential disruption of cellcycle pathways in small cell and non-small cell lung cancer.Br J Cancer 2006, 94(12):1927-1935.14. Lockwood WW, Chari R, Coe BP, Girard L, Macaulay C, Lam S,Gazdar AF, Minna JD, Lam WL: DNA amplification is a ubiqui-tous mechanism of oncogene activation in lung and othercancers.  Oncogene 2008.15. Neve RM, Chin K, Fridlyand J, Yeh J, Baehner FL, Fevr T, Clark L, Bay-ani N, Coppe JP, Tong F, et al.: A collection of breast cancer celllines for the study of functionally distinct cancer subtypes.Cancer Cell 2006, 10(6):515-527.16. Stransky N, Vallot C, Reyal F, Bernard-Pierrot I, de Medina SG, Seg-raves R, de Rycke Y, Elvin P, Cassidy A, Spraggon C, et al.: Regionalcopy number-independent deregulation of transcription incancer.  Nat Genet 2006, 38(12):1386-1396.17. Sanders MA, Verhaak RG, Geertsma-Kleinekoort WM, Abbas S,Horsman S, Spek PJ van der, Lowenberg B, Valk PJ: SNPExpress:integrated visualization of genome-wide genotypes, copynumbers and gene expression levels.  BMC Genomics 2008, 9:41.18. Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, DiekhansM, Giardine B, Harte RA, Hinrichs AS, Hsu F, et al.: The UCSCGenome Browser Database: 2008 update.  Nucleic Acids Res2008:D773-779.19. Khojasteh M, Lam WL, Ward RK, MacAulay C: A stepwise frame-work for the normalization of array CGH data.  BMC Bioinfor-matics 2005, 6:274.20. Neuvial P, Hupe P, Brito I, Liva S, Manie E, Brennetot C, Radvanyi F,Aurias A, Barillot E: Spatial normalization of array-CGH data.BMC Bioinformatics 2006, 7:264.21. Hupe P, Stransky N, Thiery JP, Radvanyi F, Barillot E: Analysis ofarray CGH data: from signal ratio to gain and loss of DNAregions.  Bioinformatics 2004, 20(18):3413-3422.Page 11 of 12(page number not for citation purposes)lution DNA microarray with complete coverage of thehuman genome.  Nat Genet 2004, 36(3):299-303.Publish with BioMed Central   and  every scientist can read your work free of charge"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."Sir Paul Nurse, Cancer Research UKYour research papers will be:available free of charge to the entire biomedical communitypeer reviewed and published immediately upon acceptancecited in PubMed and archived on PubMed Central BMC Bioinformatics 2008, 9:422 Venkatraman ES, Olshen AB: A faster circular binary segmenta-tion algorithm for the analysis of array CGH data.  Bioinformat-ics 2007, 23(6):657-663.23. Nannya Y, Sanada M, Nakazaki K, Hosoya N, Wang L, Hangaishi A,Kurokawa M, Chiba S, Bailey DK, Kennedy GC, et al.: A robust algo-rithm for copy number detection using high-density oligonu-cleotide single nucleotide polymorphism genotyping arrays.Cancer Res 2005, 65(14):6071-6079.24. Ballestar E, Paz MF, Valle L, Wei S, Fraga MF, Espada J, Cigudosa JC,Huang TH, Esteller M: Methyl-CpG binding proteins identifynovel sites of epigenetic inactivation in human cancer.  EmboJ 2003, 22(23):6335-6345.25. Bibikova M, Lin Z, Zhou L, Chudin E, Garcia EW, Wu B, Doucet D,Thomas NJ, Wang Y, Vollmer E, et al.: High-throughput DNAmethylation profiling using universal bead arrays.  Genome Res2006, 16(3):383-393.26. Coe BP, Ylstra B, Carvalho B, Meijer GA, Macaulay C, Lam WL:Resolving the resolution of array CGH.  Genomics 2007,89(5):647-653.27. van Wieringen WN, Belien JA, Vosse SJ, Achame EM, Ylstra B: ACE-it: a tool for genome-wide integration of gene dosage andRNA expression data.  Bioinformatics 2006, 22(15):1919-1920.28. Zhao X, Li C, Paez JG, Chin K, Janne PA, Chen TH, Girard L, MinnaJ, Christiani D, Leo C, et al.: An integrated view of copy numberand allelic alterations in the cancer genome using singlenucleotide polymorphism arrays.  Cancer Res 2004,64(9):3060-3071.29. Grigoriadis A, Mackay A, Reis-Filho JS, Steele D, Iseli C, Stevenson BJ,Jongeneel CV, Valgeirsson H, Fenwick K, Iravani M, et al.: Establish-ment of the epithelial-specific transcriptome of normal andmalignant human breast cells based on MPSS and arrayexpression data.  Breast Cancer Res 2006, 8(5):R56.yours — you keep the copyrightSubmit your manuscript here: 12 of 12(page number not for citation purposes)


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items