UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Genomic studies of the normal and malignant neural crest Morozova, Olena 2012

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2012_fall_morozova_olena.pdf [ 7.21MB ]
Metadata
JSON: 24-1.0072833.json
JSON-LD: 24-1.0072833-ld.json
RDF/XML (Pretty): 24-1.0072833-rdf.xml
RDF/JSON: 24-1.0072833-rdf.json
Turtle: 24-1.0072833-turtle.txt
N-Triples: 24-1.0072833-rdf-ntriples.txt
Original Record: 24-1.0072833-source.json
Full Text
24-1.0072833-fulltext.txt
Citation
24-1.0072833.ris

Full Text

GENOMIC STUDIES OF THE NORMAL AND MALIGNANT NEURAL CREST  by  Olena Morozova  B.Sc. (Hons), University of Toronto, 2006  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY  in  THE FACULTY OF GRADUATE STUDIES  (Bioinformatics)  THE UNIVERSITY OF BRITISH COLUMBIA  (Vancouver)     June 2012  © Olena Morozova, 2012  ii Abstract Neuroblastoma (NBL) is an enigmatic pediatric tumor of the sympathetic nervous system that is lethal in most children diagnosed over 18 months of age with metastatic disease. NBL is thought to originate from a differentiation arrest of the neural crest, a vertebrate-specific cell lineage with one of the most diverse developmental potentials. Genomic studies of NBL have contributed to the development of new diagnostic and prognostic markers. In addition, somatic and germline mutations in the ALK oncogene have been identified and are being targeted clinically. Based on this prior work, two hypotheses were developed and addressed in this thesis: (1) characterization of NBL with higher resolution genomic technologies will lead to the identification of novel loci that contribute to the disease and (2) analysis of the transcriptome of normal neural crest cells will help identify loci of relevance to NBL. To address these hypotheses I used several datasets generated from microarrays as well as RNA and DNA sequencing experiments. Two key results have emerged from this analysis including the putative role of the BRCA1/BARD1 pathway in the development of NBL, and the heterogeneity of the genetic landscape of primary NBL tumors. Potential translational avenues for the results reported in this thesis are the exploration of AURKB and MAPK inhibitors as treatment agents for NBL.  iii Preface Portions of Chapter 1 have been published: O. Morozova, M. Hirst, M.A. Marra. Applications of new sequencing technologies for transcriptome analysis. Annu. Rev. Genomics Hum. Genet. 10:135-51, 2009. Copyright by Annual Reviews; O. Morozova and M.A. Marra. Applications of next-generation sequencing technologies in functional genomics. Genomics 92(5), 2008. Copyright by Elsevier; O. Morozova and M.A. Marra. From cytogenetics to next-generation sequencing technologies: advances in the detection of genome rearrangements in tumors. Biochem. Cell Biol. 86(2):81-91, 2008. Copyright by Canadian Science Publishing. I have written most of the text for these review manuscripts with guidance and input from my supervisor, M.A. Marra, and the co-author, M. Hirst. Portions of Chapter 2 have been pubished in three manuscripts: H. Jinno, O. Morozova, K.L. Jones, J.A. Biernaskie, M. Paris, R. Hosokawa, M.A. Rudnicki, Y. Chai, F. Rossi, M.A. Marra, F.D. Miller. Convergent genesis of an adult neural crest-like dermal stem cell from distinct developmental origins. Stem Cells 28(11):2027-40, 2010. Copyright by AlphaMed Press; M.D. O‘Connor, E. Wederell, G. Robertson, A. Delaney, O. Morozova, S.S. Poon, D. Yap, J. Fee, Y. Zhao, H. McDonald, T. Zeng, M. Hirst, M.A. Marra, S.A. Aparicio, C.J. Eaves. Retinoblastoma-binding proteins 4 and 9 are important for human pluripotent stem cell maintenance. Exp. Hematol. 39(8):866-79.e1; 2011. Copyright by Elsevier; O. Morozova, V. Morozov, B.G. Hoffman, C.D. Helgason, M.A. Marra. A seriation approach for visualization-driven discovery of co-expression patterns in Serial Analysis of Gene Expression (SAGE) data. PLoS ONE 3(9):e3205, 2008. The author contributions for each manuscript are provided below on a per-manuscript basis.   Sections 2.2.1, 2.2.2, 2.4.1, 2.4.2, 2.4.3, 2.4.4; Figures 2.1, 2.2, 2.3, 2.4, and Table 2.1 are based on the manuscript: H. Jinno, O. Morozova, K.L. Jones, J.A. Biernaskie, M. Paris, R. Hosokawa, M.A. Rudnicki, Y. Chai, F. Rossi, M.A. Marra, F.D. Miller. Convergent genesis of an adult neural crest-like dermal stem cell from distinct developmental origins. Stem Cells 28(11):2027-40, 2010. Copyright by AlphaMed Press. H.J., K.L.J., J.A.B., M.P., and F.D.M were involved in the conception and design of the study. H.J., K.L.J. and J.A.B. performed the collection and analysis of experimental data, including the RT PCR experiments described in Section 2.2.2. R.H., M.A.R., Y.C. and F.R. provided study material. F.D.M. provided financial support and supervised the study. M.A.M. participated in data  iv interpretation, and provided supervisory support, including manuscript approval, for the computational part of the study (microarray data analysis). I performed all the microarray data analysis, made the figures, interpreted the results, and wrote the sections of the manuscript reproduced in this thesis, except as specified below. The RT PCR panels in Figures 2.3A and B were made by members of F.D.M.‘s laboratory.  The description of the RT PCR method in Section 2.4.4 was written by members of F.D.M.‘s laboratory. All animal use was approved by the Animal Care Committee for the Hospital for Sick Children in accordance with the Canadian Council of Animal Care policies. Sections 2.2.5, 2.2.5.1, 2.2.5.2, 2.4.6, 2.4.8; Figures 2.6, and 2.7B, and Table 2.3 are based on the manuscript: M.D. O‘Connor, E. Wederell, G. Robertson, A. Delaney, O. Morozova, S.S. Poon, D. Yap, J. Fee, Y. Zhao, H. McDonald, T. Zeng, M. Hirst, M.A. Marra, S.A. Aparicio, C.J. Eaves. Retinoblastoma-binding proteins 4 and 9 are important for human pluripotent stem cell maintenance. Exp. Hematol. 39(8):866-79.e1; 2011. Copyright by Elsevier. The author contributions for the sections of the manuscript described in the thesis are provided below. Members of the Eaves laboratory and their colleagues made the SAGE libraries described in Table 2.3, and defined the 319 candidate pluripotency genes listed in Appendix B. G.R. performed the PASTAA motif enrichment analysis discussed in Section 2.2.5.2, participated in wrting Sections 2.2.5.2 and 2.4.8 and made Figure 2.7B. M.A.M. provided supervisory support for the seriation component of the study. I designed and performed the seriation analysis of ESC SAGE libraries, made figures and tables and wrote the sections of the manuscript reproduced in this thesis, except as defined above. Sections 2.2.5.1 and 2.4.7 are based on the manuscript: O. Morozova, V. Morozov, B.G. Hoffman, C.D. Helgason, M.A. Marra. A seriation approach for visualization-driven discovery of co-expression patterns in Serial Analysis of Gene Expression (SAGE) data. PLoS ONE 3(9):e3205, 2008. O.M. and M.A.M. conceived and designed the study, and co- wrote the manuscript with input from all co-authors. V.M. developed and implemented the seriation algorithm in Matlab. B.G.H. and C.D.H. constructed pancreatic SAGE libraries and provided guidance for biological interpretation of the results (the description and analysis of pancreatic SAGE libraries is not included in this thesis). M.A.M. supervised the study. I adopted the seriation algorithm to the analysis of SAGE data, performed the analysis, interpreted the results and wrote the manuscript with input from all co-authors, including all portions of the manuscript reproduced in this thesis.  v A version of Chapter 3 has been published: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin. Cancer Res. 16(18):4572-82, 2010. Copyright by the American Association for Cancer Research. M.V. performed the protein experiments, including mass spectrometry (Section 3.2.3) and Western Blot (Section 3.2.5), and participated in writing the relevant sections of the manuscript (Sections 3.2.3, 3.4.4, 3.4.6). P.T., T.K., and M.F.M. provided technical and supervisory assistance with the mass spectrometry facility, and approved the manuscript. N.G. performed the drug inhibitor studies, made panel C of Figure 3.3 and participated in writing Section 3.4.5. L.M.H. isolated and cultured NBL TIC lines, provided materials for sequencing, and participated in writing Section 3.4.1 (describing the culturing of NBL TICs and SKPs). K.M.B. performed shRNA experiments (Section 3.2.5), generated data for panel B of Figure 3.3, and participated in writing the relevant sections of the manuscript (Sections 3.2.5 and 3.4.7). J.M. provided supervisory support to K.M.B. and approved the manuscript. A.M., T.C., R.D.M., N.T., R.V., and S.J. provided bioinformatic assistance with processing RNA-Seq data and approved the manuscript. M.H., R.M., and Y.Z. provided technical assistance with library construction and RNA sequencing of NBL TIC and SKP libraries and approved the manuscript. K.M.S. provided technical assistance to the Toronto group. F.M. provided SKP lines for the study. D.R.K. provided project leadership and financial support to the Toronto component of the study, and approved the manuscript. M.A.M. provided supervisory and financial support, participated in the study design, and approved the manuscript. I participated in the study design, conceived and performed all the computational analyses detailed in Sections 3.2.1, 3.2.2, 3.2.4, 3.2.6, and 3.2.7), interpreted the data, made the figures and tables (except as described above), and wrote the manuscript with input from all co-authors. A version of Chapter 4 is in revision: T.J. Pugh*, O. Morozova*, E.F. Attiyeh, S. Asgharzadeh, J.S. Wei, D. Auclair, K. Cibulskis, M.S. Lawrence, A.H. Ramos, E. Shefler, A. Sivachenko, C. Sougnez, I. Birol, R.D. Corbett, K.L. Mungall, Y. Zhao, R.A. Moore, N. Thiessen, A. Lo, R. Chiu, S.D. Jackman, A. Ally, B. Kamoh, A. Tam, J. Qian, M. Krzywinski, M. Hirst, S.J. Diskin, Y.P. Mosse, K.A. Cole, M. Diamond, R. Sposto, L. Ji, T.  vi Badgett, W.B. London, Y. Moyer, J.M. Gastier-Foster, M.A. Smith, J.M. Guidry Auvil, D.S. Gerhard, M.D. Hogarty, S.B. Gabriel, S.J.M. Jones, G. Getz, R.C. Seeger, J. Khan, M.A. Marra, M. Meyerson,  J.M. Maris. The genomic landscape of high-risk neuroblastomas reveals a wide spectrum of somatic mutation. In Revision. *Authors contributed equally. J.M.M., J.K., R.C.S., D.S.G. and M.A.S. conceived and led the project. M.A.M. and M.M. conceived and supervised all aspects of the sequencing work at the Genome Sciences Centre and Broad Institute, respectively. E.F.A., S.A., J.S.W., S.J.D., Y.P.M, K.A.C., L.J, T.B., Y.M., J.G-F and M.H. selected and characterized samples, provided disease-specific expertise in data analysis, and edited the manuscript. R.S. and W.L. provided statistical support. D.A, E.S., C.S., M.D., and J.M.G.A. provided overall project management and quality control support. K.C., M.S.L., A.H.R., and A.S. supported analysis of exome sequencing data. I.B., K.L.M., R.C., S.J., and J.Q. performed de novo assembly of Illumina sequencing data. Y.Z. led the library construction effort for the Illumina libraries. A.T. and Y.Z. planned the sequencing verification, and A.A. and B.K. performed the experiments. R.D.C. performed copy number analysis of genome sequencing data. M.K. performed verification of candidate rearrangements. N.T. ran the gene- and exon-level quantification pipeline on the RNA-Seq data. A.L. helped interpret data provided by Complete Genomics, Inc. R.A.M. and M.H. led the sequencing effort for the Illumina genome and transcriptome libraries. S.B.G. led the sequencing effort for the exome sequencing libraries. G.G. and S.J.M.J. supervised the bioinformatics group at the Broad Institute and Genome Sciences Centre, respectively. T.J.P. performed the mutation analysis of the exome sequencing data and the MutSig analysis. I performed the mutation analysis of genome and transcriptome sequencing data, and conducted integrative analysis of these data by combining mutation analysis, copy number analysis and de novo assembly results. Together with T.J.P., I combined the exome, genome, and transcriptome data from the different sequencing platforms, and interpreted the results. In concert with T.J.P, D.J.G., M.A.M., M.M. and J.M.M, I co-wrote the manuscript with input from all co-authors. I made all the figures and tables in Chapter 4, except Figure 4.1, Figure 4.2, Table 4.1 and Table 4.2, which were modified from Trevor Pugh‘s work.  vii Table of Contents Abstract .................................................................................................................................... ii Preface ..................................................................................................................................... iii Table of Contents .................................................................................................................. vii List of Tables .......................................................................................................................... xi List of Abbreviations ........................................................................................................... xiii Acknowledgements .............................................................................................................. xiv Dedication ............................................................................................................................. xvi Chapter 1: Evolving methods of genomic analysis and their application to the study of neuroblastoma ......................................................................................................................... 1 1.1 Introduction ........................................................................................................................... 1 1.2 Cancer as a genetic disease ................................................................................................... 1 1.3 Cancer as a multigenic disease .............................................................................................. 2 1.4 Origin of genetic mutations in cancers.................................................................................. 3 1.4.1 Familial cancers and cancer syndromes............................................................................ 3 1.4.2 Genetic causes of sporadic cancers ................................................................................... 4 1.5 Cancer stem cell hypothesis .............................................................................................. 5 1.6 Genetic lesions in cancers and methods for their detection .................................................. 6 1.6.1 Pre-genomic methods for studying genetic aberrations in cancers ................................... 6 1.6.2 Array-based methods for the detection of genetic lesions in cancer genomes ................. 8 1.6.3 Sequencing approaches for the detection of genetic lesions in cancers ............................ 9 1.6.3.1 Advances in DNA sequencing technologies ............................................................ 9 1.6.3.2 Sanger-based sequencing methods for the detection of genetic lesions ................. 13 1.6.3.3 Cancer sequencing studies using the Sanger technology ....................................... 16 1.6.3.4 Cancer genome and exome sequencing using new sequencing technologies ........ 18 1.7 Cancer transcriptomes as proxies for the genomic diversity of tumors .............................. 20 1.7.1 Transcriptome analysis of cancers using microarrays .................................................... 20 1.7.2 Sequence census approaches to transcriptome analysis.................................................. 22 1.7.2.1 Whole transcriptome sequencing of cancers .......................................................... 23 1.8 Integrative genomics of cancers .......................................................................................... 25 1.9 Childhood neuroblastoma ................................................................................................... 26  viii 1.9.1 Classification, treatment and prognosis .......................................................................... 27 1.9.2 Neuroblastoma genetics and genomics ........................................................................... 29 1.9.2.1 Copy number aberrations ....................................................................................... 30 1.9.2.2 Gene expression profiling of neuroblastoma ......................................................... 31 1.9.2.3 Genetically engineered mouse models of neuroblastoma ...................................... 32 1.10 Thesis roadmap and chapter summaries ............................................................................. 32 Chapter 2: Transcriptome analysis of normal neural crest stem cells  ........................... 40 2.1 Introduction ......................................................................................................................... 40 2.2 Results ................................................................................................................................. 43 2.2.1 SKPs of distinct developmental origin are highly similar at the transcriptional level and differ from bone marrow mesenchymal stem cells (MSCs) ........................................................ 43 2.2.2 SKPs of distinct developmental origin maintain a lineage history at the gene expression level….. ....................................................................................................................................... 44 2.2.3 Identification of genes significantly enriched and depleted in neural crest stem cell-like cells…… ...................................................................................................................................... 45 2.2.4 Pathway analysis of SKP-enriched and SKP-depleted transcripts ................................. 46 2.2.5 SKPs share expression profile similarities with ES cells ............................................... 48 2.2.5.1 Identification of genes associated with the maintenance of the undifferentiated state in human ES cells ............................................................................................................ 48 2.2.5.2 Validation of pluripotency markers using computational methods ....................... 49 2.2.5.3 Pluripotency genes whose transcripts are enriched or depleted in normal neural crest stem cell-like cells compared to mesenchymal stem cells .............................................. 51 2.3 Discussion ........................................................................................................................... 52 2.4 Materials and methods ........................................................................................................ 54 2.4.1 Microarray analysis of rat SKP lines .............................................................................. 54 2.4.2 Unsupervised analysis to assess global transcriptome similarity ................................... 55 2.4.3 Differential expression analysis using microarrays ........................................................ 56 2.4.4 Reverse Transcription Polymerase Chain Reaction (RT-PCR) to confirm results from SKP microarray analysis .............................................................................................................. 56 2.4.5 Pathway analysis of transcripts enriched and depleted in SKPs compared to MSCs ..... 57 2.4.6 Seriation analysis of LongSAGE libraries from the Cancer Genome Anatomy Project 58 2.4.7 Seriation using the progressive construction of contigs heuristic ................................... 58 2.4.8 Computational validation of transcripts in Supercontig 1 as pluripotency markers ........... 60  ix Chapter 3: Transcriptome analysis of neuroblastoma tumor-initiating cells for therapeutic target prediction ............................................................................................... 93 3.1 Introduction ......................................................................................................................... 93 3.2 Results ................................................................................................................................. 94 3.2.1 Identification of genes preferentially enriched or depleted in NBL TICs compared to a compendium of cancer tissues and SKPs .................................................................................... 94 3.2.2 Elevated mRNA levels of BRCA1 signaling pathway members are associated with the NBL TIC phenotype .................................................................................................................... 96 3.2.3 MudPIT analysis confirms the abundance of DNA repair proteins in the proteome of a NBL TIC line ............................................................................................................................... 97 3.2.4 Known drug targets among NBL TIC-enriched transcripts ............................................ 98 3.2.5 Targeting BRCA1 signaling: inhibition of AURKB is selectively cytotoxic to NBL TICs….. ....................................................................................................................................... 99 3.2.6 Exon-level expression analysis of BARD1 reveals a potential mechanism for the sensitivity of NBL TICs to AURKB inhibition ......................................................................... 100 3.2.7 Relevance to primary neuroblastoma ........................................................................... 102 3.3 Discussion ......................................................................................................................... 104 3.4 Materials and methods ...................................................................................................... 107 3.4.1 RNA sequencing and data analysis ............................................................................... 107 3.4.2 Microarray experiments and data analysis.................................................................... 108 3.4.3 Identification of NBL TIC-enriched and depleted genes and the functional enrichment analysis. ............................................................................................. …………………………108 3.4.4 Gel-free two-dimensional liquid chromatography coupled to shotgun tandem mass spectrometry .............................................................................................................................. 109 3.4.5 AlamarBlue assay ......................................................................................................... 110 3.4.6 Western blotting ........................................................................................................... 110 3.4.7 Small hairpin RNA (shRNA) knockdowns .................................................................. 111 3.4.8 Exon-level analysis of RNA sequencing data ............................................................... 111 3.4.9 AURKB expression analysis ........................................................................................ 112 Chapter 4: Whole genome characterization of primary neuroblastoma tumors reveals a wide spectrum of somatic alteration ................................................................................. 137 4.1 Introduction ....................................................................................................................... 137 4.2 Results ............................................................................................................................... 138 4.2.1 Exome sequencing ........................................................................................................ 138  x 4.2.2 Whole genome and transcriptome sequencing ............................................................. 139 4.2.3 Overall mutation frequencies ........................................................................................ 140 4.2.4 Verification of candidate somatic mutations using orthogonal approaches ................. 141 4.2.5 Genes and pathways with significant frequency of mutation ....................................... 142 4.2.6 Genome rearrangements and structural variants ........................................................... 144 4.2.7 Mutations in other known cancer genes and regions .................................................... 145 4.3 Discussion ......................................................................................................................... 147 4.4 Materials and methods ...................................................................................................... 149 4.4.1 Sample selection and preparation ................................................................................. 149 4.4.2 Illumina library construction and sequencing ............................................................... 149 4.4.3 Detection of candidate somatic mutations in genome sequencing data ........................ 149 4.4.4 Gene coverage in transcriptome sequencing data ......................................................... 150 4.4.5 Copy number analysis using genome sequencing data ................................................. 151 4.4.6 Rearrangement detection .............................................................................................. 151 4.4.7 Exome sequencing and data analysis ............................................................................ 153 4.4.8 Integrated analysis of somatic variation from exome and genome data sets ................ 153 Chapter 5: Conclusions and future directions ................................................................. 178 5.1 Transcriptome analysis of normal neural crest cells identifies key pathways, enriched and depleted in this population compared to other related cell types ................................................... 178 5.2 Plasticity of the neural crest stem cell phenotype and NBL heterogeneity ....................... 180 5.3 Transcriptome analysis of NBL tumor-initiating cells implicates AURKB as a novel .... 181 drug target for NBL ........................................................................................................................ 181 5.4 Whole genome, transcriptome and exome sequencing of primary NBL tumors reveals a broad spectrum of somatic mutations ............................................................................................ 183 5.5 Future directions in NBL genomics .................................................................................. 184 Bibliography ........................................................................................................................ 184 Appendices ........................................................................................................................... 222 Appendix A Transcripts enriched and depleted in SKPs as discussed in Chapter 2 ................ 222 Appendix B Candidate pluripotency genes used for seriation analysis in Chapter 2 ............... 255 Appendix C Transcripts enriched and depleted in NBL TICs.................................................. 263 Appendix D Original data for the 99 NBL cases described in Chapter 4 ................................. 290 Appendix E Variant calls detected in the 99 tumor/normal pairs ............................................ 296 Appendix F       Chromatin remodeling and MAPK pathway gene lists used in Chapter 4 ........... 297  xi List of Tables Table 1.1 Specifications of the common next-generation sequencing platforms as compared to the most common Sanger sequencer (Life Technologies’ ABI3730XL) ........................... 39 Table 2.1 Genes with significant evidence of differential expression between (A) fSKPs and dSKPs, and (B) dSKPs and vSKPs as shown in Figure 2.3B ................................................. 82 Table 2.2 Pathways enriched among the transcripts increased or decreased in abundance in SKPs compared to MSCs ........................................................................................................ 86 Table 2.3 LongSAGE libraries used for the seriation analysis described in Section 2.2.4 .... 90 Table 2.4 Pluripotency genes with transcript abundance increased or decreased in SKPs compared to MSCs .................................................................................................................. 91 Table 3.1 Human NBL TIC and SKP lines used for gene expression analysis .................... 127 Table 3.2 List of RNA sequencing libraries and their sequencing statistics ........................ 129 Table 3.3 Proteins detected in the whole and crude membrane cell extract of NBL TIC line NB88R and their corresponding RNA-Seq expression level ................................................ 132 Table 3.4 Known drug targets among NBL TIC-enriched genes ......................................... 135 Table 4.1 Non-silent mutations in genes of interest along with their validation status ........ 166 Table 4.2 Genes with significant frequency of somatic mutation ........................................ 171 Table 4.3 Notable structural variants detected and confirmed in NBL genomes and transcriptomes ....................................................................................................................... 172 Table 4.4 Parameters used to select high confidence candidate somatic mutations reported by CGI ........................................................................................................................................ 175 Table 4.5 Primer sequences used for genomic validation of structural variants and gene fusions detected by BCCA pipeline ...................................................................................... 176 Table 4.6 Primer sequences used for tumor RNA validation of structural variants and gene fusions detected by the BCCA pipeline ................................................................................ 177 Table A.1 Transcripts enriched and depleted in SKPs as discussed in Chapter 2 ................ 222 Table B.1 Candidate pluripotency genes used for seriation analysis in Chapter 2 ............... 255 Table C.1 Transcripts enriched and depleted in NBL TICs .................................................. 263 Table D.1 Original data for the 99 NBL cases described in Chapter 4 ................................ 290 Table F.1 Chromatin remodeling and MAPK pathway gene lists used in Chapter 4 ........... 297   xii List of Figures  Figure 1.1 Advances in sequencing chemistry implemented in the earliest next-generation sequencers ............................................................................................................................... 35 Figure 1.2 Transcript model coverage by various sequencing-based methods for transcriptome analysis ............................................................................................................. 37 Figure 2.1 Global expression patterns are similar across SKPs of distinct development origins ................................................................................................................................................. 61 Figure 2.2 Facial and dorsal trunk SKP lineages show similar degrees of divergence from MSCs....................................................................................................................................... 64 Figure 2.3 SKPs of distinct developmental origin express neural crest specification genes despite maintaining a lineage history at the gene expression level ........................................ 67 Figure 2.4 Transcripts preferentially enriched or depleted in SKPs compared to MSCs ....... 70 Figure 2.5 Pathway analysis of transcripts enriched and depleted in SKPs compared to MSCs ................................................................................................................................................. 72 Figure 2.6 Seriation analysis to identify developmentally restricted transcripts expressed in undifferentiated ES cells ......................................................................................................... 77 Figure 2.7 Computational validation of genes identified by seriation as pluripotency markers ................................................................................................................................................. 79 Figure 3.1 Transcripts enriched and depleted in NBL TICs compared with SKPs and other tumor tissues ......................................................................................................................... 113 Figure 3.2 Pathway analysis of NBL TIC-enriched transcripts ............................................ 116 Figure 3.3 NBL TICs are sensitive to Aurora B kinase inhibition ....................................... 120 Figure 3.5 NBL cells preferentially express the oncogenic BARD1beta isoform that is involved in the stabilization of AURKB ............................................................................... 124 Figure 4.1 Overview of the multi-centre next-generation sequencing initiatives and data analyses ................................................................................................................................. 155 Figure 4.2 Somatic mutation frequencies in 99 NBL tumor/normal pairs with samples ordered by type of genes with somatic alteration ................................................................. 156 Figure 4.3 Integrated analysis of 99 neuroblastoma cases reveals a diversity of somatic aberration .............................................................................................................................. 160  xiii List of Abbreviations  BP  Base Pair CGI  Complete Genomics, Inc. COG  Children‘s Oncology Group CNV  Copy Number Variant ESC  Embryonic Stem Cell ES   Embryonic Stem ESP  End-Sequence Profiling GB  Giga Base GWAS Genome-Wide Association Study GWA  Genome-Wide Association ICGC  International Cancer Genome Consortium KB  Kilo Base MB  Mega Base NBL  Neuroblastoma RNA-Seq RNA Sequencing RPKM  Reads Per Kilobase of Gene Model per Million Mapped Reads SAGE  Serial Analysis of Gene Expression SEER  Surveilance Epidemiology and End Results SI  Splice Index SKP  Skin-Derived Precursor Cell SNP  Single Nucleotide Polymorphism SNV  Single Nucleotide Variant TARGET Therapeutically Applicable Research to Generate Effective Treatments TIC  Tumor Initiating Cell TCGA  The Cancer Genome Atlas MSC  Mesenchymal Stem Cell MPSS  Massively Parallel Signature Sequencing NCI  National Cancer Institute   xiv Acknowledgements Over the course of my PhD I have been honored to learn from many talented scientists, clinicians, professionals, and members of the general public. To these individuals, only some of whom could be personally mentioned here due to space constraints, I am indebted for the success in my endeavors and my continued enthusiasm in scientific research. First and foremost, I could never overstate my gratitude to my PhD supervisor, Dr. Marco Marra, who has become a role model of excellence in science, leadership and personal integrity. He has supported me throughout my PhD scientifically, financially and emotionally, and provided me with numerous invaluable learning opportunities both in science and in life. I simply could not have wished for a better supervisor. I would like to express my deepest gratitude to my thesis supervisory committee, Drs. Angela Brooks- Wilson, Paul Pavlidis, and Samuel Aparicio who have challenged me with insightful questions and discussions that had a great impact on my scientific growth. I am also grateful to the members of my examining examiners, Drs. Phil Hieter, Poul Sorensen, Lynn Raymond, and Annie Huang for their detailed reading of my thesis and thoughtful comments and questions that have greatly enhanced the final document. I have been fortunate to participate in a number of collaborative projects that taught me the benefits and challenges of team science, and allowed me to interact with many exceptional individuals and world-class scientists. I am honored to have been involved in the National Cancer Institute Neuroblastoma TARGET initiative, and would like to thank Drs. John Maris, Daniela Gerhard and Malcolm Smith for this opportunity. I am also thankful to have worked with Drs. David Kaplan, Freda Miller, Jason Moffat, Gregory Cairncross, Neal Boerkoel, Connie Eaves, Sheila Singh and members of their laboratories. I would like to specifically acknowledge Loen Hansford, Milijana Vojvodic, Kristen Smith, Kim Blakely and Nathalie Grinstein for providing experimental support for my work. On the note of collaborations, I cannot fail to thank Dr. Stephen Yip for introducing me to neuropathology, and for helping me on this journey in more ways than could be listed here. I am privileged to have been part of the Marra lab, and would like to thank its current and former members for technical assistance, insightful discussions, and emotional support. I wish to specifically thank Noushin Farnoud, Andy Mungall, Malachi Griffith, Trevor Pugh, Ryan Morin, Tesa Severson, Rodrigo Goya, Maria Mendez-Lago, Sorana Morrissy, Jill  xv Mwenifumbo, and Suganthi Chittaranjan for their expertise and team spirit that have contributed immensely to this thesis. I also would like to acknowledge the gifted summer students Alexandra Maslova and Yulia Merkulova who have been a great help in my research. My sincerest gratitude goes to Lulu Crisostomo for her invaluable assistance with administrative tasks and much more. I am thankful to have been surrounded by many talented staff and scientists at the BC Cancer Agency‘s Genome Sciences Center (GSC), especially, Richard Corbett, Yaron Butterfield, Karen Mungall, Mikhail Bilenky, Hye Jung (Elizabeth) Chun, Greg Taylor, Roland Santos, Alireza Hadj Khodabakhshi, Gordon Robertson, Nina Thiessen, and Rob Chrisp. These individuals have been a source of both scientific and emotional support over the course of my PhD. My work would not have been possible without the skilled assistance from the members of the GSC library construction, sequencing, and bioinformatics teams. I would also like to thank Robyn Roscoe, Karen Novik, Diane Miller, Dominik Stoll and Cecelia Suragh, for their help with funding applications and project management support. I have enjoyed being part of the Canadian Institutes for Health Research / Michael Smith Foundation for Health Research Strategic Training Program in Bioinformatics, and would like to thank the two foundations for my stipend during the rotations. I would also like to extend my gratitude to Dr. Steven Jones and Sharon Ruschkowski for fostering a great training environment, and supporting me through my rotations and thesis work. In addition to the bioinformatics program stipend, I have been honored to receive salary and travel funds from the National Sciences and Engineering Research Council, Michael Smith Foundation for Health Research, Genome Canada, American Association for Cancer Research Women in Cancer Research Council, University of British Columbia, Roman M. Babicki Fellowship in Medical Research, and the John Bosdet Memorial Fund. I also cannot fail to acknowledge the Jordan Hopkins Foundation for Cancer Research, the James Fund for Neuroblastoma Research, the British Columbia Childhood Cancer Parents‘ Association, and the Will to Survive Campaign for their passionate support of pediatric cancer research, including my thesis project.   Finally, I wish to extend my thanks to fellow graduate students Anya Gangaeva, Meeta Mistry, Shabnam Tavassolli, Katayoon Kasaian, Warren Cheung, Leon French, Kieran O‘Neill, Anthony Fejes, and Yvonne Li, as well as my family and friends for being a great source of encouragement, motivation, and fun throughout these years.  xvi Dedication  To Anna, Ava, Emily, Ethan, Brendan, Connor, Jake, James, Jordan, Kaiya, Nate, Maya, Reese, Ryan, Taras, Shivank as well as countless others who have journeyed through the world of neuroblastoma, and to Megan McNeil, who fought hard for a day when no child would die from cancer.                        1 Chapter 1: Evolving methods of genomic analysis and their application to the study of neuroblastoma 1  1.1 Introduction While it has been long realized that cancers are genetic diseases, it is only with the recent advent of high resolution genomic technologies that the exact nature of genetic changes associated with most cancers are being elucidated. This Chapter reviews the evolution of genomic approaches that have been developed for cancer analysis, with an emphasis on the genomic technologies, microarrays and next-generation sequencing, used for the research described in Chapters 2, 3 and 4 of this thesis. A specific focus of the dissertation is on the genomic analysis of pediatric neuroblastoma, a cancer of the developing sympathetic nervous system that most commonly affects children under the age of 5. Section 1.9 provides a brief overview of the clinical and biological features of neuroblastoma, as well as the advances in neuroblastoma genetics and genomics. Finally, Section 1.10 introduces the specific hypotheses and experimental goals addressed in each of the research chapters of this thesis. 1.2 Cancer as a genetic disease The presence and causative role of genetic defects in cancer cells was first suggested by David von Hansemann and Theodor Boveri in the 1890s-1900s [1]. Boveri accepted von Hansemann‘s original idea that abnormal chromatin content was central to cancer cells, and refined it in his subsequent experimental work on sea urchin embryos. Using the sea urchin model system, Boveri observed that abnormal numbers of chromosomes led to improper embryonic development, and, in some cases, to uncontrolled cell growth. Boveri further hypothesized that genetic aberrations came in two flavors, those stimulatory and those inhibitory to cell growth [2,3]. The growth stimulatory chromosomes would be accumulated  1  Portions of this Chapter have been published, and the author contributions are provided in the Preface as per the University of British Columbia PhD thesis guidelines: O. Morozova, M. Hirst, M.A. Marra. Applications of new sequencing technologies for transcriptome analysis. Annu Rev Genomics Hum Genet 10:135-51, 2009. Copyright by Annual Reviews; O. Morozova and M.A. Marra. Applications of next-generation sequencing technologies in functional genomics. Genomics 92(5), 2008. Copyright by Elsevier; O. Morozova and M.A. Marra. From cytogenetics to next-generation sequencing technologies: advances in the detection of genome rearrangements in tumors. Biochem. Cell Biol. 86(2):81-91, 2008. Copyright by Canadian Science Publishing.  2 by cancer cells, while the inhibitory ones would be excluded.  Boveri‘s prescient concepts of stimulatory and inhibitory genetic material were much later manifested in the notions of oncogenes and tumor suppressors, collectively known as cancer genes [4–6]. Oncogenes and tumor suppressors are genes whose products function in cell growth pathways or are involved in the control of the cell cycle. Mutated oncogenes typically function in a dominant fashion, while mutations in tumor suppressors are recessive [7]. The first cellular oncogene c-src was discovered by homology with viral sequence previously shown by Peyton Rous to cause sarcomas in hen [8,6]. The normal cellular homologues of viral oncogenes are commonly referred to as prototype oncogenes (proto-oncogenes) to highlight the fact that they need to be activated by a genetic event (a gain-of-function mutation) to become oncogenes, whereas the viral counterparts encode constitutively active pro-survival proteins. Another class of genes that contribute to cancer and is sometimes considered part of the term ―cancer genes‖ includes genes involved in DNA repair. Defects in these genes contribute to the increased rate of accumulation of DNA damage as well as genomic instability that in turn enhances the likelihood of producing a genetic alteration affecting a proto-oncogene or tumor suppressor (e.g. mutations in mismatch repair genes are responsible for hereditary nonpolyposis colorectal cancer [9]). 1.3 Cancer as a multigenic disease Mathematical modeling studies that used epidemiological data on the age distribution of common cancers have led investigators, such as Carl Nordling, to propose that several (originally as many as seven) genetic hits may be required for tumorigenesis [10].  Alfred Knudson applied the idea of multistep tumorigenesis to the study of retinoblastoma, a pediatric cancer that can occur in both sporadic and familial forms. Knudson used statistical modeling to suggest that the distribution of sporadic and familial retinoblastoma tumors was consistent with the disease being caused by two hits (later termed Knudson‘s two-hit hypothesis). The two-hit hypothesis suggested that in familial cases the first genetic hit was inherited and the second acquired somatically, while in sporadic cases both hits were somatic [11]. This model explained why familial but not sporadic cases often presented with multiple tumors or tumors in both eyes. It was put forward in Nordling‘s original paper that only hits that confer survival advantages on cancer cells would count towards the proposed seven required for  3 tumorigenesis, thereby alluding to the ideas of cancer driver mutations and clonal evolution. Peter Nowell later formalized these ideas into a theoretical framework of stepwise acquisition and Darwinian selection of genetic changes that underlies our current view of tumorigenesis [12].  Nowell also suggested that early genetic mutations that occur in cancer cells may contribute to genomic instability and even more genetic alterations observed in later-stage tumors. However, due to limited biological knowledge available at the time, he was unable to pinpoint the exact nature of the genetic changes required for tumorigenesis. In a seminal paper published in 1990, Eric Fearon and Bert Vogelstein combined previous theoretical work with advances in the identification of oncogenes and tumor suppressors to propose a specific molecular model of colorectal tumorigenesis [13]. According to this model, aggressive colorectal carcinomas developed from benign adenomas by sequential acquisition of changes that included activation of oncogenes and inactivation or loss of tumor suppressors. The model also incorporated epigenetic changes, such as DNA hypomethylation, which was originally reported to occur in tumors by Feinberg and Vogelstein [14], but shown to have a causal role in cancer only several years later [15]. It is now accepted that abnormalities in cancer genes, accumulated and selected for in a step-wise process, contribute to a genetic landscape that underlies the biological hallmarks of tumors: self-sufficiency in growth signals, insensitivity to growth-inhibitory signals, evasion of programmed cell death (apoptosis), unlimited replicative potential, sustained angiogenesis, and tissue invasion [16]. 1.4 Origin of genetic mutations in cancers 1.4.1 Familial cancers and cancer syndromes While most cancers are acquired sporadically, a fraction of malignancies, such as familial breast cancer, cluster in families and are associated with inherited mutations in cancer genes [17]. In addition, several cancer syndromes have been characterized and linked with overall increased risk of cancers, for instance Li-Fraumeni Syndrome and von Hippel- Lindau disease are associated with increased risk of certain types of solid tumors [18]. Familial cancers and cancer syndromes have been instrumental in inferring the identities of a fraction of cancer genes that play a role in both sporadic and familial forms of the same malignancy. For example, tumor suppressors RB and VHL are altered in both sporadic and familial forms of retinoblastoma and renal cell carcinoma, respectively [19,20]. However, in  4 most cases, such as ductal and lobular breast cancer, alterations in different genes underlie sporadic and familial forms of the same disease [21]. 1.4.2 Genetic causes of sporadic cancers   Most well-characterized familial cancer syndromes follow a dominant mode of inheritance and are associated with a small number of rare alleles that confer a significant effect on the phenotype [18]. The completion of the first two human genome sequences and the International HapMap Initiative led to the realization of the abundance of human genetic variation that may contribute to an individual‘s risk of common diseases, including cancer [22–24].  This realization brought about investigations into recessive genetic components that may influence susceptibilities to sporadic cancers. Due to the high lifetime relative risk of developing a sporadic cancer at an invasive site (45% for males and 38% for women in the US according to the Surveillance Epidemiology and End Results database [25]), studies of cancer families are confounded by a high likelihood of chance associations and difficulties in discerning hereditary and environmental causes [17]. To address these concerns and help delineate the potential hereditary component of common cancers, a large-scale study designed to compare co-occurrence of common cancers in monozygotic and dizygotic twins was conducted [26]. The study examined 44,788 sets of twins from Sweden, Finland and Denmark, and found minor contributions of a hereditary component to susceptibility for most types of cancer, suggesting that most significant causes of common sporadic cancers were environmental. Environmental agents that have been associated with cancer include tobacco smoke, UV light, radiation, hormones, viruses, and various chemical substances. In fact, it is currently thought that the environmental causes of human cancers are underappreciated [27]. Given these observations, sporadic cancers are likely caused by a combination of inherited predisposition alleles and acquired (somatic) mutations that result in uncontrolled proliferation and tumor growth. Inherited or acquired defects in DNA repair, replication or segregation can aggravate the neoplastic phenotype and contribute to further cancer progression. The acquired alterations may arise through the exposure to environmental agents, or due to other factors, some of which may be currently unknown.   5 1.5 Cancer stem cell hypothesis Stem cells are defined as special cells within a multicellular organism that are able to self-renew and, through cell division, generate specialized cell types that compose each tissue within the body. For instance, embryonic stem cells are able to generate all cell types within the developing embryo while adult (somatic) stem cells are able to regenerate cell types within a particular tissue [28]. Modern use of the term ―cancer stem cell‖ has been pioneered by the work in leukemia that showed that the cell of origin of leukemias, regardless of their heterogeneity, consistently exhibited properties of the normal hematopoietic stem cell [29]. This work resolved the long-term debate on the target cell that was susceptible to leukemic transformation, and implicated the hematopoietic stem cell in this role.  Since this landmark study, similar observations have been made in brain and breast malignancies [30,31]. Both of these reports together with the original leukemia work suggested that a small fraction of cells (0.1-0.0001%) within each tumor maintains stem cell properties and is responsible for self- renewal and regeneration of the tumor hierarchy by producing differentiated cells that form the bulk of the tumor. However, the idea of rarity of tumor-regenerating cancer stem cells was questioned by the work in melanoma, which reported that an average of 27% of unsorted melanoma cells from patients were capable of forming tumors in mice in single-cell transplant experiments [32]. The contention of the melanoma study was that the common use of NOD/SCID mice, such as that reported in the original leukemia work [29], may underestimate the frequency of tumor-forming cells as these mice have remnants of immunity and are less susceptible to developing cancer. In contrast, the melanoma study used NOD/SCID interleukin-2 receptor gamma chain null (Il2rg (-/-)) mice that are more immunocompromised than the NOD/SCID mice and are thus better suitable for estimating true tumorigenic capacity of cancer cells. Another study challenged the presumed origin of cancer stem cells from resident normal stem cells within the tissue and showed that breast cancer stem cells may arise from tumor cells via epithelial to mesenchymal transformation induced by immune signaling [33]. Given these observations, the state-of-the-art version of the cancer stem cell model is dynamic, and incorporates the possibility of variable frequencies of stem cells in different cancer types, as well as the potential for inter-conversion of cancer stem cell and non-stem cell compartments within the tumor [34].  6 An important result from studies in the cancer stem cell field is the finding that cancer stem cells may be resistant to therapies and may be associated with tumor recurrence [35]. As such, cancer stem cells provide important targets for novel therapy development, particularly for recurrent and refractory disease. Therefore, studying genetic changes found in these cells may shed light onto the potential therapeutics that may be specific to the cancer stem cell compartment. 1.6 Genetic lesions in cancers and methods for their detection Somatic and germline aberrations implicated in tumorigenesis can affect a single base (point mutations) as well as multiple bases (translocations, inversions, small insertions/deletions (indels), and copy number variants (CNVs)).  Throughout this thesis, events are defined as duplications or deletions if they are <1 kb in length and as CNVs if they are >1 kb in length.  In addition, losses of heterozygosity (LOH) are defined in the context of tumor suppressors when one allele, most often the functional one, is lost either through the loss of a copy of a chromosome or via a copy-number-neutral mechanism. Due to their size and ease of detection, genome rearrangements involving whole chromosomes or their parts were the earliest genetic events shown to be associated with cancer [3]. Chromosomal translocations can result in either chimeric protein products or aberrant gene expression due to the apposition of coding sequences to regulatory regions of other genes, either of which can be associated with cancer genes [7]. Copy number gains (amplifications) have been shown to be linked to increased expression of oncogenes, such as MYCN in neuroblastoma, while regions of copy number losses may harbor tumor suppressor loci [36]. In addition, coding or regulatory sequence of cancer genes can be disrupted by point mutations and small indels affecting the amino acid sequence or gene expression, respectively. The smaller events evaded detection by early low resolution approaches, and the extent of their contribution to tumorigenesis has been realized only in the recent decade. 1.6.1 Pre-genomic methods for studying genetic aberrations in cancers   The earliest methods for detecting chromosomal and genomic aberrations in cancers involved microscopic examinations of chromosomes and chromosome banding patterns [37]. Application of these approaches led to the discovery of the Philadelphia chromosome, which results from an exchange of DNA between chromosomes 9 and 22 in chronic myeologenous leukemia (CML) [38,39]. PCR-based methods have been used to detect known genome  7 rearrangements, particularly alterations in gene copy number. These methods produce results promptly, require little starting material, and are excellent for locus-specific identification of known rearrangements of a few kbs in size. Several techniques allow detection of genomic lesions larger than those detectable by traditional PCR (5 – 6 kb) [40]. For instance, Long PCR uses a mixture of two polymerases, a proofreading and a non-proofreading one, thus increasing the product size to 35 kb [40]. The product length of non-proofreading polymerases is limited by the low efficiency of extension at mismatched bases, while the product length of proofreading polymerases can be limited by their 3´-exonuclease activity; therefore, combining the two types of polymerases increases the product size achievable by each enzyme alone. This method is useful for identifying specific large aberrations, including intragenic deletions, insertions and duplications [41]. An important milestone in molecular cytogenetics was the development of in situ hybridization. This procedure is based on the principle of the hybridization of a labeled probe, containing genomic DNA of interest, to a complementary target; probe copy number is assessed by means of microscopic visualization. Since the first report of the method in 1969 [42], in situ hybridization methods have undergone extensive advancement with regards to both the target and the probe [43].  The most commonly used conventional in situ hybridization protocol in cancer research is dual-color fluorescence in situ hybridization (FISH). This method involves labeling centromeres and the DNA region of interest with different colors and estimating probe copy number from the ratio of the centromeric and non- centromeric signal. Dual-color FISH is used for the detection of chromosomal gains or losses (aneuploidy); intrachromosomal insertions, deletions, inversions, amplifications; and chromosomal translocations in both solid and hematopoietic cancers [44]. An extension of conventional FISH methods is the development of multi-fluorochrome techniques such as multiplex FISH (M-FISH) [45], spectral karyotyping (SKY) [46] and combined binary ratio labeling (COBRA) [47] which allow the simultaneous visualization of all chromosomes in 24 colors. Improvements in target resolution have been achieved through the use of different probe substrates, including metaphase chromosomes (~5 Mb resolution), interphase nuclei (50 kb – 2 Mb resolution), and extended chromatin or DNA fibers (5 – 500 kb resolution) [43]. Mapped genomic clones such as bacterial artificial chromosomes (BACs), P1-derived artificial chromosomes (PACs), and yeast artificial chromosomes (YACs) have also been  8 used as FISH probes to achieve a higher resolution mapping of genome rearrangements to the human genome sequence than that achievable by chromosome FISH [48–50]. 1.6.2 Array-based methods for the detection of genetic lesions in cancer genomes Comparative genomic hybridization (CGH) is a molecular cytogenetic method for detecting relative differences in copy number between two genomes. In its original form, DNAs from reference and test samples were labeled with different colors and hybridized to metaphase chromosomes. The ratios of test to reference fluorescence intensities were quantified using digital image analysis, and were used to identify genomic losses or gains in the test sample (e.g. a tumor sample) with respect to the reference sample [51]. Conventional CGH is labor intensive, providing relatively low resolution of 5 to 10 Mb for deletions and 2 Mb for amplifications [52]; moreover, it is unsuitable for the detection of balanced rearrangements (e.g. balanced translocations and inversions) as well as whole genome copy number changes (ploidy) [53]. However, CGH can be used as a discovery tool as it requires no prior knowledge of chromosomal imbalances. To overcome the low resolution limitation of CGH, array CGH (aCGH) was developed. In aCGH, the differentially labeled test and reference DNA is hybridized to a glass slide containing arrayed DNA probes rather than metaphase chromosomes [54]. With the recent development of arrays of mapped clones spanning whole chromosomes [55,56] and the whole human genome [57], large-scale aCGH experiments are feasible. For instance, 79 kb resolution has been achieved using a genome-wide array of BACs [58];  75 and 110 kb resolutions have been reported with chromosomal arrays containing a mix of BACs/PACs and fosmids/cosmids, and BACs only, respectively [55,56].  Arrays of mapped genomic clones are robust with a high signal to noise ratio, and have been applied to the detection of copy number changes in tumors on a genome-wide and chromosome-wide scale [59,52]. In contrast, oligonucleotide arrays can provide a higher resolution (generally 5 to 50 kb) but have been reported to suffer from lower sensitivity resulting in failure to reliably detect low- copy number changes due to a poorer signal to noise ratio [60]. Oligonucleotide array CGH can potentially provide even higher resolution than 5 kb as overlapping nucleotides can be synthesized with as little as a single base off-set [53]. Despite the popularity of aCGH methods, the main technological limitation of these methods is their restricted applicability to  9 the detection of genome rearrangements that involve a change in copy numbers relative to a reference sample. Single Nucleotide Polymorphism (SNP) arrays, originally designed for genotyping, are oligonucleotide arrays that detect the two different alleles of biallelic SNPs [61]. Probe signal intensities can be used to determine SNP genotypes and to detect copy number changes [62].  In contrast to array CGH, in which samples are differentially labeled and co- hybridized, only one labeled sample is hybridized to the SNP array at a time; CNVs are detected by comparison with one or several reference samples analyzed in separate hybridizations. Currently SNP arrays capable of genotyping more than 1M SNPs are available from companies such as Illumina and Affymetrix, providing a resolution that matches or exceeds that of most state-of-the-art aCGH platforms. An important advantage of SNP arrays is the ability, unique among genomic methods discussed thus far, to detect copy number neutral losses of heterozygosity [63]. Further, SNP arrays have been used to detect allele-specific copy number variants [64]. A disadvantage of the technology is the requirement of a PCR amplification step to increase the signal to noise ratio; as a result, amplification biases may be introduced giving rise to spurious CNVs [53]. Moreover, CNV predictions achieved using SNP arrays vary depending on the reference set and computational approach used [65].  Even so, SNP arrays have been widely applied to the analysis of genomes of various tumors including neuroblastoma [66] and in The Cancer Genome Atlas discussed in Section 1.6.3.3. 1.6.3 Sequencing approaches for the detection of genetic lesions in cancers 1.6.3.1 Advances in DNA sequencing technologies With the completion of the reference human genome projects [22,24], the need for re- sequencing studies in which individual genomes and genomic segments are examined for the presence of changes linked to the phenotype of interest became apparent. This observation drove technological developments that resulted in the advent of a panel of conceptually new sequencing methods collectively referred to as ―next-generation‖, ―new generation‖ or ―second generation‖ sequencers that are more cost-effective than Sanger sequencing. A standard DNA sequencing workflow has traditionally included three key steps, sample preparation, sequencing, and data analysis. The new sequencing technologies improve upon  10 the Sanger protocol by advances in the first steps of the workflow, albeit often at the cost of higher error rates and shorter read lengths that can challenge data analysis. Several high throughput new-generation sequencing technologies are currently commercially available, including 454/FLX (Roche), Illumina, SOLiD (Life Technolgoies), Pacific Biosciences, Ion Torrent (Life Technologies). As of July 2011, the Helicos Heliscope instrument used in several published next-generation sequencing studies is no longer available for purchase. In the research described in this thesis, the Illumina technology is used in Chapter 3 to analyze the transcriptomes of neuroblastoma tumor-initiating cells as well as their normal counterparts. In Chapter 4, the same technology is used to analyze the genome, exome and transcriptome sequences of primary neuroblastoma tumors. The new technologies produce an abundance of short reads at a higher throughput than is achievable with the state-of-the-art Sanger sequencer, and their specifications are summarized in Table 1.1. An additional company not mentioned in Table 1.1, Complete Genomics, Inc. (CGI), provides whole human genome sequencing and analysis as a service [67]. Genome sequences generated by CGI from primary neuroblastoma tumors and matched peripheral blood are discussed in Chapter 4. The advances in sample preparation and sequencing chemistry and detection are reviewed below for the most common next- generation sequencing technologies: 454/Roche, Illumina, and SOLiD. To provide an example of the true single molecule technology, the Helicos Heliscope is also discussed. 1.6.3.1.1  Advances in Sample Preparation In the original Sanger sequencing protocol, a DNA sample is first sheared into fragments, and then subcloned into vectors, followed by the amplification in bacterial or yeast hosts. The amplified DNA is then isolated and sequenced with the Sanger chain termination method [68]. Cloning-based amplification allows for the sequencing of contiguous large fragments, and does not require prior information about the genome sequence (termed ―de novo sequencing‖). However, it is prone to host-related biases, and is lengthy and labor intensive, restricting large-scale Sanger sequencing to designated genome sequencing centers.  Cloning-based amplification followed by Sanger sequencing was used for the determination of the first human genome sequences [24,22]. Notably, when a reference genome sequence of an organism is available and when regions to be sequenced are  11 small, templates can be prepared for sequencing by PCR amplification instead of cloning [69]. A major advantage of the second-generation sequencing platforms is the elimination of the in vivo cloning step and its replacement with PCR-based amplification. Both 454/Roche [70] and Applied Biosystems SOLiD technologies circumvented the cloning requirement by taking advantage of emulsion PCR [71], which uses emulsion droplets to isolate single DNA templates in separate micro reactors where amplification is carried out. This template amplification is also used in Ion Torrent instruments [72]. The Illumina platform [73,74]  uses bridge amplification, a solid phase amplification approach in which DNA molecules are attached to a solid surface and amplified in situ, generating clusters of identical DNA molecules. Both of these amplification approaches result in the generation of a collection of clonal copies of the template, which are fed into subsequent steps of the sequencing pipelines. The first single-molecule method to be commercialized was developed by Stephen Quake‘s laboratory (and commercialized by Helicos Biosciences), eliminated the amplification step, directly sequencing single DNA molecules bound to a surface [75]. Another commercially available single-molecule sequencing method (Pacific Biosciences) employs real-time detection of single fluorescently-labeled nucleotides as they are incorporated by a polymerase [76]. Such single-molecule sequencing approaches are referred to as third-generation technologies. Third-generation sequencers have the potential to reduce the sequencing costs of the second-generation instruments, although their scalability remains unproven. 1.6.3.1.2 Advances in Sequencing Chemistry and Detection The paradigm of the original Sanger method is the DNA polymerase-dependent synthesis of a complementary strand in the presence of four labeled nonreversible synthesis terminators, 2´,3´-dideoxynucleotides (ddNTPs) corresponding to the four natural 2´- deoxynucleotides (dNTPs). The four non-reversible terminators are incorporated into the growing DNA strand at random in place of the corresponding dNTP, thereby producing a collection of DNA fragments of varying lengths that are then separated by polyacrylamide gel electrophoresis [68]. Originally, radioactive ddNTPs were used and four different reactions were required per template molecule. Subsequently, the radioactive ddNTPs were replaced with fluorescently labeled terminators that allowed the four sequencing reactions to  12 be carried out simultaneously with different ddNTPs distinguishable by emission spectra [77]. Another variation of automated Sanger sequencing is the dye-labeled primer sequencing in which fluorescent dyes are attached to the 5′ end of primers [78]. A key disadvantage that hindered further development of this method as compared to the dye-labeled terminators described above is the need for four separate extension reactions that needed to be pooled prior to loading, and four dye-labeled primers for each template.  Other improvements of Sanger sequencing included the replacement of slab gel electrophoresis with capillaries, the advent of capillary arrays that allowed sample multiplexing, and the deployment of production-scale sequencing workflows. As a result of these developments, the Sanger method achieved the read length, accuracy, and throughput compatible with de novo sequencing of whole genomes. To date, Sanger sequencing has been responsible for the generation of reference genome sequences of many species including that of human [22,24]. The pyrosequencing approach was the first alternative to Sanger sequencing to achieve commercialization as part of the Roche/454 instrument [70]. Pyrosequencing uses chemiluminescence-based detection of each released pyrophosphate that occurs upon the incorporation of a nucleotide by the DNA polymerase (Figure 1.1A). The four nucleotides are added to the sequencing reaction one at a time, such that only one type of nucleotide is available to the DNA polymerase at a given step. The addition of the correct nucleotide is accompanied by the release of light allowing for the inference of the nucleotide identity at each position in a sequencing read. The amount of light produced is proportional to the number of incorporated nucleotides, potentially permitting the detection of homopolymers. In practice, however, sequencing of homopolymer stretches using the Roche/454 technology is error-prone [79]. In the 454 FLX instrument, about 1.6 million pyrosequencing reactions occur in parallel, each in a separate well of a picotiter plate contributing to a much higher sequencing throughput than that achieved in a 96-well capillary array of a modern Sanger sequencer. Similarly to 454/Roche, the Illumina Genome Analyzer also uses sequencing-by- synthesis, albeit with a different detection chemistry [74]. The Illumina sequencing reaction utilizes four fluorescently labeled nucleotide analogs that serve as reversible sequencing terminators, and highly modified DNA polymerases that are capable of incorporating these analogs into the growing oligonucleotide chain (Figure 1.1B). At each step the correct  13 nucleotide analog is incorporated into the growing chain and its identity is revealed by the color of its fluorescent label. Importantly, the 3´-OH group of the nucleotide is blocked to prevent further extension of the nascent DNA chain. After the imaging step, the label is washed off and the blockage is reversed, thereby allowing the synthesis to proceed. The sequencing reactions occur in a massively parallel fashion on a flow cell, which is a glass surface that contains hundreds of millions of clusters of clonally identical DNA molecules. The true single-molecule sequencing approach commercialized by Helicos Biosciences in the HeliScope instrument also used a sequencing-by-synthesis procedure in which virtual terminators (nucleotide analogs that reduce the processivity of DNA polymerase) are used [80]. The reduced DNA polymerase processivity allows for the accurate identification of homopolymer stretches. In the Helicos system, single-molecule DNA templates are captured on the flow cell surface. The Cy3-labels attached at both ends of each DNA molecule are used to reveal the location of each template bound to immobilized primers on the surface of the flow cell. The Cy5-labeled nucleotides are added to the reaction one at a time, and the detection of incorporated nucleotides is achieved (Figure 1.1.C). In contrast to the polymerase-based approaches discussed above, the SOLiD (Supported Oligonucleotide Ligation and Detection System) system uses a sequencing-by- ligation approach in which the sequence is inferred indirectly via successive rounds of hybridization and ligation events. This approach was first published by the Church laboratory as the ―polony sequencing technique‖ [81]. The SOLiD system uses 16 dinucleotides, each carrying a fluorescent label. Four fluorescent dyes are used in the system such that one dye labels four different dinucleotides (Figure 1.1D). The identity of each base is determined from the fluorescent readout of two successive ligation reactions. An advantage of the two- base encoding scheme is that each position is effectively probed twice, in principle allowing for the distinction of sequencing error from a true sequence polymorphism. 1.6.3.2 Sanger-based sequencing methods for the detection of genetic lesions Since Sanger sequencing had been the only available sequencing technology for more than 20 years, routine whole genome sequencing was not feasible in that time frame, and Sanger-based methods for rearrangement detection, not requiring whole genome sequencing, had been developed. Digital karyotyping (DK) is a method for genome-wide analysis of copy number changes and other genome rearrangements [82]. The method can be regarded as a  14 ―genomic version‖ of the serial analysis of gene expression (SAGE) technique [83] described in Section 1.7.2. In DK, genomic DNA is digested with a mapping restriction enzyme, originally SacI (with a 6 bp recognition sequence) followed by the ligation of biotinylated linkers and a second digestion using a fragmenting restriction enzyme with a 4 bp recognition sequence. The biotinylated sequences are isolated by binding to streptavidin and the DNA tags are released using a tagging enzyme with a 6 bp recognition sequence. The isolated sequence tags are concatenated, cloned, sequenced, and aligned to a reference genome assembly, providing a copy number estimate at the particular locus. The combination of the mapping and fragmenting enzymes used determines the size of detectable rearrangements, and the genome-wide occurrence of mapping enzyme recognition sites defines genomic areas represented in DK analysis. In the case of SacI, recognition sites are abundant and expected to occur every 4 kb; however, some areas of the human genome (<5%) have lower densities of SacI sites and thus would be analyzed at a lower resolution [82]. To date, DK has been successfully applied to the analysis of a variety of cancers, including those of colon and brain, and has been used to identify putative oncogenes and tumor suppressors in these tumors [84,85]. The original version of DK has a theoretical resolution of 4 kb, which is higher than the generally available array-based methods. A partial limitation of DK imposed by the use of restriction enzymes is the uneven coverage of the genome, which may be addressed by using different combinations of mapping and fragmenting enzymes. Clone-based methods have been developed to detect both balanced and unbalanced genome rearrangements in cancers. An end sequence profiling approach (ESP) has been developed and successfully applied to the genome-wide analysis of rearrangements of the MCF7 breast cancer cell line [86,87]. In ESP, a BAC library is constructed for the tumor genome of interest, both ends of BAC clones are sequenced, and the paired-end sequences are mapped back to a reference genome assembly. Structural genomic variants are discovered by identifying clones whose paired-end sequences map to the reference genome in orientations that indicate the clone was derived from rearranged DNA. The ESP approach is potentially applicable to the detection of all types of genome rearrangements, which could be inferred from different types of ―ESP signatures‖ [86].  While powerful, paired-end sequencing of clones has several limitations. First, the approach is dependent on the  15 construction of clone libraries, which can be slow and costly, requiring high molecular weight DNA. Second, the resolution of paired-end sequencing methods is determined by the clone properties and the redundancy of genome coverage. Also, since the sampling occurs only from the ends, large numbers of clones would be necessary to achieve genome-wide high resolution coverage of rearrangements. To address this limitation a BAC clone fingerprint profiling (FPP) approach for high resolution detection of genome rearrangements was developed [88]. The FPP method includes the digestion of genomic BAC clones prepared from tumor DNA with five restriction enzymes, HindIII, EcoRI, BglII, NcoII, and PvuII to generate clone fingerprints that are then aligned against the in silico digests of the reference genome sequence using the FPP alignment algorithm. The restriction enzymes were selected to achieve frequent cutting and restriction site location complementation (restriction-site-poor areas of one enzyme corresponding to restriction-site-rich areas of another enzyme). The FPP alignment algorithm consists of four steps that are detailed in [88]. Briefly, the steps for aligning each BAC fingerprint to the reference genome sequence include the following: a global search of the reference genome sequence to identify BAC- sized or smaller genomic regions that yield digest patterns similar to that of the query clone; a local search that further delineates the local correspondence between the fingerprint of the query clone and that of the in silico digested genomic region(s) identified in step 1; an edge detection algorithm that precisely identifies the extent of the alignment; and the final partitioning step that selects an optimal solution, whereby a minimal set of alignments maximally accounts for all clone fragments on the genome. Differences between the experimental and in silico digestion patterns are indicative of genomic differences, including genome rearrangements in the clone versus the reference genome. For instance, an alignment in which the clone maps to one genomic region, but in which there are internal gaps in fragment alignments, indicates the presence of a localized rearrangement confined to the clone; on the other hand, an alignment in which the clone fingerprint is partitioned over several regions in the genome suggests the presence of a translocation, inversion, or a large deletion. The FPP approach provides several important advantages over ESP and other genome-wide methods for rearrangement detection. First, the method samples the entire clone insert and not just the clone ends, as in ESP. Therefore, rearrangement coordinates  16 mapping within the clone will be more precisely localized with FPP than ESP, given the same number of clones sampled [88]. Second, FPP is relatively tolerant of repeats compared with ESP and oligonucleotide microarrays, since only 7% of human repeats are found in contiguous regions of 3.9 kb (the average sizeable HindIII restriction fragment)[88]. This is an important advantage, considering that a significant portion of the human genome is composed of repeat sequences. Third, both balanced and unbalanced rearrangements are potentially detectable. As in ESP, clones harboring rearrangements can be directly selected for functional analyses and sequencing. Some of the drawbacks of the FPP approach include the cost and speed of library production (similar to ESP), the cost of clone characterization (cheaper than in ESP), and the requirement of a large amount of starting DNA material (less than in ESP). Consequently, although the FPP approach is potentially very powerful, the reliance on clones currently limits its widespread application. In addition, just as it is the case with other methods that rely on restriction enzyme digestion, FPP may erroneously interpret restriction fragment length polymorphisms as genome rearrangements. This limitation may be partially addressed in the future as more complete catalogues of normal genomic variation are compiled. 1.6.3.3 Cancer sequencing studies using the Sanger technology As discussed in Sections 1.2 through 1.4, it has become increasingly clear that sporadic cancers are associated with multiple acquired genetic lesions that contribute to various aspects of oncogenesis. To address the spectrum of these lesions more comprehensively than possible with hybridization or clone-based sequencing discussed in previous sections, several sequencing initiatives have been launched worldwide. The most notable of these are the Cancer Genome Project (CGP) in the United Kingdom and The Cancer Genome Atlas (TCGA) in the United Stated [89,90].  A branch of the TCGA with a pediatric focus, the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiative was also set up to apply similar approaches to the analysis of pediatric tumors (http://target.cancer.gov/). Initially the large-scale sequencing projects relied on the Sanger-based re-sequencing of the coding sequence of a gene set of interest or all genes in the genome; however, with the advent of new sequencing technologies discussed in Section 1.6.3.1, these projects are switching to whole genome, exome and transcriptome analysis using new sequencing platforms (Section 1.6.3.4). An analysis of 99 neuroblastoma cases  17 studied as part of the TARGET initiative using new sequencing technologies is discussed in Chapter 4.   The systematic re-sequencing of the PCR-amplified coding exons of 518 protein kinase genes in 210 human cancers of 13 different types of histology conducted by the CGP initiative identified 1,007 somatic mutations, of which 921 were single base substitutions, 78 were indels, and 8 were complex rearrangements; 2/3 of these mutations had previously been uncharacterized [69].  The first TCGA report of a comprehensive analysis of glioblastoma tumors that incorporated re-sequencing data from a panel of over 600 genes in 143 cases revealed three signaling pathways that may be disrupted in glioblastoma [91]. However, since the sequencing effort involved only a subset of genes, recurrent mutations in IDH1, a gene previously not implicated in cancer, were missed by this approach but detected by a more comprehensive sequencing study [92]. Similar studies where pre-selected gene sets were re-sequenced in panels of tumors were also conducted in pediatric acute lymphoblastic leukemia, lung, and soft tissue sarcomas [93–95]. In all cases, these studies identified novel loci and pathways associated with the diseases. The re-sequencing of the coding regions of RefSeq and Consensus Coding Sequence (CCDS) genes was conducted in 11 breast and 11 colorectal cancers [96,97] and identified somatic mutations in 1718 genes (9.4% of the genes analyzed). More recently, similar approaches were also conducted in ovarian cancer and pediatric solid tumor medulloblastoma [98,99]. The medulloblastoma study involved the analysis of 22 tumors, and found an average of 11 somatic gene alterations per tumor, which was fewer by a factor of 5 to 10 compared to the adult solid tumors analyzed by related approaches, as described in this section above. Nonetheless, the study found mutations in MLL2 and MLL3, previously unknown in this malignancy. These studies suggest that large-scale sequencing efforts are successful at identifying known and novel genetic aberrations in human cancers, and that our catalogs of genetic variants that contribute to oncogenesis are incomplete for both pediatric and adult tumors. In fact, prior to large-scale sequencing studies, approximately 1% of human genes had been shown to be mutated in cancers using other techniques [7]. In contrast, recent data from the Catalogue of Somatic Mutations in Cancers database at the Sanger Institute suggest that up to 26% of all genes may harbor somatic mutations in cancers, and novel cancer genes, with  18 proven causal roles in oncogenesis, are defined each year [100]. Some of the notable examples of novel cancer genes discovered by sequencing studies include IDH1 in gliomas and leukemias [101,92], EZH2 in lymphomas and myeloid disorders [102], and FOXL2 in ovarian cancers [103]. The increasing number of genes with reported mutations in cancers points at the heterogeneity of somatic mutation found in certain cancer types, particularly solid tumors. For instance, the recent report from the sequencing of the coding region of 316 ovarian tumors by the TCGA revealed that TP53 was the only highly prevalent recurrently mutated gene, and that other genes were mutated in small subsets of tumors [98]. Therefore, the large-scale sequencing studies indicate that unbiased analyses of both adult and pediatric cancers using higher resolution approaches may identify novel loci relevant to these diseases. 1.6.3.4 Cancer genome and exome sequencing using new sequencing technologies With the advent of next-generation sequencing technologies described in Section 1.6.3.1, whole genome, exome, and transcriptome sequencing studies became more feasible and routine than previously possible with Sanger sequencing. In addition to reducing the cost of large-scale sequencing, the introduction of next-generation sequencers increased the sensitivity of mutation detection. An early study using the 454/Roche sequencing technology demonstrated the potential of next-generation sequencers to detect rare variants present in specific subpopulations of cells that elude cost-effective detection by capillary sequencing approaches [104]. The ability to detect genetic heterogeneity is due to the use of sequencing templates that have been clonally derived from a single molecule; in this manner, a variant present in a few cells can be detected if sufficient sequencing depth is applied. This feature is particularly important in cancer research in light of the hierarchy of different cell types within a tumor discussed in Section 1.5. Given this hierarchy as well as variable levels of stromal contamination invariably present in clinical samples, Sanger sequencing studies of cancers likely sampled only the most common genotypes present in a tumor, and may have missed mutations in samples containing a high frequency of normal cells [104].  In contrast, new sequencing technologies are potentially more sensitive and capable of detecting the genetic make-up of rare populations.  The study that used Illumina technology to sequence the whole genome of an acute myeloid leukemia (AML) sample became the first report of a cancer genome sequenced with a new sequencing technology [105]. This study identified known and novel somatic  19 mutations that might contribute to leukemogenesis, suggesting that next-generation sequencing provides a comprehensive way for analyzing cancer genomes. Since this initial report, the genomes of additional hematopoietic (acute myeloid leukemia, chronic lymphoblastic leukemia, multiple myeloma, B-cell lymphoma) and solid (lung, breast, tongue, prostate, and skin) tumors have been published [101,106–114]. These studies have led to the identification of genetic lesions previously not implicated in the particular malignancy or oncogenesis per se.  Some of this information was shown to be immediately clinically actionable, such as in the case of a tongue adenocarcinoma, whose genome sequence was used to suggest a potential therapeutic option for the patient [109]. Similarly, the identification of BRAF mutations in a fraction of multiple myeloma patients suggests a role for BRAF inhibitors in the management of the disease [108]. With the rapidly increasing number of cancer sequencing studies, largely facilitated by the introduction of new sequencing technologies, an international group of experts established the International Cancer Genome Consortium with the purpose of coordinating the ongoing cancer sequencing efforts in different countries [115]. The projects within the consortium encompass the sequencing of over 50 different cancer types, and over 25,000 individual cancer genomes. In addition to whole genome sequencing of tumors, many efforts involve the sequencing of coding regions of the genome or exome. A rationale for conducting exome rather than whole genome sequencing is the current cost-efficiency of the former approach. It can be also argued that somatic variation within the coding sequence is currently more readily interpretable and clinically actionable than intergenic variation captured by whole genome projects along with the coding variation. While sequencing experiments are becoming increasingly more affordable, whole genome sequencing is still costly when performed to the depth required to comprehensively identify variants in all genes (average 30X haploid coverage [116] that was later upgraded to at least 50X haploid coverage [117]). Therefore, major reductions in sequencing and analysis costs need to occur before exome sequencing can be rendered obsolete. Several methods of target enrichment have been developed to select the coding regions for sequencing. These methods comprise two most common categories, PCR-based enrichment of targets, and hybridization-based enrichment of targets conducted in solution,  20 on an array or as a combination of these two approaches (hybrid) capture; each of these methods have their own advantages and disadvantages [118]. To date several cancers have been analyzed using next-generation exome sequencing, including rare tumors, such as pheochromocytoma, hepatocellular carcinoma, hairy cell leukemia, renal cell carcinoma, and acute monocytic leukemia [119–123].  Exome sequencing has been useful for detecting point mutations and indels in the coding sequence, while whole genome methods, in addition to detecting these events, have also detected gene fusions and structural rearrangements. 1.7 Cancer transcriptomes as proxies for the genomic diversity of tumors Historically, cancers have been classified based on their pathological features. However, it became evident that patients with an identical histopathological diagnosis differed dramatically in terms of their disease course and response to therapy. These phenotypic differences can be now attributed to the genomic heterogeneity that has emerged from recent genome-level analyses of individual tumors as described in Section 1.6.3 (also, as reviewed in [116]). However, even prior to high resolution genome sequencing studies, some of this heterogeneity could be assessed from studying gene expression profiles of seemingly identical tumors. Two conceptually different approaches to high throughput gene expression profiling, using hybridization and sequencing, have emerged in the last decades and allow for the interrogation of gene expression levels on a genome-wide scale. 1.7.1 Transcriptome analysis of cancers using microarrays One group of methods for global transcriptome analysis is based on microarrays, in which cDNA is hybridized to arrays of complementary oligonucleotide probes corresponding to genes of interest, and the abundance of a particular mRNA species is estimated from its hybridization intensity to the relevant probe [124]. Microarray analysis is used in this thesis to study gene expression profiles of normal and malignant neural crest-like cells in Chapters 2 and 3. In Chapter 2, microarray expression data derived from several lineages of normal SKin-derived Precursor cells (SKPs) are used to characterize the neural crest-like phenotype of these cells and support their use as normal counterparts of neuroblastoma cells. In Chapter 3, microarray analysis is used to confirm the results from RNA sequencing experiments (Section 1.7.2).  Several microarray platforms are currently available or in development; however, all of them rely on the principle of probe-target hybridization, in which the signal intensity  21 provides a measure of the amount of particular nucleic acid in a sample. In addition to measuring the concentration of nucleic acid in a sample, the signal intensity also depends on probe-target binding affinity, specificity of which is controlled for in a microarray experiment by introducing mismatch probes [125]. The seminal study applied microarrays to the examination of expression profiles of acute myeloid and acute lymphoblastic leukemias and showed that these clinically-distinct leukemias could be distinguished prospectively in an unsupervised manner based on their gene expression information alone [126]. In addition to the finding of correlation between the disease phenotype and global gene expression profile, this study introduced two conceptually different applications of microarray analysis: class prediction (assigning new tumors into known classes) and class discovery (discovering novel clinically relevant subtypes) that have since been used widely in cancer transcriptomics research.  This work also brought about a multitude of expression profiling initiatives that to date have been performed in many types of malignancy [127]. These studies aimed to classify tumors previously indistinguishable with conventional approaches into clinically- relevant subtypes (class discovery) as well as to identify expression markers that could be used to prospectively classify tumors into known disease subtypes (class prediction). Another common direction of microarray data analysis is class comparison [128]. In class comparison studies, genes with evidence of differential expression among disease types, cell populations or experiments of interest are identified and used to gain novel biological or clinical insight into the different classes being compared. The analyses of microarray data, described in Chapters 2 and 3 of this thesis, are class comparisons, in which we sought transcripts significantly increased or decreased in abundance in different populations of cells. Early influential works in the cancer microarray field include class discovery studies that identified previously indistinguishable clinically and biologically relevant subtypes that derived from different cells of origin in diffuse large B-cell lymphomas [129] and breast cancers [130]. Expression-based molecular classifiers developed as a result of such studies (notably, the MammaPrint assay in breast cancers [131]) are being used in clinics and have been shown to outperform conventional methods of clinical assessment [131]. Further developments in the microarray field enabled other cancer transcriptomics applications, such as the detection of noncoding RNAs [132], single nucleotide polymorphisms (SNPs) (described in Section 1.6.2), and alternative splicing events [133].  22 Despite their power to measure the expression of thousands of genes simultaneously, microarray methods do not readily address several key aspects, notably the ability to detect novel transcripts and the ability to study the coding sequence of detected transcripts. Moreover, microarrays are indirect methods in which transcript abundance is inferred from hybridization intensity rather than measured explicitly. These properties may interfere with experimental reproducibility, particularly when performed by different laboratories [125]. 1.7.2 Sequence census approaches to transcriptome analysis A conceptually different group of methods uses sequencing of cDNA fragments derived from mRNA, followed by counting the number of times a particular fragment has been observed (Figure 1.2). This group of methods originally included the Serial Analysis of Gene Expression (SAGE) method [83], and Massively Parallel Signature Sequencing (MPSS) [134]. In SAGE, restriction enzymes are used to obtain short sequence fragments (tags) usually derived from the 3‘ end of an mRNA; the tags are concatenated and sequenced to determine the expression profiles of their corresponding mRNAs [83]. Modifications of this protocol extended the tag length from the original 14 bp to 17 bp in LongSAGE and 26 bp in the SuperSAGE protocol [135,136].  In Chapter 2 of this thesis, SAGE analysis is used to define a list of candidate pluripotency genes, preferentially expressed in undifferentiated human embryonic stem cells. The MPSS method also generates small fragment signatures of each mRNA species; however, the in vivo propagation in bacteria used in SAGE is replaced with in vitro cloning on microbeads [134,137,83]. In addition, MPSS uses a ligation-based sequencing method instead of Sanger sequencing used in SAGE [134,83]. SAGE and MPSS are often termed ―clone-and-count‖ or ―sequence census‖ techniques as they provide a digital overview of gene expression profiles in a cell [138].  Advantages of such digital readouts include statistical robustness, and less stringent standardization and replication requirements than those for microarrays [139,134]. Some disadvantages that had hindered the use of SAGE and MPSS up until recently included the cost of sequencing and the biases introduced by the necessary cloning step. Despite its superior performance compared to microarrays at detecting highly- abundant transcripts, traditional SAGE is not very efficient at detecting rare mRNA populations [140]. New sequencing technologies have increased the cost-effectiveness of the  23 method that originally relied on the Sanger sequencing protocol and eliminated the requirement for the in vivo step [83]. Several next-generation sequencing-based SAGE methods have been reported. One method termed DeepSAGE uses the 454 sequencing technology to generate 300,000 tags with less effort than a traditional LongSAGE experiment generating 50,000 tags [141]. Another SAGE-like method based on a new sequencing technology, Tag-Seq relies on the Illumina technology to generate 10 million tags per run which represents a two orders of magnitude increase over the throughput of traditional LongSAGE [142].  Both of these methods have been shown to increase the representation of low abundance transcripts that evade detection by Sanger-based SAGE methods [142,141], thereby providing a more complete view of the transcriptome. In addition to improving the original sequencing-based methods for gene expression analysis, new sequencing technologies have enabled the development of new sequence census methods, such as Rapid Analysis of 5‘-Transcript Ends (5‘-RATE) used for surveying 5‘ end fragments [143]. Originally the LongSAGE protocol was used in the Cancer Genome Anatomy Project (CGAP) consortium that was formed to construct a public database of gene expression information across multiple cancer, pre-cancer, and normal tissues [144]. This initiative aims to provide a comprehensive resource that could be mined for the identification of transcripts enriched in a particular tissue type. The SAGE protocol was chosen over microarrays due to its digital expression readout and the relative ease with which data from multiple laboratories could be combined together for analysis [144]. Due to the advent of Illumina-based Tag-Seq, several recent CGAP libraries have been constructed using Tag-Seq, which was shown to outperform the originally used LongSAGE protocol and microarrays in terms of dynamic range and transcript representation, including the representation of sense-antisense transcript pairs [142]. 1.7.2.1 Whole transcriptome sequencing of cancers Full length cDNA sequencing [145] and the generation of expressed sequence tags (ESTs) or single sequencing reads derived from one end of a cDNA clone [146] have been used to characterize cellular mRNA profiles, including those of cancer cells. However, primarily due to the cost of sequencing, these Sanger sequencing-based methods had been even less effective than traditional SAGE at providing a representation of rare cellular transcripts or transcript representation [147]. With the development of new sequencing  24 technologies, EST sequencing gained potential as one of the sequence census method for studying mRNA profiles on a genome-wide scale. With the elimination of the cloning step and common use of random priming, next-generation EST sequencing tags can now cover the whole length of transcripts [148]. Deep EST sequencing of transcriptomes using next- generation technology is also referred to as whole transcriptome shotgun sequencing (WTSS)[149] or RNA sequencing (RNA-Seq) [150,151]. In a version of this approach, polyA-selected or ribosomal RNA-depleted RNA is reverse transcribed into cDNA, which is then fragmented and sequenced using a next-generation technology to generate reads intended to cover the full length of a transcript [149]. Comparative transcript coverage with each of the sequencing-based methods described thus far is provided in Fig. 1.2. The ability to cover the whole length of transcripts with RNA-sequencing reads enables many applications, previously unachievable with tag sequencing and hybridization approaches [152]. Similarly to hybridization-based approaches, RNA-Seq is able to address differential gene- and exon-level expression but with lower background, over a larger dynamic range, and with opportunities for repeat analyses based on different sets of annotations [153]. In addition, RNA sequencing data can be used to study the structure of splice isoforms [154], and identify chimeric transcripts [155] that may result from genomic rearrangements [156] and/or trans-splicing [155]. Moreover, read sequence information allows for the detection of mutations [103] and RNA edits [157], as well as quantification of the expression level of each alternative allele [158] – applications not readily available with tag-sequencing or array technologies. To date the transcriptomes of several cancer cell lines and primary tumors, including those from cervical, colon, prostate, and hematopoietic cancer types have been characterized by RNA-Seq protocols using 454, Illumina or SOLiD technologies [152]. RNA-Seq was the approach that enabled the recent discovery of key recurrent mutations in FOXL2 and ARID1A in ovarian cancers [103,159], and EZH2 mutations in B- cell lymphoma [102]. Similar approaches have been also applied to the discovery of mutations in other cancers, including acute myeloid leukemia [160] and malignant pleural mesothelioma [161]. In addition, this approach has led to the discovery of novel expressed gene fusions affecting the RAF kinase pathway in solid malignancies [162]. The alternative splicing application of RNA-Seq has been applied to the identification of splice isoforms  25 associated with drug resistance in colorectal cancer [154]. These studies suggest that RNA- Seq is a versatile approach that not only enables the examination of gene expression profiles, but also simultaneously allows the detection of coding mutations and gene rearrangements, at least where these events do not abrogate gene expression. RNA sequencing is used in Chapters 3 and 4 of this thesis to characterize the expression profiles on neuroblastoma tumor-initiating cells (Chapter 3) and primary tumors (Chapter 4). In addition to gene-level expression profiling, RNA-Seq is used for exon level expression analysis (Chapter 3), and the detection of point mutations and fusion transcripts (Chapter 4). 1.8 Integrative genomics of cancers With increasing amounts of genome sequence, copy number, expression, and epigenetic data, generated for different cancer types, efforts have focused on integrating these data sets to produce multidimensional views of cancers. Such efforts are important priorities of large-scale cancer genomics initiatives, notably the TCGA [89]. To address the demands of the research community, several software platforms have been developed for the visualization and analysis of multiple types of genomic data, including the Integrated Genomics Viewer (IGV) [163], the Cancer Genomics Workbench [164], the UCSC Cancer Genomics Browser [165] and others. Integrative genomic studies of cancers have followed several general directions: identifying genes [166,92] and pathways [98] affected by multiple types of aberrations within the same cancer; combining multiple data types to define and characterize disease subtypes [167,168]; and conducting systems biology analyses to reconstruct cellular regulatory networks [169]. The first TCGA study that demonstrated the power of integrating multiple datasets to provide a system-level view of a cancer combined DNA copy number, gene expression, sequence and DNA methylation information from a cohort of 206 cases of glioblastoma multiforme (GBM) [91]. This study defined three signaling pathways, RTK/RAS/PI-3K, RB, and p53 signaling, each altered in over 75% of GBM patients. Even though GBM did not have frequent recurrent changes at the level of single genes, multiple datasets revealed highly-recurrent changes at the level of signaling pathways, demonstrating the power of integrative analysis to identify recurrent and prevalent alterations at the level of pathways and functional networks. Other example discoveries from integrative data analyses of cancers include the characterization of three subtypes of GBM (proneural, mesenchymal  26 and classical) associated with different gene expression and mutation signatures impacting the clinical outcome [168]; the discovery of defects in homologous recombination in a large fraction (approximately 50%) of ovarian cancers studied by the TCGA [98]; and the realization that multiple types of sequence, expression and epigenetic defects, observed in acute lymphoblastic leukemia, affect the WNT and MAPK pathways, implicating these pathways as potential therapeutic targets for the disease [166]. 1.9 Childhood neuroblastoma As discussed in Sections 1.3 and 1.4, most adult cancers arise through progressive accumulation of genetic aberrations likely occurring over many years or decades. In contrast, fewer genetic changes occurring in a short developmental time window may be sufficient for the tumorigenesis of childhood cancers [99,170,171]. Therefore, characterizing the developmental origin of childhood cancers is essential to understanding the biology of these malignancies. Neuroblastoma (NBL) is a childhood cancer of the developing sympathetic nervous system [172]. Tumors of the sympathetic nervous system account for 7.8% of all cancers among children younger than 15 years of age and of these, 97% are NBLs [25]. The ganglia of the sympathetic nervous system are derived from the sympathoadrenal lineage of the embryonic neural crest [173]. The neural crest and its multiple lineages are discussed in more detail in Section 2.1. According to the Surveillance Epidemiology and End Results database that tracks cancer epidemiology data in the United States, NBL is the most common cancer diagnosed in the first year of life in the United States [25]. There are approximately 60 new NBL cases each year in Canada (Canadian Cancer Society). The most common site for primary NBL tumors is the adrenal medulla; however, tumors can arise anywhere along the sympathetic branch of the autonomic nervous system (the branch that mediates the fight-or-flight response) [174]. The exact cell of origin of NBL is unknown and likely differs for different disease subgroups, such that aggressive tumors derive from morphologically undifferentiated cells while benign tumors derive from more differentiated cell types [175] . It is thought that a subset of NBLs originates from PHOX2B- positive neuronal progenitors [176].  As discussed in Section 1.9.2, inherited mutations in PHOX2B are associated with a fraction of familial NBLs.   27 1.9.1 Classification, treatment and prognosis NBL cases are diverse with regards to the histopathology, molecular features, and clinical outcomes. At presentation the disease can be limited to a single organ, locally or regionally invasive, or widely disseminated; more than 50% of cases are metastatic at presentation [177,174]. The most common metastatic sites are lymph nodes, bone marrow, bone, and liver [174]. Intriguingly, NBL is both disproportionally lethal despite very aggressive multimodal therapy and associated with a highest rate of spontaneous and complete regression in a subset of cases [178,174,179]. Among other factors, disease prognosis strongly depends on the age at diagnosis, with most infants typically having more favorable prognosis than older children. Historically a 12 months age cutoff was used for pre-treatment risk assessment; however, a recent retrospective study that examined the outcomes of 3,666 patients correcting them for MYCN status and stage, reported a continuous prognostic impact of age [180]. Statistical analysis performed in this study showed that a 460-day (18 months) cutoff maximized the outcome difference for younger and older patients.  To facilitate comparisons between clinical trials and studies conducted in different countries, the International Neuroblastoma Staging System (INSS) was developed in 1988 by an international panel of experts and revised in 1993 [181]. Since then, the INSS has been the most commonly used staging system in Europe and North America [179].  The INSS is a surgically-based system that differentiates patients into stages 1, 2A, 2B, 3, 4 and 4S based on the degree of surgical excision, lymph node involvement, presence of distant metastases and age (younger or older than 12 months). A significant limitation of this system is its dependence on surgical resection, whereby patients with localized disease who do not undergo surgery cannot be properly staged.  To address this limitation, a pre-treatment staging system was developed by the International Neuroblastoma Risk Group (INRG) task force and termed the INRG staging system [182]. According to the INRG staging system, tumors are to be classified at diagnosis into one of the four stages: L1 (localized disease without image-defined risk factors), L2 (localized disease with image-defined risk factors), M (metastatic disease), and MS (metastatic special disease).  In addition, after examining 8,800 NBL cases from North America, Europe, Japan, and Australia, the INRG task force also characterized 16 clinically  28 distinct pre-treatment risk groups that are defined by 7 risk factors: age, INRG stage (L1, L2, M or MS), histological category, differentiation grade, MYCN oncogene amplification status, 11q LOH status, and ploidy [183]. Based on these factors, the INRG recommends classifying tumors into four pre-treatment risk categories with statistically different 5-year event-free survival (EFS): very low-risk (5-year EFS > 85%), low-risk (5-year EFS 75-85%), intermediate-risk (5-year EFS 50-75%), and high-risk (5-year EFS < 50%). Low- and very low-risk patients are often observed without any interventions or cured with surgery alone [184]. A special subset of low-risk patients with metastatic disease, denoted as INRG stage MS  (INSS stage 4S) includes patients younger than 18 months with metastatic disease limited to bone marrow, liver or skin, favorable histology and no MYCN amplification. This subset of patients is often given supportive care and observed as these patients tend to achieve complete disease regression without any treatment [184]. Intermediate-risk patients are treated with surgery and moderate intensity chemotherapy, while high-risk patients undergo one of the most aggressive anti-cancer protocols available for both pediatric and adult cancer [184,174].  The front-line protocol for high-risk NBL includes surgery, high intensity chemotherapy with stem cell rescue, radiation, and biological therapy with retinoids [179]. Even despite this aggressive treatment, only 30-40% patients achieve long-term survival, and there is no regimen proven to be curative for relapsed disease [174]. A recent phase 3 clinical trial showed that adding ch14.18 monoclonal antibody against tumor-specific antigen GD2 to standard isotretinoin therapy for first remission improves survival for high-risk NBL patients by 20%, suggesting implementation of the immunotherapy protocol as part of the standard treatment for high-risk NBL [185].  Even so, high-risk NBL remains a significant challenge for pediatric oncologists, and new therapies are needed to improve the survival and reduce treatment-related morbidities for these patients. Chapter 3 of this thesis focuses on the analysis of NBL tumor-initiating cells, isolated from the bone marrow of relapsed high-risk NBL patients. As described in Section 1.5, cancer stem cells and tumor-initiating cells are presumed to be associated with tumor recurrences and drug resistance [35]. Therefore, the characterization of the transcriptomes of NBL tumor-initiating cells may help identify drug targets for relapsed and refractory disease. Chapter 4 describes an analysis of genomes, exomes, and transcriptomes of primary high-risk  29 NBL tumors with the goal of identifying genetic targets that could influence the development of novel therapies for high-risk NBL. 1.9.2 Neuroblastoma genetics and genomics A small subset of NBL cases (<5%) are familial and display an autosomal dominant mode of inheritance [179]. It has been shown in early studies that NBL incidence and family history follows Knudson‘s two hit hypothesis, and it was estimated that up to 22% of cases may have a germline mutation [186]. Recent studies have implicated activating mutations in anaplastic lymphoma kinase ALK to account for most cases of familial neuroblastoma [187,188]. Additionally, a small number of NBL cases that occur in conjunction with congenital central hypoventilation syndrome or Hirschsprung‘s disease are associated with germline mutations in PHOX2B [189,190]. The locus encodes a homeodomain transcription factor essential for the development of autonomic derivatives of the neural crest [191]. While PHOX2B harbors mutations that are exclusively germline, the ALK locus can be mutated or amplified in 5-15% of sporadic  NBL [192–194,188]. Mutated ALK protein is typically overexpressed and shows constitutive kinase activity, and knockdowns of mutant alleles reduce proliferation of NBL cell lines [193]. In addition, recent evidence suggests that wild type ALK alleles may be oncogenic if they are associated with ALK overexpression; therefore, inhibition of wild type or mutant protein with small molecule inhibitors may provide therapeutic avenues for NBL patients with or without ALK mutations [195]. To understand the contribution of common variants to the development of sporadic NBL, a genome-wide association study is currently under way under the patronage of the Children‘s Oncology Group [196,174]. The study aims to genotype 5,000 European ancestry NBL cases and 10,000 matched controls using the Illumina HumanHap550 BeadChip platform. To date, the study has reported significant association with the high-risk NBL phenotypes of SNPs within FLJ22536 at 6p22 (odds ratio = 1.37; 95% confidence interval 1.27 to 1.49; P = 9.33E-15), BARD1 at 2q35 (odds ratio = 1.68; 95% confidence interval 1.49 to 1.90; P = 8.65E-18), and LMO1 (odds ratio = 1.34; 95% confidence interval 1.25 to 1.44; P = 5.20E-16) at 11p15 [197,196,198,66]; while SNPs within DUSP12 at 1q23 (odds ratio = 1.46; 95% confidence interval 1.28 to 1.65; P = 8.13E-9), DDX4 at 5q11 (odds ratio = 1.31; 95% confidence interval 1.14 to 1.49; P = 8.00E-5), IL31RA at 5q11 (odds ratio = 1.24; 95% confidence interval 1.08 to 1.42; P = 2.24E-3), and HSD17B12 at 11p11 (odds ratio = 1.47;  30 95% confidence interval 1.30 to 1.66; P = 5.04E-10) were associated with low-risk disease [199]. In addition, common gains of 1q21 (NBPF23) have been found to be significantly associated with NBL (odds ratio = 2.49; 95% confidence interval 2.02 to 3.05; P = 2.97E- 17), regardless of the disease phenotype [200]. 1.9.2.1 Copy number aberrations Tumor-specific amplification of the MYCN oncogene, found in approximately 20% of primary tumors, was the first copy number alteration to be characterized in NBL [201]. This copy number aberration was immediately recognized to be linked with inferior disease prognosis [202,203], and has remained a key molecular factor in pre-treatment risk assessment ever since (Section 1.9.1). As discussed in Section 1.9.2, ALK is amplified in a subset of NBL tumors. Examination of a panel of 50 NBLs using interphase FISH found that copy number alterations involving ALK occurred in 60% of tumors and were not correlated with copy number status at MYCN, 1p36, 11q or 17q loci [204]. In addition to MYCN and ALK amplifications, copy number alterations at several larger genomic regions are associated with clinical behavior or other phenotypic characteristics of the disease. For instance, losses of 11q have been reported to occur in NBLs without MYCN amplification, and have been associated with a poor disease prognosis in this subgroup [205,206]. In contrast, losses of 1p36 have been shown to be enriched in MYCN-amplified tumors; however, it has been suggested that these losses may confer a poor effect on survival, independently of MYCN. To address the prognostic significance of these two aberrations independently from other factors, these loci were specifically examined in a panel of 915 tumors; the study revealed that unbalanced loss of 11q and loss of heterozygosity at 1p36 were independently associated with poor prognosis in NBL. Due to its prevalence in non-MYCN-amplified cases, 11q status is currently used as one of the criteria for assigning a pre-treatment risk group according to the INRG system [183]. Another common copy number alteration found in NBL is gain of the distal arm of chromosome 17 (17q gain) [207]. This alteration usually occurs in tumors with poor prognosis; however, its independent prognostic significance is unknown [196]. Several reports of translocations between chromosomes 11 and 17 provide a potential pathway for the concomitant occurrence of 17q gain and 11q loss aberrations [208,209]. Other less frequent chromosomal alterations with unknown independent prognostic value have also been reported in NBL [196].  31 In addition to specific copy number events discussed above, overall genome structures of 493 NBL tumor samples were examined using array CGH [187]. The study found that the structure of the tumor genome was variable across tumors, such that some tumors harbored exclusively whole chromosome gains and losses (numerical alterations), while others harbored gains and losses of parts of chromosomes (segmental alterations). Moreover, these genomic patterns were indicative of disease prognosis, such that tumors with numerical chromosomal alterations were associated with excellent prognosis, while tumors harboring any types of segmental chromosomal alterations were associated with high-risk disease or relapse. 1.9.2.2 Gene expression profiling of neuroblastoma Expression of several individual markers, including TRK neurotrophin receptors, have been associated with prognosis in NBL; in particular, expression of NTRK1 (TRK-A)  and NTRK3 (TRK-C) is associated with favorable prognosis [210,211], while the expression of NTRK2 (TRK-B) is associated with poor prognosis [212]. The first microarray study conducted in NBL confirmed the association of gene expression patterns with disease course and derived a panel of 19 genes that could be used to classify tumors into prognostic groups [213] . A number of microarray expression profiling studies followed, several of them specifically focusing on improving the prognostic stratification for intermediate-risk cases that are difficult to assign to a treatment plan according to the known prognostic markers [214,215]. Another large-scale study developed a 144-gene signature, based on which the investigators were able to improve retrospectively  the risk stratification used in NBL clinical trials; the gene expression signature was originally validated in a set of 174 patients [216], and later in 440 patients [217]. A recent meta-analysis study examined several previously published microarray datasets and single-gene studies to develop a robust 59-gene signature that was then validated in a set of 579 primary tumors spanning all risk groups, the largest patient cohort examined to date by gene expression analysis. After adjusting for other known clinical markers of prognosis, such as MYCN status, age and disease stage, the prognostic signature was found to be independently predictive of the overall and event-free survival [218]. While risk stratification based on gene expression has been shown to improve the performance of the  32 prognostic factors currently used in the clinics based on large sample sets, it is yet to be implemented into the clinical management of NBL. 1.9.2.3 Genetically engineered mouse models of neuroblastoma The creation of genetically engineered mouse models (GEMMs) carrying exogenous DNA of interest has contributed to our understanding of the functions of cancer genes [219]. A key role of GEMMs in cancer research has been in characterizing which aberrations in cancer genes can induce or contribute to tumorigenesis when expressed in mice, providing functional evidence for these aberrations acting as driver events in human cancer formation [220,219].  As discussed in Section 1.9.2.1, amplification of the MYCN oncogene is the best characterized genetic event that occurs in 20% of NBL tumors and is associated with poor disease prognosis. To understand the role of MYCN in NBL, a GEMM in which MYCN is overexpressed in the sympathoadrenal lineage of the neural crest using the tyrosine hydroxylase (TH) promoter was constructed [221]. The TH-MYCN mice hemizygous for the MYCN transgene develop NBL tumors with 70% penetrance by 1 year of age, while homozygous mice develop tumors with 100% penetrance by 4 months of age [222,220]. The TH-MYCN mouse model provides a model of MYCN-amplified NBL, and is currently the only well-characterized GEMM available for NBL [220]. Murine NBL recapitulates many of the biological and clinical characteristics of human MYCN-amplified NBL, such as genomic abnormalities (including MYCN amplification), disease pathology and gene expression patterns [220,222]. However, the model differs from the human disease by the low frequency of bone marrow metastases, and the predominantly paraspinal (as opposed to adrenal in humans) primary tumors [223,220]. 1.10 Thesis roadmap and chapter summaries Recent advances in cancer genomics have contributed to our understanding of cancers as diseases associated with multiple aberrations that can affect genes at the level of sequence, copy number, mRNA expression, or epigenetics. Applications of cancer genomic methods to the analysis of high-risk neuroblastoma (NBL) led to discoveries of recurrent copy number aberrations, gene expression signatures, and predisposition markers predictive of the disease phenotype. These studies revealed molecular heterogeneity of high-risk NBL, suggesting that application of higher resolution approaches may identify novel markers linked to the  33 pathogenesis of this disease. The main hypothesis underlying the research described in this thesis is that single-nucleotide resolution analysis of high-risk NBL genomes and transcriptomes will lead to the discovery of new loci that contribute to the disease. I also hypothesized that better understanding of gene expression profiles of the putative normal cell of origin of NBL will help interpret high throughput sequencing data from NBL cells by placing it in the context of expression analysis of the normal neural crest cells. Therefore, the objectives of my research are to characterize the genomes and transcriptomes of high-risk NBL primary tumors, NBL tumor initiating cells, and normal neural crest cells using new sequencing technologies with a goal of identifying novel loci that may be implicated in the disease. Since NBL originates from the developing neural crest, the goal of the research in Chapter 2 is to identify and characterize the expression of key genes and pathways that distinguish normal neural crest stem cells from other stem cell lineages. Key findings of the work described in Chapter 2 include the plasticity of the neural crest stem cell phenotype, in which non-neural-crest derived cells can converge to this phenotype; and the finding of a decreased expression of double-stranded DNA repair genes as compared to another somatic stem cell lineage with a broad developmental potential, the mesenchymal stem cells. The rationale for studying NBL tumor initiating cells (TICs), a highly tumorigenic population of metastases-derived NBL cells, in Chapter 3 is the aggressive behavior of high- risk NBL and its high propensity for relapse, potentially linked to the persistence of TICs that are resistant to conventional therapies. The goal of the research described in Chapter 3 is to use RNA sequencing data from NBL TICs to identify NBL TIC-enriched transcripts and use them to predict therapeutics that could specifically target these cells. The key finding of this work is the identification and validation of AURKB as a novel drug target for NBL TICs. Having studied the transcriptomes of normal neural crest cells (Chapter 2) and NBL TICs (Chapter 3), I addressed in Chapter 4 whether whole genome and transcriptome analysis of primary NBL tumors may identify additional genetic markers that could inform novel therapies of relevance to primary NBL tumors at diagnosis. This large-scale sequencing work revealed that NBL tumors harbored relatively low frequencies of somatic point mutations in coding sequences. Despite this observation, several gene groups, including those involved in the MAPK signaling pathway and chromatin remodeling, emerged from this analysis as  34 being the targets of somatic mutations in 15% and 11% of patients, respectively. These mutational signatures may suggest potential therapeutic avenues that could be explored in patient subgroups with these mutations.   35 Figure 1.1 Advances in sequencing chemistry implemented in the earliest next-generation sequencers In each diagram, DNA templates are depicted as black bars, sequencing primers are shown as aquamarine bars, and DNA polymerases are represented as light blue circles. (A). The pyrosequencing approach implemented in 454/Roche sequencing technology detects incorporated nucleotides (here an A nucleotide is shown) by chemiluminescence (yellow shape) resulting from PPi release. (B). The Illumina method utilizes sequencing-by-synthesis in the presence of fluorescently labeled nucleotide analogues (green, red, blue and yellow circles) that serve as reversible reaction terminators. The sequencing is performed on millions on templates simultaneously, and an imaging step follows each incorporation step to determine the identity of added nucleotides (bottom). (C) The single-molecule sequencing-by-synthesis approach detects template extension using Cy3 and Cy5 labels attached to the sequencing primer (aquamarine) and the incoming nucleotides (fuchsia), respectively. (D). The SOLiD method sequences templates by sequential ligation of labeled degenerate probes. Two-base encoding implemented in the SOLiD instrument allows for probing each nucleotide position twice. For instance, the nucleotide sequence demonstrates that the T base is effectively read twice by red (A to T) and green (T to G). The matrix on the left shows that each of the four colors encodes two separate nucleotide pairs. Reprinted with permissions of Annual Reviews.               36 A                                                                                   C  B                                                                                   D     37 Figure 1.2 Transcript model coverage by various sequencing-based methods for transcriptome analysis The exons in a gene model are represented by orange, blue and green bars, while the introns are in grey.  Following transcription and splicing, a transcript carrying exons 1, 2, and 3 is produced. The coverage of this transcript by various methods is depicted in the black box: Sanger-based expressed sequence tags (ESTs) are generated from the 3‘ or 5‘ end of transcripts, whereas SAGE tags represent short sequences at their 3‘ ends; randomly primed short reads generated by next-generation sequencers detect bases throughout the length of the transcript. Modified with permissions of Annual Reviews.   38   39 Table 1.1 Specifications of the common next-generation sequencing platforms as compared to the most common Sanger sequencer (Life Technologies’ ABI3730XL) The run statistics in this table are from [224]. The average read length is for high quality reads of more than 200 bases (the mode is higher). *Polymerase Chain Reaction (PCR) can be used for the amplification of templates for Sanger sequencing, when it is desired to sequence specific regions of the genome; the use of PCR for template amplification and candidate gene sequencing by the Sanger method is discussed in Section 1.6.3.3.  Instrument Average read length Run time Mega bases per run Sequencing chemistry Template amplification Company ABI3730XL 650 bp 2 hrs 0.06 Sequencing by synthesis with irreversible terminators (Sanger) In vivo cloning* Life Technologies Illumina GAIIx Paired 150 bp 14 days 96,000 Sequencing by synthesis with reversible terminators Bridge PCR Illumina Illumina HiSeq2000 Paired 100 bp 8 days 200,000 Sequencing by synthesis with reversible terminators Bridge PCR Illumina 454/FLX Titanium 400 bp 10 hrs 500 Pyrosequencing on solid support Emulsion PCR Roche SOLiD-4 Paired 50 bp (forward) and 35 bp (reverse) 12 days 71,400 Sequencing by ligation Emulsion PCR Life Technologies PacBio RS 860-1,100 bp 0.5- 2hrs 5-10 Sequencing by synthesis using SMRT (single molecule real time) technology None Pacific Biosciences Heliscope 35 bp N/A 28,000 Sequencing by synthesis with virtual terminators None Helicos BioSciences Ion Torrent (316 chip) >100 bp 2 hrs >100 Sequencing by synthesis with semiconductor detection Emulsion PCR Life Technologies   40 Chapter 2: Transcriptome analysis of normal neural crest stem cells 2  2.1  Introduction During early human development, a zygote (fertilized egg) undergoes cell divisions to form a blastula that implants into the uterus to continue embryogenesis. Following implantation, the process of gastrulation results in the formation of the asymmetric embryo consisting of three germ layers – ectoderm, mesoderm and endoderm – that go on to develop all major organs and tissues in the body. The ectoderm-derived neural crest is a transiently multipotent cell population unique to vertebrates [225]. Neural crest cells migrate out of their origin at the apex of the neural tube, the embryo‘s precursor to the central nervous system, and form aggregates throughout the embryo that later develop into ganglia of the peripheral nervous system. A fraction of neural crest cells infiltrates other organs, such as skin, gut and adrenal glands to generate melanocytes, enteric neurons, and hormone-secreting chromaffin cells, respectively [226]. Neural crest cells also contribute to craniofacial cartilage and bone, as well as cardiac and smooth muscle tissues.  The development of neural crest cell lineages has been compared to the process of haematopoiesis [226], in which blood cell types derive from a hematopoietic stem cell via differentiation into a series of committed progenitors. In this model, the original stem cell is multipotent, while the differentiation potential of the committed progenitors is restricted to the cell types that make up the particular lineage. Accordingly, the existence of progenitor cells committed to specific neural crest lineages has been proposed, including those committed to the enteric, parasympathetic, sympathoadrenal, sensory, glial, and melanogenic lineage [226]. The sympathoadrenal progenitor, a common progenitor to sympathetic neurons  2  Portions of this Chapter have been published, and the co-author contributions are detailed in the Preface as per the University of British Columbia PhD thesis guidelines: O. Morozova, V. Morozov, B.G. Hoffman, C.D. Helgason, M.A. Marra. A seriation approach for visualization-driven discovery of co-expression patterns in Serial Analysis of Gene Expression (SAGE) data. PLoS ONE 3(9):e3205, 2008; H. Jinno, O. Morozova, K.L. Jones, J.A. Biernaskie, M. Paris, R. Hosokawa, M.A. Rudnicki, Y. Chai, F. Rossi, M.A. Marra, F.D. Miller. Convergent genesis of an adult neural crest-like dermal stem cell from distinct developmental origins. Stem Cells 28(11):2027-40, 2010. Copyright by AlphaMed Press; M.D. O‘Connor, E. Wederell, G. Robertson, A. Delaney, O. Morozova, S.S. Poon, D. Yap, J. Fee, Y. Zhao, H. McDonald, T. Zeng, M. Hirst, M.A. Marra, S.A. Aparicio, C.J. Eaves. Retinoblastoma-binding proteins 4 and 9 are important for human pluripotent stem cell maintenance. Exp. Hematol. 39(8):866-79.e1; 2011. Copyright by Elsevier.  41 and chromaffin cells originating from the trunk region of the neural tube, has been identified and characterized using imaging studies in rats [227]. Intriguingly, cell types derived from the sympathoadrenal progenitor, but not other neural crest progenitors, are those that are susceptible to transformation into NBL [216,217]. The differentiation and specification of the sympathoadrenal lineage progenitors, although not completely understood, involves the transcription factors ASCL1, PHOX2A, PHOX2B, and HAND2 [175]. Since NBL originates from the developing neural crest, and moreover, from a specific neural crest lineage, understanding the biology and differentiation of normal neural crest stem cells may help shed light onto molecular events associated with NBL formation. Moreover, germline mutations in PHOX2B are associated with a fraction of familial NBL cases implicating genes involved in neural crest differentiation in NBL formation [190,189]. Work  to date has shown the persistence of adult or somatic stem cells in many tissues, most notably central nervous and hematopoietic systems [230–232]. Similarly, multipotent adult stem cells have been isolated from the dermis of rodent and human skin and termed SKin- derived Precursors (SKPs) [221,222]. These cells have been shown to maintain differentiation potential reminiscent of the neural crest stem cell, and are able to generate peripheral neurons, glia, Schwann cells (a subtype only thought to be made from the neural crest), and smooth muscles [235]. We also demonstrated in a publication that is beyond the scope of this thesis that SKP progenitors reside in the hair follicle niche and exhibit properties expected of a dermal stem cell, contributing to dermal maintenance, wound- healing, and hair follicle morphogenesis [236].  Skin-derived Precursors can be derived from the dermis throughout the body, only the facial component of which originates from the neural crest embryonically [237–239]. Therefore, it is unclear whether SKPs isolated from different areas of the dermis are of neural crest origin per se or converge towards the neural crest stem cell phenotype. If SKPs are derived from the neural crest, this would imply that neural crest progenitors invade the mesoderm-derived dorsal and ventral dermis [237,239] during embryogenesis, and that it is these precursors that associate with hair follicles and generate SKPs. Alternatively, if ventral and dorsal SKPs are not derived from the neural crest, this would indicate a possibility of a second developmental pathway to generate neural crest-like cells from a non-neural-crest origin. Distinguishing between these two possibilities would have important implications for  42 developmental biology as the origin of somatic stem cells, such as SKPs, is not well understood [240]. In terms of the NBL development, the second option would indicate that NBL may potentially derive from a non-neural crest origin by lineage convergence. In terms of NBL laboratory research, the origin of SKPs by lineage convergence would also indicate that SKPs derived from any area of the body can be used to model normal counterparts of NBL cells in the laboratory.  Precedence for the lineage convergence phenomenon has been established by in vitro studies where normal fibroblasts could be reprogrammed toward an ES-cell-like phenotype [241,242]. In addition, as discussed in Section 1.5, breast cancer stem cells can arise in vivo from tumor cells via epithelial to mesenchymal transformation [33]. These considerations led us to hypothesize that neural crest stem cell-like cells may arise from cell lineages other than the neural crest by adopting a neural crest stem cell-like phenotype. The overall objective of this Chapter is to characterize the expression profiles of SKP lines used as models of neural crest progenitors [234] and representing the presumed normal counterparts of neuroblastoma cells [231,232]. To fulfill this objective, I addressed the specific aims outlined below. First, I characterized the transcriptomes of SKPs isolated from facial, ventral and dorsal skin regions of the body that were shown by lineage tracing work to be derived from different developmental origins. In this experiment I showed that the three SKP lineages were similar at the expression level, but maintained the expression of a small set of genes indicative of their embryonic origin. Second, I used the three SKP populations to identify genes enriched and depleted in all SKPs compared to a mesodermal multipotent somatic stem cell lineage, mesenchymal stem cells. These transcripts represent markers that are common to neural crest progenitors regardless of their origin and distinguish the neural crest progenitors from mesenchymal stem cells. Third, as it was noted during the experiments described under Specific Aim 2 that neural crest, but not mesenchymal progenitors, expressed markers of pluripotency, I compared the normal neural crest progenitor-enriched transcripts to transcripts enriched in normal embryonic stem cells. This was done to further delineate similarities and differences between SKPs and embryonic stem cells (ESCs).  The results from these analyses provided insights into distinguishing characteristics of the transcriptome of normal neural crest progenitors, the cell type that is thought to undergo transformation to form NBL.  43 2.2 Results 2.2.1 SKPs of distinct developmental origin are highly similar at the transcriptional level and differ from bone marrow mesenchymal stem cells (MSCs) To address whether SKPs and their endogenous dermal precursors originate from the neural crest or whether, like the dermis itself, they originate from multiple developmental origins our collaborators conducted lineage tracing experiments. Briefly, they used two different mouse Cre lines that allowed them to perform lineage tracing: Wnt1-cre, which targets cells deriving from the neural crest, and Myf5-cre, which targets cells of a somite (mesodermal) origin. By crossing these Cre lines to reporter mice, they showed that the endogenous follicle-associated dermal precursors in the face derive from the neural crest, and those in the dorsal trunk derive from the somites (mesoderm), as do the SKPs they generate. The ventral trunk SKPs were found to derive from lateral plate mesoderm. Despite these different developmental origins, facial and trunk SKPs are functionally similar, even with regard to their ability to differentiate into Schwann cells, a cell type only thought to be generated from the neural crest [245]. To comprehensively define the similarities and differences among these developmentally distinct populations of SKPs,  I compared global expression profiles derived from dorsal trunk SKPs, ventral trunk SKPs, and facial SKPs and mesenchymal stem cells (MSCs), used as a control for mesoderm-derived ventral and dorsal SKPs, all generated from adult rats.  The rat model was chosen to provide a direct comparison with adult rat mesenchymal stromal cells (MSCs) that were available to our collaborators. RNA samples purified from the SKPs and MSCs were analyzed on the Affymetrix GeneChip Rat Gene 1.0 ST Array. After normalization and filtering described in the Methods, genes with variable expression profiles across facial SKPs, dorsal SKPs, ventral SKPs and MSCs were identified using the multiple group comparison implemented in the LIMMA Bioconductor package [246]. The LIMMA method was chosen for this analysis, as several studies noted its consistently favorable performance compared to other common methods of microarray data analysis, including Welch's T-test, ANOVA, SAM, and RVM [247,248]. In total, 7,012 out of 18,879 genes showed evidence of differential expression across the 4 groups (Benjamini- Hochberg-corrected q < 0.05). Spearman Rank correlations, computed between each sample pair based on the expression profiles of these genes, demonstrated that dorsal, ventral, and  44 facial SKPs were virtually identical with an average Spearman Rank correlation value of 0.94 (SD = 0.031). In contrast, average Spearman Rank correlation between SKPs and MSCs was 0.82 (SD = 0.025) (Figure 2.1A). Using a two-tailed Student‘s T-test, this difference was statistically significant (P < 0.0001). Unsupervised hierarchical consensus clustering analysis, performed on the samples based on the variable gene set described above, using a standard hierarchical clustering algorithm (correlation distance, average linkage clustering, 100 bootstrap iterations), confirmed these conclusions. The clustering result demonstrated that the three SKP populations grouped together, and that all three were distinct from the MSC samples (Figure 2.1B). 2.2.2 SKPs of distinct developmental origin maintain a lineage history at the gene expression level To delineate the extent of differences among the transcriptomes of the neural crest- derived facial SKPs, the mesoderm-derived ventral and dorsal SKPs and the MSCs, I performed three-way differential expression analysis using linear models implemented in the LIMMA Bioconductor package [237,234]. The Venn diagrams show the numbers of significantly differentially expressed genes (Benjamini-Hochberg-corrected q < 0.05) that are in common among the comparisons (Figure 2.2A). Taken together, a total of 2,603 genes showed evidence of differential expression in any of the three pairwise comparisons; and the expression levels of these genes are plotted as a heatmap (Figure 2.2B). Of these genes, only 106 were significantly different between dorsal and facial SKPs, while 2,233 and 2,525 differed between MSCs versus dorsal SKPs and MSCs versus facial SKPs, respectively. These data are compatible with the interpretation that precursor cells of at least two, and potentially three, different developmental origins converge on to a highly similar phenotype. I therefore directly compared the expression of genes associated with neural crest fate specification [250], focusing on Slug, Snail, Twist, Sox9, Sox10, Foxd3, and Ap2a1. Heatmaps of the microarray data showed that these genes were expressed at similar levels in all three of the adult rat SKP samples, as were p75NTR and RhoB, which are also associated with neural crest precursors (Figure 2.3A, left panel) [239,240]. Reverse transcription polymerase chain reaction (RT-PCR) analyses of neonatal mouse skin, conducted by our collaborators, confirmed that these mRNAs were also expressed at similar levels in neonatal murine dorsal versus facial SKPs (Fig. 2.3B, left panel).  45 Although these analyses indicate that mesenchymal precursors of different developmental origins converge to a very similar adult precursor cell phenotype, a pairwise differential expression comparison of facial versus dorsal trunk SKPs and dorsal trunk versus ventral trunk SKPs using linear models demonstrated that a subset of genes were significantly differentially expressed (Benjamini-Hochberg-corrected q < 0.05; Table 2.1). Of the 35 most differentially expressed genes in the facial versus dorsal comparison, 10 were higher in dorsal SKPs and 25 in facial SKPs. Of the 35 most differentially expressed genes in the dorsal versus ventral comparison, four were higher in dorsal SKPs and 31 in ventral SKPs (Table 2.1). Many of these genes play an important role during embryogenesis. In particular, dorsal trunk SKPs express high levels of the Zic1 transcription factor relative to both facial and ventral trunk SKPs, and the hox transcription factors Hoxa5, Hoxc4, Hoxc6, and Hoxc9 are high in dorsal trunk SKPs relative to facial SKPs (Figure 2.3B; Table 2.1). In contrast, facial SKPs expressed high relative levels of Pax3, and Msx1, both of which are transcription factors associated with cranial neural crest cells [253], and Mab-21-like 1 and Mab-21-like 2, mammalian homologues of the C. elegans mab-21 cell fate gene that is expressed during embryogenesis [254] (Figure 2.3A, right panel; Table 2.1). The relative enrichment of these different mRNAs was confirmed by our collaborators using RT-PCR analysis of neonatal mouse dorsal versus facial SKPs (Figure 2.3B, right panel). Thus, although these different dermal precursor populations are highly similar, they maintain a history of their distinct developmental origins. 2.2.3 Identification of genes significantly enriched and depleted in neural crest stem cell-like cells We reported in Sections 2.2.1 and 2.2.2 that the expression analysis of SKPs, complementing laboratory studies by our collaborators, was compatible with SKPs being derived from different developmental origins and converging onto the common neural crest stem cell-like phenotype. We next set out to identify transcripts that are enriched and depleted in SKPs from all three developmental origins compared to MSCs. These transcripts represent markers enriched and depleted in normal neural crest stem cell-like cells compared to another type of multipotent somatic stem cell, the bone marrow-derived MSCs. The MSCs provide a suitable comparator for SKPs for two reasons. First, they represent one of the few somatic stem cell lineages with a similarly broad developmental potential to the neural crest  46 [230,255]. Second, MSCs derive from the mesoderm [256], which, as discussed in Sections 2.1 and 2.2.1, is also the lineage of origin of ventral and dorsal SKPs. To identify transcripts enriched in each of facial SKPs, ventral trunk SKPs and dorsal trunk compared to MSCs, I performed pairwise differential expression analysis using linear models implemented in the LIMMA Bioconductor package [246,249]. Based on the pairwise gene expression comparisons, 3,406 genes were significantly differentially expressed between ventral trunk SKPs and MSCs; 2,793 genes were significantly differentially expressed between facial SKPs and MSCs; and 2,424 genes were significantly differentially expressed between dorsal SKPs and MSCs (Benjamini-Hochberg-corrected q < 0.05). Next, results from the pairwise comparisons were combined to identify genes that were significantly enriched or depleted in SKPs compared to MSCs in all three comparisons. In total, 654 genes were found to be enriched in all three SKP lineages compared to MSCs, while 752 were found to be depleted in all the three SKP lineages compared to MSCs. These genes are listed in Appendix A and their expression is plotted as a heatmap in Figure 2.4. 2.2.4 Pathway analysis of SKP-enriched and SKP-depleted transcripts To characterize the functions of transcripts differentially abundant in SKPs compared to MSCs, I conducted a pathway enrichment analysis using the Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com). Using rat-derived gene lists, 618/654 and 681/752 could be annotated by the Ingenuity Knowledgebase. However, when the rat-derived gene lists were converted to human orthologs, 624 out of 654 and 696 out of 752 genes could be annotated in the Ingenuity database, for genes enriched and depleted in SKPs compared to MSCs, respectively.  Notably, when rat-derived gene lists were analyzed, the functional enrichment results were almost identical to those obtained for the human orthologues. The results from the human ortholog pathway analysis are quoted here as the human analysis had a higher number of annotated genes. After applying Benjamini-Hochberg correction, 14 canonical pathways were significantly enriched among the genes upregulated in SKPs (Figure 2.5A, Table 2.2A), while 13 canonical pathways were enriched among the genes downregulated in SKPs (Figure 2.5C, Table 2.2B). The majority (9 out of 14) of the pathways upregulated in SKPs compared to MSCs involved WNT/Beta-Catenin, bone morphogenetic protein BMP or transforming growth factor beta TGFB signaling (Table 2.2A). These include canonical pathways named ―Role of  47 Osteoblasts, Osteoclasts and Chondrocytes in Rheumatoid Arthritis‖, ―Axonal Guidance Signaling‖, ―Colorectal Cancer Metastasis Signaling‖, ―Human Embryonic Stem Cell Pluripotency‖, ―WNT/Beta-Catenin Signaling‖, ―Role of Macrophages, Fibroblasts and Endothelial Cells in Rheumatoid Arthritis‖, ―Molecular Mechanisms of Cancer‖, and ―Leukocyte Extravasation Signaling‖. While the canonical WNT/Beta-Catenin signaling on its own induces differentiation of the neural crest along the sensory neural lineage, the BMP signaling pathway was shown to antagonize this activity, such that presence of both BMP and WNT maintains neural crest stem cell phenotype and multipotency [257].  Therefore, the finding that WNT/Beta-Catenin and BMP signaling account for the majority of pathways upregulated in SKPs is consistent with the neural crest stem cell-like phenotype of these cells, and distinguishes them from other non-neural crest multipotent stem cells. In addition, this finding provides a validation for the computational approach used to identify the pathways associated with transcripts enriched and depleted in SKPs compared to MSCs. An adaptation of the Ingenuity canonical pathway named ―Human Embryonic Stem Cell Pluripotency‖, outlining WNT/Beta-Catenin and BMP signaling along with other key stemness molecules in provided in Figure 2.5B with the genes upregulated in SKPs highlighted in red. In contrast, the majority (9 out of 13) of the pathways downregulated in SKPs compared to MSCs were involved in cell cycle control or DNA repair (Table 2.2B). These include canonical pathways named ―Hereditary Breast Cancer Signaling‖, ―Role of BRCA1 in DNA Damage Response‖, ―ATM Signaling‖, ―DNA Double-Strand Break Repair by Homologous Recombination‖, ―Mitotic Roles of Polo-Like Kinase‖, ―Cell Cycle Control of Chromosomal Replication‖, ―Cell Cycle: G2/M DNA Damage Checkpoint Regulation‖, ―Role of CHK proteins in Cell Cycle Checkpoint Control‖, and ―Molecular Mechanisms of Cancer‖. Intriguingly, the breast cancer, early onset 1 BRCA1 gene was involved in 7 out of 9 of these pathways (Table 2.2B) suggesting its central role in the regulatory network downregulated in SKPs compared to MSCs. An adaptation of the Ingenuity canonical pathway named ―Role of BRCA1 in DNA Damage Response‖ is depicted in Figure 2.5D with the genes downregulated in SKPs highlighted in green.  48 2.2.5 SKPs share expression profile similarities with ES cells 2.2.5.1 Identification of genes associated with the maintenance of the undifferentiated state in human ES cells In Sections 2.2.3 and 2.2.4 I reported on similarities and differences in gene expression between two lineages of multipotent somatic stem cells with a broad developmental potential, SKPs and MSCs [258,230,255]. This analysis revealed that the Ingenuity pathway annotation ―Human Embryonic Stem Cell Pluripotency‖ was significantly enriched among transcripts expressed at a higher level in all three SKP lineages compared to MSCs (Table 2.2A) suggesting that SKPs had more features of a pluripotent cell than MSCs. This result is consistent with the observation that neural crest stem cells arguably have the broadest developmental potential among the somatic stem cell types, as they are able to generate diverse cell types, including smooth muscle cells, dermis, tendons, connective tissues, sensory neurons and root ganglia of the peripheral nervous system, Schwann cells, pigment cells, and neuroendocrine cells of the adrenal medulla. Since the expression of pluripotency markers was a distinguishing feature of SKPs compared to MSCs, we aimed to further define the extent to which SKPs resembled human ES cells. To investigate the similarities and differences in gene expression between SKPs and ES cells, I set out to identify a comprehensive list of genes that may be associated with the maintenance of pluripotency in human ESCs. To accomplish this, 319 candidate pluripotency genes were selected by the Connie Eaves laboratory based on their potential role in stem cell biology, and literature reviews. These genes are involved in transcription, chromatin maintenance, and membrane receptor signaling and are listed in Appendix B. To determine which of the 319 candidate genes might be most tightly linked to the maintenance of pluripotency, I undertook an analysis of their representation in 17 LongSAGE libraries derived from undifferentiated and differentiated human ES cells as well as several adult tissues (Table 2.3). I used the SAGE Genie tool for the gene-to-tag mapping such that each gene was represented by one LongSAGE tag [259].  Following the mapping, the tag counts were normalized to the depth of 100,000, and a heuristic approach termed seriation was used to identify groups of genes with similar expression levels in different libraries. The seriation approach was chosen for this analysis, as it was shown to perform favorably compared to a clustering algorithm PoissonC, commonly used for SAGE data  49 analysis, when relatively small numbers of genes (under 5,000) are studied [260]. Since our implementation of seriation is based on the visualization of contigs to identify co-expressed genes, the approach is suitable for targeted analyses, such as when the expression of a selection of genes is considered. The method is not suitable for unsupervised genome-wide analysis. Seriation is a statistical method for simultaneously ordering rows and columns of a symmetrical distance matrix for the purposes of revealing an underlying one-dimensional structure [261]. An assumption in seriation analysis is that there is an order (or distinct sub orders) in the data that are biologically meaningful. The inherent orders may represent any sequential structure among the data (e.g. their dependence on time or another variable). In the application described here, I hypothesized that the sequential structure present in the expression data was the developmental restriction of the expression of the candidate pluripotency genes to the undifferentiated ES cell libraries. The seriation analysis identified three categories of genes (Figure 2.6A, Appendix B). In the original publication, we termed these higher order structures or categories ―Supercontigs‖ [260]. Upon further inspection, Supercontig1 contained 114 genes for which the expression was restricted to undifferentiated ES cells. This group contained tags for POU5F1, NANOG, SOX2, FOXD3 and other genes whose expression is known to decrease upon human ES cell differentiation. The average expression levels of these 114 genes in the 17 LongSAGE libraries are plotted in Figure 2.6B. A second group (Supercontig 2) consisted of 145 genes whose transcript abundance was increased in differentiated cells. The third subset (Supercontig 3) contained the remaining 60 genes whose expression patterns did not fit within either of the two categories. 2.2.5.2 Validation of pluripotency markers using computational methods Since the genes in Supercontig 1 were preferentially expressed in the undifferentiated human ES cells, I hypothesized that they would be involved in pathways associated with human ES cell pluripotency. To test this hypothesis, I used Ingenuity software (Ingenuity Systems, www.ingenuity.com) to identify canonical pathways significantly enriched among the 114 genes in Supercontig 1 (Appendix B). After correcting for multiple testing, four Ingenuity canonical pathways ―Role of Oct4 in Mammalian Embryonic Stem Cell Pluripotency‖, ―Role of NANOG in Mammalian Embryonic Stem Cell Pluripotency‖, ―Human Embryonic Stem Cell Pluripotency‖, ―Actin Cytoskeleton Signaling‖ remained  50 significantly enriched among the 114 transcripts in Supercontig 1 (Figure 2.7A). As expected, all of these pathways appeared to be associated with the maintenance of pluripotency. Given the restriction of the expression of the 114 transcripts in Supercontig 1 to the undifferentiated ES cells, we further hypothesized that the promoters of these genes, but not genes in other Supercontigs, contained binding sites for core transcription factors OCT/POU, SOX2, and NANOG that are known to be required for the maintenance of pluripotency in cell culture [262]. To address this possibility, we used PASTAA software to analyze the promoters of the genes contained in all three Supercontigs [263]. PASTAA interrogates groups of co-expressed genes and ranks their likelihood of being regulated by a transcription factor, as evident from the presence of known transcription factor binding sites. Two separate PASTAA analyses were performed on each of the three Supercontigs. One analysis involved interrogating a region extending 10 kb upstream from the transcription start site (distal analysis); the other analysis interrogated a region 6400 bp on each side of the transcription start site (proximal analysis). PASTAA analysis of Supercontig1 (the 114 undifferentiated human ES cell-specific genes) showed SOX (P= 0.0018) and FOX (P= 0.041) PWMs to be highly ranked in the distal analysis, and NANOG (P=0.013) and OCT/POU (P= 0.024) PWMs to be highly ranked in the proximal analysis (Figure 2.7C and B, respectively). Notably, the PASTAA data predicted binding of all four core pluripotency transcription factors NANOG, OCT/POU and SOX to their own and each others‘ promoters, as expected [262]. Several PWMs scored higher than NANOG in the proximal analysis (hollow circles left of the NANOG PWM in Figure 2.7B). These included PWMs representing binding sites for nuclear factor Y (NFY), a regulator of stem cell gene CD34 [264]; transcription factor AP2, expressed during early development [265] and required for the development of the sympathetic lineage [266]; TBX5, a regulator of limb development; and a neuronal regulator ELK1 [267]. In the distal analysis, PWMs for cAMP response element-binding (CREB) family members involved in neurogenesis [268] scored highly (hollow circles between ATF and FOX PWMs in Figure 2.7C), in addition to the PWMs for the core pluripotency factors. PASTAA analysis of Supercontig 2 also predicted a FOX motif to be present in the distal promoter region of these genes (P=0.0185) and a NANOG motif in the proximal promoter region (P=0.00786). Analysis of Supercontig 3 predicted FOX motifs in both the  51 distal (P=0.0404) and proximal (P=0.0327) promoter regions of these genes, but no NANOG, OCT/POU, or SOX motifs in either the distal or proximal promoter regions. In conclusion, only upstream regions of genes in Supercontig 1, but not the other Supercontigs, contained binding sites for all four core pluripotency transcription factors NANOG, OCT/POU and SOX that are required for the maintenance of the undifferentiated state in cell culture. This finding is consistent with the restricted expression of these genes in the undifferentiated ES cells, and provides additional evidence for their candidate role in the regulation of pluripotency. 2.2.5.3 Pluripotency genes whose transcripts are enriched or depleted in normal neural crest stem cell-like cells compared to mesenchymal stem cells In Section 2.2.5.1 I used seriation of LongSAGE libraries to identify candidate pluripotency genes whose expression was restricted to undifferentiated human ES cells. I identified 114 genes preferentially expressed in undifferentiated human ES cells, and their candidate role in pluripotency was supported by computational analyses in Section 2.2.5.2. To assess whether these pluripotency markers were among the transcripts enriched or depleted in SKPs with respect to MSCs, I compared the genes in Supercontig 1 to those identified in Section 2.2.3 and listed in Appendix A. This analysis revealed 5 pluripotency markers (CTNNB1, ETV4, MAD2L2, PITX2, SOX2) among the genes enriched in SKPs compared to MSCs, and 13 pluripotency markers (ADAM23, AURKB, CENPK, FAM46B, FAM64A, HMGB2, IGF2BP3, KPNA2, MTHFD1, MYBL2, TBX4, TPM1, ZFP57) among the genes depleted in SKPs compared to MSCs (Table 2.4). Two known pluripotency genes CTNNB1, a member of the WNT signaling pathway, and SOX2, one of the master regulators of pluripotency [262], were among the transcripts enriched in SKPs compared to MSCs. In addition, AURKB, a kinase known to interact with the BRCA1-associated RING domain protein 1 (BARD1), a member of the double-stranded break repair pathway [269] was found to be depleted in SKPs compared to MSCs. Intriguingly, we show in Chapter 3 that AURKB is expressed in NBL tumor-initiating cells and is a drug target for high-risk NBL. These observations suggest that although SKPs share expression profile similarities with ESCs, and have a broader developmental potential than MSCs, they are different from ESCs in the identity of pluripotency markers they express, highlighting the uniqueness of the neural crest stem phenotype.  52 2.3 Discussion NBL originates from the sympathoadrenal lineage that is thought to derive from sequential differentiation of the neural crest stem cell [226,173,229]. Sympathoadrenal precursors go on to develop the neuroendocrine cells of adrenal medulla, the most common primary site of NBL [174]. A correlation between the differentiation state of NBL cells and the clinical aggressiveness of the disease has been noted, such that cells of the most aggressive high-risk subtype resemble most primitive neural crest precursors, while cells of low-risk subtypes resemble various stages of neural crest differentiation [175]. Therefore, understanding the genesis of the sympathoadrenal lineage from early neural crest precursors, the origin and the phenotype of the neural crest stem cell and its transformation to the malignant counterpart, may shed light onto molecular events that contribute to the development of NBL. In Sections 2.2.1 and 2.2.2 I built on the lineage tracing experiments conducted by our collaborators to help characterize the developmental origin of different populations of neural crest stem cell-like cells (SKPs) that possess many properties of the somatic neural crest stem cell.  The lineage tracing work found that skin-derived neural crest stem cell-like cells originated from different developmental lineages depending on the part of the body they are derived from. However, my gene expression analysis showed that these cell populations possess remarkably similar gene expression profiles. In fact, only 35 genes were significantly differentially expressed between each facial and dorsal, and dorsal and ventral SKPs, in contrast to thousands of genes differentially expressed between each SKP lineage and another multipotent somatic stem cell lineage, the mesenchymal stem cells (MSCs). The MSCs were chosen for this comparison to represent mesoderm, the embryonic origin of dorsal and ventral SKPs. This result sets precedence for the origin of a somatic stem cell from a different tissue type, as in this case mesoderm-derived SKPs from the dorsal and ventral trunk appear to converge to a neural crest stem cell phenotype that is similar to that of neural crest-derived facial SKPs. This observation may be significant for the genesis of NBL that is thought to derive from the neural crest. Our results suggest that since neural crest-like cells can derive from a non-neural crest lineage, it may be possible that neural crest-like lineages may give rise to NBL.  53 Conceptual support for the idea of lineage convergence occurring in nature, specifically as applied to the neural crest, comes from the fact that several tissues, including the gut and respiratory epithelium, are known to contain neuroendocrine cells (typical neural crest derivatives) of non-neural crest origin  [270], supporting the existence of a second developmental pathway that converges on to a neural crest-like phenotype. Additional support for lineage convergence is based on the observation that pluripotent stem cells can be produced from germ cells [271] and even from somatic fibroblasts by in vitro reprogramming [242].  Intriguingly, just as reported here with SKPs, pluripotent cells obtained from different developmental origins (ES cells, reprogrammed germ cells and reprogrammed somatic cells) have similar expression profiles and functional characteristics but retain epigenetic and expression marks indicative of their primary origin [271]. Having shown that neural crest stem cell-like SKPs derive from different developmental origins, I went on to identify genes and pathways that distinguish the common neural crest stem cell-like phenotype of the three SKP lineages from a multipotent somatic stem lineage derived from the mesoderm (the origin of dorsal and ventral SKP lineages), the MSCs. Mesenchymal stem cells represent one of the few somatic stem cell lineages that is similar to the neural crest in terms of their developmental potential. The MSCs can reportedly differentiate into a wide array of tissue types, including osteoblasts (bone), adipocytes (fat), chondrocytes (cartilage) as well as myocytes (muscle cells) and neurons [230,255]. Differential expression analysis comparing the transcriptomes of SKPs and MSCs revealed that the majority of pathways associated with transcripts increased in abundance in SKPs compared to MSCs involved three core neural crest stem cell signaling pathways, WNT/Beta-Catenin, bone morphogenetic protein BMP or transforming growth factor beta (TGFB) signaling [272]. The expression of both the BMP and WNT/Beta-Catenin pathway members was consistent with the neural crest stem cell phenotype of SKPs, as the coordinated activity of these two pathways is required for the maintenance of undifferentiated state in neural crest stem cells [257]. In contrast, the majority of pathways associated with transcripts whose abundance was decreased in SKPs involved double-strand break DNA repair and cell cycle control. In particular, the BRCA1 molecule participated in many of the DNA repair pathways found to be significantly enriched among the SKP- depleted (MSC-enriched) transcripts, suggesting its central role in the functional network of  54 molecules relatively increased in abundance in MSCs compared to SKPs. This observation is consistent with the recent findings that the MSC cell lineage was resistant to irradiation through, among other pathways, the activation of double-strand break repair by homologous recombination and nonhomologous end joining (NHEJ) governed by BRCA1 [273]. As discussed in Section 2.2.5.3 AURKB, a kinase linked to the BRCA1 signaling pathway through the interaction with BARD1 [269] was found to be part of Supercontig 1 containing genes whose expression pattern was restricted to undifferentiated ES cells. As reported in Section 2.2.5.3 the mRNA expression level of this pluripotency-associated kinase was found to be decreased in SKPs compared to MSCs suggesting that the increased expression of BRCA1 DNA repair pathway in MSCs compared to SKPs is similar to that observed in ES cells. In conclusion, I addressed the overall goal of this Chapter of characterizing the expression profiles of SKP lines used as models of neural crest progenitors and normal counterparts of neuroblastoma cells [243,233]. In particular, I found that known signaling pathways specifically implicated in neural crest stem cells, such as WNT/Beta-Catenin, BMP and TGFB signaling were preferentially expressed in the neural crest stem cell-like cells compared to a mesodermal multipotent cell lineage. A novel finding from our work is the relative decrease in gene-level expression of members of the double-stranded DNA repair pathways that involve BRCA1 in SKPs compared to MSCs and ESCs. The molecular mechanism underlying this observation as well as its functional significance remains to be addressed by future work. Of note is the fact that members of the same pathway were found to be expressed at a higher level in NBL tumor-initiating cells as compared to SKPs (Chapter 3). 2.4 Materials and methods 2.4.1 Microarray analysis of rat SKP lines RNA was prepared from twice-passaged adult rat dorsal, facial, and ventral SKPs and MSCs using Trizol (Invitrogen), as per the manufacturer‘s instructions, followed by the RNeasy Mini Kit (Qiagen, Venlo, Netherlands, http://www.qiagen.com). Three independent isolates for each of dorsal trunk SKPs, ventral trunk SKPs, and facial SKPs, and four independent isolates of MSCs were used for the microarray study. The independent isolates were obtained from different animals. The RNA samples were analyzed on Affymetrix Gene-  55 Chip Rat Gene 1.0 ST Arrays (Affymetrix, Santa Clara, CA, http://www.affymetrix.com). The data were checked for batch effects, background corrected and quantile-normalized according to the standard Robust Multichip Average (RMA) procedure using the Affymetrix Expression Console software. The gene expression data were annotated using R. norvegicus genome build rn4. Subsequent statistical analysis was conducted using R-2.8.1. Microarray data were deposited in the NIH GEO repository (accession number GSE23954). 2.4.2 Unsupervised analysis to assess global transcriptome similarity Genes with variable expression pattern across four groups (facial SKPs, dorsal SKPs, ventral SKPs, and MSCs) were identified using the multiple group comparison implemented in the Linear Models for Microarray Data (LIMMA) Bioconductor package version 2.16.5. This package was chosen for the analysis as it was reported to perform favorably compared to other common approaches for microarray data analysis, including SAM, Welch‘s T-test, ANOVA and Wilcoxon‘s test [247]. The LIMMA method first builds a linear model for each gene using the number of parameters (experimental groups) defined by the user. A moderated t-statistic (computed based on the weighted average between the variance of each gene and the variance for all genes) is then used to test the hypothesis of each parameter coefficient for each gene being equal to zero [249]. The estimated coefficients are used to represent the fold changes between the experimental groups.  For multiple group comparisons described in Section 2.2.1 (facial SKPs, dorsal SKPs, ventral SKPs, and MSCs) the F-statistic with Benjamini-Hochberg (BH) multiple testing correction implemented in the eBayes function was used to test the null hypothesis that all parameter coefficients were equal to zero, and in other words there were no differences among the experimental groups  [249]. The parameter coefficients were defined by the contrasts between each group pairs: facial SKPs versus dorsal SKPs, facial SKPs versus ventral SKPs, facial SKPs versus MSCs, dorsal SKPs versus ventral SKPs, dorsal SKPs versus MSCs, and ventral SKPs versus MSCs. Those genes with BH-corrected q < 0.05 were considered statistically significant.   A total of 7,012 out of 18,879 genes showed evidence of differential expression among the four groups (Benjamini- Hochberg false discovery rate-corrected q < 0.05), and these genes were used in the correlation analysis and unsupervised consensus hierarchical clustering. Unsupervised hierarchical cluster analysis was conducted using Bioconductor package Pvclust version 1.2-1 with 100 bootstrap alterations. The Spearman Rank correlation  56 matrix was computed using the standard R cor function, and plotted as an image using the custom function myImagePlot (available at www.phaget4.org/R/myImagePlot.R). 2.4.3 Differential expression analysis using microarrays The preprocessed data were analyzed using the Linear Models for Microarray Data (LIMMA) Bioconductor package to identify genes that show significant evidence of differential expression in each pairwise or multiple group comparison described in the text [249]. The LIMMA method first builds a linear model (lmFit function) for each gene using the number of parameters (experimental groups) defined by the user. A moderated t-statistic computed based on the weighted average between the variance of each gene and the variance of all genes is then used to test the hypothesis of each parameter coefficient for each gene being equal to zero [249]. The estimated coefficients are used to represent the fold changes between the experimental groups, and those genes with coefficients significantly different from zero based on the moderated T-test are considered differentially expressed. The Benjamini-Hochberg (BH)-corrected q < 0.05 was used as the threshold for statistical significance.  For multiple group comparisons, such as described in Section 2.2.2 (ventral SKPs, dorsal SKPs, and MSCs) the F-statistic with BH multiple testing correction implemented in the eBayes function was used to test the null hypothesis that all parameter coefficients were equal to zero (analysis similar to ANOVA), and in other words there were no differences among the experimental groups  [249].  Those genes with BH-corrected q < 0.05 were considered statistically significant, unless listed otherwise. 2.4.4 Reverse Transcription Polymerase Chain Reaction (RT-PCR) to confirm results from SKP microarray analysis RNA was prepared from twice-passaged neonatal mouse dorsal and facial SKPs using Trizol (Invitrogen) and from sorted, uncultured mouse skin cells using Cells-to-cDNA II kit (Ambion/Applied Biosystems, Austin, TX, http://www.ambion.com) as per the manufacturer‘s instructions, followed by the RNeasy Mini Kit (Qiagen). For all analyses, controls were performed without reverse transcriptase. PCR reactions were performed as follows: 94°C, 2 minutes; 25–35 cycles of 94°C, 15 seconds; gene-specific annealing temperature for 30 seconds; and 72°C for 30 seconds. Primers used in this study were as follows: Ap2a1, 5′-TCCCTG TCCAAGTCCAACAGCAAT-3′ and 5′-  57 AAATTCGGTTTCGCACACGTACCC-3′; Eya1, 5′-CTAACCAGCCCGCATAGCCG-3′ and 5′-TAGTTTGTGAGGAAGGGGTAGG-3′; Foxd3, 5′-TCTTACATCGCGCTCATCAC- 3′ and 5′-TCTTGACGAAGCAGTCGTTG-3′; Gapdh, 5′- CGTAGACAAAATGTGAAGGTCGG-3′ and 5′-AAGCAGTTGGTGGTGCAGGATG-3′; Hoxa5, 5′-TAGTTCCGTGAGCGAACAATTC-3′ and 5′- GCTGAGATCCATGCCATTGTAG-3′; Hoxc4, 5′-AACCCATAGTCTACCCTTGGATGA- 3′ and 5′-CGGTTGTAATGAAACTCTTTCTCTAATTC-3′; Hoxc6, 5′- ACGTCGCCCTCAATTCCA-3′ and 5′-CTGAGCTACGGCTGCTCCAT-3′; Hoxc9, 5′- TGTAGCGATTTTCCGTCCTGTAG-3′ and 5′-CC GTAAGGGTGATAGACCACAGA-3′; Mab21l1, 5′-CCCCAACATGATCGCGGCCCAGGCC-3′ and 5′- CCTCCTTCAGGACGTCGGAGACCAC-3′; Mab21l2, 5′- CCCCAACATGATCGCCGCTCAGGCC-3′ and 5′-CGGGGCTCTTGCACCTCCACTTCC- 3′; Msx1, 5′-CGGGCGCCTCACTCTACAGT-3′ and 5′-TCCCGCTGCTCTGCTCAAA-3′; p75NTR, 5′-GTGTTCTCCTGCCAGGACAA-3′ and 5′-GCAGCTGTTCCACCTCTTGA-3′; Pax3, 5′-TGCCCTCAGTGAGTTCTATCAGC-3′ and 5′- GCTAAACCAGACCTGCACTCGGGC-3′; Rhob, 5′-AAGACGTGCCTGCTGATCGTG-3′ and 5′-CTTGCAGCAGTTGATGCAGCC-3′; Slug, 5′- CGTCGGCAGCTCCACTCCACTCTC-3′ and 5′-TCTTCAGGGCACCCAGGCTCACAT- 3′; Snail1, 5′-CGGCGCCGTCGTCCTTCT-3′ and 5′- GGCCTGGCACTGGTATCTCTTCAC-3′; Sox9, 5′- CCGCCCATCACCCGCTCGCAATAC-3′ and 5′-GCCCCTCCTCGCTGATACTGGTG-3′; Sox10, 5′-CAAGGGGCCCGTGTGCTA-3′ and 5′-GCCCGTGCCATGCTAACTCT-3′; Twist1, 5′-CTTTCCGCCCACCCACTTCCTCTT-3′ and 5′- GTCCACGGGCCTGTCTCGCTTTCT-3′; and Zic1, 5′-GCGGCCGAAAGCCAACT-3′ and 5′-TGCCAAAAGCAATGGACAGC-3′. 2.4.5 Pathway analysis of transcripts enriched and depleted in SKPs compared to MSCs Ingenuity software (Ingenuity Systems, www.ingenuity.com) was used for pathway enrichment analysis according to the instructions available on the software website. The rat genes were mapped to the corresponding human orthologs using the Ingenuity software. The human orthologs were then annotated using the Ingenuity Knowledgebase and subjected to a  58 pathway enrichment analysis. This analysis uses a Fisher‘s Exact test to assess the null hypothesis that the number of observed genes in a particular pathway is not different from the number expected by chance, based on the size of the observed gene list, and the number of genes in the pathway. The P-values can be adjusted for multiple testing using the Benjamini-Hochberg (BH) procedure. In this Chapter, BH-corrected q-values of less than 0.05 (unless noted otherwise in the text) were considered statistically significant and sufficient to reject the null hypothesis of a chance association of a particular pathway with the observed gene list. 2.4.6 Seriation analysis of LongSAGE libraries from the Cancer Genome Anatomy Project  LongSAGE gene expression libraries were prepared as described previously [274]. The libraries used in this study are available via the Gene Expression Omnibus database as part of the record GSE14 of the Cancer Genome Anatomy Project resource [259,144]. The tissue origins of the libraries are summarized in Table 2.3. LongSAGE tags were mapped to genes using the Hs_long.best_tag file available through the SAGE Genie at ftp://ftp1.nci.nih.gov/pub/SAGE/HUMAN and as previously described [259]. The tag counts in each library were normalized to the depth of 100,000. The resulting dataset was subjected to seriation analysis using the progressive construction of contigs heuristic implemented using custom MATLAB scripts. The algorithm described in Section 2.4.6 was run three times on the same dataset to ensure that the seriation result was robust. 2.4.7 Seriation using the progressive construction of contigs heuristic Seriation seeks the best enumeration order among objects based on their similarity according to a chosen criterion. Since the problem is NP-hard, we developed a novel heuristic specifically for the SAGE data analysis task. The ‗progressive construction of contigs‘ heuristic attempts to put the most similar objects side by side without breaking already established chains of closely related elements we term ‗contigs‘. Here we use pairwise correlations between expression vectors (normalized tag counts for a particular tag across all libraries) as the criterion for defining similarities between tags; however, in principle, other similarity criteria can be used for this task. The pairwise correlations between tag expression vectors x and y are calculated using the standard correlation coefficient function, R(x,y) = C(x,y)/sqrt(C(x,y)*C(y,x)) where C(x,y) = E[(x – x̅)*(y – y̅)], where x̅ and  59 y̅ are the means of expression vectors x and y, and E is the mathematical expectation. The correlation values are subsequently arrayed into a symmetric matrix, which is subjected to the following progressive seriation procedure. In the first step, the tag pair with the highest correlation value is found and marked as the beginning of the first contig. At each subsequent step the tag pair with the next highest correlation value is identified. If one of the members of the tag pair is involved in a previously formed contig, the columns of the matrix are reorganized to place the other member at the nearest edge of the same contig; since the matrix is symmetrical, the rows are reordered accordingly. Importantly, previously reordered elements are kept intact in this process. If it is impossible to add the similarity maximum of the current step to a contig given the restriction on the previously-moved objects or if the tag pair with the correlation maximum does not involve any of the members of the formed contigs, the current similarity maximum is used to start a new contig. The seriation process continues until all elements have been processed. The result is the production of contigs of similar correlation values that can be displayed along the diagonal of the correlation matrix representing internal topologies in the data. Theoretically, in the case of a Robinson data structure, whereby the data are from a unimodal distribution, the contigs are merged into one and the obtained result is the most optimal single seriation solution [275,261]. A key algorithmic difference between the seriation algorithm described above and a procedurally similar hierarchical clustering algorithm (such as the hierarchical clustering method developed in [276] and implemented in [277] is the treatment of vectors after the highest pairwise correlation value has been identified at each step. In clustering, the vectors are averaged together into a new vector using a linkage rule (for instance, average linkage clustering) and this new vector is represented by a node in the hierarchical clusterogram. In contrast, in the case of seriation, no new vector or node is formed, and the rows and columns of the correlation matrix are merely reordered to reflect underlying patterns in the data as described above. Therefore, no linkage rule is required in seriation in addition to the distance metric used to define similarities. In our implementation of the seriation algorithm, ordered structures (contigs) are revealed by color-coding the reordered correlation matrix according to the magnitude of the correlation value. In this manner, visual inspection of the matrix allows for the selection of ordered contigs for further inspection. Higher order structures (supercontigs) can also emerge from this analysis, indicating more complex patterns in the  60 dataset. Due to the visualization component, the algorithm is able to analyze up to 4000 genes at a time (tested on 1.7 IBM PC Pentium 4, Z60t laptop) and is suitable for the analysis of pre-selected sets of genes. Importantly, the algorithm produces a robust solution for each seriation run (in other words, equivalent solution is produced upon repeated seriation of the same data set). 2.4.8 Computational validation of transcripts in Supercontig 1 as pluripotency markers Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to identify canonical pathways associated with genes in Supercontig 1 as described in Section 2.4.5. To identify transcription factor binding sites in groups of genes, we used the PASTAA Web server as recommended by the authors [263]. The PASTAA algorithm ranks genes by estimating the overall affinity of a position weight matrix (PWM) for sequence regions that are defined relative to the transcriptional start site of each gene in a list.  Two separate PASTAA analyses were performed on each of the three Supercontigs to identify candidate distal- and proximal-acting transcription factor. The distal analysis involved interrogating a region extending 10 kb upstream from the transcription start site, while the distal analysis interrogated a region 6400 bp on each side of the transcription start site.  61 Figure 2.1 Global expression patterns are similar across SKPs of distinct development origins Transcriptome-wide expression profiles from dorsal, facial, and ventral SKPs and mesenchymal stem cells (dSKPs, fSKPs, vSKPs, and MSCs, respectively) were processed as described in Methods. (A). Spearman Rank correlations computed based on genes differentially expressed among the different types of SKPs and MSCs were color coded as shown in the color legend such that yellow represents high and blue represents low correlation, respectively. The color-coded Spearman correlation matrix reveals the relative similarity among the expression profiles of SKPs, regardless of their origin, which is in contrast to the expression profiles of MSCs that formed a separate square (bottom right). The correlation matrix is symmetrical such that the ordering of samples is the same along the x- and y-axes. (B). Unsupervised clustering conducted using correlation distance, and average linkage clustering over 100 bootstrap iterations confirmed the finding of similarity across the three SKP lineages, and their distinction from MSCs. The significance of the hierarchical clustering result was assessed using AU (approximately unbiased, in red font) and BP (bootstrap probability, in green font) re-sampling based on 100 iterations implemented in the R package Pvclust. Modified with permissions of AlphaMed Press.                62 A         63 B   64 Figure 2.2 Facial and dorsal trunk SKP lineages show similar degrees of divergence from MSCs (A). Numbers of genes significant in each of the pairwise differential expression comparisons among the facial SKPs, dorsal trunk SKPs, and MSCs (Benjamini-Hochberg-corrected q < 0.05) are plotted as Venn diagrams. Each pairwise comparison is denoted by a colored circle: facial SKPs vs. dorsal SKPs (yellow); MSCs vs. facial SKPs (pink); MSCs vs. dorsal SKPs (green). Numbers of genes significant in each comparison are quoted in the circles. For instance, the comparison in the bottom (pink and green Venn) reveals that there are 2,525 (1,069 + 1,456) and 2,233 (777 + 1,456) genes differentially expressed between MSCs and facial SKPs, and MSCs and dorsal SKPs, respectively; 1,456 of these genes are differentially expressed in both of the comparisons. This analysis reveals that facial SKPs and dorsal SKPs are more similar to each other than either of them are to MSCs. In addition, the extent of divergence between each SKP lineage and MSCs is similar with 2,525 and 2,233 genes being differentially expressed between the MSCs and facial and dorsal SKPs, respectively. (B). Three-way comparison was conducted across the three groups (facial SKPs, dorsal trunk SKPs and MSCs) to identify genes that show evidence of differential expression using the LIMMA Bioconductor package. Expression profiles of 2,603 genes, identified as differentially expressed among the groups (Benjamini-Hochberg-corrected q < 0.05) are plotted as a heatmap. The rows are centered and scaled by subtracting the mean of the row from every value and then dividing the resulting values by the standard deviation of the row (row Z-Score). Modified with permissions of AlphaMed Press.            65 A                      66 B              67 Figure 2.3 SKPs of distinct developmental origin express neural crest specification genes despite maintaining a lineage history at the gene expression level (A, left panel). Rat microarray expression levels of genes involved in neural crest specification and associated with neural crest precursors: Snail1, Slug, Twist, Sox9, Sox10, Foxd3, Ap2a1, p75NTR and RhoB [252,250,251]. Green indicates the lowest relative levels of expression and red the highest, as defined by the color key. Note that these genes are expressed similarly in the facial and dorsal trunk SKPs, despite the distinct developmental origins of these two SKP lineages from the neural crest and mesoderm, respectively. (A, right panel). Rat microarray expression levels of 10 out of the 35 transcription factors that were identified as being among the most differentially expressed genes between dorsal and facial SKPs in the analysis in Table 2.1. Green indicates the lowest relative levels of expression and red the highest, as defined by the color key. Note the differential expression between dorsal and facial SKPs. (B). RT PCR validation of the microarray results above conducted in the mouse model. For the RT PCR experiment in the left panel, total RNA was isolated from neonatal mouse dorsal trunk and facial secondary SKP spheres. Total RNA from E8.5 murine embryos was used as a positive control for primer performance. For the RT PCR experiment in the right panel, the total RNA was purified from uncultured EGFP-positive cells from neonatal Sox2-EGFP mouse dorsal trunk and facial skin secondary SKP spheres. Total RNA from E8.5 mouse embryos was used as a positive control for primer performance. Modified with permissions of AlphaMed Press.             68 A   69 B     70 Figure 2.4 Transcripts preferentially enriched or depleted in SKPs compared to MSCs Pairwise comparisons were conducted between each of dSKPs, fSKPs and vSKPs and MSCs using the LIMMA Bioconductor package to identify significantly differentially expressed genes (Benjamini-Hochberg-corrected q < 0.05). The results from these comparisons were combined to identify genes commonly enriched or depleted in SKPs; the expression profiles of 654 and 752 genes enriched or depleted in SKPs compared to MSCs, respectively are plotted as a heatmap. The rows are centered and scaled by subtracting the mean of the row from every value and then dividing the resulting values by the standard deviation of the row (row Z-Score). The genes are listed in Appendix A.   71     72 Figure 2.5 Pathway analysis of transcripts enriched and depleted in SKPs compared to MSCs Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to reveal pathway annotations significantly enriched among transcripts increased (A) or decreased (C) in abundance in SKPs compared to MSCs (Benjamini-Hochberg-corrected q < 0.05). The list of transcripts used for the analysis is provided in Appendix A. In (A) and (C) the negative logs of P values are plotted along the x-axis while the pathways are plotted along the y-axis. (B). The Ingenuity canonical pathway named ―Human Embryonic Stem Cell Pluripotency‖ is significantly enriched among the transcripts upregulated in SKPs reflecting the broad development potential of the neural crest stem cell [258]; the pathway members upregulated in SKPs are in red; the protein complexes are bolded; the kinases are denoted with triangles, while cytokines are denoted with squares. (D). The Ingenuity canonical pathway named ―Role of BRCA1 in DNA Damage Response‖ is significantly enriched among transcripts downregulated in SKPs compared to MSCs; the pathway members downregulated in SKPs are in green, and the protein complexes are bolded.              73 A    74 B              75 C    76 D   77 Figure 2.6 Seriation analysis to identify developmentally restricted transcripts expressed in undifferentiated ES cells (A). Seriation analysis of the 319 candidate pluripotency genes (Appendix B) revealed three Supercontigs of co-expressed genes, containing 114, 145 and 60 genes, respectively. Supercontigs are bounded by red boxes and are numbered. Upon inspection, Supercontig1, composed of 114 genes, contained transcripts increased in abundance in the undifferentiated ES cells. (B). Average LongSAGE-based expression level for the 114 genes in Supercontig1 genes across 17 LongSAGE libraries. Reprinted with permissions of Elsevier.  78  A  B     79 Figure 2.7 Computational validation of genes identified by seriation as pluripotency markers (A). Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to reveal canonical pathways significantly enriched among transcripts in Supercontig 1 (Benjamini-Hochberg-corrected q < 0.05). The list of transcripts used for the analysis is provided in Appendix B. (B, C). Proximal and distal promoter analyses of the genes in Supercontig1 reveals the presence of binding sites for the core pluripotency transcription factors, SOX, NANOG, and OCT/POU that are required for the  propagation of undifferentiated ES cells in culture [262].  The hollow blue circles indicate individual PWMs used for the analyses (444 and 487 PWMs were used for the proximal and distal analyses, respectively). The affinity scores of each PWM computed by PASTAA algorithm are plotted against the p-values such that the highest scoring PWMs are in the top left [263]. Several PWMs scored higher than NANOG in the proximal analysis (hollow circles left of the NANOG PWM). These included PWMs representing binding sites for NFY, AP2, TBX5, and ELK1, all associated with stemness and early development [267,264,278,265]. Panels 2.7B and 2.7C are reprinted with permissions of Elsevier.     80 A           81 B   C     82 Table 2.1 Genes with significant evidence of differential expression between (A) fSKPs and dSKPs, and (B) dSKPs and vSKPs as shown in Figure 2.3B (A) Negative log fold change (LogFC) indicates genes whose mRNA levels are decreased in fSKPs versus dSKPs while positive LogFC indicates genes whose mRNA levels are increased in fSKPs versus dSKPs. The genes are sorted by their LogFC; (B) negative LogFC indicates genes whose mRNA levels are decreased in dSKPs versus vSKPs while positive log fold change indicates genes whose mRNA levels are increased in dSKPs versus vSKPs. The genes are sorted by their LogFC. Modified with permissions of AlphaMed Press.  A  Gene LogFC Benjamini-Hochberg-corrected q Eltd1 -4.44344 0.0024664 Zic1 -4.21255 0.0049442 Hoxc6 -4.02777 0.0025499 Hoxc9 -3.35571 0.0206326 Hoxa5 -3.20921   0.0024664 Cdh7 -2.44479 0.0229012 Tfpi -2.4351 0.0020694 Anxa8 -2.13009 0.0157295 Hoxc4 -1.67475 0.0206326 Avpr1a -1.44901 0.0206326 Cox4j2 1.552854 0.0206326 Fzd6 1.728556 0.0206326 Herc3 1.738304 0.0298194 Glt25d2 1.854643 0.0170832 Cnksr3 1.886715 0.0150527 Gpr85 2.016079 0.0207053 Nrp1 2.084411 0.0110914 Sytl3 2.099451 0.0259196  83 Gene LogFC Benjamini-Hochberg-corrected q Il16 2.201115 0.0452825 Cd200 2.377227 0.0381321 Eya1 2.543472 0.0485376 Lphn3 2.547429 0.0117803 Pu3f4 2.820537 0.0051457 Cxcl14 3.096734 0.0024715 Mab21l2 3.277685 0.0071718 Tnfsf11 3.328933 0.0110914 Msln 3.405196 0.0102423 Ptprn 3.521102 0.0206326 Reln 3.724786 0.0016128 RGD13052 3.900023 0.0169522 Thbs4 4.172965 0.0017979 Cdh6 4.205613 0.0016128 Cntn6 4.224511 0.0495426 Pax3 4.409286 0.0049414 Mab21l1 5.134758 0.0016128       84 B  Gene LogFC Benjamini-Hochberg-corrected q Mab21l1 -5.30391 0.00026 Frzb -5.23657 0.001611 RGD1310827 -3.86079 0.00954 Ccr2 -3.78686 0.003454 LOC681994 -3.57244 0.008613 Car2 -3.52152 0.001611 Tmem26 -3.40143 0.013088 Nes -3.22634 0.011844 Cbln4 -3.21517 0.008461 Upk1b -3.14743 0.007513 LRRTM1 -3.12189 0.01291 Cmklr1 -3.09583 0.013115 Cldn10 -3.05589 0.004253 Mark1 -2.9736 0.002901 Il16 -2.80564 0.008005 Plxdc1 -2.70215 0.003454 Slc1a1 -2.66076 0.001611 RGD1307749 -2.60502 0.001611 Nrp1 -2.48394 0.001611 RGD1305869 -2.46487 0.001444 Loxl2 -2.40684 0.013111 Map2 -2.32846 0.010447 Slc4a11 -2.31295 0.008613 Tbx5 -2.26593 0.008242 Scn4b -2.23519 0.004551 Acy3 -2.1814 0.006916 Itga2 -2.17679 0.008005  85 Gene LogFC Benjamini-Hochberg-corrected q RGD1563891 -2.16154 0.01076 Col11a1 -2.16135 0.013115 Tead2 -2.13101 0.008439 LOC499465 -2.1293 0.002551 Avpr1a 2.1247 0.008461 Cdh7 2.587961 0.008613 Emb 3.149873 0.011844 Zic1 4.191047 0.003422   86 Table 2.2 Pathways enriched among the transcripts increased or decreased in abundance in SKPs compared to MSCs Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to identify pathway annotations significantly enriched among transcripts differentially expressed between SKPs and MSCs. A  Ingenuity Canonical Pathways -log(BH q) Ratio Molecules Role of Osteoblasts, Osteoclasts and Chondrocytes in Rheumatoid Arthritis 3.74 1.15E-01 TCF4, ADAM17, SFRP2, MMP3, BMP2, MMP14, PIK3R1, WNT16, MMP13, NFKBIA, NFAT5, WIF1, RUNX2, WNT7B, DKK2, TNFRSF1B, CTNNB1, PPP3CA, MMP1 (includes EG:300339), ADAMTS4, LRP5, GSN, IL7, BMP7, WNT5A Axonal Guidance Signaling 3.74 9.16E-02 FYN, ADAM17, PIK3R1, BMP2, MMP13, WNT16, EPHA4, NCK1, PLXNA2, ROBO1, PRKCZ, SEMA6C, SEMA4C, NTN1, LIMK1, EFNB2, NFAT5, EFNB1, WNT7B, PLCB1, GNA13, ROBO2, RASA1, PPP3CA, GNG12, SEMA5A, MMP10, SEMA3A, WIPF1, NTRK2, SEMA6D, PRKCD, BMP7, SEMA7A, WNT5A Colorectal Cancer Metastasis Signaling 2.54 9.87E-02 LRP5, TCF4, ADCY2, TGFBR1, MMP3, PTGER3, MMP14, PIK3R1, WNT16, MMP10, MMP13, MMP2, TGFBR2, WNT7B, MMP11, MMP12, CTNNB1, ADCY7, GNG12, RALGDS, MMP9, MMP1 (includes EG:300339), WNT5A Human Embryonic Stem Cell Pluripotency 1.69 1.08E-01 TCF4, S1PR2, TGFBR1, PIK3R1, FGFR1, BMP2, WNT16, TGFBR2, SOX2, NTRK2, WNT7B, BMP7, CTNNB1, WNT5A Wnt/Beta-catenin Signaling 1.52 9.94E-02 LRP5, AXIN2, TCF4, SFRP2, TGFBR1, APPL2, WNT16, KREMEN1, TGFBR2, SOX2, CDH2, WIF1, WNT7B, DKK2, CTNNB1, WNT5A Role of Macrophages, Fibroblasts and Endothelial Cells in Rheumatoid Arthritis 1.52 7.96E-02 LRP5, TCF4, SFRP2, MMP3, PIK3R1, WNT16, MMP13, IL7, PRKCZ, IL16, NFKBIA, NFAT5, WIF1, PRKCD, WNT7B, PLCB1, DKK2, TNFRSF1B, CTNNB1, PPP3CA, MMP1 (includes EG:300339), WNT5A, ADAMTS4  87 Ingenuity Canonical Pathways -log(BH q) Ratio Molecules RAR Activation 1.52 9.47E-02 ADCY2, RDH10, BMP2, PIK3R1, RBP1, PRKCZ, CRABP1, PTEN, ADH7, PNRC1, PRKCD, IGFBP3, RXRA, ZBTB16, ADCY7, MMP1 (includes EG:300339) Molecular Mechanisms of Cancer 1.52 7.42E-02 FYN, TCF4, TGFBR1, BMP2, PIK3R1, PSEN2, PSENEN, CDKN2B, PRKCZ, TGFBR2, NFKBIA, PLCB1, GNA13, CTNNB1, RASA1, RALGDS, ADCY2, LRP5, RASGRF2, PRKCD, BMP7, ARHGEF9, ADCY7, BCL2L11, WNT5A Bladder Cancer Signaling 1.35 1.19E-01 DAPK1, MMP3, MMP14, MMP10, MMP13, MMP2, MMP11, MMP12, MMP9, MMP1 (includes EG:300339) Leukocyte Extravasation Signaling 1.32 9.04E-02 TIMP3, MMP3, MMP14, PIK3R1, MMP10, MMP13, MMP2, PRKCZ, WIPF1, PRKCD, VAV3, MMP11, MMP12, CTNNB1, MMP9, MMP1 (includes EG:300339) Airway Pathology in Chronic Obstructive Pulmonary Disease 1.30 4.29E-01 MMP2, MMP9, MMP1 (includes EG:300339) LXR/RXR Activation 1.30 1.14E-01 APOE, SCD, LY96, NR1H3, ABCG1, TNFRSF1B, RXRA, MMP9, ABCA1 Hepatic Fibrosis / Hepatic Stellate Cell Activation 1.30 9.56E-02 IGFBP4, TGFBR1, FGFR1, MMP13, IGFBP5, MMP2, TGFBR2, LY96, IGFBP3, EDNRA, TNFRSF1B, MMP9, MMP1 (includes EG:300339) PTEN Signaling 1.30 1.01E-01 TGFBR2, GHR, NTRK2, TGFBR1, PIK3R1, FGFR1, FOXO3, IGF2R, BCL2L11, PRKCZ, PTEN          88 B  Ingenuity Canonical Pathways -log(BH q) Ratio Molecules Hereditary Breast Cancer Signaling 5.85 1.86E-01 POLR2F, CDC25C, GADD45B, GADD45G, BARD1, RPA1, RAD50, CHEK1, CCNB1, RAD51, HDAC6, FANCB, FANCM, RRAS2, FANCD2, RFC4, H2AFX, MRAS, BRCA2, BRCA1, RFC3 Role of BRCA1 in DNA Damage Response 5.38 2.55E-01 BARD1, PLK1, RPA1, RAD50, CHEK1, RAD51, FANCB, FANCM, FANCD2, RFC4, BRCA2, BRIP1, BRCA1, RFC3 ATM Signaling 4.85 2.50E-01 CDC25C, GADD45B, GADD45G, CCNB2, MAPK12, RAD50, CHEK1, CCNB1, RAD51, SMC2, FANCD2, H2AFX, BRCA1 DNA Double-Strand Break Repair by Homologous Recombination 4.37 5.00E-01 RAD51, LIG1, GEN1, BRCA2, RPA1, BRCA1, RAD50 Mitotic Roles of Polo-Like Kinase 3.01 1.93E-01 KIF23, CDC25C, PLK4, ESPL1, CDC20, PRC1, CCNB2, PKMYT1, PLK1, KIF11, CCNB1 Cell Cycle Control of Chromosomal Replication 2.45 2.59E-01 MCM6, CDC45, MCM2, CDC6, RPA1, DBF4, ORC1 Cell Cycle: G2/M DNA Damage Checkpoint Regulation 1.93 1.86E-01 KAT2B, CDC25C, CCNB2, PKMYT1, PLK1, BRCA1, CHEK1, CCNB1 Germ Cell-Sertoli Cell Junction Signaling 1.93 1.10E-01 TUBB2C, LAMC3, MAP3K5, MAPK12, TUBB, TUBA1B, PAK1, RRAS2, PAK3, SORBS1, MRAS, TGFB3, TUBA1C, ACTG2, ACTN4, ACTN1 Role of CHK Proteins in Cell Cycle Checkpoint Control 1.93 2.06E-01 CDC25C, RFC4, RPA1, BRCA1, RAD50, CHEK1, RFC3 Molecular Mechanisms of Cancer 1.93 8.01E-02 BMP4, LRP6, CDKN2C, MAP3K5, FAS, CHEK1, PAK1, FANCD2, MRAS, BRCA1, CDC25C, CCNE2, SMAD9, HAT1, SMAD6, AURKA, MAPK12, FOS, CCNE1, PRKCI, RRAS2, FZD4, PAK3, FZD6, TGFB3, PLCB3, CAMK2G  89 Ingenuity Canonical Pathways -log(BH q) Ratio Molecules Breast Cancer Regulation by Stathmin1 1.76 9.68E-02 PPP1R14C, CCNE2, TUBB2C, PPP1R3C, PPP1R14A, GNG3, ITPR1, TUBB, TUBA1B, ROCK2, PAK1, CCNE1, RRAS2, PRKCI, MRAS, PLCB3, TUBA1C, CAMK2G Aryl Hydrocarbon Receptor Signaling 1.59 1.02E-01 GSTM1, CCNE2, IL6, ALDH9A1, FAS, CYP1B1, CHEK1, CCNA2, FOS, CCNE1, NFIA, TGFB3, AHR, HSPB1 RhoA Signaling 1.56 1.17E-01 ROCK2, ARHGAP5, MYL9, PFN1, MYL6, CFL2, CIT, IGF1R, ANLN, ACTG2, DLC1, LPAR3     90 Table 2.3 LongSAGE libraries used for the seriation analysis described in Section 2.2.4 The LongSAGE libraries from undifferentiated ES cells, differentiated ES cells and adult tissues were used in seriation analysis to select genes whose expression was restricted to undifferentiated ES cells.  Library ID  Group Description Shes2 Undifferentiated human ES cells H9 human ES cells Shes9 Undifferentiated human ES cells HSF6 human ES cells She10 Undifferentiated human ES cells HES3 human ES cells She11 Undifferentiated human ES cells HES4 human ES cells She13 Undifferentiated human ES cells H7 human ES cells She14 Undifferentiated human ES cells H14 human ES cells She15 Undifferentiated human ES cells H13 human ES cells She16 Undifferentiated human ES cells H1 human ES cells She17 Undifferentiated human ES cells H1 human ES cells She19 Undifferentiated human ES cells BG01 human ES cells Shs11 Differentiated human ES cells H1 human ES cell-derived erythromegakaryocytic progenitors Shs12 Differentiated human ES cells H1 human ES cell-derived enriched primitive hematopoietic multipotent progenitors Shs13 Differentiated human ES cells H1 human ES cell-derived enriched primitive hematopoietic myeloid progenitors Cg643 Adult tissue Normal adult bulk pancreas Cg647 Adult tissue Mammary gland, antibody purified Cg648 Adult tissue Normal substantia nigra Cg655 Adult tissue Normal liver vascular epithelium      91 Table 2.4 Pluripotency genes with transcript abundance increased or decreased in SKPs compared to MSCs The list of 114 pluripotency genes identified by seriation analysis described in Section 2.2.5 was overlapped with the list of transcripts significantly differentially expressed between SKPs and MSCs (Section 2.2.3).  Gene symbol Description Molecular function Ingenuity Canonical Pathway enriched in ES cells Increased or decreased in SKPs CTNNB1 Catenin (cadherin- associated protein), beta 1, 88kDa Transcriptional regulator, key member of the WNT signaling pathway Role of Oct4 in Mammalian Embryonic Stem Cell Pluripotency; Role of NANOG in Mammalian Embryonic Stem  Cell Pluripotency Increased ETV4 Ets variant 4 Transcriptional regulator  Increased MAD2L2 MAD2 mitotic arrest deficient-like 2 (yeast) Component of the mitotic spindle assembly checkpoint  Increased PITX2 Paired-like homeodomain 2 Transcriptional regulator  Increased SOX2 SRY (sex determining region Y)-box 2 Transcriptional regulator Role of Oct4 in Mammalian Embryonic Stem Cell Pluripotency; Role of NANOG in Mammalian Embryonic Stem  Cell Pluripotency; Human Embryonic Stem Cell Pluripotency Increased ADAM23 ADAM metallopeptidase domain 23 Metalloprotease  Decreased AURKB Aurora kinase B Protein serine/threonine kinase  Decreased CENPK Centromere protein K Protein binding  Decreased  92 Gene symbol Description Molecular function Ingenuity Canonical Pathway enriched in ES cells Increased or decreased in SKPs FAM46B Family with sequence similarity 46, member B Unknown  Decreased FAM64A Family with sequence similarity 64, member A Unknown  Decreased HMGB2 High mobility group box 2 Transcriptional regulator  Decreased IGF2BP3 Insulin-like growth factor 2 mRNA binding protein 3 Translational regulator  Decreased KPNA2 Karyopherin alpha 2 (RAG cohort 1, importin alpha 1) Protein transporter  Decreased MTHFD1 Methylenetetrahydrofolate dehydrogenase (NADP+ dependent) 1, methenyltetrahydrofolate cyclohydrolase, formyltetrahydrofolate synthetase Dehydrogenase activity  Decreased MYBL2 v-Myb myeloblastosis viral oncogene homolog (avian)- like 2 Transcriptional regulator  Decreased TBX4 T-box 4 Transcriptional regulator  Decreased TPM1 (includes EG:22003) Tropomyosin 1 (alpha) Structural component of cytoskeleton  Decreased ZFP57 Zinc finger protein 57 homolog (mouse) Transcriptional regulator  Decreased   93 Chapter 3: Transcriptome analysis of neuroblastoma tumor-initiating cells for therapeutic target prediction 3  3.1 Introduction Cancer stem cells and tumor-initiating cells (TICs) have been described in a variety of hematopoietic and solid malignancies, including those of the breast, brain, pancreas, liver, skin, and colon [35]. Primary TIC lines have also been isolated from NBL tumors and metastases [279]. NBL TICs and cancer stem cells share several properties, including the ability to self-renew and differentiate into cell types observed in the bulk tumor, express stem cell markers, and exhibit enhanced tumorigenic potential [279]. While it has been reported that several NBL TIC lines may be contaminated with Epstein-Barr-transformed lymphocytes [280], these lines have been shown to recapitulate metastatic NBL in animals, including upon serial transplantation, supporting their usefulness as models for NBL [279].  A recent study using chronic myeloid leukemia stem cells provided proof of principle that targeting a cancer stem cell–enriched gene could lead to the eradication of such cells and a potential disease cure [281]. Therefore, NBL TICs, which are non-immortalized cell lines with high tumorigenic potential in immunosuppressed mice, can provide a model for the development of improved therapies for recurrent and metastatic NBL.  In this Chapter I describe an RNA-Seq analysis applied to a panel of human NBL TIC and SKP lines (Table 3.1). The overall objective of this Chapter is to assess whether transcripts preferentially abundant in NBL TICs could reveal candidate new drug targets against NBL. To fulfill this objective, I address three specific aims described below. First, I use RNA-Seq data to identify transcripts for which the expression is increased in NBL TICs compared to other tissue types. Second, I conduct functional analysis to identify candidate drug targets among these transcripts, with a specific focus on one drug target of interest, Aurora kinase B  3  A version of this chapter has been published, and the co-author contributions are detailed in the Preface as per the University of British Columbia PhD thesis guidelines: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin. Cancer Res. 16(18):4572-82, 2010. Copyright by the American Association for Cancer Research.  94 (AURKB). Third, I conduct an exon-level analysis to provide a potential mechanism to explain the increased expression of AURKB in NBL TICs. I used SKPs as a normal reference sample for these analyses since, as discussed in detail in Chapter 2, SKPs are multipotent precursors isolated from human foreskin that are able to self-renew and differentiate into various neural crest derivatives, including peripheral neurons and neural crest lineage-specific Schwann cells [234]. As NBL is a tumor of neural crest precursors, SKPs provide a normal reference transcriptome for the identification of candidate gene expression changes associated with the TIC phenotype. To increase the specificity of the identified gene expression changes to NBL TICs, I also compared the expression profiles of NBL TICs to those of a compendium of cancer tissues, including primary tumor samples from breast, skin, brain, B-cells, ovary, cervix, and lung, as well as breast cancer cell lines. 3.2 Results 3.2.1 Identification of genes preferentially enriched or depleted in NBL TICs compared to a compendium of cancer tissues and SKPs Having analyzed the expression profiles of normal neural crest stem cell-like cells in Chapter 2, I set out to characterize the expression profiles of their malignant counterparts, NBL tumor-initiating cells (TICs). These cells have been identified and characterized in NBL primary tumors and metastases, and have been shown to be associated with tumor relapse [279]. We sequenced transcriptomes from 10 NBL TIC lines isolated from tumors and metastases of six high-risk NBL patients (Table 3.1) using Illumina RNA sequencing (RNA- Seq)[148,149]. The NBL TIC lines used in this study included those isolated from patients during disease relapse and one from a patient in remission. Because we previously showed that line NB67, isolated from the bone marrow of a patient in clinical remission who subsequently relapsed, was tumorigenic we included this NBL TIC line in the analysis [279]. To generate reference normal expression profiles, we sequenced the transcriptomes of three foreskin-derived SKP lines from three children without cancer [234]. As described in some detail in Chapter 2, these skin-derived progenitor cells, regardless of their embryonic origin, possess the properties of neural crest stem cells, and therefore may serve as reasonable normal counterparts to NBL TICs.  95 To identify transcripts significantly enriched in NBL TIC lines compared to normal neural crest cells I used the LIMMA Bioconductor package [249] as described in Methods. The LIMMA analysis revealed 817 and 1,913 genes either significantly increased or decreased in abundance in NBL TICs versus SKPs. I considered it likely that, within the list of differentially expressed genes, there would be candidate NBL TIC markers and also transcripts generally associated with a proliferative phenotype. Targeting gene products that are nonspecifically expressed in proliferating cell types would potentially result in increased toxicity, particularly in children whose organ systems are undergoing growth and development. To avoid identifying such genes, and to select gene expression differences specific to NBL TICs, I compared our NBL TIC RNA sequences to RNA sequencing data from 30 cancer samples available at the Genome Sciences Centre. These samples were derived from seven tissue types, including ovary, B-cells, lung, blood, brain, skin, and cervix (Table 3.2) and were included as an additional reference set for the identification of transcripts enriched specifically in NBL TICs. The LIMMA package [249] was used to compare gene expression levels between NBL TICs and other tissues as described in Methods. This comparison revealed that 2,258 genes were increased in NBL TICs compared to other tissues, while 2,397 genes were decreased in expression in NBL TICs compared to other tissues. These gene lists were then compared to the lists of genes identified as significantly differentially expressed between NBL TICs and SKPs to select genes that were significant in the same direction in both of these comparison. This comparison revealed that 449 transcripts were significantly increased in expression in NBL TICs as compared to SKPs and as compared to other tissues. Similarly, 1,059 genes were decreased in expression in NBL TICs in both comparisons, NBL TICs versus SKPs and NBL TICs versus other tissues. To confirm the differential expression of candidate 449 NBL TIC-enriched transcripts (transcripts enriched in NBL TICs as compared to both SKPs and other tissues) and candidate 1,059 NBL TIC-depleted transcripts (transcripts depleted in NBL TICs as compared to both SKPs and other tissues) identified using RNA sequencing, I analyzed eight NBL TIC lines from five patients and five SKP lines from four cancer-free children (Table 2.1) using Affymetrix Human Exon 1.0 ST Array data. This platform provides independent confirmation of gene expression at the level of exons [282,283]. Analysis of exon array data,  96 conducted as described in Methods, confirmed the differential expression of 321 (71%) NBL TIC-enriched and 819 (77%) NBL TIC-depleted transcripts, which were identified as significantly differentially expressed between NBL TICs and SKPs using microarrays (Figure 3.1; Appendix C). These genes represented robust sets of NBL TIC-enriched and depleted transcripts that I analyzed further to identify the pathways disrupted in NBL TICs. 3.2.2 Elevated mRNA levels of BRCA1 signaling pathway members are associated with the NBL TIC phenotype To assess the functional significance of transcripts differentially abundant in NBL TICs, I conducted a pathway enrichment analysis using Ingenuity software (Ingenuity Systems, www.ingenuity.com) as described in Methods. The analysis revealed several signaling pathways significantly associated with the NBL TIC-enriched transcripts (Fisher‘s Exact P <0.05). The pathways are listed below along with the number of NBL TIC-enriched transcripts involved in each pathway: ―Role of BRCA1 in DNA Damage Response‖ (13 genes), ―Purine Metabolism‖ (20 genes), ―Mitotic Roles of Polo-Like Kinases‖ (8 genes), ―Pyrimidine Metabolism‖ (13 genes),  ―Role of CHK proteins in Cell Cycle Checkpoint Control‖ (6 genes), ―One Carbon Pool by Folate‖ (6 genes), ―Cell Cycle: G2/M DNA Damage Checkpoint Regulation‖ (5 genes), ―ATM Signaling‖  (4 genes), ―Cleavage and Polyadenylation of Pre-mRNA‖ (2 genes), ―Alanine and Aspartate Metabolism‖  (3 genes) (Figure 3.2A). In contrast, the following pathways were significantly enriched among NBL TIC-depleted transcripts (Fisher‘s Exact P <0.05): ―Axonal Guidance Signaling‖ (43 genes), ―Hepatic Fibrosis / Hepatic Stellate Cell Activation‖ (21 genes), ―Coagulation System‖ (10 genes), ―Colorectal Cancer Metastasis Signaling‖ (25 genes), ―CXCR4 Signaling‖ (18 genes), ―Germ Cell-Sertoli Cell Junction Signaling‖ (18 genes), ―Factors Promoting Cardiogenesis in Vertebrates‖ (12 genes), ―TGF-beta Signaling‖ (12 genes), ―ILK Signaling‖ (19 genes), and ―Complement System‖ (7 genes) (Figure 3.2B).  Of the 321 genes significantly upregulated in NBL TICs (Appendix C), thirteen were known members of the BRCA1 DNA damage response pathway (Figure 3.2C). This pathway was identified as the most significantly associated with the NBL TIC-enriched transcripts, such that 13 of the 53 pathway members were among the 321 NBL TIC-enriched transcripts. In addition, eight and eleven genes were associated with polo-like kinase and cell cycle  97 checkpoint control pathways, respectively, both of which are direct downstream targets of BRCA1 signaling (Figure 3.2C). 3.2.3 MudPIT analysis confirms the abundance of DNA repair proteins in the proteome of a NBL TIC line To assess the contribution of the NBL TIC-enriched transcripts to the NBL TIC proteome, we conducted a Multidimensional Protein Identification Technology (MudPIT) analysis of whole-cell lysate and a membrane-enriched fraction of NBL TIC line NB88R2 generated from a bone marrow metastasis of a high-risk patient.  The MudPIT technique involves digesting the protein sample into peptides with trypsin and separating the peptides with two liquid column chromatography steps. As the peptides elute from the second column they are assigned unique mass/charge ratio fingerprints that can be used to reveal the identity of each protein [284]. The MudPIT approach can effectively identify thousands of concurrently expressed proteins for global or subcellular fraction–specific proteomic profile analyses [285]. The MudPIT analysis of the whole-cell lysate isolated from line NB88R2 cells revealed 819 proteins in which each protein was identified by at least two peptides. A similar analysis identified 1,530 proteins in the membrane-enriched fraction isolated from the same line. Of the 321 TIC-enriched genes, all of which were expressed in line NB88R2, peptides for 75 were detected by MudPIT in either whole-cell or membrane-enriched lysate of line NB88R2 or both (Table 3.3). Forty-five of the detected proteins were encoded by genes that were expressed in the 75% to 100% expression percentile in the NB88R2 line, whereas only two protein products were detected for genes expressed in the 0% to 24% expression percentile, indicating a correlation between transcript abundance and MudPIT analysis in one cell line. In addition, the median expression level of the genes for which the protein products were detected by MudPIT was 206, while the median expression level of the genes for which the protein products were not detected was 73.3.  To investigate the significance of this difference, we compared the NB88R2 square-root-transformed average expression levels of the 75 NBL TIC-enriched genes for which the protein products were detected by MudPIT to the average expression levels of the 246 NBL TIC-enriched genes for which the protein products were not detected by MudPIT using a one-tailed two-sample equal variance T-test. Based on the P = 7.365E-24, we rejected the null hypothesis that the two expression means  98 were the same, providing evidence for a correlation between the higher expression level of a gene and the ability to detect its protein product by MudPIT.   According to the Ingenuity Knowledgebase annotation, 21% (16 of 75) of the detected proteins were associated with the DNA replication, recombination, and repair functional category (Ingenuity Systems, www.ingenuity.com), including PARP1, PCNA, UBE2N, FEN1, HMGB2, and RFC, which forms a major complex interacting with BRCA1 [286]. This result suggests that DNA repair proteins are expressed in the proteome of at least one NBL TIC line, providing further support to the results of the gene expression analysis. 3.2.4 Known drug targets among NBL TIC-enriched transcripts Because the most direct pharmacologic intervention is inhibition of a target protein [287], I focused further functional analyses on genes upregulated rather than downregulated in NBL TICs with respect to SKPs and other tissues. Drug repositioning, in which existing drugs are used for novel indications, is a powerful approach to novel therapy development because it greatly reduces the cost and time required to clinically develop a new therapeutic option [288]. I therefore aimed to use NBL TIC-enriched genes to identify targets of existing therapeutics with the concept that such drugs could be potentially effective against recurrent NBL. I applied the Ingenuity Knowledgebase (Ingenuity Systems, www.ingenuity.com) tool to map the 321 NBL TIC-enriched transcripts, as well as their interacting partners, to known drugs. This analysis revealed thirty known drug targets among the NBL TIC-enriched genes and their interacting partners defined by the Ingenuity Knowledge Base (bold type in Table 3.3 indicates the NBL TIC-enriched genes). Seventeen out of thirty of the predicted drug targets have been explored preclinically or clinically for the treatment of NBL (Table 3.4). These drugs included both general chemotherapeutics, such as etoposide, becatecarin, doxorubicin, flavopiridol, and vincristine, all of which are currently approved or in trials for NBL, as well as targeted agents such as BCL2 inhibitors, evaluated for the treatment of NBL [289]. Several agents predicted by my analysis, such as HDAC inhibitors and PARP inhibitors, have already shown promise in the management of chemotherapy-resistant NBL [290,291] suggesting that our approach can identify drug targets relevant to the disease.  In addition to known NBL drug targets, my analysis predicted genes and gene products targeted by existing drugs that at the time of the publication of this study had not been implicated clinically as therapeutic targets for high-risk NBL. These molecules included  99 AURKB, PLK1, ADORA2A, CXCL10, SLC1A4, COL14A1, TNFRSF10B, ITGA2b, and IL6. Based on biological and clinical considerations discussed in Section 3.2.5, we selected AURKB for further evaluation as a potential drug target against metastatic NBL. 3.2.5 Targeting BRCA1 signaling: inhibition of AURKB is selectively cytotoxic to NBL TICs The Aurora kinase family includes three serine/threonine kinases involved in the control of the cell cycle. Inhibitors of Aurora A and B kinases have shown promise as anticancer agents for the treatment of solid tumors and leukemias [292]. Although an Aurora A kinase inhibitor is in an ongoing phase I/II clinical trial for NBL (NCT00739427), Aurora B kinase inhibitors have not been investigated in relation to NBL. A recent report suggested a direct link between Aurora B kinase and BARD1, a key component of the BRCA1 signaling pathway that is also associated with susceptibility to NBL [197,269]. This report suggested that full-length BARD1, expressed by normal cells, interacts with BRCA1 and mediates AURKB degradation, while a shorter BARD1beta isoform lacking the BRCA1 interaction domain, expressed by some cancer cells, scaffolds AURKB with BRCA2 stimulating cellular proliferation (Figure 3.2C) [269]. In this study, AURKB was highly expressed in NBL TICs, at the average expression level of 44.35 Reads Per Kilobase of gene model per Million mapped reads (RPKM)(range 9.83—67.66 RPKM). In contrast, AURKB transcripts were not detectable above background in SKPs or other normal samples (Section 3.2.7). The BARD1/AURKB relationship, together with the aberrant expression of the BRCA1/BARD1 pathway and AURKB in NBL TICs observed in this study, as well as the clinical feasibility of Aurora kinase inhibitors provided a rationale for exploring the antiproliferative potential of Aurora B kinase inhibitors in NBL TICs. To assess whether elevated mRNA levels at the AURKB locus in NBL TICs compared to SKPs and a panel of other cancers corresponded to increased levels of AURKB protein, we performed Western blot analysis using whole-cell lysates from three NBL TIC lines (NB12, NB88R2, and NB122R) and two SKP lines (FS274 and FS227). This analysis revealed the presence of the AURKB protein in NBL TICs but detected no protein in SKPs, supporting the gene expression result (Figure 3.3A). To gain further insight into the role of AURKB in controlling NBL TIC proliferation, we performed shRNA knockdown experiments in NBL TIC lines NB12 and NB88R2. NBL TICs stably infected with  100 lentiviruses encoding two separate shRNAs to AURKB showed 77% to 80% growth reduction compared with NBL TICs infected with lentiviruses carrying mock shRNAs to green fluorescent protein or β-galactosidase (Figure 3.3B). The observed reduction in proliferation following AURKB knockdowns supports the premise that AURKB signaling is important for the viability of NBL TICs. To assess whether pharmacologic inhibition of AURKB would have the same effect on NBL TIC proliferation as the AURKB knockdowns done above, we used AZD1152, a selective AURKB inhibitor that is currently undergoing phase I/II testing in patients with acute myelogenous leukemia (NCT00497991). NBL TIC lines (NB12 and NB88R2), as well as the FS283 SKP line, were treated with a range of AZD1152 concentrations, and cell growth was assessed 96 hours later using alamarBlue reduction [293] as a read-out of cellular metabolic activity. As shown in Figure 3.3C, proliferation of NBL TICs is reduced following treatment with AZD1152, showing low micromolar EC50 values (1.5-4.6 μmol/L). In contrast to this, SKPs were less sensitive to AZD1152, exhibiting higher EC50 values (12.4 μmol/L). The enhanced reduction of proliferation of NBL TICs compared to SKPs following genetic (shRNA) and pharmacological (AZD1152) inhibition of AURKB is consistent with the hypothesis that AURKB is a potential drug target for metastatic NBL. 3.2.6 Exon-level expression analysis of BARD1 reveals a potential mechanism for the sensitivity of NBL TICs to AURKB inhibition The full-length BARD1 isoform interacting with BRCA1 was reported to mediate AURKB degradation, while the shorter BARD1beta isoform lacking the BRCA1 interaction domain was reported to be involved in the stabilization of AURKB via interactions with BRCA2 [269]. Since NBL TICs expressed AURKB both at the level of mRNA (Figure 3.1B) and protein (Figure 3.3A), and were sensitive to AURKB inhibition (Figure 3.3B and C) we hypothesized that NBL TICs expressed the BARD1beta isoform that is involved in the scaffolding of AURKB and BRCA2. Upon inspection of the NBL TIC and SKP RNA-Seq data, I found that SKPs expressed BARD1 at the expression threshold level of Reads Per Kilobase of exon model per Million mapped reads (RPKM) ~ 1 [150]. Therefore, I sought an alternative source of reference normal RNA-Seq data to study the exon usage at the BARD1 locus in normal and NBL TIC cells. To address the hypothesis that NBL TICs preferentially express the  101 BARD1beta isoform, while normal cells express the full-length BARD1 isoform, I used the RNA-Seq data from NBL TIC libraries (Table 3.1), and a panel of 16 normal tissues from the Illumina BodyMap 2.0 project available through the Gene Expression Omnibus (GSE30611). The exon-level expression at the BARD1 locus was quantified in these samples as described in the Methods using the RPKM expression measure [150]. The exon usage of each BARD1 exon was defined as the splice index (SI), calculated as the percent of the RPKM level of each exon from the overall RPKM level of the gene, (exon RPKM/gene RPKM)*100. The average SI of exon 2 in NBL TICs, computed across the 10 NBL TIC RNA-Seq libraries, was 2.17% (SD = 0.96), which was significantly less than the average value of 11.5% (SD = 4.01%), computed across the 16 normal tissues, as assessed by a moderated T- test with the Benjamini-Hochberg multiple testing correction implemented in the LIMMA Bioconductor package [249] (BH-corrected q < 0.05). Similarly, the average SI of exon 3 was 8.75% (SD = 2.60%) in NBL TICs, which was found to be significantly different from the average value of 31% (SD = 7.00%) in the normal tissues (BH-corrected q < 0.05) (Figure 3.5A). The moderated T-test was selected for this analysis as it does not assume independence of exons from each other, which is likely a biologically relevant assumption [249]. Instead, to compute the T-statistic, the empirical Bayes moderated T-test method uses information from all exons in a gene, by computing a weighted average between the variance at each exon and the variance across all exons at the locus [249]. The moderated T-test has been used previously for studying differential expression in RNA-Seq data [294] and for differential splicing analysis using exon arrays [295]. Upon manual inspection in the IGV browser [163], exon 1 was found to have a high GC content (average 70% or more), which may account for the low coverage of this region in all samples. Due to the low coverage of exon 1, its SI could not be reliably assessed in this study. Based on the UniProtKB records, the BRCA1 interaction region of BARD1 is comprised of residues 26-119, encoded by a portion of exon 1, exon 2 and exon 3 (Figure 3.5C) [296]. Therefore, the finding of the lower expression of exons 2 and 3 of BARD1 in NBL TIC cells is consistent with the expression of the shorter BARD1beta isoform that has been reported to be involved in the stabilization of AURKB in cancer cells [269]. We also used the trans-ABySS de novo assembly pipeline [297] discussed in Section 4.4.6 to reconstruct the structure of BARD1 transcripts expressed by NBL TICs. This pipeline  102 assembled short RNA sequencing reads into contigs, aligned the contigs to the reference hg18 genome, and then compared the alignments to the annotated transcript models from Ensembl 54 [298]. Since exon 1 of BARD1 was not covered by sequencing reads, we were unable to assemble contigs that spanned the full length of BARD1 transcripts. However, we detected contigs that were missing exons 1, 2, and 3 providing additional evidence for the expression of the BARD1beta isoform by NBL TICs. 3.2.7 Relevance to primary neuroblastoma In Sections 3.2.1-3.2.5 I found that the mRNA levels of members of the  BRCA1/BARD1 signaling pathway were significantly higher in predominantly metastases- derived NBL TICs than in normal neural crest-like cells (SKPs), and other cancers. Moreover, both transcript and protein levels of AURKB, a member of the BRCA1/BARD1 pathway, were found to be enriched in expression in NBL TICs compared to SKPs (Sections 3.2.2 and 3.2.5). We also showed that genetic and pharmacological inhibition of AURKB was cytotoxic to NBL TICs, and less so to SKPs. In Section 3.2.6 I linked the observation of the preferential expression of AURKB by NBL TICs to the expression of the oncogenic BARD1beta isoform that was reported to stabilize AURKB in cancer cells (Figure 3.5C) [269]. Since NBL TICs used in this analysis are predominantly derived from bone marrow metastases of relapsed NBL patients [279], I asked whether the BRCA1/BARD1 pathway, the oncogenic BARD1beta isoform and AURKB were also expressed by primary NBL cells. To address this question, I used the RNA-Seq data from 10 primary NBL tumors, described in Chapter 4 and Appendix D. To investigate whether the mRNA expression of BRCA1/BARD1 pathway members was enriched in primary NBL tumors with respect to normal cells, I compared the expression profiles of 10 primary NBL tumors (Appendix D) and 16 normal tissues from the Illumina BodyMap 2.0 project (Section 3.2.6). I used the Reads Per Kilobase of gene model per Million mapped reads (RPKM) as a measure of gene expression [150], and applied the methods in the LIMMA package [249] to identify genes significantly enriched in expression in NBL cells, as described in Section 3.4.3. This analysis revealed 1,828 genes with evidence of increased mRNA abundance in NBL cells compared to normal tissues (Benjamini- Hochberg-corrected q < 0.05). Ingenuity Pathway Analysis software (Ingenuity Systems, www.ingenuity.com) was then used to identify significantly enriched annotations within this  103 gene list, as described in Section 3.4.3. The pathway enrichment analysis revealed that the pathway entitled ―Role of BRCA1 in DNA Damage Response‖ was the most significantly enriched annotation among the 1,828 genes (Fisher‘s Exact P < 0.05), such that 15 out of 53 members of this pathway (FANCG, FANCA, FANCD2, RAD51, BRCA1, BACH1, AURKB, BLM, RFC, MSH2, SWI/SNF, OCT1, TP53, PLK1, E2F) were more abundant in NBL cells compared to normals at the level of mRNA (Benjamini-Hochberg-corrected q < 0.05). Having established that the BRCA1/BARD1 signaling pathway annotation was significantly enriched among transcripts increased in expression in NBL tumors compared to normal cells, we used the RPKM measure to directly compare the AURKB expression levels, and BARD1 exon usage in NBL TICs, primary NBL and Illumina BodyMap 2.0. We used the Illumina BodyMap 2.0 data rather than SKP data for the primary tumor versus normal analyses, since, as mentioned in Section 3.2.6, BARD1 exon usage could not be reliably assessed in SKPs due to the marginal expression of this gene in the SKPs RNA-Seq libraries. The average SI of exon 2 in NBL TICs and NBL primary tumors was 2.17% (SD = 0.96%) and 3.57% (SD = 1.89%), respectively, both of which were significantly less than the average value of 11.5% (SD = 4.01%), computed across the 16 normal tissues, as assessed by a moderated T-test in and the Benjamini-Hochberg multiple testing correction implemented in the LIMMA Bioconductor package [249] (BH-corrected q < 0.05). Similarly, the average SI of exon 3 was 8.75% (SD = 2.60%) and 7.55% (SD = 2.04%) in NBL TICs and primary tumors, respectively, both of which were found to be significantly different from the average value of 31% in the normal tissues (BH-corrected q < 0.05) (Figure 3.5A). The average gene- level RPKM values for AURKB were computed for Illumina BodyMap 2.0 normal tissues (16 samples), NBL primary tumors (10 samples), and NBL TICs (10 samples). While AURKB expression was not detectable above background (RPKM ~ 1) in any of the 16 normal libraries, the average AURKB expression in NBL primary tumors and NBL TICs was 21.6 RPKM (range 2.55—36.95 RPKM) and 44.35 RPKM (range 9.83—67.66 RPKM), respectively (Figure 3.5B). These results are consistent with the interpretation that the BARD1beta isoform is present in both NBL TICs and primary NBL tumors, and that both primary tumors and NBL TICs may be sensitive to AURKB inhibition.   104 3.3 Discussion The rationale for the work in Chapter 3 was the idea that targeting cancer stem cell- specific proteins could be cytotoxic to cancer stem cells, while sparing their normal stem cell counterparts, and lead to discoveries with potential clinical application. This idea has been previously validated in a chronic myeloid leukemia model, where a leukemia stem cell- specific gene Alox5 was identified, and its inhibition led to the eradication of chronic myeloid leukemia in a mouse model [281]. Therefore, I aimed to apply the same concept to the NBL TIC model, using SKPs as normal reference stem cells. To identify transcripts for which the expression was enriched in NBL TICs, I used RNA-Seq expression data from NBL TICs, SKPs, and a compendium of cancer tissues. It is important to note that since the compendium of cancer tissues included RNA-Seq data from cancerous lymph nodes, B-cell- specific transcripts found in NBL TICs by us (not shown) and others [280], possibly as a result of contamination with Epstein-Barr-transformed lymphocytes, would not be identified as NBL TIC-enriched. The gene-level expression analysis of RNA-Seq data from ten NBL TIC samples revealed 321 transcripts increased in expression in NBL TICs compared to SKPs and a panel of cancer tissues. Twenty-one of these transcripts were members of the BRCA1 signaling pathway or its downstream components, which amounted to a statistically significant enrichment of this pathway annotation among transcripts increased in expression in NBL TICs (Fisher‘s Exact P < 0.05). A key component of the BRCA1 pathway, BRCA1- associated RING domain protein 1 (BARD1), was shown to act as a predisposition locus for high-risk NBL by a single nucleotide polymorphism (SNP)–based genome-wide association study [197]. In this study of more than 500 high-risk NBL patients, also described in Section 1.9.2, six intronic SNPs at the BARD1 locus, contained within BARD1 introns 1, 3, and 4, met genome-wide significance for association with the disease (odds ratio for the most significant SNP = 1.68; 95% confidence interval 1.49 to 1.90; P = 8.65E-18 ). Evidence in breast tumors suggests that BARD1 is a regulator of the tumour-suppressor function of BRCA1, and can act as a tumor suppressor itself [299,300]. In particular, the BARD1/BRCA1 heterodimer is important for the tumor suppressor activity, such that losses of BARD1, BRCA1 or their interaction are tumorigenic and result in similar basal-like phenotypes in breast cancers [300].  Preliminary investigations of the effects of the BARD1  105 NBL risk alleles identified in the genome-wide association study [197] suggest that these alleles result in the overexpression of the oncogenic BARD1beta isoform (Figure 3.5C) [196].  The BARD1beta isoform lacks exons 2 and 3 that encode the RING-finger domain involved in the interaction with BRCA1 [269]. Aberrant BARD1 splicing, although not the isoform seen in NBL, has been previously reported in other cancers, including ovarian [301], colon [302] and non-small cell lung cancers [303]. In this study, we also observed that NBL TICs and primary tumors, but not normal tissues, expressed the oncogenic BARD1beta isoform (Section 3.2.7) that does not interact with BRCA1, but instead is involved in scaffolding BRCA2 and AURKB (Figure 3.5C) [269]. To identify existing therapeutics that could be applied to the treatment of recurrent NBL, I used Ingenuity Pathway Analysis software (Ingenuity Systems, www.ingenuity.com) to analyze the functional significance of the identified genes and match them against a database of available drugs. In total, thirty targets with an available inhibitor were identified, nine of which have never been implicated in NBL treatment. Aurora kinase B (AURKB), one of the nine novel drug targets, was selected for further validation based on two factors: its link to the BRCA1 signaling pathway through reported interactions with the shorter (beta) isoform of the NBL predisposition locus BARD1 [269], and the known role of its family member, AURKA, in NBL [304]. Both AURKA and AURKB are essential for proper chromosome alignment and separation during mitosis.  The inhibition of either protein results in gross defects in chromosome segregation: aneuploidy in the case of selective AURKA inhibition, and polyploidy in the case of selective AURKB inhibition, either leading to cell death [305]. Treatments with a selective AURKB inhibitor, AZD1152, were cytotoxic to NBL TICs used in the study but not to normal pediatric neural crest-like precursor cells. Although AURKA inhibitors are currently in clinical trials for NBL (NCT00739427), to our knowledge this study provides the first report of AURKB inhibitors as potential therapeutics for NBL. Because AURKB inhibitors are already in clinical trials, there is potential for rapid translation of the finding in NBL to therapy against the disease. The selective activity of AZD1152 in NBL TICs compared to SKPs, which is likely due to the differential AURKB protein abundance in NBL TICs compared with SKPs, provides a foundation for further exploring AURKB as a drug target for pediatric NBL.  106   An independent validation of the potential significance of AURKB in NBL is the preliminary report from the KidsCancerKinome initiative that studied a panel of pediatric tumors and cell lines and found that both AURKA and AURKB were expressed at a high level in tumors with poor prognosis, including high-risk NBL [306]. The therapeutic potential of inhibiting AURKA and AURKB in NBL is currently being investigated by the group through functional studies, including shRNA knockdowns and in vivo inhibitor studies (Ellen Westerhout, personal communication). The confirmation of our finding by an independent group of investigators studying primary NBL tumors lends credibility to our bioinformatic approach, in which we used normal SKPs and a compendium of cancer tissues to select NBL TIC-enriched markers.  Further validation of the results from my bioinformatic analysis is provided by two reports that used NBL TICs [307] and primary NBL tumors and cell lines [308]  to provide experimental evidence of the therapeutic potential of PLK1 inhibition in high-risk NBL. As shown in Figure 3.2C, PLK1 signaling is downstream of BRCA1/BARD1 pathway, and the PLK1 molecule was also suggested by my analysis as one of the potential therapeutic targets against NBL TICs (Table 3.4). In conclusion, the work described in this Chapter provides the first high-resolution system-level analysis of NBL TICs and a proof of principle that next-generation sequencing of primary human NBL TICs can reveal therapeutically relevant candidates for NBL. Specifically, we showed that inhibiting an NBL TIC-enriched transcript implicated in a relevant pathway is selectively cytotoxic to these cells compared to their normal stem cell counterparts (SKPs). The selective cytotoxicity against cancer stem cell-like NBL TICs is particularly important for high-risk NBL, as current therapies used in the management of the disease can effectively reduce tumor burden, but do not produce a durable cure in the majority of patients [174]. Since cancer stem cells are thought to be associated with disease relapse [243], the specific targeting of NBL TICs may help result in stable long-term remission for high-risk NBL patients. The apparent selectivity of AURKB inhibition, as compared to normal pediatric stem cells (SKPs), may imply that this treatment would potentially be less toxic to children with NBL.  107 3.4 Materials and methods 3.4.1 RNA sequencing and data analysis NBL TICs and SKPs were cultured as previously described [279,234]. Briefly, the cells were cultured in DMEM-F12 medium, 3:1 (Invitrogen), containing 2% B27 supplement (Gibco), 40 ng/mL basic fibroblast growth factor 2, and 20 ng/mL epidermal growth factor (both from Collaborative Research; proliferation media) in 75 cm2 flasks in a 37°C and 5% CO2 tissue-culture incubator. The cell growth conditions were normalized such that NBL TICs were cultured for 7 days and SKPs for 14 days post plating prior to harvesting in exponential growth phase and RNA isolation for transcriptome analysis. Details of the NBL TIC and SKP samples used in this analysis are provided in Table 3.1. RNA sequencing libraries from NBL TICs and SKPs were constructed from DNase I treated mRNA as previously described [149,102]. The libraries were sequenced on an Illumina Genome Analyzer. The read length and amount of aligned sequence data generated for each library is provided in Table 3.2. The reads were aligned to the human reference genome build hg18 (National Center for Biotechnology Information Build 36) and a database of known exon junctions [149] using MAQ software version 0.7.1 in paired-end mode [309]. Duplicate reads were retained for the expression analysis. The number of bases sequenced per number of exonic bases mapped was used as a measure of gene expression level for each gene [114]. The sequencing and processing of RNA-Seq libraries from other tumor types was conducted according to the same production protocol [102,149]. The read length and amount of aligned sequence data generated for each library is provided in Table 3.2. The reads were aligned to the human reference genome build hg18 (National Center for Biotechnology Information Build 36.1) and a database of known exon junctions [149] using MAQ software version 0.7.1 in paired-end mode, and the duplicate read pairs were removed [309]. The number of bases sequenced per number of exonic bases mapped was used as a measure of gene expression level for each gene [114]. The genes with the cumulative expression value of less than 10 (computed across all samples) were filtered out from the analysis. The expression values were square-root transformed and used in the lmFit function of the Linear Models for Microarray Data (LIMMA) Bioconductor package to estimate fold changes between the compared groups by fitting linear models to each gene [249]. The LIMMA method was selected for this analysis as it was previously successfully used for the  108 analysis of RNA-Seq data [294]. The NBL TICs versus SKPs and NBL TICs versus other cancers comparisons were conducted similarly, such that single contrasts were defined in each analysis creating pairwise comparisons [249]. For both pairwise comparisons (NBL TICs versus SKPs, NBL TICs versus other cancers), the moderated T-statistic with Benjamini-Hochberg (BH) multiple testing correction implemented in the eBayes function was used to assess the significance of differential expression. Those genes with BH– corrected q < 0.05 were considered statistically significant. 3.4.2 Microarray experiments and data analysis Cells were collected and lysed in Trizol, and RNA was purified using the RNeasy mini kit (Qiagen). RNA samples (Table 2.1) were analyzed on Affymetrix GeneChip Human Exon 1.0 ST Arrays. The data were checked for batch effects, background corrected, and normalized according to the Robust Multichip Average procedure using the Affymetrix Expression Console software. Gene-level expression summaries were computed based on all core probes. Differential gene expression was assessed using the lmFit function of the Linear Models for Microarray Data (LIMMA) Bioconductor package [249] as described previously in Section 2.4.3. 3.4.3 Identification of NBL TIC-enriched and depleted genes and the functional enrichment analysis List of significantly differentially expressed genes from each analysis (NBL TICs versus SKPs and NBL TIC versus tissue pool, as measured by RNA sequencing) were overlapped to identify genes that are significantly enriched and depleted in NBL TICs with respect to both SKPs and a panel of cancer tissues. The lists of NBL TIC-enriched and NBL TIC-depleted transcripts were then compared to the lists of differentially expressed genes from the microarray analysis described in Section 3.4.2 (NBL TICs versus SKPs) to derive robust sets of genes increased and decreased in expression in NBL TICs compared to SKPs and other cancers, and confirmed by both RNA sequencing and microarrays (Appendix C). Ingenuity Pathway Analysis software (Ingenuity Systems, www.ingenuity.com) was then used on these sets to select canonical pathways significantly enriched among the microarray- confirmed sets of  NBL TIC-enriched and NBL TIC-depleted transcripts (Fisher‘s Exact P < 0.05). The pathway enrichment analysis implemented in Ingenuity uses a Fisher‘s Exact test to assess the null hypothesis of the number of observed genes in a particular pathway being  109 produced by chance (Ingenuity Systems, www.ingenuity.com). The null hypothesis is rejected at Fisher‘s Exact P < 0.05. 3.4.4 Gel-free two-dimensional liquid chromatography coupled to shotgun tandem mass spectrometry A crude membrane fraction was prepared as follows. NB88R2 cells were swollen in hypotonic buffer (20 mmol/L Tris, pH 7.4; 10 mmol/L KCl; 5 mmol/L sodium vanadate; 1mmol/L phenylmethylsulfonylfluoride) and lysed by dounce homogenization. The cleared cell lysate was centrifuged for 15 minutes at 6,000 × g to collect the crude membrane fraction. The protein fraction was resuspended in urea buffer (8 mol/L urea, 2 mmol/L HEPES, 2.5mmol/L sodium pyrophosphate, 1 mmol/Lβ-glycerophosphate, and 1 mmol/L vanadate; Cell Signaling Technology) and was reduced and alkylated with 4.5 mmol/L dithiothreitol (DTT) and 10 mmol/L iodoacetamide, respectively. Whole-cellular fraction was prepared as follows. NB88R2 cells were lysed in urea lysis buffer (8 mol/L urea, 2 mmol/L HEPES, 2.5 mmol/L sodium pyrophosphate, 1 mmol/L β-glycerophosphate, and 1 mmol/L vanadate) and sonicated (3 bursts of 4 W for 10 s). The cell lysate was cleared by centrifugation (20,000 × g for 15 min at 4°C) and was reduced and alkylated with 4.5 mmol/L DTT and 10 mmol/L iodoacetamide, respectively. Proteins were digested with trypsin and purified using C18 reverse phase resin prior to mass spectrometry. The gel-free two-dimensional liquid chromatography coupled to shotgun tandem mass spectrometry (MudPIT) analysis was done for 8 cycles as described [284] with the following modifications: approximately 60 μg (membrane fraction) or 40 μg (whole-cell fraction) of digested protein was analyzed on a linear ion-trap LTQ-Orbitrap mass spectrometer (ThermoFisher). Samples were loaded using a Proxeon HPLC system (Thermo Fisher Scientific) and subjected to MudPIT analysis. All data was analyzed using Sequest (ThermoFinnigan; version SRF v. 5) and X! Tandem (http://www.thegpm.org/; version 2007.01.01.2 for membrane fraction or version TORNADO 2009.04.01.3 for whole-cell fraction) search algorithms using the Human International Protein Index database (version 3.41 with 72,155 entries or version 3.66 with 86,845 entries for membrane and whole-cell fractions, respectively). Sequest and X! Tandem were searched with a fragment ion mass tolerance of 0.50 or 0.40 Da for membrane and whole-cell fraction, respectively, and a parent ion tolerance of 2.0 or 5.0 ppm for membrane or whole-cell fraction, respectively. The  110 fragment ion mass tolerance defines an error range for considering two ion peaks as identical, while the parent ion tolerance defines the error range for peptide identification in the database.  The iodoacetamide derivative of cysteine was specified as a fixed modification in Sequest and X! Tandem. The oxidation of methionine was specified as a variable modification. Proteins were accepted based on the following criteria. At least two peptides per protein were identified with a probability threshold of 95% or greater or 90% or greater as derived by the Peptide Prophet algorithm [310] and an overall protein identity of >95.0% or >90% using the Protein Prophet algorithm was used for the membrane-enriched fraction or whole cell lysate, respectively [311]. 3.4.5 AlamarBlue assay NB12, NB88R2, and FS283 spheres were dissociated into single cells and seeded in triplicates at 3,000 cells per well in 50 μL medium containing 30% SKPs conditioned media in non–tissue culture–treated 96-well plates (Corning Life Sciences). AZD1152 (Selleck Chemicals LLC) was dissolved in dimethyl sulfoxide (DMSO) to a stock concentration of 50 mmol/L, from which 1:3 fold sequential dilutions were prepared. Intermediate dilutions of the compound were made in medium and immediately added to the cells in a volume of 50 μL. Cells treated with 0.05% DMSO in the absence of the drug were used as a control for optimal cellular proliferation, whereas wells containing media only were used to determine the background fluorescence; alamarBlue (10 μL) was added to each well after 72 hours, followed by incubation for an additional 24 hours. Fluorescence intensity was measured using PHERAstar SpectraMax Plus384 microplate reader (BMG Labtech) with an excitation filter of 535 nm and an emission filter of 590 nm. Percentage reduction of alamarBlue was calculated as ((mean fluorescence of treated wells - background fluorescence)/(mean fluorescence of DMSO-treated wells - background fluorescence)) * 100. Half maximal effective concentration (EC50) curves were generated using GraphPad Prism 5 software (GraphPad Software, Inc.). 3.4.6 Western blotting Cells were harvested, washed with cold HBSS, and lysed with NP40 lysis buffer containing 10 mmol/L Tris (pH 8.0), 150 mmol/L NaCl, 10% glycerol, 1% Nonidet P-40, 1 mmol/L phenylmethylsulfonylfluoride, 1 mmol/L orthovanadate, and proteinase inhibitor cocktail tablet (Complete Mini, EDTA-free, Roche). Cells were lysed for 10 to 20 minutes on  111 ice and centrifuged for 10 minutes at 12,000 rpm at 4°C. Protein amounts were determined by BCA Assay (Pierce), and 40 μg of protein was loaded per lane. Western blots were probed with rabbit polyclonal anti-Aurora B antibody (Abcam; ab2254) and mouse monoclonal anti- glyceraldehydes-3-phosphate dehydrogenase antibody (Santa Cruz; sc-47724)  in 5% w/v nonfat dry milk in TBS/0.1% Tween-20 over night at 4°C. Blots were developed using ECL or ECL-plus reagent (GE Healthcare Life Sciences). 3.4.7 Small hairpin RNA (shRNA) knockdowns Cell lines were stably infected with either a mock treatment or lentivirus-encoding shRNAs of interest at a multiplicity of infection of 1.0. Seventy-two hours post infection, the virus was removed, and cells were seeded in triplicate at a density of 10,000 per well in 24- well plates. The remaining cells were used for RNA isolation to determine the efficiency of knockdown by quantitative reverse transcriptase qRT-PCR. Viable cell numbers were determined on days 1, 3, 5, and 7 post plating by removing cells from wells and counting via hemocytometer. The experiments were conducted in triplicates. 3.4.8 Exon-level analysis of RNA sequencing data The BARD1 splicing analysis using RNA-Seq data was conducted as described below. The RNA-Seq data from NBL TIC libraries (Table 3.1), NBL primary tumors (Appendix D) and a panel of 16 normal tissues from the Illumina BodyMap 2.0 project available through the Gene Expression Omnibus (GSE30611) was processed as described in Section 3.4.1. The exon coverage analysis was based on Ensembl gene annotations (homo_sapiens_core_54_36p) [298]. These annotations were converted into one model per gene by taking all transcripts of a given gene and collapsing them into a single gene model such that exonic bases in a collapsed gene model were the union of exonic bases that belonged to all known transcripts of the gene. The analysis used SAMtools version 0.1.13 pileup [312] to get the per-base coverage depths, and excluded reads with mapping quality < 10 and reads flagged as poor quality according to the Illumina chastity filter. The final analysis report included coverage information for each individual exon and intron in the collapsed gene models, as well as for the cumulative coverage across all the exons in each model. These coverage statistics were computed using the RPKM method [150]. The RPKM of 1 was used as a threshold to consider an exon expressed above background [150]. RPKM for each exon was calculated using the formula: (number of reads mapped to an exon x  112 1.00E9)/(NORM_TOTAL x length of the exon), where NORM_TOTAL = the total number of reads that are mapped to exons excluding those belonging to the mitochondrial genome. RPKM for the whole gene was calculated using the formula: (number of reads mapped to all exons in a gene x 1.00E9)/(NORM_TOTAL x sum of the lengths of all exons in the gene), where NORM_TOTAL = the total number of reads that are mapped to exons excluding those belonging to the mitochondrial genome. The splice indices for BARD1 (ENSG00000138376) exons were computed as (exon RPKM / gene RPKM) *100. The significance of the observed differences in splice indices between sample pairs was assessed using the R Linear Models for Microarray Data (LIMMA) package adopted for splicing analysis as previously described [295]. The Benjamini-Hochberg correction for multiple testing was used, and the corrected q-values of less than 0.05 were considered statistically significant. 3.4.9 AURKB expression analysis The AURKB expression analysis in Section 3.2.7 was conducted using the RPKM expression measure as described above for BARD1. The gene-level RPKM was computed as: (number of reads mapped to all exons in a gene x 1.00E9)/(NORM_TOTAL x sum of the lengths of all exons in the gene), where NORM_TOTAL = the total number of reads that are mapped to exons excluding those belonging to the mitochondrial genome. The gene annotation was based on Ensembl 54 (homo_sapiens_core_54_36p) [298]. The RPKM of 1 was used as a threshold to consider an exon expressed above background [150].      113 Figure 3.1 Transcripts enriched and depleted in NBL TICs compared with SKPs and other tumor tissues Differentially expressed genes were identified using RNA sequencing data from NBL TICs, SKPs and a panel of cancer tissues. An equivalent differential expression analysis was conducted using exon array data from NBL TICs and SKPs. (A). Venn diagrams summarize the overlap of the results from the three differential expression analyses (NBL TICs versus SKPs using microarray, NBL TICs versus SKPs using RNA-Seq, and NBL TICs versus other cancers using RNA-Seq) for upregulated (left panel) and downregulated (right panel) genes. (B). RNA sequencing expression profiles of 321 NBL TIC-enriched (red column) and 819 TIC-depleted transcripts (blue column) in NBL TICs, SKPs, and other cancer libraries are plotted as a heatmap with genes as rows and samples as columns. The transcripts are represented by rows and samples are represented as columns. The rows are centered and scaled by subtracting the mean of the row from every value and then dividing the resulting values by the standard deviation of the row (row Z-Score). The NBL TIC libraries are labeled with the “TIC” prefix, and the tissue identities of the remaining libraries are explained in Table 3.2. The 321 NBL TIC-enriched genes and 819 NBL TIC-depleted genes were confirmed as significantly differentially expressed in all three comparisons as described in (A). The robustness of the heatmap was confirmed using the bootstrapping algorithm implemented in the Pvclust Bioconductor package [313], such that NBL TICs could be separated from the other tissues based on the expression of the 321 NBL TIC-enriched transcripts 98/100 times. Adapted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82.       114 A   115 B      116 Figure 3.2 Pathway analysis of NBL TIC-enriched transcripts Ingenuity Pathway Analysis tool (Ingenuity Systems, www.ingenuity.com) was used to reveal canonical pathways significantly enriched among genes upregulated (A) or downregulated (B) in NBL TICs (Fisher‘s Exact P < 0.05). The ratios of observed versus total numbers of genes in each pathway are plotted with the orange line, whereas the lengths of the blue bars are the significance scores for each pathway; significance threshold (Fisher‘s Exact P < 0.05) is marked by the vertical orange line. (C). The pathway named ―Role of BRCA1 in DNA damage response‖ was most significantly upregulated in NBL TICs compared with SKPs and other tissues; pathway members for which the expression is increased in NBL TICs are highlighted in red, and the protein complexes are indicated using a bold circle. The recently reported protein-protein interaction between AURKB, BRCA2 and the short (beta) isoform of BARD1 is denoted with a dotted line. Adapted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System- level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82.           117 A          118 B          119 C      120 Figure 3.3 NBL TICs are sensitive to Aurora B kinase inhibition (A). Western blot analysis confirmed the presence of AURKB protein in NBL TICs but not in SKPs. Blots were probed with the rabbit polyclonal anti-Aurora B antibody (Abcam; ab2254) and the mouse monoclonal anti-GAPDH antibody (Santa Cruz; sc-47724). The AURKB band at 37 kDa is detectable in NBL TIC lines NB12, NB88R2 and NB122R, similarly to the positive control (HeLa cells). The AURKB band is undetectable in SKP lines FS274 and FS227. (B). Reduction of the proliferation of NBL TICs upon shRNA knockdown of AURKB. Growth curves of NBL TIC lines NB88R2 (top) and NB12 (bottom) infected with shRNA against AURKB or controls (left panel); quantitative reverse transcriptase PCR was used to determine the effectiveness of AURKB knockdown (76-86%) (right panel). All experiments were done in triplicates. (C). AlamarBlue assay revealed that AURKB inhibition with AZD1152 was effective in NBL TICs at EC50 of 1.5 to 4.6 μmol/L, whereas AURKB inhibition was effective in SKPs at 12.4 μmol/L. All experiments were done in triplicates. Reprinted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82.              121 A      122 B    123 C                  124 Figure 3.5 NBL cells preferentially express the oncogenic BARD1beta isoform that is involved in the stabilization of AURKB The RNA-Seq data from 10 NBL TIC libraries (Table 3.1), 10 NBL primary tumors (Appendix D) and a panel of 16 normal tissues from the Illumina BodyMap 2.0 project available through the Gene Expression Omnibus (GSE30611) were analyzed for exon-level and gene-level expression as described in Methods. (A). Exon usage at the BARD1 locus is quantified using splice indices (SI). The SI for each BARD1 exon is computed as (exon RPKM/gene RPKM)*100, and the average SI value is calculated across each of the three groups: Illumina BodyMap 2.0 normal tissues (16 samples), NBL primary tumors (10 samples), and NBL TICs (10 samples). The SI values for each of the BARD1 exons (x-axis) are plotted along the y-axis. The SI values of exons 2 and 3 (marked by stars) are significantly lower in the NBL primary tumors and NBL TICs, as compared to the normal tissues (Benjamini-Hochberg-corrected q < 0.05). This finding is consistent with the expression of the BARD1beta isoform by primary NBL cells and NBL TICs. (B). The gene- level expression of AURKB in each sample was quantified using the RPKM measure as described in Methods. The average gene-level RPKM value for AURKB is computed for each group: Illumina BodyMap 2.0 normal tissues (16 samples), NBL primary tumors (10 samples), and NBL TICs (10 samples). While AURKB expression is not detectable above background (RPKM ~ 1) in any of the 16 normal libraries, the average AURKB expression in NBL primary tumors and NBL TICs is 21.6 RPKM (range 2.55—36.95 RPKM) and 44.35 RPKM (range 9.83—67.66 RPKM), respectively.  (C). A cartoon representation of the hg18 Ensembl 54 BARD1 gene model [298]. The exons are depicted by squares, while introns are shown as lines. The protein domains are depicted with squares of different colors, as described in the legend, and are marked on the exons that encode these domains. The full- length BARD1 transcript includes all coding exons, and contains three ANK repeats, two BRCT domains, and a RING-finger domain [296]. The BRCA1 interaction region includes the RING-finger domain and comprises residues 26-119, encoded by a portion of exon 1, exon 2 and exon 3. The BARD1beta transcript lacks exons 2 and 3 and encodes a protein product without the RING-finger domain that stabilizes AURKB through its scaffolding with BRCA2 [269].   125 A  B      126 C      127 Table 3.1 Human NBL TIC and SKP lines used for gene expression analysis Human NBL TIC and SKP lines were analyzed by RNA-Seq (column 6) and/or microarray (column 5) to identify transcripts significantly enriched in NBL TICs (Section 3.2.1). The International Neuroblastoma Staging System (INSS) stage is listed in column 2, the MYCN oncogene amplification status of NBL samples is listed in column 3, and the tissue origin is listed in column 4. All NBL TIC lines are derived from high-risk NBL patients, while SKP lines are derived from cancer-free children. Superscripts designate samples from the same patient. Reprinted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System- level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82.  Sample INSS Stage MYCN Description Human Exon Array RNA- Sequencing (Library ID) NB12 1  4 Single copy Bone marrow metastasis, relapse Yes Yes (HS0502) NB67 1  4 Single copy Bone marrow metastasis, remission  Yes (HS0499) NB12-2 1  4 Single copy Bone marrow metastasis, relapse Yes Yes (HS1041) NB88L1 2  4 Single copy Bone marrow metastasis, relapse Yes Yes (HS0382) NB88R2 2  4 Single copy Bone marrow metastasis, relapse Yes Yes (HS0627) NB122R 3  4 Single copy Bone marrow metastasis, relapse Yes Yes (HS1040) NB122L 3  4 Single copy Bone marrow metastasis, relapse Yes Yes (HS1151)  128 Sample INSS Stage MYCN Description Human Exon Array RNA- Sequencing (Library ID) NB100 4 Amplified Brain metastasis, relapse Yes NB128 4  4 Amplified Bone marrow metastasis, diagnosis Yes Yes (HS1149) NB153 4  4 Amplified Primary tumor, post- chemotherapy  Yes (HS1241) NB121 4 Amplified Bone marrow metastasis, diagnosis  Yes (HS1593) FS210 Normal Single copy Neural crest stem cell-like SKPs  Yes (HS1042) FS248 Normal Single copy Neural crest stem cell-like SKPs  Yes (HS1043) FS253 Normal Single copy Neural crest stem cell-like SKPs  Yes (HS1150) FS225 Normal Single copy Neural crest stem cell-like SKPs Yes FS227-P1 Normal Single copy Neural crest stem cell-like SKPs Yes FS227-P2 Normal Single copy Neural crest stem cell-like SKPs Yes FS229 Normal Single copy Neural crest stem cell-like SKPs Yes FS230 Normal Single copy Neural crest stem cell-like SKPs Yes      129 Table 3.2 List of RNA sequencing libraries and their sequencing statistics Messenger RNA from NBL TICs, SKPs, and a compendium of cancer tissues were sequenced on an Illumina Genome Analyzer. The reads were aligned to the human reference genome build hg18 (National Center for Biotechnology Information Build 36) and a database of exon junctions [149] using MAQ software version 0.7.1 in paired-end mode [309]. The duplicate reads were retained for this analysis. The median read length for each library is provided in column 3, and the total amount of aligned sequence is provided in column 4. Adapted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82.  Library ID Tissue source Median read length, bp Aligned sequence, bp HS0382 Neuroblastoma TICs 42 6,739,934,150 HS0627 Neuroblastoma TICs 36 3,985,563,940 HS0499 Neuroblastoma TICs 42 2,476,237,136 HS0502 Neuroblastoma TICs 42 2,215,675,756 HS1040 Neuroblastoma TICs 50 3,729,063,600 HS1041 Neuroblastoma TICs 50 4,313,933,000 HS1149 Neuroblastoma TICs 75 5,822,325,000 HS1151 Neuroblastoma TICs 50 3,374,707,500 HS1241 Neuroblastoma TICs 50 3,214,430,500 HS1593 Neuroblastoma TICs 50 5,252,354,800 HS1042 SKPs 50 4,209,705,900 HS1043 SKPs 50 4,453,217,800 HS1151 SKPs 50 3,374,707,500 HS0299 Breast cancer cell line 36 1,119,888,180  130 Library ID Tissue source Median read length, bp Aligned sequence, bp HS0327 Ovarian tumor 42 1,415,520,768 HS0419 Breast cancer cell line 36 1,177,457,184 HS0445 Breast cancer cell line 36 1,828,856,952 HS0462 Ovarian tumor 36 1,915,337,292 HS0463 Ovarian tumor 39 856,303,704 HS0464 Ovarian tumor 42 868,958,028 HS0465 Ovarian tumor 42 1,058,404,872 HS0466 Ovarian tumor 42 1,295,742,888 HS0467 Ovarian cancer cell line 39 775,075,260 HS0468 Ovarian tumor 36 2,698,323,072 HS0469 Ovarian tumor 42 1,374,384,648 HS0470 Ovarian tumor 36 2,285,498,720 HS0471 Ovarian tumor 42 1,393,590,120 HS0511 Breast tumor 36 6,236,709,588 HS0644 Lymphoma 36 5,816,275,584 HS0652 Lymphoma 36 2,332,176,480 HS0663 Lung tumor 42 283,413,396 HS0701 Ovarian tumor 46 1,074,023,336 HS0702 Ovarian tumor 50 1,830,544,072 HS0703 Ovarian tumor 36 1,422,457,368 HS0706 Lung tumor 36 1,459,970,088 HS0708 Ovarian tumor 36 1,760,040,532 HS0709 Oligodendroglioma cell line 36 2,741,182,488 HS0724 Blood from a cancer patient 42 2,119,757,136 HS0727 Lung tumor 42 2,421,817,104 HS0728 Lung tumor 42 4,112,016,636 HS1085 Oligodendroglioma tumor 50 5,024,879,600 HS1086 Oligodendroglioma tumor 50 6,907,718,400  131 Library ID Tissue source Median read length, bp Aligned sequence, bp HS1400 Metastatic adenocarcinoma tumor 50 12,806,918,200                             132 Table 3.3 Proteins detected in the whole and crude membrane cell extract of NBL TIC line NB88R and their corresponding RNA-Seq expression level Protein detection was done as described in Methods. Briefly,  at least two peptides per protein were identified with a probability threshold of 95% or greater or 90% of greater as derived by the Peptide Prophet algorithm [310] and an overall protein identity of >95.0% or >90% using the Protein Prophet algorithm was used for the membrane-enriched fraction or whole cell lysate, respectively [311]. In other words, the 95% CI cutoff for the membrane- enriched fraction represents the 95% or greater likelihood of each protein being identified correctly. Similarly, the lowered threshold of 90% CI used for the whole cell lysate represents 90% or greater likelihood of each protein being identified correctly. The threshold was lowered for whole cell lysate analysis due to the lower sensitivity of this assay for protein identification [314]. Adapted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82.  133 NBL TIC- enriched gene Average expression level in NBL TICs Protein product type Membrane- Enriched Fraction (95% CI) Whole Cell Lysate (90% CI) HNRNPU 998 Transporter Yes Yes SFXN1 140 Transporter Yes Yes KPNB1 652 Transporter Yes Yes NUP210 605 Transporter  Yes NUP214 231 Transporter Yes SLC7A6 232 Transporter Yes XPO5 212 Transporter Yes NUP107 107 Transporter Yes SLC1A4 279 Transporter Yes NUP88 98 Transporter Yes HNRNPM 364 Transmembrane receptor Yes Yes HNRNPD 379 Transcription regulator Yes Yes FUBP1 271 Transcription regulator  Yes HMGB2 306 Transcription regulator  Yes TAF15 278 Transcription regulator  Yes SFRS2 505 Transcription regulator  Yes GTF2I 129 Transcription regulator Yes HTT 531 Transcription regulator Yes EPB41 229 Plasma membrane protein Yes PSME3 287 Peptidase  Yes USP10 118 Peptidase Yes LMNB1 292 Other Yes Yes SFRS1 526 Other Yes Yes TMPO 304 Other Yes Yes HNRNPH1 628 Other Yes Yes CPSF6 189 Other Yes Yes HNRNPR 257 Other Yes Yes SFPQ 588 Other Yes Yes IMMT 151 Other Yes Yes SSRP1 369 Other Yes Yes NUP93 121 Other  Yes PCNA 229 Other  Yes CYFIP2 535 Other Yes CEP72 17 Other Yes NOLC1 285 Other Yes LARP1 750 Other Yes STRBP 134 Other Yes ANKRD44 76 Other Yes CLN6 76 Other Yes  134 NBL TIC- enriched gene Average expression level in NBL TICs Protein product type Membrane- Enriched Fraction (95% CI) Whole Cell Lysate (90% CI) WDR77 117 Other Yes MKI67 1253 Other Yes SUPT16H 268 Nuclear protein Yes Yes RFC5 57 Nuclear protein Yes PARP1 368 Enzyme Yes Yes NNT 212 Enzyme Yes Yes MCM7 374 Enzyme Yes Yes FH 95 Enzyme Yes Yes ATIC 227 Enzyme Yes Yes MTHFD1 188 Enzyme Yes Yes PAICS 357 Enzyme Yes Yes MCM2 323 Enzyme Yes Yes MCM6 188 Enzyme Yes Yes TRAP1 242 Enzyme Yes Yes GART 205 Enzyme Yes Yes MCM4 390 Enzyme Yes Yes GOT2 202 Enzyme Yes Yes KARS 203 Enzyme Yes Yes MCM3 346 Enzyme Yes Yes RRM2 425 Enzyme Yes Yes MARS 266 Enzyme Yes Yes UBE2N 119 Enzyme Yes Yes LBR 299 Enzyme  Yes TOP2A 586 Enzyme Yes MRPL37 123 Enzyme Yes SCLY 70 Enzyme Yes DARS2 79 Enzyme Yes DHTKD1 133 Enzyme Yes POLR1A 195 Enzyme Yes RFC3 99 Enzyme Yes FEN1 133 Enzyme Yes MCCC1 71 Enzyme Yes TARS2 71 Enzyme Yes GPHN 77 Enzyme Yes RRM1 218 Enzyme Yes SUPV3L1 59 Enzyme Yes     135 Table 3.4 Known drug targets among NBL TIC-enriched genes Transcripts enriched in NBL TICs are in bold. The drug-target associations were obtained from the Ingenuity Knowledgebase (Ingenuity Systems, www.ingenuity.com). The drugs previously or currently used in NBL (based on literature review, Ingenuity Knowledgebase, or ClinicalTrials.gov; http://www.clinicaltrials.gov/ as of February, 2010) are underlined. Adapted by permission from the American Association for Cancer Research: O. Morozova, M. Vojvodic, N. Grinshtein, L.M. Hansford, K.M. Blakely, A. Maslova, M. Hirst, T. Cezard, R.D. Morin, R. Moore, K.M. Smith, F. Miller, P. Taylor, N. Thiessen, R. Varhol, Y. Zhao, S. Jones, J. Moffat, T. Kislinger, M.F. Moran, D.R. Kaplan, M.A. Marra. System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin Cancer Res. 2010 Sep 15;16(18):4572-82.  Gene symbol Drug ADORA2A Caffeine-containing drugs, adenosine, istradefylline, dyphylline, binodenoson, regadenoson, aminophylline, clofarabine, theophylline AURKB AZD-1152 PLK1 BI2536 PDE7A Dyphylline, nitroglycerin, aminophyline, anagrelide, milrinone, dipyridamole, tolbutamide, theophylline, pentoxifylline TYMS Flucytosine, plevitrexed, nolatrexed, capecitabine, floxuridine, LY231514, 5-fluorouracil, trifluridine PRIM1 Fludarabine phosphate POLE3 Gemcitabine RRM1 Fludarabine phosphate, gemcitabine, clofarabine RRM2 Triapine, hydroxyurea, fludarabine phosphate, gemcitabine PARP1 INO-1001 GART LY231514 POLE Nelarabine, gemcitabine, clofarabine, trifluridine TOP2A Novobiocin, CPI-0004Na, pixantrone, elsamitrucin, AQ4N, BN 80927, tafluposide, norfloxacin, tirapazamine, TAS-103, gatifloxacin, valrubicin, gemifloxacin, nemorubicin, nalidixic acid, epirubicin, daunorubicin, etoposide, doxorubicin, moxifloxacin, becatecarin, mitoxantrone, dexrazoxane BCL2 Oblimersen, (-)-gossypol, obatoclax, G3139 SLC1A4 Riluzole ODC1 Tazarotene, eflornithine IL6 Tocilizumab  136 Gene symbol Drug ERBB2 Trastuzumab, BMS-599626, ARRY-334543, XL647, CP-724714, HKI-272, lapatinib, erlotinib HDAC1 Tributyrin, PXD101, pyroxamide, MGCD0103, FR901228, vorinostat ITGA2B Abciximab, TP-9201, eptifibatide, tirofiban TNF Adalimumab, infliximab, CDP870, golimumab, thalidomide, etanercept NR3C1 Corticosteroid-containing drugs (beclomethasone dipropionate) CXCL10 MDX-1100 TERT GRN163L TUBA1C Colchicine/probenecid, XRP9881, E7389, AL-108, EC145, NPI- 2358, milataxel, TTI-237, vinflunine, podophyllotoxin, colchicines, epothilone B, TPI 287, docetaxel, vinorelbine, vincristine, vinblastine, paclitaxel, ixabepilone CDC2 Flavopiridol COL14A1 Collagenase TNFRSF10B CS-1008 FYN Dasatinib ALAD δ-Aminolevulinic acid                137 Chapter 4: Whole genome characterization of primary neuroblastoma tumors reveals a wide spectrum of somatic alteration 4  4.1 Introduction In Chapters 2 and 3 of this thesis, I reported on the analysis of the expression profiles of normal and malignant neural crest stem cell-like cells, respectively.  These analyses revealed a number of genes and pathways, such as those involved in DNA double-stranded break repair, to be aberrantly expressed in metastases-derived NBL TICs, and implicated Aurora kinase B as a novel drug target against NBL TICs (Chapter 3). Exon-level analysis of RNA-Seq data described in Chapter 3 provided a potential mechanistic avenue to account for the sensitivity of NBL TICs, but not normal cells, to AURKB inhibition. Subsequent work by others has confirmed Aurora kinase B to be a drug target against NBL in primary tumors [306]. The overall objective of Chapter 4 is to conduct a high resolution characterization of a panel of primary NBL tumors using next-generation sequencing approaches with a goal of identifying additional drug targets for the disease that are relevant to primary tumors at diagnosis. In particular, in this Chapter I address two specific aims listed below.  First, I address whether primary NBL tumors harbor recurrently mutated genes. Second, I investigate whether the genetic aberrations found in primary tumors recurrently target the same signaling pathways. To accomplish these aims, we developed a strategy that uses a combination of next-generation sequencing approaches to comprehensively characterize 99 primary NBL tumor DNA samples and matched peripheral blood DNA samples used as normal reference material (Figure 4.1). We used Illumina whole exome sequencing to  4  A version of the Chapter is in revision, and the co-author contributions are detailed in the Preface as per the University of British Columbia PhD thesis guidelines T.J. Pugh*, O. Morozova*, E.F. Attiyeh, S. Asgharzadeh, J.S. Wei, D. Auclair, K. Cibulskis, M.S. Lawrence, A.H. Ramos, E. Shefler, A. Sivachenko, C. Sougnez, I. Birol, R.D. Corbett, K.L. Mungall, Y. Zhao, R.A. Moore, N. Thiessen, A. Lo, R. Chiu, S.D. Jackman, A. Ally, B. Kamoh, A. Tam, J. Qian, M. Krzywinski, M. Hirst, S.J. Diskin, Y.P. Mosse, K.A. Cole, M. Diamond, R. Sposto, L. Ji, T. Badgett, W.B. London, Y. Moyer, J.M. Gastier-Foster, M.A. Smith, J.M. Guidry Auvil, D.S. Gerhard, M.D. Hogarty, S.B. Gabriel, S.J.M. Jones, G. Getz, R.C. Seeger, J. Khan, M.A. Marra, M. Meyerson, J.M. Maris. The genomic landscape of high-risk neuroblastomas reveals a wide spectrum of somatic mutation. In Revision. *Authors contributed equally.  138 characterize 81 tumor/normal pairs; Illumina whole genome and transcriptome sequencing to characterize 10 tumor/normal pairs; and Complete Genomics, Inc. (CGI) whole genome sequencing to characterize another 10 tumor/normal pairs. Among these samples, we included one case that was studied by both whole exome and whole genome sequencing using Illumina, and another case that was studied by whole genome sequencing using both CGI and Illumina. The Illumina and CGI sequencing technologies are discussed in Section 1.6.3.1.  This study reports on the application of second-generation sequencing to the characterization of high-risk NBL. The 99 NBL cases included in this analysis were collected and characterized as part of the Therapeutically Applicable Research to Generate Effective Targets (TARGET) initiative (http://target.cancer.gov). The TARGET initiative is a pediatric branch of The Cancer Genome Atlas, and was designed to identify molecular targets for pediatric cancer drug development. All sequence data have been deposited in dbGAP (http://www.ncbi.nlm.nih.gov/gap) and six-letter codes are used to identify individuals in this database throughout the text. The clinical details of the cases are listed in Appendix D. 4.2 Results 4.2.1 Exome sequencing Exome sequencing was used to survey the frequency of coding sequence mutations in 81 high-risk NBL tumor/normal pairs, one of which was also included in the set of 19 whole genome sequences described below (Figure 4.1). DNA was extracted, amplified, and ~33 Mb of genomic sequence captured by in-solution hybridization [315] followed by Illumina sequencing [316]. The target regions consisted of 193,094 exons from 18,863 genes annotated by the Consensus Coding Sequences [317]  and RefSeq [318]  databases as coding for protein or micro-RNA (accessed November 2010). On average, 10.7 Gb of unique sequence data were generated for each sample, of which 58% were aligned to the target exome using Burrows-Wheeler Aligner [319]  (84% if bases within 250 bp of each target were included), resulting in median coverage of 191X of each on-target base. On average, 90% of targeted bases were suitable for mutation detection (> 14 reads in the tumor and > 8 reads in the normal) using the muTect algorithm [108,320,91]. A total of 14% of exons had fewer than 90% of bases assessable for mutation in at least 73 of 81 exome pairs (90%), apparently due to systematic capture or sequencing problems related to GC-content.  139 4.2.2 Whole genome and transcriptome sequencing Genome sequencing of tumor and matched normal DNA was used to explore the spectrum of somatic sequence and structural aberration present in 19 high-risk neuroblastoma cases. To survey the fraction of rearrangements that are expressed in the transcriptome, we also generated over 10 Gb of RNA-Seq data in 10 out of 19 cases. To account for potential biases imposed by the new sequencing platforms, we used two different genome sequencing approaches, Illumina [316] (10 cases) and CGI [321] (10 cases); one case, PASLGS, was sequenced using both methods. Ten tumor and normal genomes were sequenced to an average 29.7X haploid coverage using Illumina technology, while another set of ten tumor and normal genomes were sequenced to an average 59.9X haploid coverage using CGI technology. The coverage in the Illumina-derived data set permitted single nucleotide variant (SNV) detection at over 86% of positions in the reference genome (hg19), and over 74% of the coding sequence, as defined by the exome experiment. Similarly, the average coverage by the CGI allowed for the SNV analysis of 86% of the reference genome, and 94% of the coding sequence. The differences in average coverage achieved by the CGI and Illumina platforms could be attributed to the different depth of sequencing used in each approach (average 29.7X and average 59.9X for the Illumina and CGI platforms, respectively). The analysis of the common case, PASLGS, revealed that the somatic non-silent mutation rates per Mb of coding sequence computed separately for CGI and Illumina data were 0.58 and 0.66, respectively, and the two methods detected 9 somatic exomic mutations in common. As both exome and genome data were generated for PANYGR using Illumina technology, we had the opportunity to compare variants detected using both approaches on a single sample. The mutation rates in the coding region derived from these data sets were 0.59 and 0.65 non-silent mutations per Mb, in the exome and genome respectively, and the two methods detected 27 somatic exomic mutations in common. One additional variant called from the exome data was not detected by the genome analysis due to low coverage in the tumor genome at this position (4X), although the mutant allele was supported by one read. As the two methods were concordant for all somatic exomic mutations, we combined and directly compared mutation calls from exome and genome data sets for subsequent analyses.   140 4.2.3 Overall mutation frequencies Across the coding regions of 99 tumor/normal pairs (80 exome, 18 genome, 1 exome and genome), we detected 2,500 candidate somatic mutations in 2,105 genes (Appendix E). A median of 20 candidate exomic mutations was found per tumor (range 3-236), of which 16 were predicted to affect an amino acid or splicing change (non-silent mutations) (range 3- 171) (Figure 4.2A). This corresponded to a median non-silent mutation frequency of 0.56 mutations per Mb, after correction for the number of bases with sufficient data for mutation detection (Figure 4.2A). This is one of the lowest median mutation frequencies reported to date [116,322], and is consistent with recent data showing a similar 0.4 non-silent mutations/Mb recently reported for medulloblastoma [99], another pediatric solid tumor. Synonymous mutations were relatively few compared to non-silent changes, suggesting a low rate of putative passenger events. We did not observe a correlation between mutation frequency and age of diagnosis, MYCN amplification status, or other prognostically relevant clinical or genomic variables (q > 0.015). The rates of transitions (substitutions that change purines to purines or pyrimidines to pyrimidines) and transversions (substitutions that change purines to pyrimidines or vice versa) in NBL differed greatly from those found in cancers with known environmental contributions. For instance, over 90% of mutations in melanoma are C>T/G>A transitions associated with ultraviolet light exposure [112] while smoking-associated C>A/G>T transversions make up 46% of mutations in small cell lung cancer [110]. By comparison, the C>T/G>A transitions and C>A/G>T transversions comprised 29% and 36% of all mutations in NBL, respectively. These rates were consistent with current hypotheses of limited environmental contribution to NBL development [174]. While the mutation rate was low across most tumors, 2 tumors had markedly increased non-silent mutation rates (greater than the third quartile plus 3.5 times the interquartile range, i.e. Q3 + 3.5*IQR) (Figure 4.2). This threshold for outliers (hypermutated samples) was selected to identify extreme outliers, more stringent to that used in the TCGA study that reported on hypermutated glioblastoma multiforme (GBM) tumors [91]. Both hypermutated tumors contained alterations that may explain accumulation of somatic mutations. Specifically, PAPPKJ harbored a deletion of one copy and a nonsense mutation of the other copy of the DNA mismatch-repair gene MLH1, likely resulting in a complete loss  141 of this protein. Similarly, PALJPX contained a heterozygous nonsense mutation in the DNA nucleotide excision repair (NER) gene damaged DNA-binding protein 1 DDB1 that is involved in maintaining genome integrity and preventing the accumulation of DNA lesions in replicating cells [323]. The loss of the fission yeast DDB1 ortholog ddb1 results in a hypermutator phenotype in yeast [324]. In addition, the knockdown of the Drosophila DDB1 ortholog D-DDB1 in wing imaginal discs produces a genome instability phenotype in somatic cells [325]. These observations suggest that the hypermutator phenotype in the NBL PALJPX case may be explained by the heterozygous nonsense mutation in DDB1. However, as the yeast experiments were conducted in haploid organisms, there is no direct evidence that haploinsufficiency for ddb1 observed in the NBL tumor is sufficient to drive hypermutation. No nonsense mutations in DDB1 have been reported in the COSMIC database as of February, 2012 [326]. The finding of hypermutation in NBL represents a possible new subtype of this disease and future studies will define if there are unique clinical features associated with this aberration. 4.2.4 Verification of candidate somatic mutations using orthogonal approaches A total of 438 candidate somatic SNVs identified by whole genome sequencing were selected for verification by Sanger and/or Illumina sequencing, performed at CGI. Three hundred and seventy nine variants were confirmed using at least one of these approaches, corresponding to a 90% verification rate when failed assays were accounted for. In addition, 224 candidate somatic mutations were selected for verification with Sequenom genotyping [327], conducted at the Broad Institute.  The Sequenom method is based on distinguishing allele-specific primer extension products by mass spectrometry, and was originally developed for germline genotyping [328]. One hundred and sixteen of the 224 sites (52%) were confirmed by this approach. Two main reasons may account for the low verification rate of the Sequenom experiment as compared to the sequencing verification experiment conducted at CGI. First, since the Sequenom assay was originally developed for germline analysis, it is poor at detecting mutations present at low allelic frequencies [116], as may be the case in heterogeneous cancer samples. Second, whole genome amplification reactions, which were used in the exome sequencing experiment, may have introduced artifacts, resulting in a higher fraction of false positives in these data compared to genome sequencing data, which did not involve whole genome amplification. All somatic mutations that are  142 explicitly described in the text have been verified, and the methods used to confirm each mutation are listed in Table 4.1. 4.2.5 Genes and pathways with significant frequency of mutation We identified 8 genes mutated at a significant frequency (q < 0.2) in the 99 tumors using the MutSig algorithm [329]  (Table 4.2).  Of these, only five genes remained significant (q < 0.2) when the hypermutated samples were excluded from the analysis (Table 4.2). The MutSig algorithm tests the null hypothesis that all the observed mutations in each gene are a consequence of random background mutation. Genes for which this hypothesis is rejected based on the Benjamini-Hochberg false discovery rate-corrected q-value are considered significantly mutated [329], such that mutations in these genes likely contribute to the malignant phenotype. Due to the very low mutation rate in NBL, we chose the BH false discovery rate-based q-value threshold of 0.2, which implies that we allowed for 20% of false positives in our data (1 in 5 MutSig hits). While mutations in ALK and PTPN11 were previously known in NBL [188,194,193,192,330], the remaining 6 candidate genes are newly reported in this disease. Using the available RNA-Seq data from 10 cases as well as published RNA-Seq data from neural-crest-like cells [244], also discussed in Chapter 3, we determined that PGLYRP3, GABRA6 and IGSF11 were not expressed in either normal neural crest-like cells or NBL cells. Since a key goal of this study and the TARGET consortium is to identify genetic alterations that could potentially be targeted by drugs, we focused our analysis on four genes significantly mutated in non-hypermutated samples and expressed in NBL: ALK, PTPN11, LILRB1 and NRAS. ALK was previously reported to be mutated in up to 10% of NBL cases [192,194,188,193] consistent with our unbiased screen here showing 9 cases with a somatic ALK mutation, all restricted to the kinase domain. A tenth case, PANYGR, harbored a germline variant also in the ALK kinase domain (clinically-associated dbSNP rs113994092, described as a pathogenic allele), heterozygous in tumor and normal samples from this patient. Mutations in the ALK oncogene occurred exclusive of the other genes mutated at a statistically significant frequency and were independent of the MYCN amplification status (3 MYCN-amplified, 7 non-amplified). Germline and somatic PTPN11 mutations have been reported in NBL [330] and both somatic mutations observed in this study were located at residues frequently mutated in  143 juvenile myelomonocytic leukemia [331] and individuals with Noonan syndrome [332]. No pathogenic germline PTPN11 variants were found as part of the study. Another MutSig hit, LILRB1 is an inhibitory cell-surface immunoglobulin-like receptor that has been reported to limit the activation of the mTOR pathway through the activation of SHP2 [333], the protein encoded by PTPN11. Therefore, loss of SHP2 regulation by LILRB1 could in theory have a similar oncogenic effect as activating mutations of PTPN11, as such mutations could potentially lead to the constitutive activation of the RAS/MAPK signaling. The case PALTEG contained a splice site mutation that is predicted to disrupt the splice consensus sequence. Two additional mutations fell within the immunoglobulin domain: a nonsense alteration in PALZZV and a missense change in PANUKV. In addition to PTPN11 and LILBR1 that are upstream regulators of the RAS/MAPK pathway, five Cancer Gene Census genes in the MAPK signaling pathway (KEGG hsa04010) appeared to harbor somatic mutations in six tumors. These mutations were in receptor tyrosine kinases EGFR, NFKB2, NTRK1, PDGFRB, and the downstream target NRAS (2 tumors), which was also identified by MutSig as significantly mutated in our study. A recent report suggested that PTPN13 may also function in the MAPK pathway [334] and translocations or mutations of PTPN12 and PTPN13 were identified in 3 samples, albeit one of which was hypermutated. Overall, MAPK pathway oncogenes were mutated in 15% of high-risk NBL cases studied (Figure 4.2B). Chromatin remodeling genes appeared to be frequently disrupted in NBL as 18 histone-modifying genes harbored coding somatic mutations: nonsense mutations in CREBBP, CHD8, KDM6A, MLL4; and missense mutations in EP300, ARID1A (2 tumors), ARID1B, ASH1L, CHD6, HDAC4, KDM5A, MLL3, MLL5, NUP98, PAX5, PRDM2, and PRDM4. These genes encode characterized histone acetyltransferases (CREBBP and EP300), DNA helicases (CHD6, CHD8, ARID1A), histone demethylases (KDM5A and KDM6A), histone methyltransferases (ASH1L, MLL3, MLL4, MLL5, PRDM2), histone deacetylase (HDAC4), and other proteins involved in chromatin remodeling (ARID1B, NUP98, PAX5, PRDM4). More than a half (9/17) of the chromatin remodeling genes mutated in NBL were annotated in the Database for Annotation, Visualization, and Integrated Discovery (DAVID) [335] as positive regulators of transcription. Five chromatin remodeling genes were mutated in tumors with ALK mutations. Intriguingly, loss of function mutations in chromatin  144 modifiers have been reported in lymphoma [102,106], bladder cancer [336], and other tumors [337]. Overall, a potential defect in chromatin remodeling was identified in 11% of high-risk NBL cases studied (Figure 4.2B). 4.2.6 Genome rearrangements and structural variants We used the trans-ABySS de novo assembly pipeline for Illumina sequencing data [297] to search for expressed rearrangements affecting genes; each of these events was confirmed by local re-assembly of the genomic reads [338]. In parallel, the CGI structural variation pipeline was used to detect candidate structural variants in the CGI genomes [110]. In total, 83 distinct events affecting 97 genes were identified using the two approaches in the 19 neuroblastoma cases, including 22 expressed events found in the RNA-Seq data; and a median of 4 structural variants (0 to 14 events range) was detected per tumor genome. The genomic architectures of 19 cases with available genome sequencing data are plotted in Figure 4.3 using CIRCOS [339]. The notable structural variants are summarized in Table 4.3. We found 4 distinct somatic translocations between chromosomal arms 11q and 17q that are commonly affected by numerical alterations in NBL [196].  These four somatic events occurred in three cases, PARGUX (2 events), PASCKI and PANNMS, and involved different genes and breakpoints in each case. Notably, one of the t(11;17) translocation in PARGUX is predicted to disrupt the function of IKZF3, an Ikaros DNA-binding protein 3 involved in chromatin remodeling, and previously implicated in chronic lymphocytic leukemia [340]. Another chromatin remodeling gene, ARID1B, was targeted by a somatic ~30 kb deletion that removed exon 2 in PASLGS, and appeared to be loss-of-function. Members of the MAPK pathway were also the target of somatic structural change, a MAPK10/PRDM5 fusion and a PRELID2/MAPK9 fusion, both resulting from intrachromosomal deletions but with unknown frame effects due to multiple transcript annotations for these genes. Other cancer genes affected by somatic structural variants included ABL2 which was fused out of frame with ACBD6 and harbored a somatic missense mutation in this study; STAG1, a p53 pathway member recently implicated as a target of translocations in several cancers [341,342]; cadherins CDH13 and CDH18; and NOP2 and AUTS2, both known translocation targets in acute lymphoblastic leukemia [343]. Two loci were affected recurrently by structural variants in two cases: the transcriptional repressor ZFHX3 (ATBF1) that has been shown to function as a tumor  145 suppressor in several cancers [344] and the CDK5 regulatory subunit associated protein 1 CDKAL1. Neither of these genes has been implicated in NBL, and their potential role in this disease warrants further investigation. The NBAS (neuroblastoma amplified sequence) locus located 0.4 Mb from MYCN appeared to be most commonly affected by rearrangements in our cohort of MYCN-amplified cases, harboring 11 distinct rearrangements in three cases PASDZJ, PARSHT, and PARIRD. The NBAS-rearranged cases were associated with an increased copy number at the NBAS locus, and with more than 2-fold increase of the NBAS mRNA compared to the wild type NBAS cases, consistent with the previous observation that a fraction of MYCN-amplified cases involves co-amplification of NBAS. 4.2.7 Mutations in other known cancer genes and regions Beyond ALK, PTPN11, and NRAS, no cancer genes listed in the Cancer Gene Census [7]  had mutation frequencies that rose to the level of statistical significance (q < 0.2). In addition to the mutations in gene sets noted above, 14 genes listed in the Cancer Gene Census were mutated across 12 samples (Table 4.1; Figure 4.2B). Mutations in 2 of these genes, ATM and PIK3CA, matched a mutation listed in COSMIC [100]. Two MYC family members, MYC and MYCN, were mutated in two NBL tumors lacking MYCN amplification. The SIFT algorithm [345] predicted both variants to be deleterious (score < 0.01) and the MYCN mutation was previously reported in glioblastoma [346], suggesting that it may confer selective advantage to malignant cells. We also detected and validated a fusion of MYCN with GULP1 that retained the reading frame and may be activating. Therefore, it appears that several mechanisms exist to promote MYC signaling in NBL beyond amplification. Among the genes that harbored protein sequence-altering mutations in two or more non-hypermutated cases, there were several genes that mapped to known chromosomal regions frequently altered somatically and of clinical significance  [197]. Known chromosomal regions of clinical significance in NBL are described in detail in Section 1.9.2.1 and include losses of chromosomal arms 1p and 11q, and gains of chromosomal arm 17q. Out of the 99 cases analyzed in this study, 42 cases harbored a loss of 1p, 49 cases harbored a loss of 11q, and 63 cases harbored a gain of 17q (Appendix D). A single gene mapping to the 1p31-1p36 common deletion region, ARID1A, harbored somatic non-synonymous mutations in two non-hypermutated cases. The ARID1A locus has been implicated as a tumor suppressor in several adult and pediatric solid tumors by both  146 genomic and functional evidence [347,348]. Even though the two cases with non-silent ARID1A mutations, PALXHW and PALNLU, each had two copies of 1p, both ARID1A mutations (Table 4.1) were missense homozygous changes predicted to be damaging to the protein (score < 0.05) by the SIFT algorithm [349]. The genes CATSPER1, AHNAK, PITPNM1, and SORL1, mapping to the 11q region commonly deleted in NBL, were each mutated in two non-hypermutated cases. All six cases with mutations in at least one of these genes harbored a loss of the chromosomal arm 11q. These cases included PALHVD, PAINLH, PAPBZI, PAKFUY, PALFPI, and PALSAE (Appendix D). Consistent with losing one copy of the 11q arm, all candidate mutations in CATSPER1, AHNAK, PITPNM1, and SORL1 detected in this study were homozygous. While CATSPER1 and PITPNM1 have not been reported previously to play a role in NBL or cancer in general, the loss of AHNAK is seemingly associated with the radiosensitivity of NBL cells [350], and SORL1 may be involved in the proliferation of NBL cell lines [351]. In addition, SCN4A, CRHR1, ABCA5, and IGF2BP1 mapped to 17q, commonly gained in high-risk NBL. None of these loci have been previously implicated in NBL. Finally, several genes on 11q (FAM86C1, RNF121, ATG2A and SHANK2) and 17q (IKZF3, TRIM37 and BCAS3) appeared to be affected by the t(11;17) translocations. None of these genes were affected by the translocations recurrently. All three cases, PASCKI, PARGUX and PANNMS, with translocations between chromosomal arms 17q and 11q, harbored concurrent losses of 11q and gains of 17q, suggesting that the unbalanced translocations t(11;17) may account for gains of 17q and losses of 11q in a fraction of NBL tumors. As described in Section 1.9.2, a GWAS study has been contacted in NBL and implicated common germline variants in FLJ22536, BARD1 and LMO1 to be associated with the susceptibility to sporadic high-risk NBL [197,66,198,196]. The current study did not detect somatic non-silent variants or somatic recurrent non-coding variants in any of these loci in our cohort of 99 NBL cases. However, we did observe that all 10 cases with available matched tumor and normal genome and tumor transcriptome sequencing data (Appendix D) had novel germline variants (single nucleotide substitutions), not reported in the 1000 genomes project data [352] or any samples from non-cancerous tissues sequenced at the Genome Sciences Centre [353].  These variants may be related to the aberrant splicing observed at the BARD1 locus (Section 3.2.7).  147 4.3 Discussion The described survey for somatic mutation in primary NBL tumors has found this cancer to have one of the lowest mutation frequency rates among solid tumors examined to date, similar to that of another pediatric cancer, medulloblastoma [99]. The mutations identified in our study were distributed across a large number of genes as 88% of genes with non-silent mutations were only mutated in 1 of the 99 tumors studied. In addition, non-silent mutations were seen four times more often than silent mutations (1,735 non-silent versus 420 synonymous mutations in non-hypermutated samples), suggesting a selective pressure for coding changes. This is unlike most adult cancers, where passenger mutations are much more frequent than driver mutations [354]. Presumably, the low passenger mutation rate observed in NBL reflects less environmental influence in this cancer compared to adult malignancies. This is consistent with NBL typically arising at a very young age, with most cases diagnosed before 5 years of age [174].  The genome sequence analysis described in this Chapter has been able to identify candidate mechanisms involved in 51 of 99 neuroblastomas (ALK mutations, MAPK pathway oncogene mutations, mutations in chromatin remodeling genes, mutations in MYC family genes, mutations in Cancer Gene Census genes, as highlighted in Figure 4.2). While sequencing of more NBL cases will provide increased power to discover additional recurrent somatic events, the relative paucity of focal mutations discovered here challenges the general concept that druggable targets and pathways can be defined in each patient by sequencing approaches alone, at least in the somatic mutation space.  Nonetheless, our data address the overall objective of this Chapter and identify common vulnerabilities of primary NBL tumors that may be exploited therapeutically. For instance, a subset of NBL patients may be sensitive to the inhibition of ALK (9% of patients with high-risk NBL) and MAPK signaling (15% of patients with high-risk NBL); and strategies that target these pathways can be immediately prioritized for clinical development due to the known activating role of the mutations in these pathways. In contrast, chromatin remodeling abnormalities, found in 11% of patients with high-risk NBL, need to be further investigated before they can be targeted clinically. In Chapter 3, we conducted an expression analysis of NBL TICs and found that the double-stranded break DNA repair pathway, involving the BRCA1/BARD1 complex was  148 expressed at a higher level in NBL TICs compared to normal neural crest-like cells and a panel of other cancers. Aberrations in DNA repair may appear counter-intuitive, given the low mutation rate in NBL, discussed in this Chapter. However, the observation is consistent with the results of sequencing studies in breast cancers, many of which also possess a defect in double-stranded break DNA repair [355].  In particular, albeit with some exceptions, breast cancers do not typically harbor an increased frequency of somatic point mutations as compared to other adult tumors, such as lung cancers or melanoma [354]. Instead, breast cancer genomes often harbor a high frequency of large chromosomal aberrations [356] that may be associated with both a deficiency of homologous recombination (e.g. loss of function of BRCA1 in familial breast cancer) or the hyperactivity of the BRCA1 signaling pathway through gain of function mutations in BRCA1, seen in other types of breast tumors [355]. Similarly, despite the low rate of somatic point mutations, NBL tumors display a high prevalence of large chromosomal alterations (chromosomal alterations affecting genes are shown in Figure 4.3) suggesting that aberrations in double-stranded break DNA repair may play a role in this disease. Increased expression of the BRCA1 pathway in NBL TICs further suggests that hyperactivity of this pathway may be a factor in NBL, similarly to what is seen in some cases of breast cancer [357].  Interestingly, we did not observe any somatic non-silent mutations in the BRCA1 pathway members, including BARD1, shown to be aberrantly spliced in NBL cells in Chapter 3. We did, however, observe previously unreported intronic germline variants in BARD1 in all 10 NBL cases with available genome and transcriptome sequencing data. Since preliminary reports suggest that GWAS risk-alleles of BARD1, all occurring in introns [197], may be associated with aberrant splicing of this gene [196], we hypothesized that the novel germline variants observed at this locus by the current study may also play a role in BARD1 splicing. Further functional work is needed to confirm these possibilities.  Since the majority of our data set comprised exome sequencing data, we do not exclude the possibility that non-coding (regulatory) mutations may explain the observed increased expression of the BRCA1 signaling pathway members discussed in Chapter 3.     149 4.4 Materials and methods 4.4.1 Sample selection and preparation The study focused on high-risk NBL, and we attempted to reduce heterogeneity by restricting eligibility to subjects between 1.5 and 5.5 years of age at diagnosis (median 2.94 years) with stage 4 (high-risk metastatic) disease (Appendix D). There was a preponderance of male subjects (62%). All specimens were obtained at original diagnosis after informed consent at Children‘s Oncology Group (COG) member institutions. Thirty-four of the 99 tumors studied harbored amplification of the MYCN oncogene and 40 had a diploid DNA index (values of 1 in Appendix D). These two assays are routinely performed on all NBL samples in the COG NBL reference laboratory by fluorescence in situ hybridization and flow cytometry, respectively. Flash frozen tumor samples were analyzed for percent tumor content by histopathology prior to nucleic acid extraction, and samples with <75% tumor content were not included in this study. Tumor RNA and DNA were derived from fresh frozen primary NBL tissue and matched normal peripheral blood. All sequence data have been uploaded to dbGAP (http://www.ncbi.nlm.nih.gov/gap) and six-letter codes used to identify individuals in this database are referenced throughout the text. 4.4.2 Illumina library construction and sequencing Genome and transcriptome libraries of the ten BCCA cases were constructed from input amounts of 2-4µg DNA and 3-10µg DNaseI-treated total RNA, respectively, following the previously described protocols [102,106]. The sequencing was carried out using Genome Analyzer IIx (GAIIx) (Illumina, Hayward, CA, USA) as per the manufacturer's instructions. Paired end reads generated from genome and transcriptome sequencing were aligned to the hg19 (GRCh37) reference human genome assembly using BWA version 0.5.7 [319]. RNA- Seq reads were processed as previously described in Section 3.4.1 and [244,149]. 4.4.3 Detection of candidate somatic mutations in genome sequencing data SNV detection in Illumina tumor genome and transcriptome data was performed using SNVMix2 with filtering to include SNVs such that the combined probability of either heterozygous or homozygous SNV was greater than 0.99 [358]. Reads flagged as poor quality according to the Illumina chastity filter, duplicate reads, and reads aligned with a mapping quality < 40 were excluded from SNV calling. The somatic status of SNV calls was determined using read evidence from the SAMtools version 0.1.13 pileup [312] constructed  150 at the variant positions in the matched normal genome. Positions with normal genome coverage of at least 5 unique reads supporting the reference allele were considered somatic. The candidate somatic SNV calls were inspected using the Integrative Genome Browser [163], and only those calls confirmed by visual inspection were used in the analysis. Ten of these events, listed in Table 4.1, was validated using ultra-deep re-sequencing with read indexing as previously described [102]. The Pindel software was used as suggested by the authors to identify candidate short insertions from the tumor and normal genomic bam files [359]. The mean and standard deviation of read pair insert sizes were calculated for all samples to be ~400 bp, and this value was used in each Pindel run. The Pindel short insertion output was filtered to select events that mapped to annotated genes (Ensembl 59 [360]). Candidate somatic short insertion events that recurred in at least two cases were manually reviewed in the Integrated Genome Browser (Broad Institute). In addition, SAMtools version 0.1.13 pileup and varFilter functionality [312] was used to indentify indels from the tumor and normal genomic alignment bam files. To detect candidate somatic indels, further filtering was done separately on normal and tumor libraries. In the normals, any event with a total coverage of less than 8 was discarded. In the tumor libraries, only the indels with (#indel reads/#total reads) >= 16% were considered. After the filtering, any indel present in one or more normal libraries was flagged as germline. None of the candidate somatic coding indels from the Pindel or SAMtools analysis was confirmed by manual inspection in the Integrated Genome Browser [163], and hence they are excluded from the text. For CGI data, the provided MAF files were used to extract somatic mutations using the filtering criteria provided in Table 4.4. 4.4.4 Gene coverage in transcriptome sequencing data The alignments of RNA-Seq data were used to estimate gene expression levels. Gene coverage analysis was based on Ensembl gene annotations (homo_sapiens_core_59_37d) [360]. These annotations were converted into one model per gene by taking all transcripts of a given gene and collapsing them into a single gene model such that exonic bases in a collapsed gene model were the union of exonic bases that belonged to all known transcripts of the gene.  151 The analysis used SAMtools version 0.1.13 pileup [312] to get the per-base coverage depths, and excluded reads with mapping quality < 10 and reads flagged as poor quality according to the Illumina chastity filter. The reads per kilobase of exon model per million mapped reads (RPKM) metric was used to estimate gene expression level [150]. RPKM was calculated using the following formula: (number of reads mapped to all exons in a gene x 1.00E9)/(NORM_TOTAL x sum of the lengths of all exons in the gene), where NORM_TOTAL = the total number of reads that are mapped to exons excluding those belonging to the mitochondrial genome. 4.4.5 Copy number analysis using genome sequencing data Copy number analysis was conducted using an HMM that was previously described [114,109]. Briefly, for copy number analysis, 50 million reads (mapping Q >10) were randomly selected from the final merged bam files for the tumor and matched normal genomes. The normal reads were split into bins of 200 adjacent alignments, and the corresponding bins in the tumor genome were used to calculate the ratio of tumor/normal reads in each bin. These values were normalized by subtracting the median of the tumor- normal ratios across the whole genome.  This resulted in a measurement of the relative read density from the tumors and matched normals in bins of variable length along the genome, where bin width was inversely proportional to the number of mapped reads in the normal genome. GC bias correction was applied, and an HMM was used to classify and segment the tumor genome into continuous regions of somatic copy number loss (HMM state 1), neutrality (HMM state 2), slight gain (HMM state 3), gain (HMM state 4) or high gain (HMM state 5). For CGI data, cnvTumorSegmentsRelative.tsv files were used to obtain somatic CNV calls. These calls were then converted to the five HMM states described above using the following rules: if calledLevel<=0.79 then 1; if 0.79<calledLevel<=1.25 then 2; if 1.25<calledLevel<=1.75 then 3; if 1.75<calledLevel<=2.5 then 4; if calledLevel >2.5 then 5 4.4.6 Rearrangement detection De novo transcriptome assembly by ABySS [338] was performed on the ten RNA- Seq datasets to identify candidate transcript rearrangements.  The assembled contigs were run through the trans-ABySS pipeline [297] which aligned a merged contig set to the hg19 (GRCh37) human reference genome assembly and compared the alignments to annotated  152 transcript models, allowing identification of known and novel transcript structures. The transcript rearrangement component of the pipeline identified all contigs that had two separate discrete genomic BLAT alignments. The top 5 scoring alignments were inspected manually and the read evidence support was used to filter out likely false positive events. Smaller scale rearrangements were identified from contigs with single, gapped BLAT alignments with supporting read evidence again used to filter out false positive events. Targeted genomic assembly of the candidate rearranged regions was performed to validate the events in the genomic data. In addition, 9 events were validated with PCR and Sanger sequencing in the tumor DNA and RNA using the following procedure. Primer pairs were selected around the event breakpoint with a 10 bp margin on either side using Primer3 [361] with the following parameters: 22-26 bp size, 40-46 GC and 54-66 TM restrictions, and using GC clamp. Primers were selected favoring product sizes 500-600 bp, 400-700 bp, and 300- 800 bp, respectively. For each amplicon, up to 100 primer pairs were initially identified. This set was filtered for pairs that hybridized to a unique location using BLAT (min identity 100, tile size 10, step size 2) on hg19 human genome assembly. Each primer was independently ranked using the Primer3 objective function. The primer sequences used for the genome and transcriptome validations are provided in Table 4.5 and Table 4.6, respectively. For the RNA validation, first strand cDNA was synthesized using 500ng of DNaseI- treated total RNA from tumor by following the Agilent AccuScript High Fidelity 1 st  Strand cDNA Synthesis protocol (catalog #200820); 1µL of 5-fold diluted template (1 st  strand cDNA) was used for setting up the PCR with 98 o C for 30 seconds, followed by 32 cycles of 98 o C for 10 seconds, 59 o C for 30 seconds, 72 o C for 10 seconds, and then 72 o C for 5 minutes. The PCR product was run on an 8% PAGE gel for 35 minutes at 200V, and stained with SyBr green for 1 minute to visualize the image. For the DNA validation, 1ng genomic DNA was used as a template for PCR with 98 o C for 30 seconds, followed by 28 cycles of 98 o C for 30 seconds, 63 o C for 30 seconds, 72 o C for 60 seconds, and then 72 o C for 5 minutes. The PCR product was run on a 1% agarose gel for 90 minutes at 100V, and stained using SyBr green for 45 minutes for visualization. The target PCR products from matching tumor and normal DNA were excised, cloned into vector pCR4-TOPO (Invitrogen) and sequenced using M13 forward and M13 reverse primers on the ABI3730xl capillary sequencer.  153 The CGI structural variation pipeline was used to identify rearrangements present in the CGI data [110]. Candidate somatic events were confirmed by PCR and electrophoresis alone or followed by Sanger sequencing. 4.4.7 Exome sequencing and data analysis The generation, sequencing, and analysis of 81 pairs of exome libraries at the Broad Institute was performed using a detailed, previously described protocol [108]. A summary of deviations from this protocol is provided here. Due to the small quantities of DNA available, all DNA samples were amplified using Phi29-based multiple-strand displacement whole genome amplification (Repli-g service, QIAgen). Exonic regions were captured by in- solution hybridization using RNA baits similar to those described [108] but supplemented with additional probes capturing additional genes listed in RefSeq [318] in addition to the original Consensus Coding Sequence (CCDS) set [317]. In total, ~33 Mb of genomic sequence was captured, consisting of 193,094 exons from 18,863 genes annotated by the CCDS [317] and RefSeq [318] databases as coding for protein or micro-RNA (accessed November 2010). Sequencing of 76 bp paired-end reads was performed using Illumina Genome Analyzer IIx (GAIIx) and HiSeq 2000 instruments. Reads were aligned to the hg19 (GRCh37) build of the human reference genome sequence using BWA [319]. To confirm sample identity, copy number profiles derived from sequence data were compared with those previously derived from microarray data from each case, downloaded from dbGAP.  Candidate somatic base substitutions were detected using muTect (previously referred to as muTector [108]). Candidate somatic insertions and deletions were detected as previously described [108]. 4.4.8 Integrated analysis of somatic variation from exome and genome data sets Somatic mutations detected in genome, exome, and transcriptome data sets were annotated using Oncotator version 0.4. Genes mutated at a statistically significant frequency were identified using the MutSig algorithm [329]. Briefly, background mutation rates were estimated from all data for each of the 7 mutation categories: C or G in CpG; C in TpC or G in GpA; A; remaining C; remaining G; insertion/deletion/duplication. These rates were assumed to be constant across all patients and across all genes in the genome. The overall background mutation rate was considered to be the sum of the seven random variables, describing each mutation category. The observed mutation data for each gene  154 across all patients, corrected for gene length, was then compared to the background mutation rate, and a likelihood ratio test was applied to select those genes whose observed mutation rate was significantly different from the estimated background mutation rate. The Benjamini- Hochberg false discovery rate correction for multiple testing was applied to calculate the q- value for each gene, quoted in the text. The q-values of less than 0.2 were considered statistically significant which amounts to a 20% false positive rate. The relationship between mutation frequency and age of diagnosis was tested using the Spearman rank test. The R version 2.11.1 implementation of the Kolmogorov-Smirnov test (ks.test) was used to test differences in mutation frequency distributions of the following: 1) MYCN amplified vs. unamplified, 2) 17q loss vs. wildtype, 3) 1p gain vs. wildtype, 4) 11q loss vs. wildtype, and 5) hyperdiploid vs. diploid. Correction for multiple testing was performed using the R Bioconductor package q-value [246]. Significantly mutated genes led to an investigation of related genes, specifically those involved in chromatin remodeling and MAPK signaling. These lists of genes are provided in Appendix F. In a search for informative mutations in hypermutated samples, we examined mutations in genes from a published [362] and updated list of DNA repair genes available through the authors‘ website: http://sciencepark.mdanderson.org/labs/wood/dna_repair_genes.html.  155 Figure 4.1 Overview of the multi-centre next-generation sequencing initiatives and data analyses                156 Figure 4.2 Somatic mutation frequencies in 99 NBL tumor/normal pairs with samples ordered by type of genes with somatic alteration (A). Individual somatic mutation rates in the 99 NBL tumors arranged by mutation categories discussed in the text (color-coded): hypermutated, ALK mutated, chromatin remodeling gene mutated, MAPK pathway oncogene mutated, Cancer Gene Census gene mutated, and unknown. Within each category the samples are ordered by their somatic non-silent mutation rate corrected for callable exonic sequence (Mb). The data panels are described below in bold. Data type – sequencing technology used, blue = in-solution exome capture followed by Illumina sequencing, orange = Illumina whole genome sequencing, yellow = Complete Genomics, Inc. (CGI) whole genome sequencing. Hatched blocks identify cases for which data were generated using two technologies. Callable exonic Mb – megabases of coding sequence with sufficient data for mutation detection. Count of candidate somatic mutation – stacked bar plot of silent (i.e. synonymous) and non-silent mutations in each tumor. Boxplot to the right depicts distribution of non-silent mutation frequencies across all 99 tumors. Whiskers depict upper and lower ranges used to detect outliers, equal to first or third quartile minus or plus 3.5 times the interquartile range, for the first and third quartiles, respectively (i.e. Q1 - 3.5*IQR or Q3 + 3.5*IQR). Outlier mutation frequencies are shown as circles. dbGAP 6-letter identifiers – TARGET sample identifiers (Appendix D). (B). Distribution of specific mutations in each mutational category of interest (color-coded): hypermutated, ALK mutated, chromatin remodeling gene mutated, MAPK pathway oncogene mutated, Cancer Gene Census gene mutated, and unknown. Genes, found to be mutated at a significant frequency by MutSig analysis, are listed in bold. Mutations in MYC family members (MYCN and MYC) are also highlighted. Genes that are listed in the unknown category are MutSig hits that do not belong to any of the other categories described by the legend (PGLYRP3, GABRA6, SUCLG2, IGSF11). The data panels are described below in bold. Heatmap of non-silent mutations and structural rearrangements – colored blocks identify alterations in genes with statistically significant mutation frequencies or implicated as part of a mechanism disrupted in NBL; DNA repair (red), ALK signaling (orange), chromatin remodeling (green), MAPK signaling (blue), MYC family member (light blue). Alteration types are color-coded missense mutation (black), nonsense/frameshift/splice site mutation (red), and structural rearrangement (orange).  MYCN amplification – black  157 rectangles used to identify samples with MYCN amplification. A grey square identifies a sample for which a measurement of MYCN amplification could not be made for technical reasons.         158 A  Categories for sample classification  Hypermutated  ALK mutated  Chromatin remodeling gene mutated  MAPK pathway oncogene mutated  Cancer Gene Census gene mutated  Unclassified      159 B     Categories for sample classification  Hypermutated  ALK mutated  Chromatin remodeling gene mutated  MAPK pathway oncogene mutated  MYC family mutated  Cancer Gene Census gene mutated  Unclassified  Gene alteration categories in heatmap  Missense mutation  Nonsense, frameshift or splice site mutation  Structural variant or gene fusion  160 Figure 4.3 Integrated analysis of 99 neuroblastoma cases reveals a diversity of somatic aberration Each case analyzed by whole genome sequencing is represented as a CIRCOS plot [339]. The reference human chromosomes are arranged end-to-end in the outer-most ring. Genes harboring non-silent mutations are depicted outside of the chromosomes with circles color- coded as described in the legend. The ring inside the chromosomes shows somatic gains and losses of copy number. Finally, the inner-most ring depicts structural aberrations with black lines inside the circle; aberrations predicted to result in a gene fusion are highlighted with orange lines. The cases are ordered according to the categories in Figure 1. (A). Four cases with MAPK pathway aberrations. The MAPK pathway aberrations detected in the NBL genomes include mutations in NRAS and NF1 and gene fusions involving MAPK9 and MAPK10. (B). Four cases with chromatin remodeling aberrations. The chromatin remodeling aberrations detected in the NBL genomes include mutations in MLL5, CHD8, CREBBP, deletion in ARID1B, and a gene fusion involving IKZF3. (C). Three cases with aberrations in known cancer genes, including ABL2, ATM, and FANCD2. (D). Two cases with somatic mutations in ALK. (E). Six unclassified cases with no aberrations in the categories described thus far. Two unclassified cases (PARIRD and PARSHT) contain rearrangements of the MYCN amplification region.          161 A. Cases with somatic alterations in MAPK pathway oncogenes     162 B. Cases with somatic alterations in chromatin remodeling genes     163 C. Cases with somatic alterations in Cancer Gene Census genes   164 D. Cases with somatic alterations in the ALK oncogene              165 E. Unclassified cases       166 Table 4.1 Non-silent mutations in genes of interest along with their validation status The genes of interest listed in the table include genes mutated at a significant frequency (as detected by the MutSig analysis), genes implicated in chromatin remodeling or MAPK signaling (Appendix F), and genes listed in the Cancer Gene Census [7]. The genes are listed in the order shown in Figure 4.2. MutSig genes are highlighted in bold. Chromatin = X identifies chromatin remodeling genes hits; Cancer = X identifies genes listed in the Cancer Gene Census; MAPK = X identifies genes that encode a member of the MAPK Pathway (KEGG hsa04010); COSMIC = Number of samples recorded in the Catalogue of Somatic Mutations in Cancer that overlap the particular NBL mutation. Confirmation by orthogonal method = method(s) used to confirm the variant: Sanger sequencing, Custom hybrid capture and Illumina sequencing, RNA sequencing, genotyping using Sequenom assay [327]; * denotes hypermutated samples.    167 Category Gene Protein change Genome change (hg19) Case identifiers COSMIC overlaps Orthogonal method used to confirm variant Hypermutated (DNA repair) MLH1 p.Y157* chr3:37050322C>A PAPPKJ* 0 Sequenom DDB1 p.C725* chr11:61079358G>T PALJPX* 0 Sequenom ALK ALK p.I1170N chr2:29445216A>T PANRHJ 2 Sanger ALK p.I1171N chr2:29445213A>T PAKXDZ 2 Sanger, Sequenom ALK p.F1174L chr2:29443695G>T PANZVU, PALAKE 58 Sanger, Capture, Sequenom ALK p.F1174L chr2:29443695G>C PAREGK 58 Sanger, Capture, Sequenom ALK p.F1245I chr2:29436860A>T PAINLN 7 Sanger, Sequenom ALK p.I1250T chr2:29432739A>G PANYGR germline 0 Sanger, Sequenom ALK p.R1275Q chr2:29432664C>T PALNLU, PANBCI 42 Sanger, Capture, Sequenom ALK p.R1275L chr2:29432664C>A PASAZJ 42 Sanger, Capture, Sequenom Chromatin remodeling ARID1A p.G1139V chr1:27099000G>T PALNLU 0 Sequenom ARID1A p.G1942D chr1:27106214G>A PALXHW 0 Sequenom ARID1B p.R1487M chr6:157522242G>T PAMMWD 0 ASH1L p.K324R chr1:155451690T>C PAPPKJ* 0 CHD6 p.P2383T chr20:40040888G>T PAKFUY 0 CREBBP p.S1365* chr16:3790439G>T PASCLP 0 Capture, RNA-Seq, Sequenom EP300 p.R915C chr22:41546128C>T PANIPC 0 Sequenom HDAC4 p.P917L chr2:239990289G>A PALJPX* 0 IKZF4 p.G151V chr12:56420730G>T PALFPI 0 KDM5A p.A1028V chr12:420184G>A PANBCI 0 KDM6A p.Q1354* chrX:44966680C>T PALZRG 0  168 Category Gene Protein change Genome change (hg19) Case identifiers COSMIC overlaps Orthogonal method used to confirm variant MLL3 p.A4748T chr7:151841899C>T PALWIP 0 Sequenom MLL4 p.C1432* chr19:36218517C>A PANZVU 0 MLL5 p.P1759Q chr7:104753479C>A PANYGR 0 Capture, RNA-Seq, Sequenom NUP98 p.E836V chr11:3735118T>A PANPVI 0 PAX5 p.D227N chr9:36966647C>T PAIXNV 0 PRDM2 p.P1547T chr1:14108929C>A PAMMWD 0 PRDM4 p.E733G chr12:108128195T>C PAPPKJ* 0 MAPK pathway EGFR p.P641H chr7:55240678C>A PALUDH 0 Sequenom LILRB1 p.E57* chr19:55143049G>T PALZZV 0 Sequenom LILRB1 p.R550_splic e chr19:55147061G>T PALTEG 0 LILRB1 p.S209T chr19:55143652T>A PANUKV 0 Sequenom NF1 p.Ile1679Val chr17:29653037A>G PASDZJ 0 NF1 p.E2501* chr17:29679318G>T PALJPX* 1 Sequenom NFKB2 p.H894fs chr10:104162112_10416211 2delC PAPPKJ* 0 NRAS p.G13R chr1:115258745C>G PANBSP 331 NRAS p.Q61K chr1:115256530G>T PAPTMM 1551 Sanger, Capture, Sequenom NTRK1 p.V263L chr1:156841484G>T PAIXNC 0 PDGFRB p.A927S chr5:149499049C>A PAITCI 0 PTPN11 p.A72T chr12:112888198G>A PAPBZI 68 PTPN11 p.E76A chr12:112888211A>C PALHVD 128 PTPN12 p.H460Y chr7:77256374C>T PAPPKJ 0 PTPN13 p.V811L chr4:87662913G>T PAILNU 0 MYC p.T73I chr8:128750681C>T PAKZRF 0  169 Category Gene Protein change Genome change (hg19) Case identifiers COSMIC overlaps Orthogonal method used to confirm variant MYCN p.P44L chr2:16082317C>T PASLGS 1 Sanger, Capture, RNA-Seq, Sequenom PGLYRP3 p.R175M chr1:153276338C>A PALZRG 0 Sequenom PGLYRP3 p.F237C chr1:153274903A>C PALHVD 0 Sequenom PGLYRP3 p.H338N chr1:153270446G>T PALZSL 0 Sequenom GABRA6 p.R84H chr5:161115980G>A PAPPKJ 0 Sequenom GABRA6 p.A322T chr5:161119084G>A PAMUTD 0 SUCLG2 p.D109H chr3:67579512C>G PAICGF 0 Sequenom SUCLG2 p.R160Q chr3:67570997C>T PAPPKJ 0 IGSF11 p.S270T chr3:118623540C>G  PAITCI 0 Sequenom IGSF11 p.P366Q chr3:118621566G>T PALJPX 0 Sequenom Cancer Gene Census ABL2 p.P996A chr1:179077416G>C PANYGR 0 RNA-Seq, Sequenom ATIC p.P148A chr2:216190772C>G PALAKM 0 ATM p.A2274V chr11:108196798C>T PANRRW 1 Capture, RNA-Seq ATRX p.S2017P chrX:76849227A>G PALXHW 0 BCR p.S317fs chr22:23524096_23524097in sC PAHYWC 0 CARD11 p.R1011K chr7:2951918C>T PAHYWC 0 CD79A p.P128fs chr19:42383609_42383610in sC PAMMWD 0 CIITA p.E372D chr16:11000462G>C PAMVAG 0 CLTC p.A76T chr17:57721820G>A PAPPKJ* 0 COL1A1 p.G1163R chr17:48264420C>G PALHVD 0 DDX5 p.I82V chr17:62500403T>C PAPPKJ* 0 FANCD2 p.K871N chr3:10114944A>C PANN 0 RNA-Seq FANCE p.R200C chr6:35423873C>T PAMBAC 0 KTN1 p.Q618L chr14:56106660A>T PARGUX 0 Sanger, Capture MEN1 p.R521fs chr11:64572092_64572093in PANUKV 2  170 Category Gene Protein change Genome change (hg19) Case identifiers COSMIC overlaps Orthogonal method used to confirm variant sG MET p.R359Q chr7:116340214G>A PASAZJ 0 Capture MLLT3 p.167_168SS >S chr9:20414341_20414343del CTA PALXMM 0 MLLT3 p.Q326* chr9:20413868G>A PAPPKJ* 0 Sequenom MSI2 p.R269W chr17:55752347C>T PAKFUY 0 NACA p.P996fs chr12:57112326_57112326d elG PALAKE 0 NKX2-1 p.S44Y chr14:36988522G>T PAMZMG 0 NOTCH1 p.C2189Y chr9:139391625C>T PAPPKJ* 0 Sequenom NOTCH2 p.P6fs chr1:120612003_120612004 delGG PALWVJ 0 PDE4DIP p.Q1197K chr1:144881607G>T PAKXDZ 0 PIK3CA p.K111N chr3:178916946G>T PAIPGU 20 Sequenom PLAG1 p.P458Q chr8:57078932G>T PAREGK 0 Sanger, Capture ROS1 p.L2013V chr6:117638404G>C PAHYWC 0 TAF15 p.R406I chr17:34171520G>T PALUDH 0 TMPRSS2 p.A423fs chr21:42842589_42842590in sC PALAKE 0 TRIP11 p.A1552T chr14:92466356C>T PALNLU 0 TSC1 p.T356I chr9:135786463G>A PALFPI 0 USP6 p.67_68IR> MW chr17:5036210_5036211TC >GT PASLGS possible germline 0 Sanger, Capture WT1 p.R495P chr11:32410674C>G PANYBL 0    171 Table 4.2 Genes with significant frequency of somatic mutation Somatic mutations in exomic regions from 99 NBL cases were analyzed using the MutSig algorithm [329] as described in Section 4.4.8 with and without two hypermutated (HM) samples. The MutSig algorithm tests the null hypothesis that all the observed mutations in each gene are a consequence of random background mutation processes. Genes for which this hypothesis is rejected based on the Benjamini-Hochberg false discovery rate-corrected q-value (q < 0.2) are considered significantly mutated, and are listed in the table.  Gene Description Patients Unique sites q-value no HM q-value with HM Expressed in 10 neuroblastoma transcriptomes ALK Anaplastic lymphoma receptor tyrosine kinase 9 6 7.7x10 -7  2.6x10 -6  Yes PGLYRP3 Peptidoglycan recognition protein 3 3 3 0.045 0.065 No LILRB1 Leukocyte immunoglobulin-like receptor, subfamily B, member 1 3 3 0.071 0.085 Yes PTPN11 Protein tyrosine phosphatase, non-receptor type 11 2 2 0.13 0.17 Yes NRAS Neuroblastoma RAS viral (v-ras) oncogene homolog 2 2 0.17 0.17 Yes GABRA6 Gamma-aminobutyric acid (GABA) A receptor, alpha 6 2 2 1.00 0.17 No SUCLG2 Succinate-CoA ligase, GDP- forming, beta subunit 2 2 1.00 0.17 Yes IGSF11 Immunoglobulin superfamily, member 11 2 2 1.00 0.18 No    172 Table 4.3 Notable structural variants detected and confirmed in NBL genomes and transcriptomes *These fusions likely have complex architecture and may involve additional neighboring genes. The following designations are used in the Table: SV = structural variant; CE = capillary electrophoresis; MAPK = identifies genes that encode a known or putative member of the MAPK Pathway (KEGG hsa04010); Cancer = identifies genes listed in the Cancer Gene Census [7]; t(11;17) = identifies genes affected by a translocation between chromosomal arms 17q and 11q; Chromatin remodeling = identifies genes that function in chromatin remodeling; Recurrent genes = identifies genes recurrently affected by structural variants in this study; Mitelman database =  identifies genes known to be involved in cancer-specific genome rearrangements as recorded in the Mitelman database of chromosome aberrations and gene fusions in cancer [343]; Other = denotes other notable genes described in the text; Confirmed, evidence in blood = somatic events detectable by PCR in the patient‘s blood, likely derived from circulating tumor DNA.              173 Gene(s) Event Type Sample Breakpoint Breakpoint Validation status Comment MYCN; GULP1 Fusion PARIRD chr2:16083041 chr2:189393508 Confirmed somatic Cancer ABL2; ACBD6 Fusion PARRBU chr1:179198375 chr1:180382607 Probable somatic Cancer ARID1B SV PASLGS chr6:157138276 chr6:157168409 Confirmed somatic Chromatin remodeling FAM86C1; IKZF3 Fusion PARGUX chr11:71508561 chr17:37960058 Probable somatic by CE Chromatin remodeling; t(11;17) MAPK10; PRDM5 Fusion PASCKI chr4:87260892 chr4:121730886 Probable somatic by CE MAPK PRELID2; MAPK9 Fusion PAPSKM chr5:145142894 chr5:179682098 Confirmed somatic MAPK AUTS2 SV PARIRD chr7:70188135 chr7:70200616 Confirmed somatic Mitelman database RRM1714; NOP2 Fusion PARDUJ chr12:6669132 chr12:6679022 Confirmed somatic Mitelman database PTPN13 SV PASCKI chr4:87732011 chr4:104699882 Probable somatic by CE MAPK FAM134B; CDH18 Fusion PAPSKM chr5:16539884 chr5:19720574 Confirmed somatic Other CDH13 SV PAPTLD chr16:82673944 chr16:82684731 Confirmed somatic Other LSAMP; STAG1 Fusion PANRRW chr3:116668399 chr3:136402660 Confirmed somatic Other NBAS; BAZ2B Fusion PARIRD chr2:15578371 chr2:160221641 Confirmed somatic Recurrent gene NBAS; CCNT2* Fusion PARIRD chr2:15591679 chr2:135682003 Confirmed somatic Recurrent gene NBAS SV PARIRD chr2:15648856 chr2:15650260 Confirmed somatic Recurrent gene NBAS SV PARIRD chr2:15659845 chr2:15660266 Probable somatic by CE Recurrent gene NBAS SV PARIRD chr2:15699066 chr17:53790753 Probable somatic by CE Recurrent gene  174 Gene(s) Event Type Sample Breakpoint Breakpoint Validation status Comment CDKAL1 SV PAPTMM chr6:20769198 chr6:20806792 Probable somatic by CE Recurrent gene CDKAL1 SV PASLGS chr6:20806846 chr6:20899275 Confirmed somatic Recurrent gene APBB1766; ZFHX3 Fusion PARDUJ chr16:73036896 chr16:73064047 Confirmed somatic Recurrent gene NBAS; AK001558* Fusion PARSHT chr2:15629062 chr2:12660527 Confirmed, evidence in blood Recurrent gene NBAS SV PASDZJ chr2:15794685 chr2:17080300 Confirmed, evidence in blood Recurrent gene NBAS; FAM49A* Fusion PASDZJ chr2:15667544 chr2:17302098 Confirmed, evidence in blood Recurrent gene NBAS SV PASDZJ chr2:16794222 chr2:17046968 Confirmed, evidence in blood Recurrent gene NBAS SV PASDZJ chr2:16975790 chr2:17208524 Confirmed somatic Recurrent gene NBAS SV PARSHT chr2:12660729 chr2:15626595 Confirmed, evidence in blood Recurrent gene ZFHX3 Duplication PANRRW chr16:73064821 chr16:73352657 Putative unknown origin Recurrent gene RNF121; TRIM37 Fusion PANNMS chr11:71692501 chr17:57072537 Confirmed somatic t(11;17) ATG2A; BCAS3 Fusion PARGUX chr11:64674966 chr17:58891730 Probable somatic by CE t(11;17) SHANK2 SV PASCKI chr11:70784776 chr17:34136040 Confirmed somatic t(11;17)  175 Table 4.4 Parameters used to select high confidence candidate somatic mutations reported by CGI The MAF files provided by Complete Genomics, Inc (CGI) were filtered based on the parameters described in the table.  Selection Criterion Operator Value Variant_Classification Equal (Nonsense, Misstart, Nonstop, Frame_Shift, In_Frame, Missense, Splice_Site) Variant_Type Equal (Snp, Ins, Del, Sub) Mutation_Status Equal (Somatic, LOH) Tumor_VarScore_Rank >= 0.025 Match_Norm_RefScore_Rank >= 0.025  176 Table 4.5 Primer sequences used for genomic validation of structural variants and gene fusions detected by BCCA pipeline  Sample Gene(s) Genomic breakpoint Genomic breakpoint Primer 1 Primer 2 PANNMS RNF121; TRIM37 chr11:71692501 chr17:57072537 GATATTTCGTTTGGATAGCA CTGG GAAGTGCAGTAGCACGATTT TGG PANRRW LSAMP; STAG1 chr3:116668399 chr3:136402660 TCTGCAGAGAGAAAGACTAC CTTG TACTGAGTTTTCCTATCCACA AGC PARSHT NBAS chr2:12660527 chr2:15629062 ATAATTGTTGCTAGTGGAGG AAGG ACAAATACCCTGAGAGTCTG GAAG PARSHT NBAS chr2:12660729 chr2:15626595 ATAATTGTTGCTAGTGGAGG AAGG ACAAATACCCTGAGAGTCTG GAAG PASDZJ NBAS chr2:15794685 chr2:17080300 GTCAAATTTATCAGCCTTTG GC GTTTAAGGCCCTGATAGAAG AGG PASDZJ NBAS chr2:15667544 chr2:17302098 GACGATCTATCCTGGCACTG AC ATTCATGTTGCAAGAGCAGA AG PASDZJ NBAS chr2:16794222 chr2:17046968 GGAACTTCTTGATATGGTCT GACTC TTCCCAGTTCTTTCTTATAGA GGTG PASDZJ NBAS chr2:16975790 chr2:17208524 ATAGGAATCACAACAGGAA AGGAG CTACAGCACGGGCTTCTAAA AC PASLGS RERE chr1:5081632 chr1:8421299 GACACTCATGAGCATAGAAA AAGG AGGACAATGAGAGTGACTCG GAC PARRBU MPRIP chr17:16952510 chr17:2459132 CCGAGTTTAAGCGATTCTTG TG GGTATATGCCAAGAAGAATT GAGG      177 Table 4.6 Primer sequences used for tumor RNA validation of structural variants and gene fusions detected by the BCCA pipeline  Sample Gene(s) Primer1 Primer2 PANNMS RNF121; TRIM37 ATCTCTCTCCAGAAGAGCAATGG AGGTGCAGTGTCAGTTTCAAATC PANRRW LSAMP; STAG1 GAATAACACACCGGAGACTTTTG GTTAAAATCCACGCTGCGAACAG PARRBU MPRIP GGATGACACTTTGAGAACTCCTG GTGTCAGAACTGCTTCAAGCCC PARSHT NBAS ACTGGAACAAATTCTCAGTGTGTC GAATCCATCTTTCTCTCATGTAGC PASDZJ NBAS GTGAGAAGTGGTGTCACTCACGC ATTCATGTTGCAAGAGCAGAAG PASDZJ NBAS ATCAACACAGCTATTTACCACCC GTTCAGAGAATCTCCCAAAATCAC PASDZJ NBAS ACAGCTATTTACCACCCTGGTC GTCCATCATAGAGCTGAAAATGTG PASLGS RERE ACAGTGAAGAAGTCGGCCAAGAAG AGAGACACCAAACAGGCTTTGAG PASLGS RERE GTACCTCCAGCAATGACAGTAAAG CTCATTTTGTCTTCAATGTGGG   178 Chapter 5: Conclusions and future directions Evolving methods of genomic analysis reviewed in Chapter 1 have contributed to the characterization of cancer genomes and transcriptomes at ever-increasing resolution. The advent of second-generation sequencing technologies has enabled studies of cancers that achieve single-nucleotide views of both genomes and transcriptomes. Applications of array- based methods and candidate gene Sanger-based re-sequencing to the analysis of human neuroblastoma (NBL) have revealed novel loci associated with the disease, most notably the anaplastic lymphoma kinase ALK that is subject to somatic mutation and amplification, occurring in 5-15% of patients with sporadic NBL [192,193,188,194] . Based on these studies we hypothesized that interrogations of NBL genomes and transcriptomes using second generation technologies may lead to novel insights into the disease.  We also hypothesized that better understanding of the gene expression profile of the putative cell of origin of NBL will help identify loci with clinical relevance to the disease and interpret high throughput  sequencing data from NBL cells. To address these hypotheses we developed three research objectives that formed the basis of Chapters 2, 3, and 4, each fulfilling specific goals described in the subsections below. 5.1 Transcriptome analysis of normal neural crest cells identifies key pathways, enriched and depleted in this population compared to other related cell types Since NBL is thought to originate from a differentiation arrest along the sympathoadrenal lineage of the neural crest, understanding the neural crest stem cell and its development into this lineage may provide insight into the pathogenesis of NBL. Therefore, the overall objective of Chapter 2 was to identify and characterize the expression of genes and pathways that distinguish neural crest stem cells from other stem cell lineages with similarly broad developmental potential. The Skin-derived Precursor cells (SKPs) have been validated as models for normal neural crest stem cells by previous work [234] and have been used for this analysis. The Mesenchymal Stem Cells (MSCs) have been chosen for comparison as they represent one of the few somatic stem cell lineages that approach the developmental potential of the neural crest [230,255,363]. To address the research objective of Chapter 2, I first characterized the transcriptomes of SKPs isolated from ventral, dorsal and facial skin regions of the body that are thought to derive from different developmental origins, including neural crest itself and somite  179 mesoderm. This analysis revealed plasticity of the neural crest stem cell phenotype suggesting that cells resembling normal neural crest stem cells may arise from non-neural crest lineages.   Based on this result, I used the three SKP populations to identify transcripts enriched and depleted in SKPs compared to a related multipotent somatic stem cell lineage, the MSCs. This analysis revealed the relative increase of mRNA abundance of transcripts involved in the WNT/Beta-catenin, BMP and TGFB pathways, and relative depletion of transcripts involved in double-stranded break DNA repair in SKPs compared to MSCs. While the importance of active WNT/Beta-catenin, BMP and TGFB signaling in neural crest cells is well-established [257,364,272], the relative reduction of the expression level of genes involved in double-stranded break DNA repair is a novel finding. A recent study in mice has identified eleven DNA repair genes, highly expressed during very early embryonic development and barely detectable in the adrenal medulla, an organ derived from the sympathoadrenal lineage of the neural crest and the most common primary site of NBL [365]. This study is consistent with my finding of the decreased mRNA abundance of DNA repair genes in SKPs compared to MSCs. In addition, the SKP and MSC comparison revealed the preferential expression of pluripotency markers in SKPs, which prompted me to further investigate similarities and differences between the expression profiles of SKPs and ES cells. This analysis revealed 5 pluripotency markers (CTNNB1, ETV4, MAD2L2, PITX2, SOX2) among the genes enriched in SKPs compared to MSCs, and 13 pluripotency markers (ADAM23, AURKB, CENPK, FAM46B, FAM64A, HMGB2, IGF2BP3, KPNA2, MTHFD1, MYBL2, TBX4, TPM1, ZFP57) among the genes depleted in SKPs compared to MSCs, highlighting the unique phenotype of SKPs. Future studies based on the findings in Chapter 2 may investigate further the functional mechanism and consequences of the observed reduction of expression of DNA repair genes in normal neural crest cells. The work discussed in Chapter 2 involved rat rather than human neural crest cells as a model for the analysis, and it will be important to validate these findings in human SKPs as well as other models of neural crest cells (for example, the human epidermal neural crest stem cells [366]). The original choice of the rat cells was driven by our desire to investigate the similarities and differences among SKPs isolated from different parts of the body, and the availability of rat-derived MSCs for comparisons. Since  180 we showed the convergence of facial, dorsal trunk and ventral trunk SKPs to a neural crest stem cell phenotype, and assuming that this finding holds true in other vertebrate species, SKPs from any part of the body can be used to model normal neural crest stem cells. We took advantage of this result in Chapter 3, where human foreskin-derived SKPs were used as a reference normal tissue for the analysis of NBL tumor-initiating cells. 5.2 Plasticity of the neural crest stem cell phenotype and NBL heterogeneity The results of the analysis described in Chapter 2 are consistent with the hypothesis that neural crest stem cell-like cells could derive from non-neural crest lineages. In particular, we showed that mesoderm-derived ventral and dorsal SKPs were similar to neural crest- derived facial SKPs at the level of gene expression and differentiation potential. This finding may relate to the heterogeneity of NBL, which is a spectrum of diseases with diverse genetic aberrations, pathological features, and clinical courses.  Dozens of clinical and biological markers of potential clinical significance have been proposed for NBL [367]. Seven of these markers, including the differentiation grade of the tumor (neuroblastoma, ganglioneuroblastoma or ganglioneuroma), are currently used clinically for pre-treatment risk stratification of new NBL patients [183]. The differentiation grade of NBL cells may reflect the developmental stage at transformation, and correlates with the disease course, such that low-risk tumors typically have a more differentiated morphology than high-risk tumors [175]. NBL recurrence may still occur in some low- or intermediate-risk patients with differentiated morphology and low-stage disease, suggesting that tumors of the same differentiation grade and stage may be heterogeneous at a molecular level. Poor outcome in patients with differentiated tumors and low-stage disease was found to be associated with high expression of MYC and low expression of genes involved in sympathetic neuronal differentiation [368]. The heterogeneity of NBL cells with respect to their apparent developmental program is also reflected in the variable sensitivity of NBL cell lines to differentiation agents. For instance, retinoids can induce marked neuronal differentiation and cell cycle arrest in some NBL cell lines but fail to have any effect on other NBL lines, derived from patients with similar disease characteristics [369].  Notably, retinoic acid is involved in regulating the differentiation of many tissues; however, the nature of the growth and differentiation response to retinoic acid depends on the cell type. [370].  181  The diversity of NBL cells with respect to differentiation grade, expression of developmental markers, and sensitivity to retinoids may reflect different origins of the neural crest progenitor cells that undergo transformation into NBL. As reported in Chapter 2, neural crest stem cell-like cells may arise from both the neural crest and the mesoderm. This observation suggests that NBL may in principle derive from mesodermal cells that have converged to a neural crest precursor phenotype. Since cells of different developmental origins, despite having similar phenotypes, maintain a developmental history at the gene expression level (Section 2.2.2), different developmental origins of NBL may account for the observed gene expression differences among NBL cells of presumably similar differentiation grades [368]. The potential for mesodermal cells to give rise to NBL, as well as the putative impact of this on NBL heterogeneity remains to be addressed by future studies. 5.3 Transcriptome analysis of NBL tumor-initiating cells implicates AURKB as a novel drug target for NBL Having characterized the transcriptomes of normal neural crest cells in Chapter 2, I set out to characterize a presumed malignant counterpart of these cells, the NBL tumor- initiating cells (TICs) derived from bone marrow metastases of high-risk NBL patients. These cells have been shown to give rise to NBL when injected in mice, and upon serial transplantation, suggesting that they are a suitable model for the disease. In addition, the isolation of NBL TICs from patients in remission who later relapsed suggested that these cells could be used as markers for minimal residual disease in otherwise asymptomatic patients [279]. The overall objectives of Chapter 3 were to identify transcripts preferentially abundant in NBL TICs compared to normal SKPs, characterized in Chapter 2, and to assess if these transcripts could be used to suggest new targets against NBL. To address these objectives I used RNA-Seq data from NBL TICs, SKPs, and other cancers to identify transcripts whose expression was increased in NBL TICs compared to other tissue types. I then conducted pathway analysis to identify functional associations among these transcripts that could be targeted by inhibitors. The pathway analysis revealed the increased expression of the BRCA1 signaling pathway members in NBL TICs compared to SKPs and other tissue types, suggesting that the double-stranded break DNA repair pathway might be activated in NBL TICs. The finding of the potential tumor-specific activation of this pathway led to the  182 hypothesis that AURKB, a kinase linked to this pathway through its interaction with BRCA1-associated RING domain protein 1 (BARD1) and aberrantly expressed in NBL TICs could serve as a drug target against these cells. This hypothesis was tested through AURKB knock downs and treatments with an AURKB-specific pharmacological inhibitor, and both experiments led to the specific killing of NBL TICs but not normal SKPs. An independent group of investigators later confirmed AURKB to be a target in primary NBL tumors further validating our result [306]. Thus, the expression of members of the BRCA1 signaling pathway found to be lower in normal neural crest stem cell-like cells compared to MSCs in Chapter 2, appeared to be increased in NBL TICs. Inhibition of a kinase involved in this pathway appeared to be cytotoxic to these cells, suggesting the importance of this pathway for NBL pathogenesis. Additional support for the role of BRCA1 signaling in the pathogenesis of NBL comes from a GWA study that implicated SNPs in the BARD1 locus to be associated with the development of sporadic high-risk NBL [197]. The SNPs identified by the GWAS analysis have been suggested to influence the splicing of BARD1 such that exons 2 and 3 are excluded, resulting in the loss of the functional domain involved in the BARD1 interaction with BRCA1 [196]. The exon-level RNA-Seq analysis reported in Chapter 3 supported the hypothesis of the preferential expression of the short BARD1beta isoform by NBL cells. The report of a novel function of the stabilization of AURKB by the short BARD1beta isoform [269] provides a potential mechanism for the preferential sensitivity of NBL cells but not normal neural crest cells to AURKB inhibition. Future work resulting from this finding will include functional studies that would investigate the molecular effects of the inhibition of AURKB on the expression of the BRCA1 pathway members. In this thesis, we speculated that inhibition of AURKB acts through downregulating the expression of BRCA1 pathway members, such as gross chromosomal abnormalities are accumulated and not repaired, resulting in cell death. However, direct experimental evidence is required to support or refute this speculation. Examining the expression of BARD1 and its splicing status following AURKB inhibition would also be of interest. This experiment would reveal whether the killing of NBL cells by inhibiting AURKB is associated with the downregulation of the expression of the BARD1beta isoform.  183 A limitation of the work described in Chapter 3 is the use of NBL TICs that are reportedly contaminated with EBV-transformed lymphocytes [280]. I believe that effects of this contamination on the results were partially accounted for by the experimental design that used an expression compendium with lymphocyte-related tissues (diseased B-cells) as reference for identifying NBL TIC-enriched transcripts. The independent validation of drug targets predicted by my analysis in primary tumors by other investigators (Chapter 3) provided additional validation for the usefulness of NBL TICs as models of NBL, despite the contamination. However, confirmatory studies in non-contaminated NBL stem cells would be useful to assess the generality of our findings. 5.4 Whole genome, transcriptome and exome sequencing of primary NBL tumors reveals a broad spectrum of somatic mutations The analysis described in Chapters 2 and 3 focused on the transcriptomes of normal and malignant neural crest cells, and implicated the BRCA1 DNA repair pathway as aberrantly enriched at the mRNA level in metastases-derived NBL TICs compared to the normal neural crest-like cells. While the finding of AURKB as a novel drug target was validated in primary tumors [306], the overall experimental design in Chapter 3 focused on identifying metastases-enriched transcripts and potential targets. Therefore, the objective of Chapter 4 was to conduct a high resolution characterization of a panel of 99 primary NBL tumors to identify recurrently altered genes and pathways of relevance to primary tumors at diagnosis. We also investigated whether the genetic aberrations found in primary tumors targeted similar pathways to those that have been identified to be aberrantly expressed in metastases-derived NBL TICs (Chapter 3). We sequenced 99 primary tumors and matched peripheral blood using a combination of whole genome and exome sequencing performed using Illumina and CGI technologies. We also sequenced the transcriptomes from 10 primary tumors included in the set of 99 cases. Analysis of these data revealed that NBL tumors contained a median 0.56 non-silent mutations per megabase of coding DNA, one of the lowest rates reported in cancer to date. The ALK gene showed the highest somatic mutation rate and was found to be mutated in 9% cases, with another case PANYGR harboring an oncogenic germline mutation in the kinase domain of ALK. Three additional genes (LILRB1, PTPN11 and NRAS) showed significantly recurrent mutations in non-hypermutated cases, albeit in less than 5% of cases. A loss-of-  184 function translocation of IKZF3 together with alterations found in related genes implicated disruption of chromatin remodeling mechanisms in 11% of cases. Mutations in PTPN11, its regulator, LILRB1, and other MAPK signaling components including NRAS, implicated hyperactivation of the RAS/MAPK pathway in 15% of cases. Mutations in MYC and MYCN were seen in two tumors without MYCN amplification, suggesting that MYCN could be activated in NBL through a variety of mechanisms. A hypermutator phenotype was found in 2% of the cases with loss of function mutations in DNA repair genes. In addition, we identified over 80 somatic structural variants including the aforementioned IKZF3 rearrangement. Therefore, the work described in Chapter 4 highlighted the molecular heterogeneity of high-risk NBL, identified commonly disrupted pathways, and demonstrated a relative paucity of somatically acquired mutations, thus implicating epigenetic events as potentially contributing to the tumor behavior. In addition to cataloging the genetic aberrations found in primary tumors, I also compared the genes harboring somatic mutations in primary tumors to those found to be increased in expression in NBL TICs compared to SKPs. While I did not observe somatic mutations in BARD1 that could directly explain the preferential expression of the short BARD1beta isoform described in Chapter 3, I did observe several novel germline variants occurring in BARD1 introns that could be associated with this phenotype. Future studies can address this possibility by examining a larger cohort of tumors with matched expression and DNA sequence data from tumor and normal DNA. 5.5 Future directions in NBL genomics While the work conducted in Chapter 4 was able to identify a potential disease mechanism in over 50% of all cases (Figure 4.2B), there is a significant amount of discovery that still needs to occur to unravel additional molecular aberrations that may contribute to NBL development. It remains a challenge from the translational point of view that the most common genomic aberrations in primary NBL are large chromosomal rearrangements affecting hundreds of genes, and other than MYCN and ALK, focal disruption of individual genes appear to be rare (as seen in Figure 4.3). In addition, it is possible that a significant part of the disease phenotype may be related to germline genetic variation and subsequent stochastic and/or epigenetic alterations in tumor cells.  Future efforts in the field may involve integration of data from the genome-wide association efforts [196] with the sequencing data,  185 such as those described in Chapter 4, as well as generation of new data sets querying epigenetic and expression changes. Precedence for epigenetic abnormalities playing a causative role in the pathogenesis of a pediatric cancer has been established by a recent study in retinoblastoma. This study employed genome-wide sequencing and epigenetic analysis of retinoblastoma tumors to reveal few somatic mutations but a number of cancer pathways, including the pathway involving the proto-oncogene SYK, being deregulated at an epigenetic level [170]. Since NBL can be regarded as a malignancy resulting from a differentiation arrest of the neural crest [371], epigenetic abnormalities may play a significant part in determining the ultimate clinical phenotype. Whether this is so remains to be addressed through comprehensive surveys of the epigenome.  186 Bibliography 1. Nature Milestones in Cancer [http://www.nature.com/milestones/milecancer/masthead/index.html].Accessed 2 March 2011. 2. Boveri T: Uber mehrpolige mitosen als mittel zur analyse des zellkerns. Verh. D. Phys. Med. Ges. 1902, 35:67–90. 3. Boveri T: Zur Frage der Entstehung maligner Tumoren. Jena: Verlag von Gustav Fischer; 1914. 4. Finlay CA, Hinds PW, Levine AJ: The p53 proto-oncogene can act as a suppressor of transformation. Cell 1989, 57:1083–1093. 5. Huang HJ, Yee JK, Shew JY, Chen PL, Bookstein R, Friedmann T, Lee EY, Lee WH: Suppression of the neoplastic phenotype by replacement of the RB gene in human cancer cells. Science 1988, 242:1563–1566. 6. Stehelin D, Varmus HE, Bishop JM, Vogt PK: DNA related to the transforming gene(s) of avian sarcoma viruses is present in normal avian DNA. Nature 1976, 260:170–173. 7. Futreal PA, Coin L, Marshall M, Down T, Hubbard T, Wooster R, Rahman N, Stratton MR: A census of human cancer genes. Nat. Rev. Cancer 2004, 4:177–18310.1038/nrc1299. 8. Rous P: A sarcome of the fowl transmissible by an agent separable from the tumor cells. J. Exp. Med 1911, 13:397–411. 9. Leach FS, Nicolaides NC, Papadopoulos N, Liu B, Jen J, Parsons R, Peltomäki P, Sistonen P, Aaltonen LA, Nyström-Lahti M: Mutations of a mutS homolog in hereditary nonpolyposis colorectal cancer. Cell 1993, 75:1215–1225. 10. Nordling CO: A new theory on cancer-inducing mechanism. Br. J. Cancer 1953, 7:68– 72. 11. Knudson AG: Mutation and cancer: statistical study of retinoblastoma. Proc. Natl. Acad. Sci. U.S.A 1971, 68:820–823. 12. Nowell PC: The clonal evolution of tumor cell populations. Science 1976, 194:23–28. 13. Fearon ER, Vogelstein B: A genetic model for colorectal tumorigenesis. Cell 1990, 61:759–767. 14. Feinberg AP, Vogelstein B: Hypomethylation distinguishes genes of some human cancers from their normal counterparts. Nature 1983, 301:89–92. 15. Laird PW, Jackson-Grusby L, Fazeli A, Dickinson SL, Jung WE, Li E, Weinberg RA, Jaenisch R: Suppression of intestinal neoplasia by DNA hypomethylation. Cell 1995, 81:197–205.  187 16. Hanahan D, Weinberg RA: The hallmarks of cancer. Cell 2000, 100:57–70. 17. Li FP: Familial cancer syndromes and clusters. Curr Probl Cancer 1990, 14:73–114. 18. Sijmons R: Identifying patients with familial cancer syndromes. In Cancer Syndromes National Center for Biotechnology Information (US); 2009. 19. Knudson AG: Hereditary cancers disclose a class of cancer genes. Cancer 1989, 63:1888–1891. 20. Latif F, Tory K, Gnarra J, Yao M, Duh FM, Orcutt ML, Stackhouse T, Kuzmin I, Modi W, Geil L: Identification of the von Hippel-Lindau disease tumor suppressor gene. Science 1993, 260:1317–1320. 21. Kenemans P, Verstraeten RA, Verheijen RHM: Oncogenic pathways in hereditary and sporadic breast cancer. Maturitas 2004, 49:34–4310.1016/j.maturitas.2004.06.005. 22. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan A, Sougnez C, et al.: Initial sequencing and analysis of the human genome. Nature 2001, 409:860– 92110.1038/35057062. 23. The International HapMap Project: Nature 2003, 426:789–79610.1038/nature02168. 24. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, Smith HO, Yandell M, Evans CA, Holt RA, Gocayne JD, Amanatides P, Ballew RM, Huson DH, Wortman JR, Zhang Q, Kodira CD, Zheng XH, Chen L, Skupski M, Subramanian G, Thomas PD, Zhang J, Gabor Miklos GL, Nelson C, Broder S, Clark AG, Nadeau J, McKusick VA, Zinder N, et al.: The sequence of the human genome. Science 2001, 291:1304– 135110.1126/science.1058040. 25. National Cancer Institute: Surveillance, Epidemiology and End Results (SEER) Database. 2010, Available: http://seer.cancer.gov/statistics/.Accessed 28 July 2011. 26. Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Pukkala E, Skytthe A, Hemminki K: Environmental and heritable factors in the causation of cancer- -analyses of cohorts of twins from Sweden, Denmark, and Finland. N. Engl. J. Med 2000, 343:78–8510.1056/NEJM200007133430201. 27. Christiani DC: Combating Environmental Causes of Cancer. New England Journal of Medicine 2011, 364:791–793. 28. Stent GS: The role of cell lineage in development. Philos. Trans. R. Soc. Lond., B, Biol. Sci 1985, 312:3–19.  188 29. Bonnet D, Dick JE: Human acute myeloid leukemia is organized as a hierarchy that originates from a primitive hematopoietic cell. Nat Med 1997, 3:730– 73710.1038/nm0797-730. 30. Al-Hajj M, Wicha MS, Benito-Hernandez A, Morrison SJ, Clarke MF: Prospective identification of tumorigenic breast cancer cells. Proc. Natl. Acad. Sci. U.S.A 2003, 100:3983–398810.1073/pnas.0530291100. 31. Singh SK, Clarke ID, Terasaki M, Bonn VE, Hawkins C, Squire J, Dirks PB: Identification of a Cancer Stem Cell in Human Brain Tumors. Cancer Research 2003, 63:5821 –5828. 32. Quintana E, Shackleton M, Sabel MS, Fullen DR, Johnson TM, Morrison SJ: Efficient tumour formation by single human melanoma cells. Nature 2008, 456:593– 59810.1038/nature07567. 33. Santisteban M, Reiman JM, Asiedu MK, Behrens MD, Nassar A, Kalli KR, Haluska P, Ingle JN, Hartmann LC, Manjili MH, Radisky DC, Ferrone S, Knutson KL: Immune- induced epithelial to mesenchymal transition in vivo generates breast cancer stem cells. Cancer Res 2009, 69:2887–289510.1158/0008-5472.CAN-08-3343. 34. Gupta PB, Chaffer CL, Weinberg RA: Cancer stem cells: mirage or reality? Nat Med 2009, 15:1010–101210.1038/nm0909-1010. 35. Zhou B-BS, Zhang H, Damelin M, Geles KG, Grindley JC, Dirks PB: Tumour- initiating cells: challenges and opportunities for anticancer drug discovery. Nat Rev Drug Discov 2009, 8:806–82310.1038/nrd2137. 36. Beheshti B, Braude I, Marrano P, Thorner P, Zielenska M, Squire JA: Chromosomal localization of DNA amplifications in neuroblastoma tumors using cDNA microarray comparative genomic hybridization. Neoplasia 2003, 5:53–62. 37. Caspersson T, Lindsten J, Lomakka G, Moller A, Zech L: The use of fluorescence techniques for the recognition of mammalian chromosomes and chromosome regions. Int Rev Exp Pathol 1972, 11:1–72. 38. Rowley JD: Letter: A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 1973, 243:290–293. 39. Nowell PC, Hungerford DA: Chromosome studies on normal and leukemic human leukocytes. J. Natl. Cancer Inst. 1960, 25:85–109. 40. Barnes WM: PCR amplification of up to 35-kb DNA with high fidelity and high yield from lambda bacteriophage templates. Proc. Natl. Acad. Sci. U.S.A. 1994, 91:2216–2220. 41. Vasickova P, Machackova E, Lukesova M, Damborsky J, Horky O, Pavlu H, Kuklova J, Kosinova V, Navratilova M, Foretova L: High occurrence of BRCA1 intragenic  189 rearrangements in hereditary breast and ovarian cancer syndrome in the Czech Republic. BMC Med. Genet. 2007, 8:3210.1186/1471-2350-8-32. 42. Buongiorno-Nardelli M, Amaldi F: Autoradiographic detection of molecular hybrids between RNA and DNA in tissue sections. Nature 1970, 225:946–948. 43. Speicher MR, Carter NP: The new cytogenetics: blurring the boundaries with molecular biology. Nat. Rev. Genet. 2005, 6:782–79210.1038/nrg1692. 44. Patel AS, Hawkins AL, Griffin CA: Cytogenetics and cancer. Curr Opin Oncol 2000, 12:62–67. 45. Speicher MR, Gwyn Ballard S, Ward DC: Karyotyping human chromosomes by combinatorial multi-fluor FISH. Nat. Genet. 1996, 12:368–37510.1038/ng0496-368. 46. Schröck E, du Manoir S, Veldman T, Schoell B, Wienberg J, Ferguson-Smith MA, Ning Y, Ledbetter DH, Bar-Am I, Soenksen D, Garini Y, Ried T: Multicolor spectral karyotyping of human chromosomes. Science 1996, 273:494–497. 47. Tanke HJ, Wiegant J, van Gijlswijk RP, Bezrookove V, Pattenier H, Heetebrij RJ, Talman EG, Raap AK, Vrolijk J: New strategy for multi-colour fluorescence in situ hybridisation: COBRA: COmbined Binary RAtio labelling. Eur. J. Hum. Genet. 1999, 7:2–1110.1038/sj.ejhg.5200265. 48. Fujiwara H, Emi M, Nagai H, Ohgaki K, Imoto I, Akimoto M, Ogawa O, Habuchi T: Definition of a 1-Mb homozygous deletion at 9q32-q33 in a human bladder-cancer cell line. J. Hum. Genet. 2001, 46:372–37710.1007/s100380170056. 49. Henderson L-J, Okamoto I, Lestou VS, Ludkovski O, Robichaud M, Chhanabhai M, Gascoyne RD, Klasa RJ, Connors JM, Marra MA, Horsman DE, Lam WL: Delineation of a minimal region of deletion at 6q16.3 in follicular lymphoma and construction of a bacterial artificial chromosome contig spanning a 6-megabase region of 6q16-q21. Genes Chromosomes Cancer 2004, 40:60–6510.1002/gcc.20013. 50. Huang H, Qian C, Jenkins RB, Smith DI: Fish mapping of YAC clones at human chromosomal band 7q31.2: identification of YACS spanning FRA7G within the common region of LOH in breast and prostate cancer. Genes Chromosomes Cancer 1998, 21:152–159. 51. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, Pinkel D: Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 1992, 258:818–821. 52. Mantripragada KK, Buckley PG, Diaz de Ståhl T, Dumanski JP: Genomic microarrays in the spotlight. Trends Genet. 2004, 20:87–94. 53. Carter NP: Methods and strategies for analyzing copy number variation using DNA microarrays. Nat. Genet. 2007, 39:S16–2110.1038/ng2028.  190 54. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, Collins C, Kuo WL, Chen C, Zhai Y, Dairkee SH, Ljung BM, Gray JW, Albertson DG: High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet. 1998, 20:207–21110.1038/2524. 55. Buckley PG, Mantripragada KK, Benetkiewicz M, Tapia-Páez I, Diaz De Ståhl T, Rosenquist M, Ali H, Jarbo C, De Bustos C, Hirvelä C, Sinder Wilén B, Fransson I, Thyr C, Johnsson B-I, Bruder CEG, Menzel U, Hergersberg M, Mandahl N, Blennow E, Wedell A, Beare DM, Collins JE, Dunham I, Albertson D, Pinkel D, Bastian BC, Faruqi AF, Lasken RS, Ichimura K, Collins VP, et al.: A full-coverage, high-resolution human chromosome 22 genomic microarray for clinical and research applications. Hum. Mol. Genet. 2002, 11:3221–3229. 56. Buckley PG, Mantripragada KK, Piotrowski A, Diaz de Ståhl T, Dumanski JP: Copy- number polymorphisms: mining the tip of an iceberg. Trends Genet. 2005, 21:315– 31710.1016/j.tig.2005.04.007. 57. Krzywinski M, Bosdet I, Smailus D, Chiu R, Mathewson C, Wye N, Barber S, Brown- John M, Chan S, Chand S, Cloutier A, Girn N, Lee D, Masson A, Mayo M, Olson T, Pandoh P, Prabhu A-L, Schoenmakers E, Tsai M, Albertson D, Lam W, Choy C-O, Osoegawa K, Zhao S, de Jong PJ, Schein J, Jones S, Marra MA: A set of BAC clones spanning the human genome. Nucleic Acids Res. 2004, 32:3651–366010.1093/nar/gkh700. 58. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, Snijders A, Albertson DG, Pinkel D, Marra MA, Ling V, MacAulay C, Lam WL: A tiling resolution DNA microarray with complete coverage of the human genome. Nat. Genet. 2004, 36:299–30310.1038/ng1307. 59. Inazawa J, Inoue J, Imoto I: Comparative genomic hybridization (CGH)-arrays pave the way for identification of novel cancer-related genes. Cancer Sci. 2004, 95:559–563. 60. De Lellis L, Curia MC, Aceto GM, Toracchio S, Colucci G, Russo A, Mariani-Costantini R, Cama A: Analysis of extended genomic rearrangements in oncological research. Ann. Oncol. 2007, 18 Suppl 6:vi173–17810.1093/annonc/mdm251. 61. Gunderson KL, Steemers FJ, Lee G, Mendoza LG, Chee MS: A genome-wide scalable SNP genotyping assay using microarray technology. Nat. Genet. 2005, 37:549– 55410.1038/ng1547. 62. Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, Grigorova M, Jones KW, Wei W, Stratton MR, Futreal PA, Weber B, Shapero MH, Wooster R: High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res. 2004, 14:287–29510.1101/gr.2012304. 63. Heinrichs S, Look AT: Identification of structural aberrations in cancer by SNP array analysis. Genome Biol. 2007, 8:21910.1186/gb-2007-8-7-219.  191 64. LaFramboise T, Weir BA, Zhao X, Beroukhim R, Li C, Harrington D, Sellers WR, Meyerson M: Allele-specific amplification in cancer revealed by SNP array analysis. PLoS Comput. Biol. 2005, 1:e6510.1371/journal.pcbi.0010065. 65. Baross A, Delaney AD, Li HI, Nayar T, Flibotte S, Qian H, Chan SY, Asano J, Ally A, Cao M, Birch P, Brown-John M, Fernandes N, Go A, Kennedy G, Langlois S, Eydoux P, Friedman JM, Marra MA: Assessment of algorithms for high throughput detection of genomic copy number variation in oligonucleotide microarray data. BMC Bioinformatics 2007, 8:36810.1186/1471-2105-8-368. 66. Wang K, Diskin SJ, Zhang H, Attiyeh EF, Winter C, Hou C, Schnepp RW, Diamond M, Bosse K, Mayes PA, Glessner J, Kim C, Frackelton E, Garris M, Wang Q, Glaberson W, Chiavacci R, Nguyen L, Jagannathan J, Saeki N, Sasaki H, Grant SFA, Iolascon A, Mosse YP, Cole KA, Li H, Devoto M, McGrady PW, London WB, Capasso M, et al.: Integrative genomics identifies LMO1 as a neuroblastoma oncogene. Nature 2011, 469:216– 22010.1038/nature09609. 67. Reid C: Company Profile: Complete Genomics Inc. Future Oncology 2011, 7:219– 22110.2217/fon.10.173. 68. Sanger F, Nicklen S, Coulson AR: DNA sequencing with chain-terminating inhibitors. Proc. Natl. Acad. Sci. U.S.A 1977, 74:5463–5467. 69. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C, Edkins S, O‘Meara S, Vastrik I, Schmidt EE, Avis T, Barthorpe S, Bhamra G, Buck G, Choudhury B, Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Jenkinson A, Jones D, et al.: Patterns of somatic mutation in human cancer genomes. Nature 2007, 446:153– 15810.1038/nature05610. 70. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen Y-J, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Goodwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer MLI, Jarvie TP, Jirage KB, Kim J-B, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, et al.: Genome Sequencing in Open Microfabricated High Density Picoliter Reactors. Nature 2005, 437:376–38010.1038/nature03959. 71. Tawfik DS, Griffiths AD: Man-made cell-like compartments for molecular evolution. Nat. Biotechnol 1998, 16:652–65610.1038/nbt0798-652. 72. Rothberg JM, Hinz W, Rearick TM, Schultz J, Mileski W, Davey M, Leamon JH, Johnson K, Milgrew MJ, Edwards M, Hoon J, Simons JF, Marran D, Myers JW, Davidson JF, Branting A, Nobile JR, Puc BP, Light D, Clark TA, Huber M, Branciforte JT, Stoner IB, Cawley SE, Lyons M, Fu Y, Homer N, Sedova M, Miao X, Reed B, et al.: An integrated semiconductor device enabling non-optical genome sequencing. Nature 2011, 475:348– 35210.1038/nature10242.  192 73. Bennett ST, Barnes C, Cox A, Davies L, Brown C: Toward the $1000 human genome. Pharmacogenomics 2005, 6:373–38210.1517/14622416.6.4.373. 74. Bentley DR: Whole-genome re-sequencing. Curr. Opin. Genet. Dev 2006, 16:545– 55210.1016/j.gde.2006.10.009. 75. Braslavsky I, Hebert B, Kartalov E, Quake SR: Sequence information can be obtained from single DNA molecules. Proc Natl Acad Sci U S A 2003, 100:3960– 396410.1073/pnas.0230489100. 76. Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, Peluso P, Rank D, Baybayan P, Bettman B, Bibillo A, Bjornson K, Chaudhuri B, Christians F, Cicero R, Clark S, Dalal R, Dewinter A, Dixon J, Foquet M, Gaertner A, Hardenbol P, Heiner C, Hester K, Holden D, Kearns G, Kong X, Kuse R, Lacroix Y, Lin S, et al.: Real-time DNA sequencing from single polymerase molecules. Science 2009, 323:133–13810.1126/science.1162986. 77. Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C, Connell CR, Heiner C, Kent SB, Hood LE: Fluorescence detection in automated DNA sequence analysis. Nature 1986, 321:674–67910.1038/321674a0. 78. Rosenblum BB, Lee LG, Spurgeon SL, Khan SH, Menchen SM, Heiner CR, Chen SM: New dye-labeled terminators for improved DNA sequencing patterns. Nucleic Acids Research 1997, 25:4500 –450410.1093/nar/25.22.4500. 79. Dames S, Durtschi J, Geiersbach K, Stephens J, Voelkerding KV: Comparison of the Illumina Genome Analyzer and Roche 454 GS FLX for Resequencing of Hypertrophic Cardiomyopathy-Associated Genes. J Biomol Tech 2010, 21:73–80. 80. Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, Braslavsky I, Causey M, Colonell J, Dimeo J, Efcavitch JW, Giladi E, Gill J, Healy J, Jarosz M, Lapen D, Moulton K, Quake SR, Steinmann K, Thayer E, Tyurina A, Ward R, Weiss H, Xie Z: Single-molecule DNA sequencing of a viral genome. Science 2008, 320:106–10910.1126/science.1150427. 81. Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, Wang MD, Zhang K, Mitra RD, Church GM: Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome. Science 2005, 309:1728 –173210.1126/science.1117389. 82. Wang T-L, Maierhofer C, Speicher MR, Lengauer C, Vogelstein B, Kinzler KW, Velculescu VE: Digital karyotyping. Proc. Natl. Acad. Sci. U.S.A 2002, 99:16156– 1616110.1073/pnas.202610899. 83. Velculescu VE, Zhang L, Vogelstein B, Kinzler KW: Serial analysis of gene expression. Science 1995, 270:484–487. 84. Parrett TJ, Yan H: Digital karyotyping technology: exploring the cancer genome. Expert Rev. Mol. Diagn 2005, 5:917–92510.1586/14737159.5.6.917.  193 85. Salani R, Chang C-L, Cope L, Wang T-L: Digital karyotyping: an update of its applications in cancer. Mol Diagn Ther 2006, 10:231–237. 86. Volik S, Raphael BJ, Huang G, Stratton MR, Bignel G, Murnane J, Brebner JH, Bajsarowicz K, Paris PL, Tao Q, Kowbel D, Lapuk A, Shagin DA, Shagina IA, Gray JW, Cheng J-F, de Jong PJ, Pevzner P, Collins C: Decoding the fine-scale structure of a breast cancer genome and transcriptome. Genome Res 2006, 16:394–40410.1101/gr.4247306. 87. Volik S, Zhao S, Chin K, Brebner JH, Herndon DR, Tao Q, Kowbel D, Huang G, Lapuk A, Kuo W-L, Magrane G, De Jong P, Gray JW, Collins C: End-sequence profiling: sequence-based analysis of aberrant genomes. Proc. Natl. Acad. Sci. U.S.A 2003, 100:7696–770110.1073/pnas.1232418100. 88. Krzywinski M, Bosdet I, Mathewson C, Wye N, Brebner J, Chiu R, Corbett R, Field M, Lee D, Pugh T, Volik S, Siddiqui A, Jones S, Schein J, Collins C, Marra M: A BAC clone fingerprinting approach to the detection of human genome rearrangements. Genome Biol 2007, 8:R22410.1186/gb-2007-8-10-r224. 89. Collins FS, Barker AD: Mapping the cancer genome. Pinpointing the genes involved in cancer will help chart a new course across the complex landscape of human malignancies. Sci. Am 2007, 296:50–57. 90. Dickson D: Wellcome funds cancer database. Nature 1999, 401:72910.1038/44413. 91. The Cancer Genome Atlas Research Network: Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 2008, 455:1061–106810.1038/nature07385. 92. Parsons DW, Jones S, Zhang X, Lin JC-H, Leary RJ, Angenendt P, Mankoo P, Carter H, Siu I-M, Gallia GL, Olivi A, McLendon R, Rasheed BA, Keir S, Nikolskaya T, Nikolsky Y, Busam DA, Tekleab H, Diaz LA, Hartigan J, Smith DR, Strausberg RL, Marie SKN, Shinjo SMO, Yan H, Riggins GJ, Bigner DD, Karchin R, Papadopoulos N, Parmigiani G, et al.: An integrated genomic analysis of human glioblastoma multiforme. Science 2008, 321:1807–181210.1126/science.1164382. 93. Barretina J, Taylor BS, Banerji S, Ramos AH, Lagos-Quintana M, Decarolis PL, Shah K, Socci ND, Weir BA, Ho A, Chiang DY, Reva B, Mermel CH, Getz G, Antipin Y, Beroukhim R, Major JE, Hatton C, Nicoletti R, Hanna M, Sharpe T, Fennell TJ, Cibulskis K, Onofrio RC, Saito T, Shukla N, Lau C, Nelander S, Silver SJ, Sougnez C, et al.: Subtype-specific genomic alterations define new targets for soft-tissue sarcoma therapy. Nat. Genet 2010, 42:715–72110.1038/ng.619. 94. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan MD, Cibulskis K, Sougnez C, Greulich H, Muzny DM, Morgan MB, Fulton L, Fulton RS, Zhang Q, Wendl MC, Lawrence MS, Larson DE, Chen K, Dooling DJ, Sabo A, Hawes AC, Shen H, Jhangiani SN, Lewis LR, Hall O, Zhu Y, Mathew T, Ren Y, Yao J, Scherer SE, Clerc K, et al.: Somatic mutations affect key pathways in lung adenocarcinoma. Nature 2008, 455:1069– 107510.1038/nature07423.  194 95. Zhang J, Mullighan CG, Harvey RC, Wu G, Chen X, Edmonson M, Buetow KH, Carroll WL, Chen I-M, Devidas M, Gerhard DS, Loh ML, Reaman GH, Relling MV, Camitta BM, Bowman WP, Smith MA, Willman CL, Downing JR, Hunger SP: Key pathways are frequently mutated in high risk childhood acute lymphoblastic leukemia: a report from the Children’s Oncology Group. Blood 2011, 10.1182/blood-2011-03-341412Available: http://www.ncbi.nlm.nih.gov/pubmed/21680795.Accessed 27 June 2011. 96. Sjöblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ, Ptak J, Silliman N, Szabo S, Buckhaults P, Farrell C, Meeh P, Markowitz SD, Willis J, Dawson D, Willson JKV, Gazdar AF, Hartigan J, Wu L, Liu C, Parmigiani G, Park BH, Bachman KE, Papadopoulos N, Vogelstein B, Kinzler KW, Velculescu VE: The consensus coding sequences of human breast and colorectal cancers. Science 2006, 314:268– 27410.1126/science.1133427. 97. Wood LD, Parsons DW, Jones S, Lin J, Sjöblom T, Leary RJ, Shen D, Boca SM, Barber T, Ptak J, Silliman N, Szabo S, Dezso Z, Ustyanksky V, Nikolskaya T, Nikolsky Y, Karchin R, Wilson PA, Kaminker JS, Zhang Z, Croshaw R, Willis J, Dawson D, Shipitsin M, Willson JKV, Sukumar S, Polyak K, Park BH, Pethiyagoda CL, Pant PVK, et al.: The genomic landscapes of human breast and colorectal cancers. Science 2007, 318:1108– 111310.1126/science.1145720. 98. The Cancer Genome Atlas Research Network: Integrated genomic analyses of ovarian carcinoma. Nature 2011, 474:609–61510.1038/nature10166. 99. Parsons DW, Li M, Zhang X, Jones S, Leary RJ, Lin JC-H, Boca SM, Carter H, Samayoa J, Bettegowda C, Gallia GL, Jallo GI, Binder ZA, Nikolsky Y, Hartigan J, Smith DR, Gerhard DS, Fults DW, VandenBerg S, Berger MS, Marie SKN, Shinjo SMO, Clara C, Phillips PC, Minturn JE, Biegel JA, Judkins AR, Resnick AC, Storm PB, Curran T, et al.: The genetic landscape of the childhood cancer medulloblastoma. Science 2011, 331:435– 43910.1126/science.1198056. 100. Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, Jia M, Shepherd R, Leung K, Menzies A, Teague JW, Campbell PJ, Stratton MR, Futreal PA: COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res 2011, 39:D945–95010.1093/nar/gkq929. 101. Mardis ER, Ding L, Dooling DJ, Larson DE, McLellan MD, Chen K, Koboldt DC, Fulton RS, Delehaunty KD, McGrath SD, Fulton LA, Locke DP, Magrini VJ, Abbott RM, Vickery TL, Reed JS, Robinson JS, Wylie T, Smith SM, Carmichael L, Eldred JM, Harris CC, Walker J, Peck JB, Du F, Dukes AF, Sanderson GE, Brummett AM, Clark E, McMichael JF, et al.: Recurring mutations found by sequencing an acute myeloid leukemia genome. N. Engl. J. Med 2009, 361:1058–106610.1056/NEJMoa0903840. 102. Morin RD, Johnson NA, Severson TM, Mungall AJ, An J, Goya R, Paul JE, Boyle M, Woolcock BW, Kuchenbauer F, Yap D, Humphries RK, Griffith OL, Shah S, Zhu H, Kimbara M, Shashkin P, Charlot JF, Tcherpakov M, Corbett R, Tam A, Varhol R, Smailus D, Moksa M, Zhao Y, Delaney A, Qian H, Birol I, Schein J, Moore R, et al.: Somatic  195 mutation of EZH2 (Y641) in Follicular and Diffuse Large B-cell Lymphomas of Germinal Center Origin. Nat Genet 2010, 42:181–18510.1038/ng.518. 103. Shah SP, Köbel M, Senz J, Morin RD, Clarke BA, Wiegand KC, Leung G, Zayed A, Mehl E, Kalloger SE, Sun M, Giuliany R, Yorida E, Jones S, Varhol R, Swenerton KD, Miller D, Clement PB, Crane C, Madore J, Provencher D, Leung P, DeFazio A, Khattra J, Turashvili G, Zhao Y, Zeng T, Glover JNM, Vanderhyden B, Zhao C, et al.: Mutation of FOXL2 in granulosa-cell tumors of the ovary. N. Engl. J. Med 2009, 360:2719– 272910.1056/NEJMoa0902542. 104. Thomas RK, Nickerson E, Simons JF, Janne PA, Tengs T, Yuza Y, Garraway LA, LaFramboise T, Lee JC, Shah K, O‘Neill K, Sasaki H, Lindeman N, Wong K-K, Borras AM, Gutmann EJ, Dragnev KH, DeBiasi R, Chen T-H, Glatt KA, Greulich H, Desany B, Lubeski CK, Brockman W, Alvarez P, Hutchison SK, Leamon JH, Ronan MT, Turenchalk GS, Egholm M, et al.: Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencing. Nat Med 2006, 12:852– 85510.1038/nm1437. 105. Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, Dooling D, Dunford- Shore BH, McGrath S, Hickenbotham M, Cook L, Abbott R, Larson DE, Koboldt DC, Pohl C, Smith S, Hawkins A, Abbott S, Locke D, Hillier LW, Miner T, Fulton L, Magrini V, Wylie T, Glasscock J, Conyers J, Sander N, Shi X, Osborne JR, Minx P, et al.: DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 2008, 456:66–7210.1038/nature07485. 106. Morin RD, Mendez-Lago M, Mungall AJ, Goya R, Mungall KL, Corbett RD, Johnson NA, Severson TM, Chiu R, Field M, Jackman S, Krzywinski M, Scott DW, Trinh DL, Tamura-Wells J, Li S, Firme MR, Rogic S, Griffith M, Chan S, Yakovenko O, Meyer IM, Zhao EY, Smailus D, Moksa M, Chittaranjan S, Rimsza L, Brooks-Wilson A, Spinelli JJ, Ben-Neriah S, et al.: Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature 2011, 476:298–30310.1038/nature10351. 107. Berger MF, Lawrence MS, Demichelis F, Drier Y, Cibulskis K, Sivachenko AY, Sboner A, Esgueva R, Pflueger D, Sougnez C, Onofrio R, Carter SL, Park K, Habegger L, Ambrogio L, Fennell T, Parkin M, Saksena G, Voet D, Ramos AH, Pugh TJ, Wilkinson J, Fisher S, Winckler W, Mahan S, Ardlie K, Baldwin J, Simons JW, Kitabayashi N, MacDonald TY, et al.: The genomic complexity of primary human prostate cancer. Nature 2011, 470:214– 22010.1038/nature09744. 108. Chapman MA, Lawrence MS, Keats JJ, Cibulskis K, Sougnez C, Schinzel AC, Harview CL, Brunet J-P, Ahmann GJ, Adli M, Anderson KC, Ardlie KG, Auclair D, Baker A, Bergsagel PL, Bernstein BE, Drier Y, Fonseca R, Gabriel SB, Hofmeister CC, Jagannath S, Jakubowiak AJ, Krishnan A, Levy J, Liefeld T, Lonial S, Mahan S, Mfuko B, Monti S, Perkins LM, et al.: Initial genome sequencing and analysis of multiple myeloma. Nature 2011, 471:467–47210.1038/nature09837.  196 109. Jones SJ, Laskin J, Li YY, Griffith OL, An J, Bilenky M, Butterfield YS, Cezard T, Chuah E, Corbett R, Fejes AP, Griffith M, Yee J, Martin M, Mayo M, Melnyk N, Morin RD, Pugh TJ, Severson T, Shah SP, Sutcliffe M, Tam A, Terry J, Thiessen N, Thomson T, Varhol R, Zeng T, Zhao Y, Moore RA, Huntsman DG, et al.: Evolution of an adenocarcinoma in response to selection by targeted kinase inhibitors. Genome Biol 2010, 11:R8210.1186/gb-2010-11-8-r82. 110. Lee W, Jiang Z, Liu J, Haverty PM, Guan Y, Stinson J, Yue P, Zhang Y, Pant KP, Bhatt D, Ha C, Johnson S, Kennemer MI, Mohan S, Nazarenko I, Watanabe C, Sparks AB, Shames DS, Gentleman R, de Sauvage FJ, Stern H, Pandita A, Ballinger DG, Drmanac R, Modrusan Z, Seshagiri S, Zhang Z: The mutation spectrum revealed by paired genome sequences from a lung cancer patient. Nature 2010, 465:473–47710.1038/nature09004. 111. Pleasance ED, Stephens PJ, O‘Meara S, McBride DJ, Meynert A, Jones D, Lin M-L, Beare D, Lau KW, Greenman C, Varela I, Nik-Zainal S, Davies HR, Ordoñez GR, Mudie LJ, Latimer C, Edkins S, Stebbings L, Chen L, Jia M, Leroy C, Marshall J, Menzies A, Butler A, Teague JW, Mangion J, Sun YA, McLaughlin SF, Peckham HE, Tsung EF, et al.: A small- cell lung cancer genome with complex signatures of tobacco exposure. Nature 2010, 463:184–19010.1038/nature08629. 112. Pleasance ED, Cheetham RK, Stephens PJ, McBride DJ, Humphray SJ, Greenman CD, Varela I, Lin M-L, Ordóñez GR, Bignell GR, Ye K, Alipaz J, Bauer MJ, Beare D, Butler A, Carter RJ, Chen L, Cox AJ, Edkins S, Kokko-Gonzales PI, Gormley NA, Grocock RJ, Haudenschild CD, Hims MM, James T, Jia M, Kingsbury Z, Leroy C, Marshall J, Menzies A, et al.: A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 2010, 463:191–19610.1038/nature08658. 113. Puente XS, Pinyol M, Quesada V, Conde L, Ordóñez GR, Villamor N, Escaramis G, Jares P, Beà S, González-Díaz M, Bassaganyas L, Baumann T, Juan M, López-Guerra M, Colomer D, Tubío JMC, López C, Navarro A, Tornador C, Aymerich M, Rozman M, Hernández JM, Puente DA, Freije JMP, Velasco G, Gutiérrez-Fernández A, Costa D, Carrió A, Guijarro S, Enjuanes A, et al.: Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature 2011, 475:101– 10510.1038/nature10113. 114. Shah SP, Morin RD, Khattra J, Prentice L, Pugh T, Burleigh A, Delaney A, Gelmon K, Guliany R, Senz J, Steidl C, Holt RA, Jones S, Sun M, Leung G, Moore R, Severson T, Taylor GA, Teschendorff AE, Tse K, Turashvili G, Varhol R, Warren RL, Watson P, Zhao Y, Caldas C, Huntsman D, Hirst M, Marra MA, Aparicio S: Mutational evolution in a lobular breast tumour profiled at single nucleotide resolution. Nature 2009, 461:809– 81310.1038/nature08489. 115. Hudson TJ, Anderson W, Artez A, Barker AD, Bell C, Bernabé RR, Bhan MK, Calvo F, Eerola I, Gerhard DS, Guttmacher A, Guyer M, Hemsley FM, Jennings JL, Kerr D, Klatt P, Kolar P, Kusada J, Lane DP, Laplace F, Youyong L, Nettekoven G, Ozenberger B, Peterson J, Rao TS, Remacle J, Schafer AJ, Shibata T, Stratton MR, Vockley JG, et al.: International network of cancer genome projects. Nature 2010, 464:993–99810.1038/nature08987.  197 116. Meyerson M, Gabriel S, Getz G: Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet 2010, 11:685–69610.1038/nrg2841. 117. Ajay SS, Parker SCJ, Abaan HO, Fajardo KVF, Margulies EH: Accurate and comprehensive sequencing of personal genomes. Genome Res. 2011, 21:1498– 150510.1101/gr.123638.111. 118. Mamanova L, Coffey AJ, Scott CE, Kozarewa I, Turner EH, Kumar A, Howard E, Shendure J, Turner DJ: Target-enrichment strategies for next-generation sequencing. Nat. Methods 2010, 7:111–11810.1038/nmeth.1419. 119. Comino-Méndez I, Gracia-Aznárez FJ, Schiavi F, Landa I, Leandro-García LJ, Letón R, Honrado E, Ramos-Medina R, Caronia D, Pita G, Gómez-Graña A, de Cubas AA, Inglada- Pérez L, Maliszewska A, Taschin E, Bobisse S, Pica G, Loli P, Hernández-Lavado R, Díaz JA, Gómez-Morales M, González-Neira A, Roncador G, Rodríguez-Antona C, Benítez J, Mannelli M, Opocher G, Robledo M, Cascón A: Exome sequencing identifies MAX mutations as a cause of hereditary pheochromocytoma. Nat Genet 2011, 10.1038/ng.861Available: http://www.ncbi.nlm.nih.gov/pubmed/21685915.Accessed 27 June 2011. 120. Tiacci E, Trifonov V, Schiavoni G, Holmes A, Kern W, Martelli MP, Pucciarini A, Bigerna B, Pacini R, Wells VA, Sportoletti P, Pettirossi V, Mannucci R, Elliott O, Liso A, Ambrosetti A, Pulsoni A, Forconi F, Trentin L, Semenzato G, Inghirami G, Capponi M, Di Raimondo F, Patti C, Arcaini L, Musto P, Pileri S, Haferlach C, Schnittger S, Pizzolo G, et al.: BRAF mutations in hairy-cell leukemia. N. Engl. J. Med 2011, 364:2305– 231510.1056/NEJMoa1014209. 121. Totoki Y, Tatsuno K, Yamamoto S, Arai Y, Hosoda F, Ishikawa S, Tsutsumi S, Sonoda K, Totsuka H, Shirakihara T, Sakamoto H, Wang L, Ojima H, Shimada K, Kosuge T, Okusaka T, Kato K, Kusuda J, Yoshida T, Aburatani H, Shibata T: High-resolution characterization of a hepatocellular carcinoma genome. Nat. Genet 2011, 43:464– 46910.1038/ng.804. 122. Varela I, Tarpey P, Raine K, Huang D, Ong CK, Stephens P, Davies H, Jones D, Lin M- L, Teague J, Bignell G, Butler A, Cho J, Dalgliesh GL, Galappaththige D, Greenman C, Hardy C, Jia M, Latimer C, Lau KW, Marshall J, McLaren S, Menzies A, Mudie L, Stebbings L, Largaespada DA, Wessels LFA, Richard S, Kahnoski RJ, Anema J, et al.: Exome sequencing identifies frequent mutation of the SWI/SNF complex gene PBRM1 in renal carcinoma. Nature 2011, 469:539–54210.1038/nature09639. 123. Yan X-J, Xu J, Gu Z-H, Pan C-M, Lu G, Shen Y, Shi J-Y, Zhu Y-M, Tang L, Zhang X- W, Liang W-X, Mi J-Q, Song H-D, Li K-Q, Chen Z, Chen S-J: Exome sequencing identifies somatic mutations of DNA methyltransferase gene DNMT3A in acute monocytic leukemia. Nat. Genet 2011, 43:309–31510.1038/ng.788. 124. Schena M, Shalon D, Davis RW, Brown PO: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 1995, 270:467–470.  198 125. Pozhitkov AE, Tautz D, Noble PA: Oligonucleotide microarrays: widely applied— poorly understood. Briefings in Functional Genomics & Proteomics 2007, 6:141 – 14810.1093/bfgp/elm014. 126. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, Bloomfield CD, Lander ES: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999, 286:531–537. 127. Balmain A: Cancer genetics: from Boveri and Mendel to microarrays. Nat. Rev. Cancer 2001, 1:77–8210.1038/35094086. 128. Perez-Diez A, Morgun A, Shulzhenko N: Microarrays for cancer diagnosis and classification. Adv. Exp. Med. Biol. 2007, 593:74–8510.1007/978-0-387-39978-2_8. 129. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD, Armitage JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000, 403:503–51110.1038/35000501. 130. Sørlie T, Perou CM, Tibshirani R, Aas T, Geisler S, Johnsen H, Hastie T, Eisen MB, van de Rijn M, Jeffrey SS, Thorsen T, Quist H, Matese JC, Brown PO, Botstein D, Lønning PE, Børresen-Dale A-L: Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences 2001, 98:10869 –1087410.1073/pnas.191367098. 131. Kunz G: Use of a genomic test (MammaPrint TM ) in daily clinical practice to assist in risk stratification of young breast cancer patients. Arch. Gynecol. Obstet 2011, 283:597–60210.1007/s00404-010-1454-9. 132. White NMA, Bao TT, Grigull J, Youssef YM, Girgis A, Diamandis M, Fatoohi E, Metias M, Honey RJ, Stewart R, Pace KT, Bjarnason GA, Yousef GM: miRNA profiling for clear cell renal cell carcinoma: biomarker discovery and identification of potential controls and consequences of miRNA dysregulation. J. Urol. 2011, 186:1077– 108310.1016/j.juro.2011.04.110. 133. Griffith M, Tang MJ, Griffith OL, Morin RD, Chan SY, Asano JK, Zeng T, Flibotte S, Ally A, Baross A, Hirst M, Jones SJM, Morin GB, Tai IT, Marra MA: ALEXA: a microarray design platform for alternative expression analysis. Nat. Methods 2008, 5:11810.1038/nmeth0208-118. 134. Brenner S, Johnson M, Bridgham J, Golda G, Lloyd DH, Johnson D, Luo S, McCurdy S, Foy M, Ewan M, Roth R, George D, Eletr S, Albrecht G, Vermaas E, Williams SR, Moon K, Burcham T, Pallas M, DuBridge RB, Kirchner J, Fearon K, Mao J, Corcoran K: Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 2000, 18:630–63410.1038/76469.  199 135. Matsumura H, Reich S, Ito A, Saitoh H, Kamoun S, Winter P, Kahl G, Reuter M, Kruger DH, Terauchi R: Gene expression analysis of plant host-pathogen interactions by SuperSAGE. Proc. Natl. Acad. Sci. U.S.A 2003, 100:15718– 1572310.1073/pnas.2536670100. 136. Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ, Vogelstein B, Kinzler KW, Velculescu VE: Using the transcriptome to annotate the genome. Nat. Biotechnol 2002, 20:508–51210.1038/nbt0502-508. 137. Brenner S, Williams SR, Vermaas EH, Storck T, Moon K, McCollum C, Mao JI, Luo S, Kirchner JJ, Eletr S, DuBridge RB, Burcham T, Albrecht G: In vitro cloning of complex mixtures of DNA on microbeads: physical separation of differentially expressed cDNAs. Proc. Natl. Acad. Sci. U.S.A. 2000, 97:1665–1670. 138. Morozova O, Marra MA: Applications of next-generation sequencing technologies in functional genomics. Genomics 2008, 92:255–26410.1016/j.ygeno.2008.07.001. 139. Audic S, Claverie JM: The significance of digital gene expression profiles. Genome Res. 1997, 7:986–995. 140. Wang SM: Understanding SAGE data. Trends Genet. 2007, 23:42– 5010.1016/j.tig.2006.11.001. 141. Nielsen KL, Høgh AL, Emmersen J: DeepSAGE--digital transcriptomics with high sensitivity, simple experimental protocol and multiplexing of samples. Nucleic Acids Res 2006, 34:e13310.1093/nar/gkl714. 142. Morrissy AS, Morin RD, Delaney A, Zeng T, McDonald H, Jones S, Zhao Y, Hirst M, Marra MA: Next-generation tag sequencing for cancer gene expression profiling. Genome Res 2009, 19:1825–183510.1101/gr.094482.109. 143. Gowda M, Li H, Alessi J, Chen F, Pratt R, Wang G-L: Robust analysis of 5’- transcript ends (5’-RATE): a novel technique for transcriptome analysis and genome annotation. Nucleic Acids Res 2006, 34:e12610.1093/nar/gkl522. 144. Lal A, Lash AE, Altschul SF, Velculescu V, Zhang L, McLendon RE, Marra MA, Prange C, Morin PJ, Polyak K, Papadopoulos N, Vogelstein B, Kinzler KW, Strausberg RL, Riggins GJ: A Public Database for Gene Expression in Human Cancers. Cancer Research 1999, 59:5403 –5407. 145. Tsai C-C, Chung Y-D, Lee H-J, Chang W-H, Suzuku Y, Sugano S, Lin J-Y: Large- scale sequencing analysis of the full-length cDNA library of human hepatocellular carcinoma. J. Biomed. Sci 2003, 10:636–64310.1159/000073529. 146. Hillier LD, Lennon G, Becker M, Bonaldo MF, Chiapelli B, Chissoe S, Dietrich N, DuBuque T, Favello A, Gish W, Hawkins M, Hultman M, Kucaba T, Lacy M, Le M, Le N, Mardis E, Moore B, Morris M, Parsons J, Prange C, Rifkin L, Rohlfing T, Schellenberg K,  200 Marra M: Generation and analysis of 280,000 human expressed sequence tags. Genome Research 1996, 6:807 –82810.1101/gr.6.9.807. 147. Sun M, Zhou G, Lee S, Chen J, Shi RZ, Wang SM: SAGE is far more sensitive than EST for detecting low-abundance transcripts. BMC Genomics 2004, 5:110.1186/1471- 2164-5-1. 148. Morozova O, Hirst M, Marra MA: Applications of new sequencing technologies for transcriptome analysis. Annu Rev Genomics Hum Genet 2009, 10:135– 15110.1146/annurev-genom-082908-145957. 149. Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T, McDonald H, Varhol R, Jones S, Marra M: Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. BioTechniques 2008, 45:81– 9410.2144/000112900. 150. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 2008, 5:621– 62810.1038/nmeth.1226. 151. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 2008, 320:1344–134910.1126/science.1158441. 152. Costa V, Angelini C, De Feis I, Ciccodicola A: Uncovering the complexity of transcriptomes with RNA-Seq. J. Biomed. Biotechnol 2010, 2010:85391610.1155/2010/853916. 153. Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 2009, 10:57–6310.1038/nrg2484. 154. Griffith M, Griffith OL, Mwenifumbo J, Goya R, Morrissy AS, Morin RD, Corbett R, Tang MJ, Hou Y-C, Pugh TJ, Robertson G, Chittaranjan S, Ally A, Asano JK, Chan SY, Li HI, McDonald H, Teague K, Zhao Y, Zeng T, Delaney A, Hirst M, Morin GB, Jones SJM, Tai IT, Marra MA: Alternative expression analysis by RNA sequencing. Nat Meth 2010, 7:843–84710.1038/nmeth.1503. 155. Berger MF, Levin JZ, Vijayendran K, Sivachenko A, Adiconis X, Maguire J, Johnson LA, Robinson J, Verhaak RG, Sougnez C, Onofrio RC, Ziaugra L, Cibulskis K, Laine E, Barretina J, Winckler W, Fisher DE, Getz G, Meyerson M, Jaffe DB, Gabriel SB, Lander ES, Dummer R, Gnirke A, Nusbaum C, Garraway LA: Integrative analysis of the melanoma transcriptome. Genome Res. 2010, 20:413–42710.1101/gr.103697.109. 156. Pflueger D, Terry S, Sboner A, Habegger L, Esgueva R, Lin P-C, Svensson MA, Kitabayashi N, Moss BJ, MacDonald TY, Cao X, Barrette T, Tewari AK, Chee MS, Chinnaiyan AM, Rickman DS, Demichelis F, Gerstein MB, Rubin MA: Discovery of non- ETS gene fusions in human prostate cancer using next-generation RNA sequencing. Genome Res. 2011, 21:56–6710.1101/gr.110684.110.  201 157. Rosenberg BR, Hamilton CE, Mwangi MM, Dewell S, Papavasiliou FN: Transcriptome-wide sequencing reveals numerous APOBEC1 mRNA-editing targets in transcript 3’ UTRs. Nat. Struct. Mol. Biol. 2011, 18:230–23610.1038/nsmb.1975. 158. Rozowsky J, Abyzov A, Wang J, Alves P, Raha D, Harmanci A, Leng J, Bjornson R, Kong Y, Kitabayashi N, Bhardwaj N, Rubin M, Snyder M, Gerstein M: AlleleSeq: analysis of allele-specific expression and binding in a network framework. Mol. Syst. Biol. 2011, 7:52210.1038/msb.2011.54. 159. Wiegand KC, Shah SP, Al-Agha OM, Zhao Y, Tse K, Zeng T, Senz J, McConechy MK, Anglesio MS, Kalloger SE, Yang W, Heravi-Moussavi A, Giuliany R, Chow C, Fee J, Zayed A, Prentice L, Melnyk N, Turashvili G, Delaney AD, Madore J, Yip S, McPherson AW, Ha G, Bell L, Fereday S, Tam A, Galletta L, Tonin PN, Provencher D, et al.: ARID1A mutations in endometriosis-associated ovarian carcinomas. N. Engl. J. Med 2010, 363:1532–154310.1056/NEJMoa1008433. 160. Greif PA, Eck SH, Konstandin NP, Benet-Pagès A, Ksienzyk B, Dufour A, Vetter AT, Popp HD, Lorenz-Depiereux B, Meitinger T, Bohlander SK, Strom TM: Identification of recurring tumor-specific somatic mutations in acute myeloid leukemia by transcriptome sequencing. Leukemia 2011, 25:821–82710.1038/leu.2011.19. 161. Sugarbaker DJ, Richards WG, Gordon GJ, Dong L, De Rienzo A, Maulik G, Glickman JN, Chirieac LR, Hartman M-L, Taillon BE, Du L, Bouffard P, Kingsmore SF, Miller NA, Farmer AD, Jensen RV, Gullans SR, Bueno R: Transcriptome sequencing of malignant pleural mesothelioma tumors. Proc. Natl. Acad. Sci. U.S.A 2008, 105:3521– 352610.1073/pnas.0712399105. 162. Palanisamy N, Ateeq B, Kalyana-Sundaram S, Pflueger D, Ramnarayanan K, Shankar S, Han B, Cao Q, Cao X, Suleman K, Kumar-Sinha C, Dhanasekaran SM, Chen Y, Esgueva R, Banerjee S, LaFargue CJ, Siddiqui J, Demichelis F, Moeller P, Bismar TA, Kuefer R, Fullen DR, Johnson TM, Greenson JK, Giordano TJ, Tan P, Tomlins SA, Varambally S, Rubin MA, Maher CA, et al.: Rearrangements of the RAF kinase pathway in prostate cancer, gastric cancer and melanoma. Nat. Med 2010, 16:793–79810.1038/nm.2166. 163. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP: Integrative genomics viewer. Nat Biotech 2011, 29:24–2610.1038/nbt.1754. 164. Zhang J, Finney R, Edmonson M, Schaefer C, Rowe W, Yan C, Clifford R, Greenblum S, Wu G, Zhang H, Liu H, Nguyen C, Hu Y, Madhavan S, Ding L, Wheeler DA, Gerhard DS, Buetow KH: The Cancer Genome Workbench: Identifying and Visualizing Complex Genetic Alterations in Tumors. NCI Nature Pathway Interaction Database 2010, 10.1038/pid.2010.1Available: http://pid.nci.nih.gov/PID/2010/100309/full/pid.2010.1.shtml.Accessed 11 January 2012. 165. Sanborn JZ, Benz SC, Craft B, Szeto C, Kober KM, Meyer L, Vaske CJ, Goldman M, Smith KE, Kuhn RM, Karolchik D, Kent WJ, Stuart JM, Haussler D, Zhu J: The UCSC cancer genomics browser: update 2011. Nucleic Acids Research 2010, 39:D951– D95910.1093/nar/gkq1113.  202 166. Hogan LE, Meyer JA, Yang J, Wang J, Wong N, Yang W, Condos G, Hunger SP, Raetz E, Saffery R, Relling MV, Bhojwani D, Morrison DJ, Carroll WL: Integrated genomic analysis of relapsed childhood acute lymphoblastic leukemia reveals therapeutic strategies. Blood 2011, 118:5218–522610.1182/blood-2011-04-345595. 167. Cho Y-J, Tsherniak A, Tamayo P, Santagata S, Ligon A, Greulich H, Berhoukim R, Amani V, Goumnerova L, Eberhart CG, Lau CC, Olson JM, Gilbertson RJ, Gajjar A, Delattre O, Kool M, Ligon K, Meyerson M, Mesirov JP, Pomeroy SL: Integrative genomic analysis of medulloblastoma identifies a molecular subgroup that drives poor clinical outcome. J. Clin. Oncol. 2011, 29:1424–143010.1200/JCO.2010.28.5148. 168. Verhaak RGW, Hoadley KA, Purdom E, Wang V, Qi Y, Wilkerson MD, Miller CR, Ding L, Golub T, Mesirov JP, Alexe G, Lawrence M, O‘Kelly M, Tamayo P, Weir BA, Gabriel S, Winckler W, Gupta S, Jakkula L, Feiler HS, Hodgson JG, James CD, Sarkaria JN, Brennan C, Kahn A, Spellman PT, Wilson RK, Speed TP, Gray JW, Meyerson M, et al.: Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR, and NF1. Cancer Cell 2010, 17:98–11010.1016/j.ccr.2009.12.020. 169. Floratos A, Smith K, Ji Z, Watkinson J, Califano A: geWorkbench: an open source platform for integrative genomics. Bioinformatics 2010, 26:1779– 178010.1093/bioinformatics/btq282. 170. Zhang J, Benavente CA, McEvoy J, Flores-Otero J, Ding L, Chen X, Ulyanov A, Wu G, Wilson M, Wang J, Brennan R, Rusch M, Manning AL, Ma J, Easton J, Shurtleff S, Mullighan C, Pounds S, Mukatira S, Gupta P, Neale G, Zhao D, Lu C, Fulton RS, Fulton LL, Hong X, Dooling DJ, Ochoa K, Naeve C, Dyson NJ, et al.: A novel retinoblastoma therapy from genomic and epigenetic analyses. Nature 2012, advance online publication10.1038/nature10733Available: http://dx.doi.org/10.1038/nature10733.Accessed 22 January 2012. 171. Scotting PJ, Walker DA, Perilongo G: Childhood solid tumours: a developmental disorder. Nat Rev Cancer 2005, 5:481–48810.1038/nrc1633. 172. Goodman, Gurney, Smith, Olshan: Sympathetic nervous system tumors. Available: http://seer.cancer.gov/publications/childhood/.Accessed 5 March 2011. 173. Huber K: The sympathoadrenal cell lineage: Specification, diversification, and new perspectives. Developmental Biology 2006, 298:335–34316/j.ydbio.2006.07.010. 174. Maris JM: Recent advances in neuroblastoma. N. Engl. J. Med 2010, 362:2202– 221110.1056/NEJMra0804577. 175. Mohlin SA, Wigerup C, Påhlman S: Neuroblastoma aggressiveness in relation to sympathetic neuronal differentiation stage. Seminars in Cancer Biology 2011, 21:276– 28210.1016/j.semcancer.2011.09.002.  203 176. Alam G, Cui H, Shi H, Yang L, Ding J, Mao L, Maltese WA, Ding H-F: MYCN promotes the expansion of Phox2B-positive neuronal progenitors to drive neuroblastoma development. Am. J. Pathol. 2009, 175:856– 86610.2353/ajpath.2009.090019. 177. Joyner BD: Neuroblastoma: eMedicine Urology. 2010, Available: http://emedicine.medscape.com/article/439263-overview.Accessed 5 March 2011. 178. Brodeur GM: Neuroblastoma: biological insights into a clinical enigma. Nat. Rev. Cancer 2003, 3:203–21610.1038/nrc1014. 179. Mueller S, Matthay KK: Neuroblastoma: biology and staging. Curr Oncol Rep 2009, 11:431–438. 180. London WB, Castleberry RP, Matthay KK, Look AT, Seeger RC, Shimada H, Thorner P, Brodeur G, Maris JM, Reynolds CP, Cohn SL: Evidence for an age cutoff greater than 365 days for neuroblastoma risk group stratification in the Children’s Oncology Group. J. Clin. Oncol 2005, 23:6459–646510.1200/JCO.2005.05.571. 181. Brodeur GM, Pritchard J, Berthold F, Carlsen NL, Castel V, Castelberry RP, De Bernardi B, Evans AE, Favrot M, Hedborg F: Revisions of the international criteria for neuroblastoma diagnosis, staging, and response to treatment. J. Clin. Oncol 1993, 11:1466–1477. 182. Monclair T, Brodeur GM, Ambros PF, Brisse HJ, Cecchetto G, Holmes K, Kaneko M, London WB, Matthay KK, Nuchtern JG, von Schweinitz D, Simon T, Cohn SL, Pearson ADJ: The International Neuroblastoma Risk Group (INRG) Staging System: An INRG Task Force Report. Journal of Clinical Oncology 2009, 27:298 – 30310.1200/JCO.2008.16.6876. 183. Cohn SL, Pearson ADJ, London WB, Monclair T, Ambros PF, Brodeur GM, Faldum A, Hero B, Iehara T, Machin D, Mosseri V, Simon T, Garaventa A, Castel V, Matthay KK: The International Neuroblastoma Risk Group (INRG) Classification System: An INRG Task Force Report. Journal of Clinical Oncology 2009, 27:289 – 29710.1200/JCO.2008.16.6785. 184. Øra I, Eggert A: Progress in treatment and risk stratification of neuroblastoma: Impact on future clinical and basic research. Seminars in Cancer Biology 2011, 21:217– 22810.1016/j.semcancer.2011.07.002. 185. Yu AL, Gilman AL, Ozkaynak MF, London WB, Kreissman SG, Chen HX, Smith M, Anderson B, Villablanca JG, Matthay KK, Shimada H, Grupp SA, Seeger R, Reynolds CP, Buxton A, Reisfeld RA, Gillies SD, Cohn SL, Maris JM, Sondel PM: Anti-GD2 antibody with GM-CSF, interleukin-2, and isotretinoin for neuroblastoma. N. Engl. J. Med 2010, 363:1324–133410.1056/NEJMoa0911123. 186. Knudson AG, Strong LC: Mutation and cancer: neuroblastoma and pheochromocytoma. Am J Hum Genet 1972, 24:514–532.  204 187. Janoueix-Lerosey I, Schleiermacher G, Michels E, Mosseri V, Ribeiro A, Lequin D, Vermeulen J, Couturier J, Peuchmaur M, Valent A, Plantaz D, Rubie H, Valteau-Couanet D, Thomas C, Combaret V, Rousseau R, Eggert A, Michon J, Speleman F, Delattre O: Overall genomic pattern is a predictor of outcome in neuroblastoma. J. Clin. Oncol 2009, 27:1026–103310.1200/JCO.2008.16.0630. 188. Mossé YP, Laudenslager M, Longo L, Cole KA, Wood A, Attiyeh EF, Laquaglia MJ, Sennett R, Lynch JE, Perri P, Laureys G, Speleman F, Kim C, Hou C, Hakonarson H, Torkamani A, Schork NJ, Brodeur GM, Tonini GP, Rappaport E, Devoto M, Maris JM: Identification of ALK as a major familial neuroblastoma predisposition gene. Nature 2008, 455:930–93510.1038/nature07261. 189. Mosse YP, Laudenslager M, Khazi D, Carlisle AJ, Winter CL, Rappaport E, Maris JM: Germline PHOX2B mutation in hereditary neuroblastoma. Am. J. Hum. Genet 2004, 75:727–73010.1086/424530. 190. Trochet D, Bourdeaut F, Janoueix-Lerosey I, Deville A, de Pontual L, Schleiermacher G, Coze C, Philip N, Frébourg T, Munnich A, Lyonnet S, Delattre O, Amiel J: Germline mutations of the paired-like homeobox 2B (PHOX2B) gene in neuroblastoma. Am. J. Hum. Genet 2004, 74:761–76410.1086/383253. 191. Pattyn A, Morin X, Cremer H, Goridis C, Brunet J-F: The homeobox gene Phox2b is essential for the development of autonomic neural crest derivatives. Nature 1999, 399:366–37010.1038/20700. 192. Chen Y, Takita J, Choi YL, Kato M, Ohira M, Sanada M, Wang L, Soda M, Kikuchi A, Igarashi T, Nakagawara A, Hayashi Y, Mano H, Ogawa S: Oncogenic mutations of ALK kinase in neuroblastoma. Nature 2008, 455:971–97410.1038/nature07399. 193. George RE, Sanda T, Hanna M, Fröhling S, Luther W, Zhang J, Ahn Y, Zhou W, London WB, McGrady P, Xue L, Zozulya S, Gregor VE, Webb TR, Gray NS, Gilliland DG, Diller L, Greulich H, Morris SW, Meyerson M, Look AT: Activating mutations in ALK provide a therapeutic target in neuroblastoma. Nature 2008, 455:975– 97810.1038/nature07397. 194. Janoueix-Lerosey I, Lequin D, Brugières L, Ribeiro A, de Pontual L, Combaret V, Raynal V, Puisieux A, Schleiermacher G, Pierron G, Valteau-Couanet D, Frebourg T, Michon J, Lyonnet S, Amiel J, Delattre O: Somatic and germline activating mutations of the ALK kinase receptor in neuroblastoma. Nature 2008, 455:967– 97010.1038/nature07398. 195. Passoni L, Longo L, Collini P, Coluccia AML, Bozzi F, Podda M, Gregorio A, Gambini C, Garaventa A, Pistoia V, Del Grosso F, Tonini GP, Cheng M, Gambacorti-Passerini C, Anichini A, Fossati-Bellani F, Di Nicola M, Luksch R: Mutation-independent anaplastic lymphoma kinase overexpression in poor prognosis neuroblastoma patients. Cancer Res 2009, 69:7338–734610.1158/0008-5472.CAN-08-4419.  205 196. Deyell RJ, Attiyeh EF: Advances in the understanding of constitutional and somatic genomic alterations in neuroblastoma. Cancer Genetics 2011, 204:113– 12116/j.cancergen.2011.03.001. 197. Capasso M, Devoto M, Hou C, Asgharzadeh S, Glessner JT, Attiyeh EF, Mosse YP, Kim C, Diskin SJ, Cole KA, Bosse K, Diamond M, Laudenslager M, Winter C, Bradfield JP, Scott RH, Jagannathan J, Garris M, McConville C, London WB, Seeger RC, Grant SFA, Li H, Rahman N, Rappaport E, Hakonarson H, Maris JM: Common variations in BARD1 influence susceptibility to high-risk neuroblastoma. Nat. Genet 2009, 41:718– 72310.1038/ng.374. 198. Maris JM, Mosse YP, Bradfield JP, Hou C, Monni S, Scott RH, Asgharzadeh S, Attiyeh EF, Diskin SJ, Laudenslager M, Winter C, Cole KA, Glessner JT, Kim C, Frackelton EC, Casalunovo T, Eckert AW, Capasso M, Rappaport EF, McConville C, London WB, Seeger RC, Rahman N, Devoto M, Grant SFA, Li H, Hakonarson H: Chromosome 6p22 locus associated with clinically aggressive neuroblastoma. N. Engl. J. Med 2008, 358:2585– 259310.1056/NEJMoa0708698. 199. Nguyen LB, Diskin SJ, Capasso M, Wang K, Diamond MA, Glessner J, Kim C, Attiyeh EF, Mosse YP, Cole K, Iolascon A, Devoto M, Hakonarson H, Li HK, Maris JM: Phenotype restricted genome-wide association study using a gene-centric approach identifies three low-risk neuroblastoma susceptibility Loci. PLoS Genet. 2011, 7:e100202610.1371/journal.pgen.1002026. 200. Diskin SJ, Hou C, Glessner JT, Attiyeh EF, Laudenslager M, Bosse K, Cole K, Mossé YP, Wood A, Lynch JE, Pecor K, Diamond M, Winter C, Wang K, Kim C, Geiger EA, McGrady PW, Blakemore AIF, London WB, Shaikh TH, Bradfield J, Grant SFA, Li H, Devoto M, Rappaport ER, Hakonarson H, Maris JM: Copy number variation at 1q21.1 associated with neuroblastoma. Nature 2009, 459:987–99110.1038/nature08035. 201. Schwab M, Alitalo K, Klempnauer K-H, Varmus HE, Bishop JM, Gilbert F, Brodeur G, Goldstein M, Trent J: Amplified DNA with limited homology to myc cellular oncogene is shared by human neuroblastoma cell lines and a neuroblastoma tumour. Nature 1983, 305:245–24810.1038/305245a0. 202. Brodeur G, Seeger R, Schwab M, Varmus H, Bishop J: Amplification of N-myc in untreated human neuroblastomas correlates with advanced disease stage. Science 1984, 224:1121 –112410.1126/science.6719137. 203. Seeger RC, Brodeur GM, Sather H, Dalton A, Siegel SE, Wong KY, Hammond D: Association of multiple copies of the N-myc oncogene with rapid progression of neuroblastomas. N. Engl. J. Med 1985, 313:1111–111610.1056/NEJM198510313131802. 204. Subramaniam MM, Piqueras M, Navarro S, Noguera R: Aberrant copy numbers of ALK gene is a frequent genetic alteration in neuroblastomas. Hum. Pathol 2009, 40:1638–164210.1016/j.humpath.2009.05.002.  206 205. Attiyeh EF, London WB, Mossé YP, Wang Q, Winter C, Khazi D, McGrady PW, Seeger RC, Look AT, Shimada H, Brodeur GM, Cohn SL, Matthay KK, Maris JM: Chromosome 1p and 11q deletions and outcome in neuroblastoma. N. Engl. J. Med 2005, 353:2243–225310.1056/NEJMoa052399. 206. Guo C, White PS, Weiss MJ, Hogarty MD, Thompson PM, Stram DO, Gerbing R, Matthay KK, Seeger RC, Brodeur GM, Maris JM: Allelic deletion at 11q23 is common in MYCN single copy neuroblastomas. Oncogene 1999, 18:4948– 495710.1038/sj.onc.1202887. 207. Abel F, Ejeskär K, Kogner P, Martinsson T: Gain of chromosome arm 17q is associated with unfavourable prognosis in neuroblastoma, but does not involve mutations in the somatostatin receptor 2(SSTR2) gene at 17q24. Br. J. Cancer 1999, 81:1402–140910.1038/sj.bjc.6692231. 208. Stallings RL, Carty P, McArdle L, Mullarkey M, McDermott M, Breatnach F, O‘Meara A: Molecular cytogenetic analysis of recurrent unbalanced t(11;17) in neuroblastoma. Cancer Genet. Cytogenet 2004, 154:44–5110.1016/j.cancergencyto.2004.04.003. 209. Stark B, Jeison M, Glaser-Gabay L, Bar-Am I, Mardoukh J, Ash S, Atias D, Stein J, Zaizov R, Yaniv I: der(11)t(11;17): a distinct cytogenetic pathway of advanced stage neuroblastoma (NBL) - detected by spectral karyotyping (SKY). Cancer Lett 2003, 197:75–79. 210. Nakagawara A, Arima-Nakagawara M, Scavarda NJ, Azar CG, Cantor AB, Brodeur GM: Association between high levels of expression of the TRK gene and favorable outcome in human neuroblastoma. N. Engl. J. Med 1993, 328:847– 85410.1056/NEJM199303253281205. 211. Rydén M, Sehgal R, Dominici C, Schilling FH, Ibáñez CF, Kogner P: Expression of mRNA for the neurotrophin receptor trkC in neuroblastomas with favourable tumour stage and good prognosis. Br J Cancer 1996, 74:773–779. 212. Nakagawara A, Azar CG, Scavarda NJ, Brodeur GM: Expression and function of TRK-B and BDNF in human neuroblastomas. Mol. Cell. Biol 1994, 14:759–767. 213. Wei JS, Greer BT, Westermann F, Steinberg SM, Son C-G, Chen Q-R, Whiteford CC, Bilke S, Krasnoselsky AL, Cenacchi N, Catchpoole D, Berthold F, Schwab M, Khan J: Prediction of clinical outcome using gene expression profiling and artificial neural networks for patients with neuroblastoma. Cancer Res 2004, 64:6883–689110.1158/0008- 5472.CAN-04-0695. 214. Asgharzadeh S, Pique-Regi R, Sposto R, Wang H, Yang Y, Shimada H, Matthay K, Buckley J, Ortega A, Seeger RC: Prognostic significance of gene expression profiles of metastatic neuroblastomas lacking MYCN gene amplification. J. Natl. Cancer Inst. 2006, 98:1193–120310.1093/jnci/djj330.  207 215. Ohira M, Oba S, Nakamura Y, Isogai E, Kaneko S, Nakagawa A, Hirata T, Kubo H, Goto T, Yamada S, Yoshida Y, Fuchioka M, Ishii S, Nakagawara A: Expression profiling using a tumor-specific cDNA microarray predicts the prognosis of intermediate risk neuroblastomas. Cancer Cell 2005, 7:337–35010.1016/j.ccr.2005.03.019. 216. Oberthuer A, Berthold F, Warnat P, Hero B, Kahlert Y, Spitz R, Ernestus K, König R, Haas S, Eils R, Schwab M, Brors B, Westermann F, Fischer M: Customized oligonucleotide microarray gene expression-based classification of neuroblastoma patients outperforms current clinical risk stratification. J. Clin. Oncol. 2006, 24:5070– 507810.1200/JCO.2006.06.1879. 217. Oberthuer A, Hero B, Berthold F, Juraeva D, Faldum A, Kahlert Y, Asgharzadeh S, Seeger R, Scaruffi P, Tonini GP, Janoueix-Lerosey I, Delattre O, Schleiermacher G, Vandesompele J, Vermeulen J, Speleman F, Noguera R, Piqueras M, Bénard J, Valent A, Avigad S, Yaniv I, Weber A, Christiansen H, Grundy RG, Schardt K, Schwab M, Eils R, Warnat P, Kaderali L, et al.: Prognostic impact of gene expression-based classification for neuroblastoma. J. Clin. Oncol. 2010, 28:3506–351510.1200/JCO.2009.27.3367. 218. Vermeulen J, De Preter K, Naranjo A, Vercruysse L, Roy NV, Hellemans J, Swerts K, Bravo S, Scaruffi P, Tonini GP, Noguera R, Piqueras M, Janoueix-Lerosey I, Delattre O, Combaret V, Fischer M, Oberthuer A, Ambros PF, Beiske K, Bénard J, Marques B, Michon J, Schleiermacher G, Bernardi BD, Rubie H, Cañete A, Castel V, Kohler J, Pötschger U, Ladenstein R, et al.: Outcome Prediction of Children with Neuroblastoma using a Multigene Expression Signature, a Retrospective SIOPEN/COG/GPOH Study. Lancet Oncol 2009, 10:663–67110.1016/S1470-2045(09)70154-8. 219. Politi K, Pao W: How genetically engineered mouse tumor models provide insights into human cancers. J. Clin. Oncol. 2011, 29:2273–228110.1200/JCO.2010.30.8304. 220. Chesler L, Weiss WA: Genetically engineered murine models – Contribution to our understanding of the genetics, molecular pathology and therapeutic targeting of neuroblastoma. Seminars in Cancer Biology 2011, 21:245– 25510.1016/j.semcancer.2011.09.011. 221. Weiss WA, Aldape K, Mohapatra G, Feuerstein BG, Bishop JM: Targeted expression of MYCN causes neuroblastoma in transgenic mice. EMBO J 1997, 16:2985– 299510.1093/emboj/16.11.2985. 222. Rounbehler RJ, Li W, Hall MA, Yang C, Fallahi M, Cleveland JL: Targeting Ornithine Decarboxylase Impairs Development of MYCN-Amplified Neuroblastoma. Cancer Res 2009, 69:547–55310.1158/0008-5472.CAN-08-2968. 223. Teitz T, Stanke JJ, Federico S, Bradley CL, Brennan R, Zhang J, Johnson MD, Sedlacik J, Inoue M, Zhang ZM, Frase S, Rehg JE, Hillenbrand CM, Finkelstein D, Calabrese C, Dyer MA, Lahti JM: Preclinical Models for Neuroblastoma: Establishing a Baseline for Treatment. PLoS ONE 2011, 6:e1913310.1371/journal.pone.0019133.  208 224. Glenn TC: Field guide to next-generation DNA sequencers. Mol Ecol Resour 2011, 10.1111/j.1755-0998.2011.03024.xAvailable: http://www.ncbi.nlm.nih.gov/pubmed/21592312.Accessed 19 July 2011. 225. Huang X, Saint-Jeannet J-P: Induction of the neural crest and the opportunities of life on the edge. Developmental Biology 2004, 275:1–1116/j.ydbio.2004.07.033. 226. Anderson DJ: The neural crest cell lineage problem: Neuropoiesis? Neuron 1989, 3:1–1216/0896-6273(89)90110-4. 227. Anderson DJ, Carnahan JF, Michelsohn A, Patterson PH: Antibody markers identify a common progenitor to sympathetic neurons and chromaffin cells in vivo and reveal the timing of commitment to neuronal differentiation in the sympathoadrenal lineage. J. Neurosci 1991, 11:3507–3519. 228. Nakagawara A, Ohira M: Comprehensive genomics linking between neural development and cancer: neuroblastoma as a model. Cancer Letters 2004, 204:213– 22416/S0304-3835(03)00457-9. 229. Jiang M, Stanke J, Lahti JM: The connections between neural crest development and neuroblastoma. Curr. Top. Dev. Biol 2011, 94:77–12710.1016/B978-0-12-380916-2.00004- 8. 230. Prockop DJ: Marrow Stromal Cells as Stem Cells for Nonhematopoietic Tissues. Science 1997, 276:71 –7410.1126/science.276.5309.71. 231. Gage FH: Mammalian Neural Stem Cells. Science 2000, 287:1433 – 143810.1126/science.287.5457.1433. 232. Reynolds B, Weiss S: Generation of neurons and astrocytes from isolated cells of the adult mammalian central nervous system. Science 1992, 255:1707 – 171010.1126/science.1553558. 233. Toma JG, Akhavan M, Fernandes KJL, Barnabe-Heider F, Sadikot A, Kaplan DR, Miller FD: Isolation of multipotent adult stem cells from the dermis of mammalian skin. Nat Cell Biol 2001, 3:778–78410.1038/ncb0901-778. 234. Toma JG, McKenzie IA, Bagli D, Miller FD: Isolation and characterization of multipotent skin-derived precursors from human skin. Stem Cells 2005, 23:727– 73710.1634/stemcells.2004-0134. 235. Fernandes KJL, McKenzie IA, Mill P, Smith KM, Akhavan M, Barnabe-Heider F, Biernaskie J, Junek A, Kobayashi NR, Toma JG, Kaplan DR, Labosky PA, Rafuse V, Hui C- C, Miller FD: A dermal niche for multipotent adult skin-derived precursor cells. Nat Cell Biol 2004, 6:1082–109310.1038/ncb1181.  209 236. Biernaskie J, Paris M, Morozova O, Fagan BM, Marra M, Pevny L, Miller FD: SKPs derive from hair follicle precursors and exhibit properties of adult dermal stem cells. Cell Stem Cell 2009, 5:610–62310.1016/j.stem.2009.10.019. 237. Christ B, Ordahl CP: Early stages of chick somite development. Anat. Embryol. 1995, 191:381–396. 238. Couly G, Grapin-Botton A, Coltey P, Ruhin B, Le Douarin NM: Determination of the identity of the derivatives of the cephalic neural crest: incompatibility between Hox gene expression and lower jaw development. Development 1998, 125:3445–3459. 239. Mauger A: [The role of somitic mesoderm in the development of dorsal plumage in chick embryos. II. Regionalization of the plumage-forming mesoderm]. J Embryol Exp Morphol 1972, 28:343–366. 240. Lanza RP: Handbook of stem cells. Academic Press; 2004. 241. Okita K, Ichisaka T, Yamanaka S: Generation of germline-competent induced pluripotent stem cells. Nature 2007, 448:313–31710.1038/nature05934. 242. Wernig M, Meissner A, Foreman R, Brambrink T, Ku M, Hochedlinger K, Bernstein BE, Jaenisch R: In vitro reprogramming of fibroblasts into a pluripotent ES-cell-like state. Nature 2007, 448:318–32410.1038/nature05944. 243. Smith KM, Datti A, Fujitani M, Grinshtein N, Zhang L, Morozova O, Blakely KM, Rotenberg SA, Hansford LM, Miller FD, Yeger H, Irwin MS, Moffat J, Marra MA, Baruchel S, Wrana JL, Kaplan DR: Selective targeting of neuroblastoma tumour-initiating cells by compounds identified in stem cell-based small molecule screens. EMBO Mol Med 2010, 2:371–38410.1002/emmm.201000093. 244. Morozova O, Vojvodic M, Grinshtein N, Hansford LM, Blakely KM, Maslova A, Hirst M, Cezard T, Morin RD, Moore R, Smith KM, Miller F, Taylor P, Thiessen N, Varhol R, Zhao Y, Jones S, Moffat J, Kislinger T, Moran MF, Kaplan DR, Marra MA: System-level analysis of neuroblastoma tumor-initiating cells implicates AURKB as a novel drug target for neuroblastoma. Clin. Cancer Res 2010, 16:4572–458210.1158/1078-0432.CCR- 10-0627. 245. Jessen KR, Mirsky R: The origin and development of glial cells in peripheral nerves. Nat. Rev. Neurosci 2005, 6:671–68210.1038/nrn1746. 246. Gentleman R, Carey VJ, Huber W, Irizarry RA, Dudoit Seds.: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer-Verlag; 2005 Available: http://www.springerlink.com/content/978-0-387-25146- 2#section=519945&page=1.Accessed 6 June 2011. 247. Jeanmougin M, de Reynies A, Marisa L, Paccard C, Nuel G, Guedj M: Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A  210 Comparison of Variance Modeling Strategies. PLoS ONE 2010, 5:e1233610.1371/journal.pone.0012336. 248. Jeffery IB, Higgins DG, Culhane AC: Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics 2006, 7:35910.1186/1471-2105-7-359. 249. Smyth GK: Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004, 3:Article310.2202/1544-6115.1027. 250. Sauka-Spengler T, Meulemans D, Jones M, Bronner-Fraser M: Ancient evolutionary origin of the neural crest gene regulatory network. Dev. Cell 2007, 13:405– 42010.1016/j.devcel.2007.08.005. 251. Stemple DL, Anderson DJ: Isolation of a stem cell for neurons and glia from the mammalian neural crest. Cell 1992, 71:973–985. 252. Liu JP, Jessell TM: A role for rhoB in the delamination of neural crest cells from the dorsal neural tube. Development 1998, 125:5055–5067. 253. Kurauchi T, Izutsu Y, Maéno M: Involvement of Neptune in induction of the hatching gland and neural crest in the Xenopus embryo. Differentiation, 79:251– 25916/j.diff.2010.01.003. 254. Wong Y-M, Chow KL: Expression of zebrafish mab21 genes marks the differentiating eye, midbrain and neural tube. Mech. Dev 2002, 113:149–152. 255. Schraufstatter IU, Discipio RG, Khaldoyanidi S: Mesenchymal stem cells and their microenvironment. Front. Biosci. 2011, 17:2271–2288. 256. Vodyanik MA, Yu J, Zhang X, Tian S, Stewart R, Thomson JA, Slukvin II: A Mesoderm-Derived Precursor for Mesenchymal Stem and Endothelial Cells. Cell Stem Cell 2010, 7:718–72910.1016/j.stem.2010.11.011. 257. Kléber M, Lee H-Y, Wurdak H, Buchstaller J, Riccomagno MM, Ittner LM, Suter U, Epstein DJ, Sommer L: Neural crest stem cell maintenance by combinatorial Wnt and BMP signaling. J. Cell Biol. 2005, 169:309–32010.1083/jcb.200411095. 258. Douarin NL, Kalcheim C: The neural crest. Cambridge University Press; 1999. 259. Boon K, Osório EC, Greenhut SF, Schaefer CF, Shoemaker J, Polyak K, Morin PJ, Buetow KH, Strausberg RL, de Souza SJ, Riggins GJ: An anatomy of normal and malignant gene expression. Proceedings of the National Academy of Sciences 2002, 99:11287 –1129210.1073/pnas.152324199.  211 260. Morozova O, Morozov V, Hoffman BG, Helgason CD, Marra MA: A seriation approach for visualization-driven discovery of co-expression patterns in Serial Analysis of Gene Expression (SAGE) data. PLoS ONE 2008, 3:e320510.1371/journal.pone.0003205. 261. Robinson WS: A method for chronologically ordering archaeological deposits. American Antiquity 1951, 16:293–301. 262. Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG, Gifford DK, Melton DA, Jaenisch R, Young RA: Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 2005, 122:947– 95610.1016/j.cell.2005.08.020. 263. Roider HG, Manke T, O‘Keeffe S, Vingron M, Haas SA: PASTAA: identifying transcription factors associated with sets of co-regulated genes. Bioinformatics 2009, 25:435–44210.1093/bioinformatics/btn627. 264. Radomska HS, Satterthwaite AB, Taranenko N, Narravula S, Krause DS, Tenen DG: A nuclear factor Y (NFY) site positively regulates the human CD34 stem cell gene. Blood 1999, 94:3772–3780. 265. Winger Q, Huang J, Auman HJ, Lewandoski M, Williams T: Analysis of transcription factor AP-2 expression and function during mouse preimplantation development. Biol. Reprod. 2006, 75:324–33310.1095/biolreprod.106.052407. 266. Schmidt M, Huber L, Majdazari A, Schütz G, Williams T, Rohrer H: The transcription factors AP-2β and AP-2α are required for survival of sympathetic progenitors and differentiated sympathetic neurons. Dev. Biol. 2011, 355:89– 10010.1016/j.ydbio.2011.04.011. 267. Cesari F, Brecht S, Vintersten K, Vuong LG, Hofmann M, Klingel K, Schnorr J-J, Arsenian S, Schild H, Herdegen T, Wiebel FF, Nordheim A: Mice deficient for the ets transcription factor elk-1 show normal immune responses and mildly impaired neuronal gene activation. Mol. Cell. Biol. 2004, 24:294–305. 268. Dworkin S, Mantamadiotis T: Targeting CREB signalling in neurogenesis. Expert Opin. Ther. Targets 2010, 14:869–87910.1517/14728222.2010.501332. 269. Ryser S, Dizin E, Jefford CE, Delaval B, Gagos S, Christodoulidou A, Krause K-H, Birnbaum D, Irminger-Finger I: Distinct Roles of BARD1 Isoforms in Mitosis: Full- Length BARD1 Mediates Aurora B Degradation, Cancer-Associated BARD1β Scaffolds Aurora B and BRCA2. Cancer Research 2009, 69:1125 –113410.1158/0008-5472.CAN- 08-2134. 270. Modlin IM, Champaneria MC, Bornschein J, Kidd M: Evolution of the diffuse neuroendocrine system--clear cells and cloudy origins. Neuroendocrinology 2006, 84:69– 8210.1159/000096997.  212 271. Kuijk EW, Chuva de Sousa Lopes SM, Geijsen N, Macklon N, Roelen BAJ: The different shades of mammalian pluripotent stem cells. Hum. Reprod. Update 2011, 17:254–27110.1093/humupd/dmq035. 272. Wurdak H, Ittner LM, Lang KS, Leveen P, Suter U, Fischer JA, Karlsson S, Born W, Sommer L: Inactivation of TGFbeta signaling in neural crest stem cells leads to multiple defects reminiscent of DiGeorge syndrome. Genes Dev 2005, 19:530– 53510.1101/gad.317405. 273. Chen M-F, Lin C-T, Chen W-C, Yang C-T, Chen C-C, Liao S-K, Liu JM, Lu C-H, Lee K-D: The sensitivity of human mesenchymal stem cells to ionizing radiation. Int. J. Radiat. Oncol. Biol. Phys. 2006, 66:244–25310.1016/j.ijrobp.2006.03.062. 274. Khattra J, Delaney AD, Zhao Y, Siddiqui A, Asano J, McDonald H, Pandoh P, Dhalla N, Prabhu A, Ma K, Lee S, Ally A, Tam A, Sa D, Rogers S, Charest D, Stott J, Zuyderduyn S, Varhol R, Eaves C, Jones S, Holt R, Hirst M, Hoodless PA, Marra MA: Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines. Genome Res 2007, 17:108–11610.1101/gr.5488207. 275. Caraux G, Pinloche S: PermutMatrix: a graphical environment to arrange gene expression profiles in optimal linear order. Bioinformatics 2005, 21:1280– 128110.1093/bioinformatics/bti141. 276. Sokal RR, Michener CD: A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin 1958, 28:1409–1438. 277. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 1998, 95:14863–14868. 278. Rodriguez-Esteban C, Tsukui T, Yonei S, Magallon J, Tamura K, Izpisua Belmonte JC: The T-box genes Tbx4 and Tbx5 regulate limb outgrowth and identity. Nature 1999, 398:814–81810.1038/19769. 279. Hansford LM, McKee AE, Zhang L, George RE, Gerstle JT, Thorner PS, Smith KM, Look AT, Yeger H, Miller FD, Irwin MS, Thiele CJ, Kaplan DR: Neuroblastoma cells isolated from bone marrow metastases contain a naturally enriched tumor-initiating cell. Cancer Res 2007, 67:11234–1124310.1158/0008-5472.CAN-07-0718. 280. Pahlman, Sven, Johnsson, Sofie, Pietras, Alexander: Patient-derived EBV- immortalized B-lymphocytes are a dominant contaminant of in vitro cultured human neuroblastoma tumor-initiating cells isolated from bone marrow. 2011, Available: http://www.abstractsonline.com/Plan/ViewAbstract.aspx?sKey=d2eb516b-a2a1-4fac-9e8d- 3b032d6bc731&cKey=29fa6ed6-fc53-4101-8a24-608b68ef3e2f&mKey=%7B507D311A- B6EC-436A-BD67-6D14ED39622C%7D.Accessed 20 May 2011. 281. Chen Y, Li D, Li S: The Alox5 gene is a novel therapeutic target in cancer stem cells of chronic myeloid leukemia. cc 2009, 8:3488–349210.4161/cc.8.21.9852.  213 282. Bitton D, Okoniewski M, Connolly Y, Miller C: Exon level integration of proteomics and microarray data. BMC Bioinformatics 2008, 9:11810.1186/1471-2105-9-118. 283. Okoniewski MJ, Miller CJ: Comprehensive Analysis of Affymetrix Exon Arrays Using BioConductor. PLoS Comput Biol 2008, 4:e610.1371/journal.pcbi.0040006. 284. Taylor P, Nielsen PA, Trelle MB, Hørning OB, Andersen MB, Vorm O, Moran MF, Kislinger T: Automated 2D Peptide Separation on a 1D Nano-LC-MS System. J. Proteome Res. 2009, 8:1610–161610.1021/pr800986c. 285. Chen EI, Hewel J, Felding-Habermann B, Yates JR: Large Scale Protein Profiling by Combination of Protein Fractionation and Multidimensional Protein Identification Technology (MudPIT). Molecular & Cellular Proteomics 2006, 5:53 – 5610.1074/mcp.T500013-MCP200. 286. Skibbens RV: Cell biology of cancer: BRCA1 and sister chromatid pairing reactions? cc 2008, 7:449–45210.4161/cc.7.4.5435. 287. Billingsley ML: Druggable targets and targeted drugs: enhancing the development of new therapeutics. Pharmacology 2008, 82:239–24410.1159/000157624. 288. Tobinick EL: The value of drug repositioning in the current pharmaceutical market. Drug News Perspect 2009, 22:5310.1358/dnp.2009.22.1.1303818. 289. Goldsmith KC, Hogarty MD: Targeting programmed cell death pathways with experimental therapeutics: opportunities in high-risk neuroblastoma. Cancer Letters 2005, 228:133–14116/j.canlet.2005.01.048. 290. Daniel RA, Rozanska AL, Thomas HD, Mulligan EA, Drew Y, Castelbuono DJ, Hostomsky Z, Plummer ER, Boddy AV, Tweddle DA, Curtin NJ, Clifford SC: Inhibition of poly(ADP-ribose) polymerase-1 enhances temozolomide and topotecan activity against childhood neuroblastoma. Clin. Cancer Res 2009, 15:1241–124910.1158/1078-0432.CCR- 08-1095. 291. Witt O, Deubzer HE, Lodrini M, Milde T, Oehme I: Targeting histone deacetylases in neuroblastoma. Curr. Pharm. Des 2009, 15:436–447. 292. Gautschi O, Heighway J, Mack PC, Purnell PR, Lara PN, Gandara DR: Aurora kinases as anticancer drug targets. Clin. Cancer Res 2008, 14:1639–164810.1158/1078-0432.CCR- 07-2179. 293. Alley MC, Scudiero DA, Monks A, Hursey ML, Czerwinski MJ, Fine DL, Abbott BJ, Mayo JG, Shoemaker RH, Boyd MR: Feasibility of Drug Screening with Panels of Human Tumor Cell Lines Using a Microculture Tetrazolium Assay. Cancer Research 1988, 48:589 –601. 294. Cloonan N, Forrest ARR, Kolle G, Gardiner BBA, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, Robertson AJ, Perkins AC, Bruce SJ, Lee CC, Ranade SS,  214 Peckham HE, Manning JM, McKernan KJ, Grimmond SM: Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods 2008, 5:613– 61910.1038/nmeth.1223. 295. Shah SH, Pallas JA: Identifying differential exon splicing using linear models and correlation coefficients. BMC Bioinformatics 2009, 10:2610.1186/1471-2105-10-26. 296. The UniProt Consortium: Ongoing and future developments at the Universal Protein Resource. Nucleic Acids Research 2010, 39:D214–D21910.1093/nar/gkq1020. 297. Robertson G, Schein J, Chiu R, Corbett R, Field M, Jackman SD, Mungall K, Lee S, Okada HM, Qian JQ, Griffith M, Raymond A, Thiessen N, Cezard T, Butterfield YS, Newsome R, Chan SK, She R, Varhol R, Kamoh B, Prabhu A-L, Tam A, Zhao Y, Moore RA, Hirst M, Marra MA, Jones SJM, Hoodless PA, Birol I: De novo assembly and analysis of RNA-seq data. Nat. Methods 2010, 7:909–91210.1038/nmeth.1517. 298. Hubbard TJP, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, Coates G, Fairley S, Fitzgerald S, Fernandez-Banet J, Gordon L, Graf S, Haider S, Hammond M, Holland R, Howe K, Jenkinson A, Johnson N, Kahari A, Keefe D, Keenan S, Kinsella R, Kokocinski F, Kulesha E, Lawson D, Longden I, et al.: Ensembl 2009. Nucleic Acids Research 2009, 37:D690–D69710.1093/nar/gkn828. 299. Irminger-Finger I, Jefford CE: Is there more to BARD1 than BRCA1? Nat. Rev. Cancer 2006, 6:382–39110.1038/nrc1878. 300. Shakya R, Szabolcs M, McCarthy E, Ospina E, Basso K, Nandula S, Murty V, Baer R, Ludwig T: The basal-like mammary carcinomas induced by Brca1 or Bard1 inactivation implicate the BRCA1/BARD1 heterodimer in tumor suppression. Proc. Natl. Acad. Sci. U.S.A. 2008, 105:7040–704510.1073/pnas.0711032105. 301. Li L, Ryser S, Dizin E, Pils D, Krainer M, Jefford CE, Bertoni F, Zeillinger R, Irminger- Finger I: Oncogenic BARD1 isoforms expressed in gynecological cancers. Cancer Res. 2007, 67:11876–1188510.1158/0008-5472.CAN-07-2370. 302. Sporn JC, Hothorn T, Jung B: BARD1 expression predicts outcome in colon cancer. Clin. Cancer Res. 2011, 17:5451–546210.1158/1078-0432.CCR-11-0263. 303. Zhang Y-Q, Bianco A, Malkinson AM, Leoni VP, Frau G, De Rosa N, André P-A, Versace R, Boulvain M, Laurent GJ, Atzori L, Irminger-Finger I: BARD1: An independent predictor of survival in non-small cell lung cancer. International Journal of Cancer. Journal International Du Cancer 2011, 10.1002/ijc.26346Available: http://www.ncbi.nlm.nih.gov/pubmed/21815143.Accessed 24 January 2012. 304. Shang X, Burlingame SM, Okcu MF, Ge N, Russell HV, Egler RA, David RD, Vasudevan SA, Yang J, Nuchtern JG: Aurora A is a negative prognostic factor and a new therapeutic target in human neuroblastoma. Mol. Cancer Ther 2009, 8:2461– 246910.1158/1535-7163.MCT-08-0857.  215 305. Lens SMA, Voest EE, Medema RH: Shared and separate functions of polo-like kinases and aurora kinases in cancer. Nat Rev Cancer 2010, 10:825–84110.1038/nrc2964. 306. Westerhout E, Kool M, Molenaar J, Stroeken, den Boer M, Segers S, Clifford S, Delattre O, Benetkiewicz M, Lanvers C, Pieters R, Pietsch T, Holst M, Renshaw J, Shipley J, Serra M, Scotlandi K, Geoerger B, Va