"Medicine, Faculty of"@en . "Pathology and Laboratory Medicine, Department of"@en . "DSpace"@en . "UBCV"@en . "deLeeuw, Ronald John"@en . "2009-08-21T17:31:11Z"@en . "2009"@en . "Doctor of Philosophy - PhD"@en . "University of British Columbia"@en . "Mantle cell lymphoma (MCL) is an aggressive non-Hodgkin\u00E2\u0080\u0099s lymphoma with a median patient survival time of 3 years. Although the characteristic t(11;14)(q13;q32) is found in virtually all cases, experimental evidence suggests that this event alone is insufficient to result in lymphoma and secondary genomic alterations are required. Therefore, secondary genetic alterations have been proposed as essential in MCL pathogenesis. Within this thesis I describe the creation of a novel assay to determine segmental copy number alterations at a previously unprecedented resolution. This new assay necessitated the development of new analytical software to visualize and analyze the high density data sets created. The creation of this software is described in detail. With these tools in place we assayed model genomes of MCL for recurrent segmental copy number alterations. These recurrent regions were defined; however, among these were copy number variations that appeared in both cases and controls. Investigation of these natural copy number variations in this thesis revealed that the human genome has a higher plasticity than previously appreciated. In fact, thousands of loci within the genome were found to be variable in copy number that may influence sensory perception and possibly disease susceptibility. I next investigated the genomes of MCL tumor samples to determine which somatic copy number alterations are related to a poor clinical course. Among the numerous loci that showed frequent copy number alterations in MCL genomes, many were associated with poor patient outcome. Among these, the loss of 9p21 was a strong factor in determining the clinical course of patients with MCL (P=0.0004). Three additional loci (4q13, 8q24, and 13q14) were combined with 9p21 to create a survival model that was very predictive of patient outcome (P=5.87 x 10-6). Interestingly, a previously uncharacterized locus (4q13) was within this survival model. Investigating this locus further revealed that the expression of two genes (CCNG2 and CCNI) influences the overall survival of patients with MCL (P=0.0292 and 0.0201, respectively)."@en . "https://circle.library.ubc.ca/rest/handle/2429/12469?expand=metadata"@en . "5784298 bytes"@en . "application/pdf"@en . " MANTLE CELL LYMPHOMA PATHOGENESIS by RONALD JOHN DELEEUW B.Sc., the University of British Columbia, 2003 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE FOR DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES (Pathology and Laboratory Medicine) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2009 \u00C2\u00A9 Ronald John deLeeuw, 2009 Abstract Mantle cell lymphoma (MCL) is an aggressive non-Hodgkin\u00E2\u0080\u0099s lymphoma with a median patient survival time of 3 years. Although the characteristic t(11;14)(q13;q32) is found in virtually all cases, experimental evidence suggests that this event alone is insufficient to result in lymphoma and secondary genomic alterations are required. Therefore, secondary genetic alterations have been proposed as essential in MCL pathogenesis. Within this thesis I describe the creation of a novel assay to determine segmental copy number alterations at a previously unprecedented resolution. This new assay necessitated the development of new analytical software to visualize and analyze the high density data sets created. The creation of this software is described in detail. With these tools in place we assayed model genomes of MCL for recurrent segmental copy number alterations. These recurrent regions were defined; however, among these were copy number variations that appeared in both cases and controls. Investigation of these natural copy number variations in this thesis revealed that the human genome has a higher plasticity than previously appreciated. In fact, thousands of loci within the genome were found to be variable in copy number that may influence sensory perception and possibly disease susceptibility. I next investigated the genomes of MCL tumor samples to determine which somatic copy number alterations are related to a poor clinical course. Among the numerous loci that showed frequent copy number alterations in MCL genomes, many were associated with poor patient outcome. Among these, the loss of 9p21 was a strong factor in determining the clinical course of patients with MCL (P=0.0004). Three additional loci (4q13, 8q24, and 13q14) were combined with 9p21 to create a survival model that was very predictive of patient outcome (P=5.87 x 10-6). Interestingly, a previously uncharacterized locus (4q13) was within this survival model. Investigating this locus further revealed that the expression of two genes (CCNG2 and CCNI) influences the overall survival of patients with MCL (P=0.0292 and 0.0201, respectively). ii Table of Contents Abstract ......................................................................................................................................... ii\u00C2\u00A0 Table of Contents ......................................................................................................................... iii\u00C2\u00A0 List of Tables ............................................................................................................................... vii\u00C2\u00A0 List of Figures ............................................................................................................................. viii\u00C2\u00A0 List of Abbreviations ...................................................................................................................... x\u00C2\u00A0 Acknowledgements ..................................................................................................................... xiii\u00C2\u00A0 Dedication ................................................................................................................................... xv\u00C2\u00A0 Co-Authorship Statement ........................................................................................................... xvi\u00C2\u00A0 Chapter 1: Introduction ................................................................................................................. 1\u00C2\u00A0 1.1 Lymphoma ........................................................................................................................... 1\u00C2\u00A0 1.1.1. Definition ...................................................................................................................... 1\u00C2\u00A0 1.1.2. Classification ................................................................................................................ 1\u00C2\u00A0 1.1.3. Etiology ........................................................................................................................ 2\u00C2\u00A0 1.1.4. Epidemiology ............................................................................................................... 3\u00C2\u00A0 1.2 Mantle cell lymphoma .......................................................................................................... 4\u00C2\u00A0 1.2.1. Classification ................................................................................................................ 4\u00C2\u00A0 1.2.2. Clinical characteristics ................................................................................................. 4\u00C2\u00A0 1.2.3. Genetic characteristics ................................................................................................ 5\u00C2\u00A0 1.3 Genomic copy number alterations in cancer ....................................................................... 6\u00C2\u00A0 1.3.1. Identification of copy number alterations in cancer ...................................................... 6\u00C2\u00A0 1.3.2. Evolution of techniques to measure copy number alterations in cancer ...................... 6\u00C2\u00A0 1.3.3. Natural copy number variation ..................................................................................... 8\u00C2\u00A0 1.3.4. Other genetic alterations in cancer .............................................................................. 8\u00C2\u00A0 1.4 Thesis theme ....................................................................................................................... 9\u00C2\u00A0 1.5. Objectives and hypothesis ............................................................................................... 10\u00C2\u00A0 1.6. Specific aims and thesis outline ....................................................................................... 10\u00C2\u00A0 Aim 1: Development of tools for the comprehensive determination of copy number alterations in MCL. ............................................................................................................... 10\u00C2\u00A0 Aim 2: Determination of recurrent copy number alterations in MCL model genomes.......... 11\u00C2\u00A0 Aim 3: Identification of natural copy number variation in the human genome. .................... 11\u00C2\u00A0 iii Aim 4: Determination of copy number alterations in MCL that correlate to a poor overall survival. ................................................................................................................................ 12\u00C2\u00A0 1.7 References ........................................................................................................................ 13\u00C2\u00A0 Chapter 2: A tiling resolution DNA microarray with complete coverage of the human genome .. 19\u00C2\u00A0 2.1 Introduction ........................................................................................................................ 20\u00C2\u00A0 2.2 Results .............................................................................................................................. 21\u00C2\u00A0 2.2.1 Array sensitivity ........................................................................................................... 21\u00C2\u00A0 2.2.2 Array resolution compared to conventional CGH ....................................................... 21\u00C2\u00A0 2.2.3 Comparison to previous array CGH ............................................................................ 22\u00C2\u00A0 2.2.4 Identification of minute regions of alteration ............................................................... 23\u00C2\u00A0 2.3 Discussion ......................................................................................................................... 23\u00C2\u00A0 2.4 Methods ............................................................................................................................. 24\u00C2\u00A0 2.4.1 BAC clone selection, preparation and validation ........................................................ 24\u00C2\u00A0 2.4.2 Array production from BAC DNA ................................................................................ 24\u00C2\u00A0 2.4.3 DNA labelling and hybridization .................................................................................. 25\u00C2\u00A0 2.4.4 Array imaging and analysis ......................................................................................... 25\u00C2\u00A0 2.5 References ........................................................................................................................ 31\u00C2\u00A0 Chapter 3: SeeGH \u00E2\u0080\u0093 A software tool for visualization of whole genome array comparative genomic hybridization data ......................................................................................................... 34\u00C2\u00A0 3.1 Introduction ........................................................................................................................ 35\u00C2\u00A0 3.2 Software environment and information sources ................................................................ 36\u00C2\u00A0 3.3 Results and discussion: ..................................................................................................... 37\u00C2\u00A0 3.3.1 Overview of data flow ................................................................................................. 37\u00C2\u00A0 3.3.2 Input requirements ...................................................................................................... 37\u00C2\u00A0 3.3.3 Data filtering and storage ............................................................................................ 38\u00C2\u00A0 3.3.4 Data presentation ....................................................................................................... 39\u00C2\u00A0 3.3.4.1 Genomic view .......................................................................................................... 39\u00C2\u00A0 3.3.4.2 Chromosome view ................................................................................................... 41\u00C2\u00A0 3.3.4.3 Accessing previously entered data .......................................................................... 42\u00C2\u00A0 3.4 Conclusions ....................................................................................................................... 42\u00C2\u00A0 3.5 Availability and requirements ............................................................................................ 43\u00C2\u00A0 3.6 References ........................................................................................................................ 50\u00C2\u00A0 Chapter 4: Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes ..................................................................................................................................... 51\u00C2\u00A0 iv 4.1 Introduction ........................................................................................................................ 52\u00C2\u00A0 4.2 Results .............................................................................................................................. 53\u00C2\u00A0 4.3 Discussion ......................................................................................................................... 60\u00C2\u00A0 4.4 Materials and methods ...................................................................................................... 62\u00C2\u00A0 4.4.1 Cell lines, culture conditions and DNA extraction ....................................................... 62\u00C2\u00A0 4.4.2 Array CGH hybridization ............................................................................................. 62\u00C2\u00A0 4.4.3 Array CGH analysis .................................................................................................... 63\u00C2\u00A0 4.4.4 Expression microarray procedure ............................................................................... 64\u00C2\u00A0 4.4.5 Fluorescent in-situ hybridization ................................................................................. 65\u00C2\u00A0 4.5 References ........................................................................................................................ 72\u00C2\u00A0 Chapter 5: A comprehensive analysis of common copy-number variations in the human genome .................................................................................................................................................... 76\u00C2\u00A0 5.1 Introduction ........................................................................................................................ 77\u00C2\u00A0 5.2 Material and methods ........................................................................................................ 77\u00C2\u00A0 5.2.1 DNA samples .............................................................................................................. 77\u00C2\u00A0 5.2.2 BAC array CGH analysis ............................................................................................ 78\u00C2\u00A0 5.2.3 CNV detection algorithm ............................................................................................. 79\u00C2\u00A0 5.2.4 Determination of false-positive and false-negative rates ............................................ 80\u00C2\u00A0 5.2.5 CNV association ......................................................................................................... 82\u00C2\u00A0 5.2.6 Duplication analysis .................................................................................................... 83\u00C2\u00A0 5.2.7 Clustering analysis ...................................................................................................... 83\u00C2\u00A0 5.2.8 Sample diversity ......................................................................................................... 84\u00C2\u00A0 5.2.9 Quantitative PCR ........................................................................................................ 84\u00C2\u00A0 5.3 Results .............................................................................................................................. 84\u00C2\u00A0 5.3.1 Identification of CNVs ................................................................................................. 84\u00C2\u00A0 5.3.2 Genomic diversity within the sample population ......................................................... 87\u00C2\u00A0 5.3.3 CNV-associated genes ............................................................................................... 88\u00C2\u00A0 5.4 Discussion ......................................................................................................................... 89\u00C2\u00A0 5.5 References ...................................................................................................................... 107\u00C2\u00A0 Chapter 6: Deletion of 4q21 and low expression of CCNI and CCNG2 result in poor overall survival in mantle cell lymphoma .............................................................................................. 110\u00C2\u00A0 6.1 Introduction ...................................................................................................................... 111\u00C2\u00A0 6.2 Methods ........................................................................................................................... 112\u00C2\u00A0 6.3 Results ............................................................................................................................ 112\u00C2\u00A0 v 6.4 Discussion ....................................................................................................................... 114\u00C2\u00A0 6.5 References ...................................................................................................................... 119\u00C2\u00A0 Chapter 7: Conclusions ............................................................................................................. 121\u00C2\u00A0 7.1 Summary ......................................................................................................................... 121\u00C2\u00A0 7.1.1 Development of tools to measure copy number alterations throughout the human genome .............................................................................................................................. 121\u00C2\u00A0 7.1.2 Measurement of MCL model genomes ..................................................................... 122\u00C2\u00A0 7.1.3 Determination of natural copy number variations in the human genome ................. 124\u00C2\u00A0 7.1.4 Correlation of somatic copy number alterations in MCL to clinical outcome............. 124\u00C2\u00A0 7.2 Concurrent studies .......................................................................................................... 125\u00C2\u00A0 7.2.1 Concurrent developments in the detection of copy number alterations .................... 125\u00C2\u00A0 7.2.2 MCL secondary genomic copy number assessment ................................................ 126\u00C2\u00A0 7.2.3 Natural copy number variation in the human genome .............................................. 126\u00C2\u00A0 7.2.4 MCL pathogenesis .................................................................................................... 127\u00C2\u00A0 7.3 Discussion and conclusions ............................................................................................ 128\u00C2\u00A0 7.4 Future directions .............................................................................................................. 131\u00C2\u00A0 7.5 References ...................................................................................................................... 133\u00C2\u00A0 Appendices ............................................................................................................................... 135\u00C2\u00A0 Research ethics board certificate of approval ....................................................................... 135\u00C2\u00A0 Biology of lymphoid cancers .............................................................................................. 135\u00C2\u00A0 Online supplementary material .............................................................................................. 138\u00C2\u00A0 Chapter 2: A tiling resolution DNA microarray with complete coverage of the human genome .............................................................................................................................. 138\u00C2\u00A0 Chapter 4: Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes ................................................................................................................. 138\u00C2\u00A0 Chapter 5: A comprehensive analysis of common copy-number variations in the human genome .............................................................................................................................. 138\u00C2\u00A0 vi List of Tables Table 4.1 Summary of genomic alterations recurring in a minimum of three MCL cell line models. ....................................................................................................................................... 71\u00C2\u00A0 Table 5.1. Samples used in this study ...................................................................................... 100\u00C2\u00A0 Table 5.2. Expected CNV patterns of eight hybridizations between four DNA samples. .......... 101\u00C2\u00A0 Table 5.3. Sensory-related genes associated with CNVs. ........................................................ 103\u00C2\u00A0 Table 5.4. Select examples of CNVs associated with cancer-related genes. ........................... 104\u00C2\u00A0 Table 5.5. Select CNVs overlapping genes associated with diseases or disease susceptibility. .................................................................................................................................................. 105\u00C2\u00A0 Table 5.6. MicroRNAs overlapping CNVs. ................................................................................ 106\u00C2\u00A0 Table 6.1. Regions of somatic copy number alteration with greater than 10% recurrence ....... 118\u00C2\u00A0 vii List of Figures Figure 2.1. Detection of two-fold copy number changes in TAT-1 lymphoma cell line on chromosome arms 8q and 18q. .................................................................................................. 26\u00C2\u00A0 Figure 2.2. Whole genome SMRT array CGH of lung cancer cell line H526. ............................ 27\u00C2\u00A0 Figure 2.3. Amplification of chromosome 8q24.12\u00E2\u0080\u0093.13 in colorectal cancer cell line COLO320. .................................................................................................................................................... 28\u00C2\u00A0 Figure 2.4. Identification of a novel microamplification by tiling resolution array CGH in COLO320. ................................................................................................................................... 29\u00C2\u00A0 Figure 2.5. Identification of microdeletions. ............................................................................... 30\u00C2\u00A0 Figure 3.1. Overall view of SeeGH data flow .............................................................................. 44\u00C2\u00A0 Figure 3.2. SeeGH \u00E2\u0080\u009CNew Data\u00E2\u0080\u009D window. ..................................................................................... 45\u00C2\u00A0 Figure 3.3. SeeGH \u00E2\u0080\u009CGenomic View\u00E2\u0080\u009D window. ............................................................................. 46\u00C2\u00A0 Figure 3.4. SeeGH \u00E2\u0080\u009CSearch\u00E2\u0080\u009D window. ......................................................................................... 47\u00C2\u00A0 Figure 3.5. SeeGH \u00E2\u0080\u009CChromosome View\u00E2\u0080\u009D window. ...................................................................... 48\u00C2\u00A0 Figure 3.6. SeeGH \u00E2\u0080\u009CExisting Data\u00E2\u0080\u009D window. ............................................................................... 49\u00C2\u00A0 Figure 4.1. Whole genome SMRT aCGH SeeGH karyogram of MCL cell line HBL-2 versus pooled normal male genomic DNA. ............................................................................................ 66\u00C2\u00A0 Figure 4.2. Summary of chromosomal imbalances detected by SMRT aCGH in 8 MCL cell line models. ....................................................................................................................................... 67\u00C2\u00A0 Figure 4.3. Representation of genomic copy number alterations on chromosome arm 9p. ........ 68\u00C2\u00A0 Figure 4.4. Representation of genomic copy number alterations on chromosome arm 12. ........ 69\u00C2\u00A0 Figure 4.5. Locus specific fluorescent in-situ hybridization validation of genetic copy number alterations. .................................................................................................................................. 70\u00C2\u00A0 Figure 5.1. Example of a karyogram from a hybridization experiment in this study. ................... 92\u00C2\u00A0 Figure 5.2. Detection of CNVs. ................................................................................................... 93\u00C2\u00A0 Figure 5.3. Distribution of overlapped CNVs at different recurrence levels. ............................... 94\u00C2\u00A0 Figure 5.4. Overlap of CNVs with segmental duplications (SD). ................................................ 95\u00C2\u00A0 Figure 5.5. Cluster analysis by use of a CEPH pedigree. ........................................................... 96\u00C2\u00A0 Figure 5.6. Distribution of CNV clones. ....................................................................................... 97\u00C2\u00A0 Figure 5.7. Detection of immunoglobulin variations. ................................................................... 98\u00C2\u00A0 Figure 5.8. Inheritance of CNVs at five olfactory receptor loci in 14 members of a CEPH pedigree. ..................................................................................................................................... 99\u00C2\u00A0 Figure 6.1. Survival model based on four copy number alterations; loss of 4q13.3-q21.33, gain of 8q24.21-q24.3, loss of 9p21, and loss of 13q14.2-q14.3. ..................................................... 116\u00C2\u00A0 viii Figure 6.2. CCNI and CCNG2 expression correlation to survival. ............................................ 117\u00C2\u00A0 ix List of Abbreviations Abbreviation Definition aCGH Array comparative genomic hybridization ASCII American standard code for information interchange BAC Bacterial artificial chromosome bp Base pairs CCD Charge coupled device CCND1 Gene encoding Cyclin D1 protein cDNA complimentary DNA CEPH Centre d' Etude du Polymorphisme Humain CpG Cytosine-phosphate-guanine dinecleotide CGH Comparative genomic hybridization CNV Natural copy number variation DNA Deoxyribose nucleic acid EBV Epstein-Barr virus FISH Fluorescence in-situ hybridization FPC Fingerprint contig GEO NCBI gene expression omnibus IPI International prognostic index kb Kilobase pairs LMPCR Linker mediated polymerase chain reaction x MALT Mucosal associated lymphoid tissue MAR Minimally altered region Mbp Megabase pairs MCL Mantle cell lymphoma MIM Medelian inheritance in man (followed by entry number) mRNA messenger RNA NCBI National center for biotechnology information NHL Non-Hodgkin lymphoma OMIM Online Mendelian inheritance in man PAC P1-derived artificial chromosome PCR Polymerase chain reaction RNA Ribonucleic acid SAGE Serial analysis of gene expression SDS Sodium dodecylsulfate SMRT Sub-megabase resolution tiling-set SNP Single nucleotide polymorphism SNR Signal to noise ratio SQL Structured query language SSC Sodium saline citrate UCSC University of California Santa Cruz UPD Uniparental disomy xi WGAC Whole genome assembly comparison WHO World health organization WSSD Whole genome shotgun sequence detection YAC Yeast artificial chromosome xii Acknowledgements I would like to acknowledge my supervisory committee: Drs. Wan Lam (Supervisor), Carolyn Brown, Douglas Horsman, Joseph Connors, and Mladen Korbelik (Chair). In addition, I would like to acknowledge the contributions of the many members of the Wan Lam Lab who contributed to this work, in particular the co-authors of each of the manuscript chapters presented herein. Specific acknowledgements from the published versions of each chapter are detailed below: Chapter 2: The authors would like to thank the following for their contribution to this publication: J. Vielkind, S. Lam, D. Horsman, M. Rosin, S. Herst, K. Lonergan, S. Ralph, J. Rathmann, R. Seagraves, M. Krzywinski, P. Lansdorp, G. Bebb, J. Schein, I. Bosdet, D. Smailus, Z. Xu, C. Brown, J. Minna and A. Gazdar. Chapter 3: The authors would like to thank Spencer Watson, Chad Malloff, and Adrian Ishkanian for their input on features to incorporate into SeeGH. Chapter 4: The authors would like to thank Bradley P. Coe for assistance with aCGH data analysis, Baljit Kamoh and Spencer K. Watson for assistance with FISH analysis, and Ali Turhan and Catherine Tucker for providing cell lines. Chapter 5: The authors would like to thank Media Farshchi and Wendy Peng for computational analysis, Andy Lam and Eric Lee for technical assistance, Sharon Gee for sample collection, the Lam Lab array CGH group for array production, Drs. Carlos Alvarez and Ford Doolittle and members of the Lam Lab for helpful discussions, and especially all sample donors. Chapter 6: The authors would like to thank Spencer Watson and Miwa Suzuki for array construction and Lindsey Kimm for technical assistance. xiii I would also like to acknowledge scholarship support from the National Sciences and Engineering Research Council and Michael Smith Foundation for Health Research. The research presented herein was funded by the following granting agencies: Genome Canada/ Genome British Columbia (All Chapters), The Lymphoma Research Foundation (Chapters 4 and 6), Canadian Institute of Health Research (Chapter 5 and 6), National Institute of Health (Chapter 5), xiv Dedication To my family, who have always supported my educational efforts. xv Co-Authorship Statement Chapters 2 to 6 were co-authored as manuscripts for publication. The following author lists apply for each chapter: Chapter 2: Ishkanian AS, Malloff CA, Watson SK, deLeeuw RJ, Chi B, Coe BP, Snijders A, Albertson DG, Pinkel D, Marra MA, Ling V, MacAulay C, Lam WL. (2004) A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet. 36(3):299-303. Contribution: Before the development of this technology, no whole genome assay for segmental copy number existed. As such, a multicenter effort was conducted between the BC Cancer Research Centre, Canada\u00E2\u0080\u0099s Michael Smith Genome Sciences Centre, and University of California San Fransisco. I was a key contributor to this effort by developing the amplification technique utilized in creating solution to be spotted on the arrays, determining mechanical spotter settings for the most efficient construction of these high density arrays, and meticulously editing and rewriting the final published manuscript. Chapter 3: Chi B, deLeeuw RJ, Coe BP, MacAulay C, Lam WL (2004). SeeGH \u00E2\u0080\u0093 A software tool for visualization of whole genome array comparative genomic hybridization data BMC Bioinformatics 9(5):13 Contribution: Mr. Chi and I immediately recognized that the creation of these high density segmental copy number profiles with tens of thousands of data points required analysis tools that did not exist. We developed this intuitive software tool based around existing cytogenetic visualization techniques. While Mr. Chi was responsible for coding the program, I was responsible for content, algorithms, and features that were required to analyze the massive amounts of data created. Mr. Chi and I had equal responsibility in writing the final published manuscript. Chapter 4: de Leeuw RJ, Davies JD, Rosenwald A, Bebb G, Gascoyne RD, Dyer MJ, Staudt LM, Martinez-Climent JA, Lam WL. (2004) Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes. Hum Mol Genet 13:1827-1837 Contribution: I am the first and corresponding author for this study. No previous study had assayed the genomes of MCL at an adequate resolution to define micro deletions and amplifications. For this first look, I decided to utilize the few cell line models available. These xvi xvii were collected from both local and international collaborators. I conducted all experimental procedures and analyses related to this study. I wrote the final published manuscript with input from Mr. Davies. Chapter 5: Wong KK*, deLeeuw RJ*, Dosanjh NS, Kimm LR, Cheng Z, Horsman DE, MacAualy C, Ng RT, Brown CJ, Eichler EE, Lam WL. (2007) A Comprehensive Analysis of Common Copy-Number Variations in the Human Genome. Am J Hum Genet 80:91-104 Contribution: Dr. Wong and I quickly recognized that the segmental copy number alterations measured by myself and others in the laboratory were not all somatic alterations in the cancer samples. We equally directed a team of local and international collaborators to define the extent and impact of these natural segmental copy number variations in the human genome. While Dr. Wong was responsible for directing the acquisition of sample copy number data, my responsibility was conducting all control experiments and determining appropriate analyses algorithms. This manuscript was equally written and co-authored by myself and Dr. Wong. Chapter 6: deLeeuw RJ, Malloff CA, Johnson NA, Shen Y, Dyer MJ, Connors JM, Gascoyne RD, Chan WC, Horsman DE, Lam WL. (2009) Deletion of 4q21 and low expression of CCNI and CCNG2 result in poor overall survival in mantle cell lymphoma. Haematologica Submitted April 24, 2009 Contribution: I am the first and corresponding author for this study. While the copy number status of MCL genomes have been studied by a variety of platforms at different resolutions, none posses the resolution of our assay. In addition the correlation of these copy number alterations to patient outcome remained extremely poor. To address this I collected a panel of MCL cases that are representative of the disease and had complete clinical data. I was responsible for all experimental procedures, analysis, and interpretation of results. I wrote the final published manuscript with input from Mr. Malloff. 1 Chapter 1: Introduction 1.1 Lymphoma 1.1.1. Definition Lymphomas are clonal tumors of lymphoid origin. These cells can be either immature or mature B-cells, T-cells, or NK cells. One of the main roles of these cells is to fight infections. Infections are combated with two distinct mechanisms; innate and adaptive immunity. Innate immunity is an immediate response that relies on genetically hard coded receptors which have evolved to recognize common infectious agents and disease patterns. Alternatively, the adaptive immune response relies on receptors that have been generated through genetic modification to provide recognition of a diverse number of non-self antigens. Cells from either system can develop into lymphoma. The development of lymphoma from cells of either system is somewhat based on their ability to proliferate at a high rate. During an infection, or chronic inflammation, cells of the immune system highly expand in numbers. This increased proliferation provides the opportunity for genetic errors that lead to cancer. In addition, the genetic modification inherent to adaptive immunity provides further opportunity for lymphomagenesis. 1.1.2. Classification The classification of lymphomas has evolved along with our understanding of both the origins and molecular biology of each type. Currently, the World Health Organization Classification of Tumors of Haematopoietic and Lymphoid Tissues is used throughout the world to provide consistent criteria for the diagnoses of lymphomas (1). A major consideration in the classification of lymphomas is the stage of differentiation of the normal cell counterpart to the clonal tumor. For B-cell lymphomas this breaks into four general groupings: precursor B-cell neoplasms, pre-germinal centre neoplasms, germinal centre neoplasms, and post germinal centre neoplasms. Similarly, T-cell lymphomas fall into precursor and peripheral T-cell 2 lymphomas, while NK-cell lymphomas are grouped with the peripheral T-cell lymphomas. The normal cell counterpart of the clonal tumor is determined in large part by the morphology and immunophenotype of the neoplasm. However, while some cell surface antigens are very characteristic of certain lymphomas, no single antigens are capable of defining a lymphoma (1). As such, multiple antigens are utilized to determine normal cell counterparts. In addition to the normal cell counterpart, molecular features are increasingly being utilized to define lymphomas. Among these features, genetic alterations play an important role. 1.1.3. Etiology The primary events that increase risk of developing lymphoma range from infectious agents, viral and bacterial, to environmental exposure. Viral infections can directly influence lymphoid cells or provide the basis of a chronic immune response. In the case of Epstein-Barr virus, it directly infects lymphoid cells and is involved in the pathogenesis of many B-cell lymphomas, as well as several NK/T-cell lymphomas. It is present in nearly 100% of endemic Burkitt lymphomas and 15-35% of sporadic and HIV related cases (2). Another virus that directly affects lymphoid cells is human herpesvirus-8 (3). Alternately, hepatitis C virus has been implicated in lymphoplasmacytic lymphoma, splenic marginal zone lymphoma, and diffuse large B-cell lymphoma (4, 5). Unlike Epstein-Barr virus and human herpesvirus-8, it does not directly infect B-cells, but appears to influence lymphoma development through a B-cell immune response. Bacterial infections can also play a significant role in the development of lymphomas. A prime example of this is extranodal marginal zone lymphoma of mucosal-associated lymphoid tissue (MALT lymphoma). Bacterial infections have been linked to the development of MALT lymphoma in multiple sites, such as Helicobacter pylori infection in gastric sites (6), Chlamydia psittaci infection in ocular sites (7), Campylobacter jejuni in small intestinal sites (8), and Borrelia burgdorferi in cutaneous sites (9). While these bacteria do not directly infect lymphoid cells, they stimulate a sustained immune response where continued proliferation provides the 3 opportunity for genetic aberrations to occur thus promoting lymphomagenesis. This is clearly demonstrated by the induction of remissions with antibiotic therapy. Environmental exposure also plays a role in the pathogenesis of lymphoma. An example includes an increased risk of non-Hodgkin lymphoma after chlordane containing insecticide treatment for termites (10). In addition to exposure to toxins, certain food substances have been linked to lymphomagenesis. Such is the case for gluten and enteropathy-type T-cell lymphoma. An inappropriate response to gluten in the small intestine creates a chronic inflammatory response that sets the stage for the development of enteropathy-type T-cell lymphoma (11). 1.1.4. Epidemiology The Surveillance, Epidemiology and End Results program in the United States shows an incidence rate of 33.65 per 100,000 individuals for all lymphoid neoplasms, which accounts for approximately 4% of new cancer cases each year (12). Ninety percent of these lymphomas are of B-cell origin. In addition, the incidence of lymphomas, particularly B-cell lymphomas is increasing (13). Diffuse large B-cell lymphoma constitutes the most frequently diagnosed lymphoma at 37%; however, this group is heterogeneous and can further subdivided into sub groups (14, 15). Another frequently diagnosed lymphoma is follicular lymphoma at 29%, although the boundary between these two diseases is somewhat vague as follicular lymphoma can progress to become diffuse large B-cell lymphoma. There are three additional lymphoma classes that exceed 5% of newly diagnosed cases, including mantle cell lymphoma at 6%. Although cancer is generally considered a disease of the elderly, lymphomas are not restricted to the elderly. Different classifications of lymphoma can widely vary in age and gender distributions, such as acute lymphoblastic leukemia, which is primarily a disease of children, with 75% of cases occurring in children under the age of six (16). Other lymphomas show an unequal distribution of gender. One disease that encompasses both of these features is 4 primary mediastinal B-cell lymphoma; which occurs in young adults (median ~35 years old) with a 2:1 female predominance (17, 18). Conversely, mantle cell lymphoma occurs in older individuals (median ~60 years old) with a 2:1 male predominance (19). 1.2 Mantle cell lymphoma 1.2.1. Classification Although mantle cell lymphoma (MCL) was originally classified as a centrocytic lymphoma in the 1970s (20), by the early 1990s the term MCL was proposed as the normal cell counterpart was identified (21). With the identification of frequent t(11;14)(q13;q32) translocations in MCL (22), cyclin D1 over-expression was found to be highly associated with MCL (23). This has now become a defining characteristic of the disease. In the small percentage of cases that lack the t(11;14) and cyclin D1 over expression, translocations and over expression of cyclin D2/D3 have been identified (24). A spectrum of morphologies is now recognized for MCL ranging from monomorphic lymphoid proliferation to blastoid and pleomorphic variants. Immunophenotyping is reasonably consistent with surface IgM/IgD, CD5, FMC-7, and CD43 positivity and CD10 and BCL6 negativity; although aberrant immunophenotypes have been described (25). These aberrant immunophenotypes are usually associated with blastoid/pleomorphic variants (26). 1.2.2. Clinical characteristics MCL accounts for approximately 6% of all lymphomas (1) and a higher proportion of lymphoma deaths (27). Patients are usually diagnosed with clinical stage III or IV disease with lymphadenopathy, hepatosplenomegaly and bone marrow involvement. Peripheral blood involvement is also common. While lymph nodes are the most common site of involvement, extranodal sites are frequently involved; including the gastrointestinal tract and Waldeyer ring (28). 5 Mantle cell lymphoma has a median survival of approximately three years; however, a great deal of heterogeneity exists in overall survival (27). While some patients succumb to the disease in less than a year, others survive for more than 10 years. However, MCL is considered an incurable disease. The greatest influence on survival is the rate of proliferation of the malignant cells (29). This has been correlated to morphology with blastoid/pleomorphic in several studies (30, 31). Additional correlations to this proliferation and overall survival remain inconsistent between studies. 1.2.3. Genetic characteristics As mentioned above, the t(11;14)(q13;q32) translocation is a defining characteristic of MCL. However, studies in mouse models that recapitulate this have shown that although the translocation is required for malignant transformation, it is insufficient alone to drive pathogenesis (32, 33). Therefore, additional genetic alterations have been proposed as essential in MCL pathogenesis. In support of this hypothesis is the identification of copy number alterations throughout the genome (30, 34). Early studies utilized standard cytogenetics to identify large regions of the genome that were either duplicated or deleted in copy number. With the development of comparative genomic hybridization in the early 1990s (35), a more refined picture of genetic alterations was revealed. However, the original technique still had relatively low resolution (~10Mb). This was improved by replacing the metaphase spreads used as a hybridization template with arrays of genomic sequences that represent loci within the genome. Initially, these arrays encompassed hundreds to a couple thousand loci improving resolution from 10Mb to a maximum of 1Mb (36-38). The work presented here includes the development of an assay that improves this resolution by an additional order of magnitude to 0.1Mb resolution. Alterations that had been identified in MCL as of the beginning of the work presented here include gains of 3q, 6p, 7p, 8q, 9q, 12p, 12q, 15q, 16p, 17q, and 18q and losses of 1p, 5p, 6q, 8p, 9p, 10p, 11q, 13q, 14q, and 17p (30, 39-43). However, the presence, location, and 6 frequency of these alterations varied widely between studies. This variation was likely due to the detection characteristics of the assays utilized, as well as the composition of each study cohort. 1.3 Genomic copy number alterations in cancer 1.3.1. Identification of copy number alterations in cancer The knowledge that cancer is a disease involving dynamic changes in the genome has been revealed over the last five decades. The first consistently reported genetic alteration in cancer was the reciprocal t(9;22) translocation associated with chronic myelogenous leukemia (44-46). As additional genetic alterations were catalogued over the following decades, it became clear that there were two distinct oncogenetic mechanisms at play; deletion/mutation of genes that act to inhibit the development of cancer (tumor suppressor genes), and gains/over-expression of genes that promote the development of cancer (oncogenes). 1.3.2. Evolution of techniques to measure copy number alterations in cancer Initial observations of human chromosomes date back as far as 1879 (47), although at the time the term chromosome had not yet been applied. The study of human genetics did not advance much beyond arguments of total chromosome number until the mid 1950s, when new cell culture techniques allowed the arrest of cultured cells in metaphase (48, 49). Within a decade, multiple syndromes were described that have their basis in aneuploidy (50-54). Also in this era there was the discovery of a cancer related genetic alteration, the Philadelphia chromosome (44). The next major advancement in detecting chromosome aberrations came in the early 1970s with the development of Giemsa banding (55, 56). This allowed not only the identification of each chromosome pair, but the visualization of banding patterns within each chromosome. This was important in identifying intra chromosomal deletions, duplications, and inversions. It also improved the resolution of gene mapping, such as the t(9;22) translocation in chronic myelogenous leukemia (46). The next major development in assessing the cancer genome 7 came two decades later with the development of comparative genomic hybridization (CGH) (35). This technique allowed for the detection of copy number alterations without dividing tumor cells, but lacked the ability to detect balanced translocations and inversions. An alternate technique was developed, also using a fluorescent labelling strategy, to detect these translocations; spectral karyotyping (57-59). Another complementary technique for the detection of copy number, inversions and translocations at specific loci utilizes fluorescently labelled probes cloned in cosmid, YAC, PAC, and BAC vectors hybridized to preparations of tumor DNA (60, 61). While this technique (FISH) has a very high resolving power, it lacks the ability to assay more than a few loci at a time. In an attempt to combine the resolving power of FISH with the assessment of the entire genome provided by CGH, array CGH was developed (37, 38). This essentially replaced the normal target metaphase spreads on glass slides with discrete spots derived from specific genomic sequences. The earliest iterations of this technique provided an order of magnitude better resolution than conventional CGH (~1Mb). The resolution of this technique was improved by another order of magnitude (~0.1Mb) with our development of an array utilizing a set of overlapping clones that represent the entire human genome (62). The next advancement in resolution came with the replacement of clonally derived DNA segments for spotting with short synthetic sequences (63-65). Although initially limited to similar resolutions due to sample restrictions, oligonucleotide design, hybridization conditions, and array construction techniques, these oligonucleotide CGH arrays have very recently improved resolution of copy number alteration detection by another order of magnitude (~0.01Mb) (66). Of course the ultimate goal of detecting genetic alterations in cancer is to assay the genome at the single base pair level. This advancement is ongoing; however, it is currently limited by sample throughput and cost considerations (Hayden EC online 6 Feb 09 doi:10.1038/news.2009.86). 8 1.3.3. Natural copy number variation The identification of microscopically visible regions of the genome that varied in size, morphology, and staining properties among apparently healthy individuals was first reported shortly after robust techniques to visualize human chromosomes was developed (67). With the improvement of techniques that measure genetic copy number to resolutions at or better than 0.1Mb, it was discovered that natural variation in the human genome exists at this genomic level as well (68, 69). In fact, it was proposed that up to 12% of the human genome could vary in copy number content (70). This discovery poses both an obstacle and an opportunity in the study of cancer genetics. The obstacle arises because very little is understood about these new found natural copy number variations (CNVs) and what reference should be used as a normal comparator; whereas, the opportunity exists to examine the potential impact of CNVs in cancer susceptibility, development, or progression. Recently, a number of reports have linked various CNVs to different cancers (71, 72). With the knowledge that CNVs are prevalent within the human genome, it is very important to be cautious when interpreting array CGH data. Within this thesis I have endeavoured to identify loci throughout the genome that are CNVs in order to better characterize their contribution to human genetics, as well as control for these in examining copy number alterations in mantle cell lymphoma. 1.3.4. Other genetic alterations in cancer It has become clear recently that genetic alterations other than copy number can play a role in the regulation of both tumor suppressor genes and oncogenes. These include single nucleotide polymorphisms (SNPs), acquired uni-parental disomy (UPD), and epigenetic changes (CpG methylation and histone modifications). SNPs are single base pairs that have been shown to vary between differing alleles of the same gene. Similar to array CGH studies of copy number, SNPs have only recently been comprehensively studied due to emerging array technology (73, 74). While some preliminary reports have shown a trend for certain SNPs to influence lymphomagenesis, further studies to identify functional SNPs are needed (75). Acquired UPD is 9 a phenomenon where stretches of the genome are found to be homozygous (76). The interpretation of this is that these regions have experienced mitotic recombination or gene conversion, if they are small. This may be important in cancer development as certain genes have alleles that promote carcinogenesis; therefore, if these genes are preferentially retained in the gene conversion process, an increased risk of cancer exists. Although acquired UPD has been clearly demonstrated in hematopoetic malignancies, its overall impact has yet to be determined (77-79). Methylation and histone modifications differ in that they do not affect the sequence of DNA but only its accessibility for transcription. The most widely studied type of epigenetic modification is the addition of a methyl group to the cytosine of CpG dinucleotide. These CpG dinucleotides are not distributed equally throughout the genome, but are enriched in clusters known as CpG islands. These islands are often associated with gene regulatory elements. Therefore, when they become methylated, the regulatory element becomes inaccessible to transcription initiation complexes and expression is reduced. Histone modification, including acetylation, methylation, phosphorolation, sumoylation, and ubiquination, is similar to methylation in that it does not change the DNA sequence but affects its conformation, thus affecting the expression of genes. Similar to SNP/UPD studies, epigenetic studies have shown clear differential methylation patterns between lymphomas, but more study is required to determine its importance. 1.4 Thesis theme The theme of this thesis is the determination of genomic alterations in MCL that contribute to the pathogensis of MCL. This is accomplished through the comprehensive detection of copy number alteration in MCL genomes and the correlation of these to disease outcome. 10 1.5. Objectives and hypothesis The objective of this work is to determine copy number alterations that influence MCL pathogenesis. The major objectives of this work are to demonstrate that: 1) Recurrent segmental DNA number changes are seen in MCL. 2) The recurrent changes encompass oncogenes and tumor suppressor genes that are important in MCL pathogenesis. 3) The combination of copy number alterations will show that specific biological pathways are essential in MCL pathogenesis. 1.6. Specific aims and thesis outline This thesis consists of several manuscripts in chronological order to best address the hypothesis of this thesis. Aim 1: Development of tools for the comprehensive determination of copy number alterations in MCL. Chapters 2 and 3 describe the development of tools to determine copy number changes throughout the human genome at an unprecedented level of resolution. Chapter 2 details the construction of a tiling resolution array consisting of 32,433 overlapping BAC clones covering the entire human genome. This array increased our ability to identify genetic copy number alterations in a single comparative genomic hybridization experiment. With the increased resolution of this assay we can identify amplifications containing oncogenes and deletions containing tumor suppressor genes in MCL genomes. At the initiation of this thesis there was no assay for the determination of copy number alterations at a resolution better 11 than 1 Mb. The development of this assay represents an order of magnitude increase in the ability to detect and localize copy number alterations throughout the human genome. Chapter 3 details the development of a software tool to visualize and analyze the large amount of data created while conducting comprehensive copy number alteration assays. It became immediately apparent that the available software tools were insufficient to allow the analysis of tens of thousands of data points per experiment. For this reason, a new software tool was developed to both visualize and aid in the analysis of our copy number alteration assays. Aim 2: Determination of recurrent copy number alterations in MCL model genomes. Chapter 4 details the assessment of copy number changes in eight commonly used MCL cell lines. Examining these genomes identified an average of 35 genetic alterations per cell line with both amplifications and deletions being present. Moreover, alignment of these copy number profiles allowed the determination of 14 recurrent regions of genomic loss and 21 recurrent regions of genomic gain. These findings lead us to believe that a study of clinical MCL samples would yield regions of copy number alteration that could then be correlated to the clinical features of MCL. One phenomenon that became apparent from our investigation of the MCL lines and assay controls was that small regions within the genome varied in copy number in both normal and tumor samples. This lead us to undertake a study to determine a baseline for these natural copy number variations. Aim 3: Identification of natural copy number variation in the human genome. Chapter 5 details the assessment of 105 normal human genomes to determine the natural copy number variation. This was undertaken to control for this natural variation in copy number in our study of MCL genomes. We identified 3,654 autosomal loci that varied in copy number within normal human genomes. Of these, 800 occurred in at least 3% of our samples and provided 12 the basis for our control of natural copy number variation in MCL genomes. During the timeframe of this project, other groups were investigating the same phenomena and published their findings at approximately the same time as ours. Aim 4: Determination of copy number alterations in MCL that correlate to a poor overall survival. Chapter 6 details the investigation of 52 MCL genomes for copy number alterations. The recurrent regions identified were then filtered to remove the natural copy number variations determined in aim 3. The remaining 42 regions were investigated to determine their contribution to the survival of MCL patients. Seven regions were found that correlated to overall survival and a survival model was constructed that used four of these regions. From this survival model, putative candidate genes were determined for each region and novel regions/genes identified. The expression of genes within MCL were investigated and the findings both reinforce the hypothesis that cell cycle deregulation and apoptosis are key factors in MCL pathogenesis and extend it to checkpoints in the cell cycle additional to G1-S. 13 1.7 References 1. Swerdlow S, Campo E, Harris N, Jaffe E, Pileri S, Stein H, et al. (eds). WHO classification of tumours of haematopoietic and lymphoid tissues. International Agency for Research on Cancer: Lyon, France, 2008. 2. Hamilton-Dutoit SJ, Raphael M, Audouin J, Diebold J, Lisse I, Pedersen C, et al. In situ demonstration of Epstein-Barr virus small RNAs (EBER 1) in acquired immunodeficiency syndrome-related lymphomas: correlation with tumor morphology and primary site. Blood 1993 Jul 15; 82(2): 619-624. 3. Cesarman E, Chang Y, Moore PS, Said JW, Knowles DM. Kaposi's sarcoma-associated herpesvirus-like DNA sequences in AIDS-related body-cavity-based lymphomas. N Engl J Med 1995 May 4; 332(18): 1186-1191. 4. Ascoli V, Lo Coco F, Artini M, Levrero M, Martelli M, Negro F. Extranodal lymphomas associated with hepatitis C virus infection. Am J Clin Pathol 1998 May; 109(5): 600-609. 5. de Sanjose S, Benavente Y, Vajdic CM, Engels EA, Morton LM, Bracci PM, et al. Hepatitis C and non-Hodgkin lymphoma among 4784 cases and 6269 controls from the International Lymphoma Epidemiology Consortium. Clin Gastroenterol Hepatol 2008 Apr; 6(4): 451-458. 6. Hussell T, Isaacson PG, Crabtree JE, Spencer J. The response of cells from low-grade B-cell gastric lymphomas of mucosa-associated lymphoid tissue to Helicobacter pylori. Lancet 1993 Sep 4; 342(8871): 571-574. 7. Ferreri AJ, Dognini GP, Campo E, Willemze R, Seymour JF, Bairey O, et al. Variations in clinical presentation, frequency of hemophagocytosis and clinical behavior of intravascular lymphoma diagnosed in different geographical regions. Haematologica 2007 Apr; 92(4): 486- 492. 8. Lecuit M, Abachin E, Martin A, Poyart C, Pochart P, Suarez F, et al. Immunoproliferative small intestinal disease associated with Campylobacter jejuni. N Engl J Med 2004 Jan 15; 350(3): 239-248. 9. Cerroni L, Zochling N, Putz B, Kerl H. Infection by Borrelia burgdorferi and cutaneous B- cell lymphoma. J Cutan Pathol 1997 Sep; 24(8): 457-461. 10. Colt JS, Davis S, Severson RK, Lynch CF, Cozen W, Camann D, et al. Residential insecticide use and risk of non-Hodgkin's lymphoma. Cancer Epidemiol Biomarkers Prev 2006 Feb; 15(2): 251-257. 11. Van Overbeke L, Ectors N, Tack J. What is the role of celiac disease in enteropathy-type intestinal lymphoma? A retrospective study of nine cases. Acta Gastroenterol Belg 2005 Oct- Dec; 68(4): 419-423. 12. Morton LM, Wang SS, Devesa SS, Hartge P, Weisenburger DD, Linet MS. Lymphoma incidence patterns by WHO subtype in the United States, 1992-2001. Blood 2006 Jan 1; 107(1): 265-276. 14 13. Stewart B, Kleihues P (eds). World Cancer Report. International Agency for Research of Cancer: Lyon, France, 2003. 14. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000 Feb 3; 403(6769): 503-511. 15. Hans CP, Weisenburger DD, Greiner TC, Gascoyne RD, Delabie J, Ott G, et al. Confirmation of the molecular classification of diffuse large B-cell lymphoma by immunohistochemistry using a tissue microarray. Blood 2004 Jan 1; 103(1): 275-282. 16. Redaelli A, Laskin BL, Stephens JM, Botteman MF, Pashos CL. A systematic literature review of the clinical and epidemiological burden of acute lymphoblastic leukaemia (ALL). Eur J Cancer Care (Engl) 2005 Mar; 14(1): 53-62. 17. Cazals-Hatem D, Lepage E, Brice P, Ferrant A, d'Agay MF, Baumelou E, et al. Primary mediastinal large B-cell lymphoma. A clinicopathologic study of 141 cases compared with 916 nonmediastinal large B-cell lymphomas, a GELA (\"Groupe d'Etude des Lymphomes de l'Adulte\") study. Am J Surg Pathol 1996 Jul; 20(7): 877-888. 18. Savage KJ, Al-Rajhi N, Voss N, Paltiel C, Klasa R, Gascoyne RD, et al. Favorable outcome of primary mediastinal large B-cell lymphoma in a single institution: the British Columbia experience. Ann Oncol 2006 Jan; 17(1): 123-130. 19. Argatoff LH, Connors JM, Klasa RJ, Horsman DE, Gascoyne RD. Mantle cell lymphoma: a clinicopathologic study of 80 cases. Blood 1997 Mar 15; 89(6): 2067-2078. 20. Lennert K. Morphology and classification of malignant lymphomas and so-called reticuloses. Acta Neuropathol Suppl 1975; Suppl 6: 1-16. 21. Banks PM, Chan J, Cleary ML, Delsol G, De Wolf-Peeters C, Gatter K, et al. Mantle cell lymphoma. A proposal for unification of morphologic, immunologic, and molecular data. Am J Surg Pathol 1992 Jul; 16(7): 637-640. 22. Williams ME, Westermann CD, Swerdlow SH. Genotypic characterization of centrocytic lymphoma: frequent rearrangement of the chromosome 11 bcl-1 locus. Blood 1990 Oct 1; 76(7): 1387-1391. 23. Williams ME, Nichols GE, Swerdlow SH, Stoler MH. In situ hybridization detection of cyclin D1 mRNA in centrocytic/mantle cell lymphoma. Ann Oncol 1995 Mar; 6(3): 297-299. 24. Gesk S, Klapper W, Martin-Subero JI, Nagel I, Harder L, Fu K, et al. A chromosomal translocation in cyclin D1-negative/cyclin D2-positive mantle cell lymphoma fuses the CCND2 gene to the IGK locus. Blood 2006 Aug 1; 108(3): 1109-1110. 25. Campo E, Raffeld M, Jaffe ES. Mantle-cell lymphoma. Semin Hematol 1999 Apr; 36(2): 115-127. 26. Morice WG, Hodnefield JM, Kurtin PJ, Hanson CA. An unusual case of leukemic mantle cell lymphoma with a blastoid component showing loss of CD5 and aberrant expression of CD10. Am J Clin Pathol 2004 Jul; 122(1): 122-127. 15 27. Swerdlow SH, Williams ME. From centrocytic to mantle cell lymphoma: a clinicopathologic and molecular review of 3 decades. Hum Pathol 2002 Jan; 33(1): 7-20. 28. Salar A, Juanpere N, Bellosillo B, Domingo-Domenech E, Espinet B, Seoane A, et al. Gastrointestinal involvement in mantle cell lymphoma: a prospective clinic, endoscopic, and pathologic study. Am J Surg Pathol 2006 Oct; 30(10): 1274-1280. 29. Rosenwald A, Wright G, Wiestner A, Chan WC, Connors JM, Campo E, et al. The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell 2003 Feb; 3(2): 185-197. 30. Bea S, Ribas M, Hernandez JM, Bosch F, Pinyol M, Hernandez L, et al. Increased number of chromosomal imbalances and high-level DNA amplifications in mantle cell lymphoma are associated with blastoid variants. Blood 1999 Jun 15; 93(12): 4365-4374. 31. Bosch F, Lopez-Guillermo A, Campo E, Ribera JM, Conde E, Piris MA, et al. Mantle cell lymphoma: presenting features, response to therapy, and prognostic factors. Cancer 1998 Feb 1; 82(3): 567-575. 32. Bodrug SE, Warner BJ, Bath ML, Lindeman GJ, Harris AW, Adams JM. Cyclin D1 transgene impedes lymphocyte maturation and collaborates in lymphomagenesis with the myc gene. Embo J 1994 May 1; 13(9): 2124-2130. 33. Lovec H, Grzeschiczek A, Kowalski MB, Moroy T. Cyclin D1/bcl-1 cooperates with myc genes in the generation of B-cell lymphoma in transgenic mice. Embo J 1994 Aug 1; 13(15): 3487-3495. 34. Seto M, Yamamoto K, Iida S, Akao Y, Utsumi KR, Kubonishi I, et al. Gene rearrangement and overexpression of PRAD1 in lymphoid malignancy with t(11;14)(q13;q32) translocation. Oncogene 1992 Jul; 7(7): 1401-1406. 35. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, et al. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 1992 Oct 30; 258(5083): 818-821. 36. Kohlhammer H, Schwaenen C, Wessendorf S, Holzmann K, Kestler HA, Kienle D, et al. Genomic DNA-chip hybridization in t(11;14)-positive mantle cell lymphomas shows a high frequency of aberrations and allows a refined characterization of consensus regions. Blood 2004 Aug 1; 104(3): 795-801. 37. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet 1998 Oct; 20(2): 207-211. 38. Solinas-Toldo S, Lampel S, Stilgenbauer S, Nickolenko J, Benner A, Dohner H, et al. Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes Chromosomes Cancer 1997 Dec; 20(4): 399-407. 39. Allen JE, Hough RE, Goepel JR, Bottomley S, Wilson GA, Alcock HE, et al. Identification of novel regions of amplification and deletion within mantle cell lymphoma DNA by comparative genomic hybridization. Br J Haematol 2002 Feb; 116(2): 291-298. 16 40. Au WY, Gascoyne RD, Viswanatha DS, Connors JM, Klasa RJ, Horsman DE. Cytogenetic analysis in mantle cell lymphoma: a review of 214 cases. Leuk Lymphoma 2002 Apr; 43(4): 783-791. 41. Martinez-Climent JA, Vizcarra E, Sanchez D, Blesa D, Marugan I, Benet I, et al. Loss of a novel tumor suppressor gene locus at chromosome 8p is associated with leukemic mantle cell lymphoma. Blood 2001 Dec 1; 98(12): 3479-3482. 42. Monni O, Oinonen R, Elonen E, Franssila K, Teerenhovi L, Joensuu H, et al. Gain of 3q and deletion of 11q22 are frequent aberrations in mantle cell lymphoma. Genes Chromosomes Cancer 1998 Apr; 21(4): 298-307. 43. Wlodarska I, Pittaluga S, Hagemeijer A, De Wolf-Peeters C, Van Den Berghe H. Secondary chromosome changes in mantle cell lymphoma. Haematologica 1999 Jul; 84(7): 594-599. 44. Nowell PC, Hungerford DA. Chromosome studies on normal and leukemic human leukocytes. J Natl Cancer Inst 1960 Jul; 25: 85-109. 45. Rowley JD. Chromosomal patterns in myelocytic leukemia. N Engl J Med 1973 Jul 26; 289(4): 220-221. 46. Rowley JD. Letter: A new consistent chromosomal abnormality in chronic myelogenous leukaemia identified by quinacrine fluorescence and Giemsa staining. Nature 1973 Jun 1; 243(5405): 290-293. 47. Arnold T. Beobachtungen uber Kentheilungen in den Zellen der Geschwulse. Virchow's Arch 1879; 78: 279-301. 48. Hsu TC. Tissue culture studies on human skin. III. Some cytological fractures of the outgrowth of epithelial cells. Tex Rep Biol Med 1952; 10(2): 336-352. 49. Makino S, Nishimura I. Water-pretreatment squash technic; a new and simple practical method for the chromosome study of animals. Stain Technol 1952 Jan; 27(1): 1-7. 50. Edwards JH, Harnden DG, Cameron AH, Crosse VM, Wolff OH. A new trisomic syndrome. Lancet 1960 Apr 9; 1(7128): 787-790. 51. Ford CE, Jones KW, Polani PE, De Almeida JC, Briggs JH. A sex-chromosome anomaly in a case of gonadal dysgenesis (Turner's syndrome). Lancet 1959 Apr 4; 1(7075): 711-713. 52. Jacobs PA, Strong JA. A case of human intersexuality having a possible XXY sex- determining mechanism. Nature 1959 Jan 31; 183(4657): 302-303. 53. Lejeune J, Gautier M, Turpin R. [Study of somatic chromosomes from 9 mongoloid children.]. C R Hebd Seances Acad Sci 1959 Mar 16; 248(11): 1721-1722. 54. Patau K, Smith DW, Therman E, Inhorn SL, Wagner HP. Multiple congenital anomaly caused by an extra autosome. Lancet 1960 Apr 9; 1(7128): 790-793. 55. Seabright M. A rapid banding technique for human chromosomes. Lancet 1971 Oct 30; 2(7731): 971-972. 17 56. Sumner AT, Evans HJ, Buckland RA. New technique for distinguishing between human chromosomes. Nat New Biol 1971 Jul 7; 232(27): 31-32. 57. Pinkel D, Landegent J, Collins C, Fuscoe J, Segraves R, Lucas J, et al. Fluorescence in situ hybridization with human chromosome-specific libraries: detection of trisomy 21 and translocations of chromosome 4. Proc Natl Acad Sci U S A 1988 Dec; 85(23): 9138-9142. 58. Schrock E, du Manoir S, Veldman T, Schoell B, Wienberg J, Ferguson-Smith MA, et al. Multicolor spectral karyotyping of human chromosomes. Science 1996 Jul 26; 273(5274): 494- 497. 59. Telenius H, Pelmear AH, Tunnacliffe A, Carter NP, Behmel A, Ferguson-Smith MA, et al. Cytogenetic analysis by chromosome painting using DOP-PCR amplified flow-sorted chromosomes. Genes Chromosomes Cancer 1992 Apr; 4(3): 257-263. 60. Dauwerse JG, Kievits T, Beverstock GC, van der Keur D, Smit E, Wessels HW, et al. Rapid detection of chromosome 16 inversion in acute nonlymphocytic leukemia, subtype M4: regional localization of the breakpoint in 16p. Cytogenet Cell Genet 1990; 53(2-3): 126-128. 61. Jaju RJ, Boultwood J, Oliver FJ, Kostrzewa M, Fidler C, Parker N, et al. Molecular cytogenetic delineation of the critical deleted region in the 5q- syndrome. Genes Chromosomes Cancer 1998 Jul; 22(3): 251-256. 62. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, et al. A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet 2004 Mar; 36(3): 299-303. 63. Barrett MT, Scheffer A, Ben-Dor A, Sampas N, Lipson D, Kincaid R, et al. Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. Proc Natl Acad Sci U S A 2004 Dec 21; 101(51): 17765-17770. 64. Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, et al. High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res 2004 Feb; 14(2): 287-295. 65. Zhao X, Li C, Paez JG, Chin K, Janne PA, Chen TH, et al. An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res 2004 May 1; 64(9): 3060-3071. 66. Greshock J, Feng B, Nogueira C, Ivanova E, Perna I, Nathanson K, et al. A comparison of DNA copy number profiling platforms. Cancer Res 2007 Nov 1; 67(21): 10173-10180. 67. de la Chapelle A, Schroder J, Stenstrand K, Fellman J, Herva R, Saarni M, et al. Pericentric inversions of human chromosomes 9 and 10. Am J Hum Genet 1974 Nov; 26(6): 746-766. 68. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, et al. Detection of large-scale variation in the human genome. Nat Genet 2004 Sep; 36(9): 949-951. 69. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, et al. Large-scale copy number polymorphism in the human genome. Science 2004 Jul 23; 305(5683): 525-528. 18 70. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al. Global variation in copy number in the human genome. Nature 2006 Nov 23; 444(7118): 444-454. 71. Liu W, Sun J, Li G, Zhu Y, Zhang S, Kim ST, et al. Association of a germ-line copy number variation at 2p24.3 and risk for aggressive prostate cancer. Cancer Res 2009 Mar 15; 69(6): 2176-2179. 72. Tchatchou S, Burwinkel B. Chromosome copy number variation and breast cancer risk. Cytogenet Genome Res 2008; 123(1-4): 183-187. 73. Dong S, Wang E, Hsie L, Cao Y, Chen X, Gingeras TR. Flexible use of high-density oligonucleotide arrays for single-nucleotide polymorphism discovery and validation. Genome Res 2001 Aug; 11(8): 1418-1424. 74. Shen R, Fan JB, Campbell D, Chang W, Chen J, Doucet D, et al. High-throughput SNP genotyping on universal bead arrays. Mutat Res 2005 Jun 3; 573(1-2): 70-82. 75. Morton LM, Purdue MP, Zheng T, Wang SS, Armstrong B, Zhang Y, et al. Risk of non- Hodgkin lymphoma associated with germline variation in genes that regulate the cell cycle, apoptosis, and lymphocyte development. Cancer Epidemiol Biomarkers Prev 2009 Apr; 18(4): 1259-1270. 76. Teh MT, Blaydon D, Chaplin T, Foot NJ, Skoulakis S, Raghavan M, et al. Genomewide single nucleotide polymorphism microarray mapping in basal cell carcinomas unveils uniparental disomy as a key somatic event. Cancer Res 2005 Oct 1; 65(19): 8597-8603. 77. Fitzgibbon J, Smith LL, Raghavan M, Smith ML, Debernardi S, Skoulakis S, et al. Association between acquired uniparental disomy and homozygous gene mutation in acute myeloid leukemias. Cancer Res 2005 Oct 15; 65(20): 9152-9154. 78. Flotho C, Kratz CP, Bergstrasser E, Hasle H, Stary J, Trebo M, et al. Genotype- phenotype correlation in cases of juvenile myelomonocytic leukemia with clonal RAS mutations. Blood 2008 Jan 15; 111(2): 966-967; author reply 967-968. 79. Pfeifer D, Pantic M, Skatulla I, Rawluk J, Kreutz C, Martens UM, et al. Genome-wide analysis of DNA copy number changes and LOH in CLL using high-density SNP arrays. Blood 2007 Feb 1; 109(3): 1202-1210. 19 Chapter 2: A tiling resolution DNA microarray with complete coverage of the human genome* * A version of this chapter has been published: Adrian S Ishkanian, Chad A Malloff, Spencer K Watson, deLeeuw Ronald J, Chi Bryan, Coe Bradley P, Snijders Antoine, Albertson Donna G, Pinkel Daniel, Marra Marco A, Ling Victor, MacAulay Calum, and Lam Wan L. (2004) A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet. 36(3):299-303. 20 2.1 Introduction Identification of chromosomal imbalances and variation in DNA copy number is essential to our understanding of disease mechanisms and pathogenesis. Array CGH (1) or matrix CGH (2) offers the highest resolution for a practical genome-wide detection of chromosomal alterations. This technique is derived from the concept of conventional CGH (3), which has contributed greatly to the molecular characterization of both somatic and constitutional genomic DNA mutations over the last decade (4-6). The primary limitation of conventional CGH is in resolution (~20 Mb) as this method detects segmental copy number changes on metaphase chromosomes (3). In array CGH, the metaphase chromosome spread is replaced by BACs, PACs, or YACs containing human DNA as targets, increasing the resolution to the distance between the selected marker DNA clones (1, 2). Genome screening using array CGH has great potential in the characterization of numerous chromosomal disorders. Efforts to construct DNA arrays spanning the human genome consisted of spotting 2460 (7) or 3500 (8) marker BAC clones, representing the sequenced genome at an average interval of approximately 1 Mb. These studies showed that sufficient target-DNA printing solution could be generated from individual BACs using PCR based protocols. Since the target product was PCR derived, it is easily replenishable, obviating the need for multiple rounds of laborious large scale BAC DNA preparations. These arrays are sensitive enough for detection of single copy changes, but the technique is limited by the small number of BAC markers representing the genome on the slide, rather than the methodology. Even at this resolution, array CGH proves to be useful for detecting chromosomal aberrations associated with congenital abnormalities and somatic malignancies (9-12). Recent studies have focused on higher-density regional arrays for fine mapping and identifying new genes in specific chromosomal regions(13-18). For example, a candidate oncogene for association with breast cancer (CYP24) was identified on 20q13.2 using an array of 29 overlapping clones spanning this region (13). The need for a tiling resolution array to map these 21 amplification or deletion boundaries is indicated by the fact that two separate regions of amplification within 20q13.2 contained two separate putative oncogenes, which would not have been detected by a lower resolution array. These studies show that the resolving power of array CGH is maximized when the detection of single copy number changes is combined with a tiling or overlapping set of BAC clones. We created the first tiling resolution BAC array with complete coverage of the human genome using 32,433 fingerprint-verified individually amplified BAC clones. Here we show that such a complete genome comparison is capable of identifying micro-amplifications and micro-deletions, which may contain genes involved in disease pathogenesis. We call this Sub-Megabase Resolution Tiling-set for array CGH (SMRT array). 2.2 Results 2.2.1 Array sensitivity To assess the sensitivity of the SMRT array, we hybridized the well characterized EBV- transformed lymphoma cell line TAT-1 to normal male genomic DNA. Genomic regions containing BCL2 (18q21) and MYC (8q24) in TAT-1 were previously shown to have a twofold copy-number increase by FISH analysis (19). We detected these previously reported amplifications at both loci, and we delineated their boundaries (Figure 2.1). Boundaries of amplification on chromosome 8 were between BAC clone RP11-143H8 at 8q22.2 and RP11- 263C20 at 8q24.13. Boundaries of amplification on chromosome 18 were between BAC clone RP11-159K14 at 18q21.32 and RP11-565D23 at 18q23. These data illustrate the detection sensitivity of array CGH. 2.2.2 Array resolution compared to conventional CGH To demonstrate the resolving power of the SMRT array, we compared the log2 ratio profile of lung cancer cell line H526 (20, 21) (Figure 2.2a) to the previously published conventional chromosomal CGH data (http://amba.charite.de/~ksch/cghdatabase/index.htm). All patterns of 22 gains and losses were matched, including large changes (e.g. the amplification of 7q and 8q and loss of the entire chromosome 10), as well as complex changes (e.g. the multiple amplifications on chromosome 1 and the multiple deletions on chromosome 4). Notably, conventional chromosomal CGH identified a highly amplified region on the telomeric end of chromosome arm 2p, apparently covering approximately one fourth of the whole chromosome. However, the SMRT array analysis showed this amplification to be precisely localized to a 1.3 Mb fragment at 2p24.3, bordered by BAC clones RP11-351F4 and RP11-701O10, which contains the MYCN oncogene. The resolving power of this whole genome array enables us to define breakpoints to within single BAC clones. For example, the deletion breakpoint on chromosome arm 3p was localized to between BAC clones RP11-632O5 and RP11-594F16 at 3p21.1 (Figure 2.2b). This finding was subsequently confirmed by FISH analysis (Figure 2.2c). 2.2.3 Comparison to previous array CGH To compare our tiling resolution array against current array CGH technology, we profiled colorectal cancer cell line COLO320 (ref. 22) which has been characterized in two previous array CGH studies (7, 22). We confirmed the amplification at 8q24 MYC region identified by these studies. Furthermore, the SMRT system further defined this segmental copy number increase precisely to a 1.9 Mb region bordered by BAC clones RP11-810D23 and RP11-294P7 (Figure 2.3). A detailed analysis of our COLO320 profile identified new microamplifications on chromosome arms 13q, 15q, 16p, and 22q (Supplemental Figure 1) which were not detected by the two prior high resolution CGH studies (7, 22). For example, we identified a 300 Kb micro-amplification at 13q12.2 containing only three genes (according to UCSC Genome Browser April 2003 Freeze): caudal type homeobox transcription factor 2 (CDX2), insulin promoter factor 1 (IPF-1) and GS homeobox 1 (GSH1) (Figure 2.4a). CDX2 is a transcription factor expressed in the intestine and altered in colorectal cancers (23). FISH analysis verified this microamplification and showed that it was within a homogeneously staining region (Figure 2.4b). These findings 23 illustrate the usefulness of a tiling resolution BAC array for comprehensive assessment of genomic integrity. 2.2.4 Identification of minute regions of alteration In addition to micro-amplifications, we also detected small deletions in a number of tumor cell lines. For example, we detected a 1.25 Mb deletion containing the gene CDKN2A (also called p16) in lymphoma cell line Z138C at 9p21.3 (Figure 2.5a). Deletion of CDKN2A occurs in approximately one-half of mantle cell lymphoma tumors as detected by FISH (24). This deletion is bordered by RP11-328C2 and RP11-275H17 (Figure 2.5a). Sub-megabase size micro- deletions can be accurately mapped in a single whole genome array CGH experiment. This is made possible by the overlapping clone coverage and their distribution on the array. A notable example is a 240 Kb deletion at 7q22.3 in the breast cancer cell line BT474, containing PRKAR2B, a regulatory kinase, and HBP1, a G1 inhibitory kinase regulated by p38 MAP kinase (25) (Figure 2.5b). Such micro-deletions have not been reported previously. The mechanism(s) by which such deletions are effected are not known. Whether this micro-deletion affects the expression of PRKAR2B or the neighbouring gene, PIK3CG, remains to be determined. The two experiments described here show how small, previously unidentified alterations that have the potential to contribute to disease may easily be identified in a single SMRT array experiment. 2.3 Discussion Array CGH is a proven method for accurate, robust and rapid genome-wide assessment of DNA copy number variation. Current users of array CGH technology consider BAC DNA markers positioned at 1\u00E2\u0080\u00932 Mb intervals to be \u00E2\u0080\u009Chigh-resolution\u00E2\u0080\u009D coverage. This view has been perpetuated by conventional whole genome analysis tools, such as microsatellite marker analysis of loss of heterozygosity, in which small interspaced \u00E2\u0080\u009Csequence tagged sites\u00E2\u0080\u009D are assayed for genomic imbalance, and the genomic integrity between these sites must be 24 inferred. In contrast, tiling resolution array CGH has the potential to identify minute genomic changes. In this study, we constructed a Sub-Megabase Resolution Tiling-set for array CGH (SMRT array), comprising 32,433 overlapping BAC clones covering the entire human genome. This tiling resolution, combined with the proven sensitivity of array CGH, makes the technique ideal for identifying new genes and will prove useful for unraveling the genetic basis of numerous diseases. 2.4 Methods 2.4.1 BAC clone selection, preparation and validation Selection and the map position of the 32,433 clones has been described previously and is available at The Children\u00E2\u0080\u0099s Hospital Oakland Research Institute (http://bacpac.chori.org/genomicRearrays.php). We validated clone identity by comparing HindIII fingerprints to the FPC BAC fingerprint database (26) (http://genome.wustl.edu/projects/human/index.php?fpc=1). These clones provide ~1.5 fold coverage of the human genome, giving an approximate resolution of 80 Kb (i.e., 2/3 of an average BAC clone). 2.4.2 Array production from BAC DNA We prepared the DNA samples to be spotted on the array by PCR using linkers (primer sequences available upon request). The protocol for linker mediated PCR was previously described (27). We precipitated the PCR products with ethanol and redissolved in an MSP printing solution (Telechem), denatured them by boiling and re-arrayed them for robotic printing in triplicate using a VersArray ChipWriter Pro (BioRad). This arrayer uses a 12 x 4 array of SMP2.5 Stealth Micro Spotting Pins (Telechem/ArrayIT) depositing DNA spots of 0.8 nl at ~ 1 \u00CE\u00BCg/\u00CE\u00BCl at 133 micrometer distances. We spotted the entire set of 32,433 solutions in triplicate onto two aldehyde-coated slides. Limited numbers of SMRT arrays are available on a cost recovery basis. 25 2.4.3 DNA labelling and hybridization We labeled 400 ng of test and reference DNA separately using Cyanine-3 and Cyanine-5 dCTPs according to a random priming protocol previously described (15). Before hybridization, we combined the DNA probes and purified them using ProbeQuant Sephadex G-50 Columns (Amersham) to remove unincorporated nucleotides. We then added 200 \u00CE\u00BCg human Cot-1 DNA (Invitrogen), precipitated the mixture and re-suspended in 100 \u00CE\u00BCl DIG Easy hybridization solution (Roche) containing sheared herring sperm DNA (Sigma-Aldrich) and yeast tRNA (Calbiochem). The probe was denatured at 85oC for 10 minutes and repetitive sequences were blocked at 45oC for 1 hr before hybridization. We carried out prehybridization in the same buffer. We applied the probe mixture to the slide surface, fixed the coverslips and incubated them at 42oC for 36 hours. We washed the arrays five times for 5 min each in 0.1X saline sodium citrate, 0.1% SDS at room temperature with agitation. We then rinsed each array repeatedly in 0.1X saline sodium citrate and dried by centrifugation. 2.4.4 Array imaging and analysis We imaged hybridized slides using a CCD based imaging system (Arrayworx eAuto, Applied Precision) and analyzed with SoftWoRx Tracker Spot Analysis software. We averaged the ratios of the triplicate spots and calculated standard deviations (SD). All spots with SDs >0.075 or signal to noise ratios <20 were removed from the analysis. We used custom viewing software (SeeGH) to visualize all data as Log2 ratio plots where each dot represents one BAC. This software is available upon request. Reference male versus reference female hybridization detected no unexpected gains or losses and random variability of log2 ratios are not observed (Supplementary Figure 2). Furthermore, owing to overlapping clone coverage, a single clone with aberrant signal ratio would not be considered an amplification or deletion. Finally, since the clones are not spotted in the order of their map position, adjacent clones are distributed throughout our array. Figure 2.1. Detection of two-fold copy number changes in TAT-1 lymphoma cell line on chromosome arms 8q and 18q. (a) Chromosome view of 8q showing MYC amplification between BAC clones RP11-143H8 and RP11-263C20. (b) Chromosome view of 18q showing BCL2 amplification between BAC clones RP11-159K14 and RP11-565D23. Vertical green and red lines are scale bars indicating log2 ratios of +0.5 and \u00E2\u0080\u0093 0.5, respectively. 26 Figure 2.2. Whole genome SMRT array CGH of lung cancer cell line H526. (a) whole genome view of H526 versus reference male DNA. (b) Amplified view of deletion breakpoint at 3p21.1 between BAC clones RP11-632O5 and RP11-594F16 also seen in (a). Vertical green and red lines are scale bars indicating log2 ratios of +0.5 and \u00E2\u0080\u0093 0.5, respectively. (c) FISH confirmation of breakpoint in (b) showing single copy loss of BAC clone RP11-594F16 (green) and normal copy number of BAC clone RP11-632O05 (red). 27 Figure 2.3. Amplification of chromosome 8q24.12\u00E2\u0080\u0093.13 in colorectal cancer cell line COLO320. This 1.9 Mb amplification containing MYC is bounded by BAC clones RP11- 810D23 and RP11-294P7. Vertical green and red lines are scale bars indicating log2 ratios of +0.5 and \u00E2\u0080\u0093 0.5, respectively. 28 Figure 2.4. Identification of a novel microamplification by tiling resolution array CGH in COLO320. (a) 300 Kb micro-amplification on chromosome 13q12.2 containing genes GSH1, CDX2, AND IPF-1 and bounded by BAC clones RP11-153M24 and RP11-152N3. Vertical green and red lines are scale bars indicating log2 ratios of +0.5 and \u00E2\u0080\u0093 0.5, respectively. (b) High copy number amplification of RP11-153M24 detected by FISH hybridization. Amplification is located within a homologously staining region. 29 Figure 2.5. Identification of microdeletions. (a) Identification of a 1.25 Mb deletion at 9p21.3 in a mantle cell lymphoma cell line containing CDKN2A bounded by BAC clones RP11-328C2 and RP11-275H17. (b) 240 Kb deletion at 7q22.3 in breast cancer cell line BT474 containing PRKAR2B and HBP1 bounded by BAC clones RP11-258L19 and RP11-262G16. Vertical green and red lines are scale bars indicating log2 ratios of +0.5 and \u00E2\u0080\u0093 0.5, respectively. 30 31 2.5 References 1. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet 1998 Oct; 20(2): 207-211. 2. Solinas-Toldo S, Lampel S, Stilgenbauer S, Nickolenko J, Benner A, Dohner H, et al. Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes Chromosomes Cancer 1997 Dec; 20(4): 399-407. 3. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, et al. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 1992 Oct 30; 258(5083): 818-821. 4. Forozan F, Karhu R, Kononen J, Kallioniemi A, Kallioniemi OP. Genome screening by comparative genomic hybridization. Trends Genet 1997 Oct; 13(10): 405-409. 5. Knuutila S, Autio K, Aalto Y. Online access to CGH data of DNA sequence copy number changes. Am J Pathol 2000 Aug; 157(2): 689. 6. Wells D, Levy B. Cytogenetics in reproductive medicine: the contribution of comparative genomic hybridization (CGH). Bioessays 2003 Mar; 25(3): 289-300. 7. Snijders AM, Nowak N, Segraves R, Blackwood S, Brown N, Conroy J, et al. Assembly of microarrays for genome-wide measurement of DNA copy number. Nat Genet 2001 Nov; 29(3): 263-264. 8. Fiegler H, Carr P, Douglas EJ, Burford DC, Hunt S, Scott CE, et al. DNA microarrays for comparative genomic hybridization based on DOP-PCR amplification of BAC and PAC clones. Genes Chromosomes Cancer 2003 Apr; 36(4): 361-374. 9. Kraus J, Pantel K, Pinkel D, Albertson DG, Speicher MR. High-resolution genomic profiling of occult micrometastatic tumor cells. Genes Chromosomes Cancer 2003 Feb; 36(2): 159-166. 10. Veltman JA, Fridlyand J, Pejavar S, Olshen AB, Korkola JE, DeVries S, et al. Array- based comparative genomic hybridization for genome-wide screening of DNA copy number in bladder tumors. Cancer Res 2003 Jun 1; 63(11): 2872-2880. 11. Veltman JA, Schoenmakers EF, Eussen BH, Janssen I, Merkx G, van Cleef B, et al. High-throughput analysis of subtelomeric chromosome rearrangements by use of array- based comparative genomic hybridization. Am J Hum Genet 2002 May; 70(5): 1269- 1276. 12. Weiss MM, Kuipers EJ, Postma C, Snijders AM, Siccama I, Pinkel D, et al. Genomic profiling of gastric cancer predicts lymph node status and survival. Oncogene 2003 Mar 27; 22(12): 1872-1879. 13. Albertson DG, Ylstra B, Segraves R, Collins C, Dairkee SH, Kowbel D, et al. Quantitative mapping of amplicon structure by array CGH identifies CYP24 as a candidate oncogene. Nat Genet 2000 Jun; 25(2): 144-146. 32 14. Bruder CE, Hirvela C, Tapia-Paez I, Fransson I, Segraves R, Hamilton G, et al. High resolution deletion analysis of constitutional DNA from neurofibromatosis type 2 (NF2) patients using microarray-CGH. Hum Mol Genet 2001 Feb 1; 10(3): 271-282. 15. Garnis C, Baldwin C, Zhang L, Rosin MP, Lam WL. Use of complete coverage array comparative genomic hybridization to define copy number alterations on chromosome 3p in oral squamous cell carcinomas. Cancer Res 2003 Dec 15; 63(24): 8582-8585. 16. Garnis C, Campbell J, Zhang L, Rosin MP, Lam WL. OCGR array: an oral cancer genomic regional array for comparative genomic hybridization analysis. Oral Oncol 2004 May; 40(5): 511-519. 17. Garnis C, Coe BP, Ishkanian A, Zhang L, Rosin MP, Lam WL. Novel regions of amplification on 8q distinct from the MYC locus and frequently altered in oral dysplasia and cancer. Genes Chromosomes Cancer 2004 Jan; 39(1): 93-98. 18. Wilhelm M, Veltman JA, Olshen AB, Jain AN, Moore DH, Presti JC, Jr., et al. Array- based comparative genomic hybridization for the differential diagnosis of renal cell cancer. Cancer Res 2002 Feb 15; 62(4): 957-960. 19. Denyssevych T, Lestou VS, Knesevich S, Robichaud M, Salski C, Tan R, et al. Establishment and comprehensive analysis of a new human transformed follicular lymphoma B cell line, Tat-1. Leukemia 2002 Feb; 16(2): 276-283. 20. Girard L, Zochbauer-Muller S, Virmani AK, Gazdar AF, Minna JD. Genome-wide allelotyping of lung cancer identifies new regions of allelic loss, differences between small cell lung cancer and non-small cell lung cancer, and loci clustering. Cancer Res 2000 Sep 1; 60(17): 4894-4906. 21. Levin NA, Brzoska P, Gupta N, Minna JD, Gray JW, Christman MF. Identification of frequent novel genetic alterations in small cell lung carcinoma. Cancer Res 1994 Oct 1; 54(19): 5086-5091. 22. Wessendorf S, Fritz B, Wrobel G, Nessling M, Lampel S, Goettel D, et al. Automated screening for genomic imbalances using matrix-based comparative genomic hybridization. Lab Invest 2002 Jan; 82(1): 47-60. 23. Kim S, Domon-Dell C, Wang Q, Chung DH, Di Cristofano A, Pandolfi PP, et al. PTEN and TNF-alpha regulation of the intestinal-specific Cdx-2 homeobox gene through a PI3K, PKB/Akt, and NF-kappaB-dependent pathway. Gastroenterology 2002 Oct; 123(4): 1163-1178. 24. Dreyling MH, Bullinger L, Ott G, Stilgenbauer S, Muller-Hermelink HK, Bentz M, et al. Alterations of the cyclin D1/p16-pRB pathway in mantle cell lymphoma. Cancer Res 1997 Oct 15; 57(20): 4608-4614. 25. Xiu M, Kim J, Sampson E, Huang CY, Davis RJ, Paulson KE, et al. The transcriptional repressor HBP1 is a target of the p38 mitogen-activated protein kinase pathway in cell cycle regulation. Mol Cell Biol 2003 Dec; 23(23): 8890-8901. 26. McPherson JD, Marra M, Hillier L, Waterston RH, Chinwalla A, Wallis J, et al. A physical map of the human genome. Nature 2001 Feb 15; 409(6822): 934-941. 33 27. Watson SK, deLeeuw RJ, Ishkanian AS, Malloff CA, Lam WL. Methods for high throughput validation of amplified fragment pools of BAC DNA for constructing high resolution CGH arrays. BMC Genomics 2004 Jan 14; 5(1): 6. 34 Chapter 3: SeeGH \u00E2\u0080\u0093 A software tool for visualization of whole genome array comparative genomic hybridization data\u00E2\u0080\u00A0 \u00E2\u0080\u00A0 A version of this chapter has been published: Chi Bryan, deLeeuw Ronald J, Coe Bradley P, MacAulay Calum, and Lam Wan L (2004). SeeGH \u00E2\u0080\u0093 A software tool for visualization of whole genome array comparative genomic hybridization data BMC Bioinformatics 9(5):13 35 3.1 Introduction Metaphase comparative genomic hybridization (CGH) is a molecular cytogenetic technique used to detect segmental DNA copy number differences between two samples of DNA (1). This is accomplished by a competitive hybridization of two differentially labelled samples to normal metaphase chromosomes, allowing the detection of single copy number changes at a resolution of 10-20Mb(1). Array CGH improves on the resolution of copy number profiling by utilizing discrete genomic loci spotted onto glass microscope slides as opposed to metaphase chromosomes as the hybridization target (2). In array CGH the resolution in detecting segmental copy number changes is limited only by the distance between and size of the genomic DNA segments spotted on the array. With the completion of the human and mouse genome sequence (3) it is possible to construct arrays consisting of a tiling set of DNA segments spanning the entire genome. Currently this approach allows the screening of tens of thousands of genomic segments for copy number alterations in a single experiment. After co- hybridization of differentially labelled DNA samples to an array, two high resolution fluorescence images, one for each labelled probe, are generated. Signal ratios for each clone which act as a proxy for copy number are obtained from these images using one of the many available array analysis software packages. However, map visualization of tens of thousands of spot data points is a daunting task. Many groups simply use Microsoft Excel to display individual plots of each region, however the failure of Excel to display multiple plots in an interactive fashion as well as the limitation to 65535 rows of data limits its functionality in high resolution aCGH analysis. Here we present a visualization tool called SeeGH that translates spot signal ratio data from array CGH experiments to displays of high resolution, segmentally annotated chromosome profiles resembling a conventional CGH karyotype diagram facilitating the detection of genetic alterations. 36 3.2 Software environment and information sources SeeGH was created using Borland\u00E2\u0080\u0099s C++Builder6 development platform and programmed using the language C++. Structured Query Language (SQL) was embedded in the C++ code to make queries to the backend database, MySQL version 4.0 (http://mysql.com/downloas/mysql- 4.0.html). MySQL was chosen as the database server since it is publicly available and capable of handling large data files with high data throughput. The software was developed on Microsoft Windows 2000 (service pack 2) and tested for compatibility with Windows XP. Therefore, SeeGH should function on any Windows based machine running Windows 2000 or later operating system. Human physical map information used in the example presented here was obtained from the April 2003 assembly on the UCSC Genome Browser Gateway website (http://genome.ucsc.edu). The SeeGH software, source code, and documentation are publicly available upon request (http://bccrc.ca/cg/ArrayCGH_Group.html). We demonstrate the use of SeeGH by viewing array CGH data obtained by co-hybridizing tumor cell line DNA, labelled with cyanine-5, and normal male DNA, labelled with cyanine-3, to an array constructed from a Human \u00E2\u0080\u009C32k\u00E2\u0080\u009D BAC Re-array Clone set (http://bccrc.ca/cg/ArrayCGH_Group.html). This array contains 32,433 BAC clone derived DNA segments spotted in triplicate on two microarray slides. To facilitate explanations of data processing, in our description below we will follow a single BAC clone (RP11-6J2) from array production to final display in SeeGH. Amplified DNA product from the BAC RP11-6J2 was spotted in triplicate from well D06 of a 384 well plate in the same manner as the remaining 32,433 BAC clones which make up the array. Experimental details for the construction and use of our 32,433 loci CGH array are described elsewhere (4). Briefly, array CGH is based on homologous sequences from each probe competitively hybridizing to the three spots representing a single clone. Post hybridization, two high resolution 16 bit TIFF images, one derived from each fluorescently labeled probe, were obtained using an Arrayworx eAuto CCD 37 based scanner (Applied Precision Instruments). These two images were then transferred to SoftWorx Tracker analysis software (Applied Precision Instruments) and paired for spot segmentation and feature extraction. Spot annotation information (e.g. signal ratio, and signal to noise ratio) for each image pair were then exported to a tab delimited text file. RP11-6J2 is represented in this output file by three unique rows describing the features for each of the triplicate spots. 3.3 Results and discussion: 3.3.1 Overview of data flow Data from the tab-delimited output file are filtered to remove unnecessary information output by the SoftWorx Tracker software before converting replicate spot data into single data records of standard deviations and averaged spot ratios. These filtered records and experiment identifiers are then filed in the database. One of two routes are utilized for displaying information from the database, direct for further annotation and via a data converter as positional and ratio data. In addition, chromosome specific information such as base pair position of each chromosome band is routed through the data converter for presentation (Figure 3.1) 3.3.2 Input requirements To accommodate output from various scanner/analyzer software packages, the only input requirement of SeeGH is a tab delimited text file with the following six fields for each array spot: a unique identifier, the base pair starting position of the clone on the chromosome, chromosome number, channel 1 signal to noise ratio (Ch1 SNR), channel 2 signal to noise ratio (Ch2 SNR), and log2 spot ratio (Figure 3.2 buttons 1-6). Two additional fields, clone name and accession number may contain further text information (Figure 3.2 buttons 7-8). Additional fields of miscellaneous data may be included in the tab delimited text file as the user is required to enter the total number of columns and the specific column number for each of the required data fields 38 (Figure 3.2 buttons 1-9). For example, the text file exported from SoftWorx Tracker contains a total of 72 fields for each spot imaged from the array. Input files can be located and opened by using the Browse button or by manually entering their file path (Figure 3.2 button 12). Because array CGH experiments contain replicate spots to ensure high confidence in spot ratios SeeGH was designed with the capability of accepting up to five replicate spots (Figure 3.2 button 10). Replicate spot ratio records are identified by their use of a common unique identifier and these spots are averaged and their standard deviations calculated. In a mantle cell lymphoma versus normal male hybridization, our example clone RP11-6J2 demonstrated triplicate spot ratios of -0.02690442, 0.009741764, and 0.04698608 respectively. Averaging these spots resulted in an average spot ratio of 0.0099414 and a standard deviation of 0.0369457. If replicate spots have been previously averaged then SeeGH requires that the \u00E2\u0080\u0098Number of Replicates\u00E2\u0080\u0099 field should be set to one and the spot standard deviations must be included in the records of the input file (Figure 3.2 buttons 10,11). SeeGH also requires the user to enter a basic description for each data file. The required fields are bar code/unique identifier, disease type, experimenter, and date (Figure 3.2 buttons 13-16). Additional information may be entered into the \u00E2\u0080\u009CComments\u00E2\u0080\u009D field but is not required (Figure 3.2 button 17). 3.3.3 Data filtering and storage Once all the required information has been entered, pressing the \u00E2\u0080\u0098Load File\u00E2\u0080\u0099 button will create a record in the \u00E2\u0080\u0098Existing Data\u00E2\u0080\u0099 table containing the five file description fields (BarCode, Disease_Type, Date, Experimenter, and Comments). The BarCode field is used as a key to generate 25 new tables which consist of a filtered input data table and one table per uniquely identified chromosome (for human material 1-22, X and Y). For our example experiment BarCode 10300047 points to these 25 new tables and the information for all three replicates of RP11-6J2 are located in the filtered input data table. The calculated average ratio and standard deviation as well as the lowest signal to noise ratio (SNR) for the three spots for each channel 39 are placed into the appropriate chromosome table along with the required annotation information reducing the three replicate records to a single chromosome record. For example, the data for RP11-6J2 from our experiment, which is a clone derived from chromosome 6, would be stored in chromosome table 10300047_chr6. 3.3.4 Data presentation 3.3.4.1 Genomic view The Genomic View window appears automatically after new data have been loaded into the database (Figure 3.3). The Genomic View consists of 24 tiles (one for each unique chromosome) each measuring 100 by 150 pixels with the origin pixel position (0, 0) at the bottom left corner for each tile. In order to graphically plot chromosomes and spot ratios, SeeGH takes the base pair information for each chromosome and spot ratio, converts them to pixel position coordinates, and draws the image of each chromosome and spot ratio into a tile using the pixel position coordinates. The chromosomal information used to draw the chromosomes is contained in 49 text files. For each chromosome arm there is a corresponding file that contains band names and base pair positions. The p and q arms of the 22 autosomes and 2 sex chromosomes are represented in a total of 48 files. The 49th file contains information about total chromosome lengths and individual arm lengths for each chromosome. In the example presented in this paper we used information from the UCSC April 2003 assembly to create these files. These files are included with the software and can be updated with new chromosomal mapping information as it becomes available. Using this information, the total base pair length of each chromosome arm is converted into pixel position y-coordinates using a base pair to pixel conversion formula (pixel position y-coordinate = base pair position / 1,700,000). This same formula is used to calculate each chromosome band\u00E2\u0080\u0099s start and end pixel position y-coordinate from the 48 band information files. Chromosomes are drawn in the Genomic View with the x-coordinate starting at pixel 10 and having a width of 20 pixels. 40 The base pair start information for spot ratios is retrieved from the 24 chromosome tables created in the database for each experiment and converted into pixel position y-coordinates using the same formula. The x-coordinate for each spot ratio is calculated using a similar pixel conversion formula (pixel position of x-coordinate = X_Axis + spot ratio * One_Ratio). One_Ratio is given a default value of 10 pixels and X_Axis is set to a constant of 50. Therefore the y and x co-ordinates of our example clone (RP11-6J2) are 68, 60 (y-coordinate = 115712602 / 1700000, x-coordinate = 60 + 0.00994114 * 10). Chromosomes and corresponding spot ratios are plotted on each tile using the calculated x and y coordinates. The 24 resulting tiles are displayed in the Genomic View as an 8 by 3 grid (Figure 3.3 button 1). The Genomic View allows manipulation of several display parameters: ratio lines, ratio width, standard deviation filters, and signal to noise filters. Ratio lines can be displayed at +/- 0.5, 1.0, 1.5 and 2.0, with a default display of +/- 1.0 (Figure 3.3 buttons 2-5). Ratio width can be increased or decreased by inputting a numerical modifier that expands or contracts the x-coordinates of the spot ratios relative to the X_Axis (pixel position of x-coordinate = X_Axis + spot ratio * (One_Ratio + modifier)) (Figure 3.3 button 6). Another feature available in SeeGH is the ability to display only those spots that meet user defined criteria. These criteria include a standard deviation cut-off and/or a minimum signal to noise ratio for either Ch1 SNR or Ch2 SNR (Figure 3.3 buttons 7-9). The 8 by 3 tiled image can be saved as a bitmap which can be viewed or printed using any image viewing software (Figure 3.3 button 10). While in the Genomic View, the user can also search for a specific spot based on unique identifier, clone name, or accession number. An example search is shown in Figure 3.3: button 11 and Figure 3.4: buttons 1-2. Once located, the appropriate Chromosome View is automatically opened with a line through the chromosome image at the appropriate spot loci and the spot is highlighted. A Chromosome View can also be opened without the need for 41 inputting a search term by selecting a chromosome with the left mouse button and choosing a magnification from the pop-up menu (Figure 3.3 button 12). 3.3.4.2 Chromosome view The Chromosome View displays the selected chromosome tile as a 649 by 673 pixel image with a zoom factor incorporated into the base pair to pixel conversion formula (pixel position y- coordinate = base pair position * zoom factor / 1,700,000) which increases or decreases the total pixel length for the chromosome image. The x-coordinates for displaying the chromosome now start at pixel 100 and have a width of 40 pixels. The x-coordinates for spot ratios are calculated using the same formula (X_Axis + spot ratio * Ratio_One) with Ratio_One equal to 50 pixels and X_Axis set to a constant of 375. For our demonstration clone the coordinates become 272,375 in the tile. In the Chromosome View, the user is given many of the same features available in the Genomic View: hiding spots based on standard deviation criteria or signal to noise ratios, changing ratio widths of the spot image, adding or deleting ratio lines of 0.5, 1.0, 1.5 and 2.0, and saving the image as a bitmap (Figure 3.5 buttons 1-5). However, the Chromosome View provides many additional features that are unavailable in the Genomic View: the display of standard deviations for replicate spots, flagging of high standard deviations, mouse-over activated spot information, continuous zoom, the ability to scroll along the chromosome, display UCSC regional information, and clear search results (Figure 3.5 buttons 6-12). Spot standard deviations, are displayed as a line through each spot and can be turned on or off simply by checking or un-checking a box in the Chromosome View (Figure 3.5 & 3.6). In addition, standard deviation lines which exceed a user defined value (Figure 3.5 button 7) can be flagged in red. One key feature added in the Chromosomal View is the \u00E2\u0080\u0098mouse-over\u00E2\u0080\u0099 functionality which displays specific spot information when the mouse cursor is positioned over a spot. The spot information displayed consists of the clone name, accession number, unique id, base pair starting position, ratio, standard deviation, and signal to noise ratio for both channel 42 1, and channel 2 (Figure 3.5 button 8). The zoom feature in Chromosome View functions the same as in the Genomic View, and can be accessed multiple times for limitless magnification (Figure 3.5 button 9). The Chromosome View can be scrolled up or down at a rate set by the user (Figure 3.5 button 10). UCSC base pair positions are given for the displayed image (Figure 3.5 button 11). The final feature clears the highlighted results of the Search function (Figure 3.5 button 12). 3.3.4.3 Accessing previously entered data The Existing Data window contains a list of all the files that have been loaded into the program (Figure 3.6 buttons 1-3). The displayed list can be limited by searching for data sets with specific search criteria (Figure 3.6 buttons 1-2). Alternately, the list can be ordered by selecting a field from the drop down menu and performing a search function without entering any search criteria. A data set can be selected by highlighting a row in the list of existing data (Figure 3.6 button 3). Once selected, the data set can either be viewed or deleted (Figure 3.6 buttons 4-5). Deleting a data set removes all tables from the database, whereas, viewing opens a Genomic View for that data. 3.4 Conclusions We have developed an array CGH data viewing tool which improves upon conventional viewing methods by displaying data in dynamically explorable conventional karyotype diagrams. This holistic genome view allows the user to easily recognize patterns in a genome wide data set while quickly identifying the chromosome bands implicated, a feature lacking in Excel based approaches which display data as linear plots which are not directly correlated to chromosomal regions. In SeeGH, a user has the ability to quickly access data point information such as clone name, NCBI sequence accession number, and base pair starting position which allows for precise localization of genetic alteration boundaries. In addition, a user can easily filter data for quality assurance by removing data points which do not meet signal to noise or standard deviation criteria. 43 SeeGH is simple to set up, requiring only MySQL version 4.0 and runs under Microsoft Windows 2000 or later operating systems. The open design of SeeGH allows easy for specific needs and future plans to include the incorporation of features for multiple experiment comparisons. 3.5 Availability and requirements Project Name: SeeGH Project Homepage: http://www.bccrc.ca/ArrayCGH Operating System: Microsoft Windows 2000 or later Programming Language: C++, SQL Other Requirements: MySQL database License: Academic Software License Any Restrictions to use by non-academics: Yes Figure 3.1. Overall view of SeeGH data flow: The user inputs data formatted as a tab delimited text file. The relevant data are then extracted from the text file via a filtering algorithm and replicate ratios and features are averaged before being stored in an SQL database. Ratio data are displayed via a data converter which converts ratio data to x, y plot coordinates, whereas annotation information is read directly from the SQL database. 44 Figure 3.2. SeeGH \u00E2\u0080\u009CNew Data\u00E2\u0080\u009D window. Buttons correspond to descriptions in text. 45 Figure 3.3. SeeGH \u00E2\u0080\u009CGenomic View\u00E2\u0080\u009D window. Reconstructed whole genomic array CGH profile from 97,299 array elements. Mantle cell lymphoma DNA (labeled with Cy5) was competitively hybridized with normal male (labeled with Cy3) to an array of 32,433 DNA segments spotted in triplicate (97,299 elements). The information from the 97,299 elements was imported into SeeGH and is displayed. Buttons correspond to descriptions in the text. 46 Figure 3.4. SeeGH \u00E2\u0080\u009CSearch\u00E2\u0080\u009D window. Buttons correspond to descriptions in the text. 47 Figure 3.5. SeeGH \u00E2\u0080\u009CChromosome View\u00E2\u0080\u009D window. 1,972 DNA segments are displayed for chromosome 6. The red line through the chromosome denotes the location of the search DNA segment which is highlighted. Horizontal lines through each data point represent standard deviations of the triplicate elements. Buttons correspond to descriptions in the text. 48 Figure 3.6. SeeGH \u00E2\u0080\u009CExisting Data\u00E2\u0080\u009D window. Buttons correspond to descriptions in the text. 49 50 3.6 References 1. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, et al. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 1992 Oct 30; 258(5083): 818-821. 2. Snijders AM, Nowak N, Segraves R, Blackwood S, Brown N, Conroy J, et al. Assembly of microarrays for genome-wide measurement of DNA copy number. Nat Genet 2001 Nov; 29(3): 263-264. 3. Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, et al. Initial sequencing and comparative analysis of the mouse genome. Nature 2002 Dec 5; 420(6915): 520-562. 4. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, et al. A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet 2004 Mar; 36(3): 299-303. 51 Chapter 4: Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes\u00E2\u0080\u00A1 \u00E2\u0080\u00A1 A version of this chapter has been published: deLeeuw Ronald J, Davies Jonathan D, Rosenwald Andreas, Bebb Gwynn, Gascoyne Randy D, Dyer Martin JS, Staudt Louis M, Martinez-Climent Jose A, and Lam Wan L. (2004) Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes. Hum Mol Genet 13:1827-1837 52 4.1 Introduction Mantle cell lymphoma (MCL) is an aggressive malignant lymphoid neoplasm with a poor prognosis. MCL comprises 6% of all non-Hodgkin's lymphomas (NHL) and has a median survival of approximately 3 years with few long term survivors (1, 2). MCL was originally identified as a morphologically distinct subtype of NHL and subsequently the t(11;14)(q13;q32) was defined as a characteristic molecular feature of this subtype (3). This translocation places the CCND1 (cyclin D1) gene adjacent to the active immunoglobulin heavy chain (IGH) enhancer region. Normally, cyclin D1 expression is transient; however, under the control of the IgH enhancer region, cyclin D1 is constitutively expressed, causing cell cycle dysregulation (4). Importantly, cyclin D1 is not expressed in normal B-cells. Although the t(11;14) is found in virtually all cases of MCL, experimental evidence in transgenic mice suggests that this event alone may not be sufficient to result in lymphoma (5, 6). Moreover, secondary genomic alterations are frequently detected concomitantly with the t(11;14) (7). Au et al. compiled a review of 214 cases of MCL and found the most common secondary genomic alterations included +3/3q (12%), +12/12q (8%), -13/13q (19%), -6q (19%), -9/9p (13%), -1p (11%), -11q (11%), -Y (10%), and -17p (8%) (8). In addition, leukemic and blastoid variants of MCL have been characterized to determine secondary genomic alterations that can act as diagnostic or prognostic markers (9-11). With few exceptions, these common secondary alterations encompass large regions encoding many candidate genes, complicating identification of markers and potential therapeutic targets. Many of these regions of genomic alteration were identified using comparative genomic hybridization (CGH) which detects single copy number differences between two samples of DNA (12). This is accomplished by a competitive hybridization of differentially labelled sample and reference DNA to normal metaphase chromosomes, allowing the detection of regional copy number changes at a resolution of 5-10 Mb (13). A more recent adaptation, array CGH, 53 improves on the resolution of DNA copy number profiling by utilizing DNA segments representing discrete genomic loci spotted onto glass slides as the hybridization target. For example, arrays of 2460, 3500, and 4100 marker BAC clones distributed at ~1 Mb intervals have been constructed (14-16). Even though these arrays represent <10-15% of the genome, they have proven to be extremely useful for detecting chromosomal aberrations associated with congenital abnormalities and somatic malignancies (17-20). Recent studies have chosen to use overlapping coverage to determine alteration boundaries within regions of interest (21-24). A recent study assayed 53 MCL patient samples using an 812 segment matrix-CGH array designed for clinical classification of MCL (25). These studies show that the power of array CGH is maximized when the ability to detect single copy number changes is combined with the use of a tiling or overlapping set of BAC clones. With the development of our array of 32,433 overlapping BAC clone-derived segments spanning the genome it is now possible to comprehensively determine segmental copy number alterations throughout the entire human genome in a single experiment while simultaneously fine mapping alteration breakpoints (26). In this study we use this new comprehensive segmental copy number profiling technology, called Sub Megabase Resolution Tiling-set (SMRT) array CGH, to identify secondary genomic alterations concomitant with the t(11;14) by characterizing eight commonly used cell models of MCL (Granta-519, HBL-2, NCEB-1, Rec-1, SP49, UPN-1, Z138C, and JVM-2). Characterization of these cell lines is necessary to determine if they adequately represent MCL. In addition, characterization of these cell lines may provide insights into MCL biology. Here we describe the whole genome analysis and the identification of novel genetic alterations in these cell models. This constitutes the first disease specific study using this new technology. 4.2 Results Copy number profiles of the eight MCL cell lines were created by co-hybridizing differentially labelled sample DNA with reference male DNA on the SMRT array that contains 32,433 BAC- 54 derived amplified fragment pools spotted in triplicate (26). The analysis of 97,299 data points for each of the eight samples (778,392 in total) facilitated the localization of chromosomal breakpoints to within single BAC clones and the subsequent identification of genomic imbalances between breakpoints. For example, alterations in MCL cell line HBL-2 included numerous gross chromosomal losses (-3p21.31-p21.1, -6q22.32-qter, -9p21.2-pter, -13q14.13- q21.32, -17p11.2-pter, -18q11.2-q21.2, -18q22.2-qter) and gains (+1q32.2-q42.2, +2q36.1-qter, +3q13.32-q22.2, +7, +8q21.12-q22.3, +8q23.1-q24.21, +9q22.33-qter, +11p11.2-pter, +11q13.3-q14.2, +13q31.2-q34, +16q, +18q21.2-q22.2). Additionally, using this comprehensive genomic assay we were able to detect localized chromosomal alterations affecting small numbers of overlapping clones (-1p36.22-p36.21, -2p11.2, -3p22.2, -3q13.31, -4p15.1-p14, - 6p25.2, -6p24.3-p24.2, -6p22.3, -13q21, -15q15.1-q15.2, -15q23, -16p11, -22q11.22, +1q21.1, +1q42.3, +2p11.1, +2q33.1, +2q33.2-q33.3, +2q34, +2q35, +4p14, +4p11, +10p11.1, +10q11.21, +10qter, +11q22.3, +12q14.2, +21p11) (Figure 4.1; aCGH smooth data in Suppl. Fig A). Comparable numbers of chromosomal alterations were observed in all MCL lines assayed excluding JVM-2, resulting in 250 genomic imbalances (mean per tumor line, 35.6; range, 21- 57) including 125 genomic gains (mean per tumor line, 17.9) and 125 genomic losses (mean per tumor line, 17.7). JVM-2 contained only seven genomic imbalances including five gains and two losses. The high number of genomic imbalances observed in seven of the eight MCL cell lines is likely due to the comprehensive genomic coverage of array CGH using the SMRT array. Identification of novel minute alterations combined with the subdivision of previously defined large alterations into smaller regions may account for the increased numbers of genomic imbalances detected. Because highly recurrent genetic imbalances may be indicative of important regions involved in pathogenesis, copy number profiles were aligned to reveal patterns of recurrent alteration (summarized in Figure 4.2). Only alterations recurring in at least three of the eight lines were used to define minimally altered regions (MARs) (Table 4.1). Using 55 this definition we can confidently define the boundaries of our regions because the clones near boundaries have shown that they can respond to copy number changes in at least two alternate lines. Of the resulting 35 MARs, we observed a range of sizes from 130 Kb to 40 Mb with nine (26%) at less than 1 Mb in size. These regions could not have been defined by conventional chromosomal or array-based CGH. Of the 35 MARs observed in this study, 22 (63 %) recurred in three samples, ten (29 %) recurred in four samples, and three (9 %) recurred in five or more samples. We first investigated whether the identified MARs included the chromosomal alterations commonly reported in MCL. The most common MAR was the loss of chromosome arm 9p. HBL-2, Rec-1, and UPN-1 demonstrated loss from 9p21.2-pter, whereas Granta-519, SP49, and Z138C showed multiple smaller regions of deletion and loss. After alignment of the profiles, a 1.24 Mb MAR of loss was identified in six of the eight samples. Notably, in SP49 and Z138C the localized deletion and loss (1.67 Mb and 1.24 Mb, respectively) encompassed CDKN2A (p16INK4A) and CDKN2B (p15INK4B), inhibitors of cyclin dependent kinase 4. Both CDKN2A and CDKN2B are reported to be both deleted and under expressed in MCL (3, 27, 28). Three additional MARs of loss were identified at 9p23, 9p23-24.1, and 9p24.3-pter with sizes of 640 Kb, 3.31 Mb, and 480 Kb, respectively (Figure 4.3). Trisomy 12 is a common cytogenetic event in MCL, although almost never seen as a sole cytogenetic alteration. Surprisingly, examination of 1488 DNA segments representing chromosome 12 revealed multiple distinct regions of gain on both the long and short arm as opposed to trisomy 12. The multiple regions observed in Granta-519, HBL-2, NCEB-1, Rec-1, Z138C, and JVM-2 included six MARs of gain at 12p11.21-p12.3, 12q13.13-q13.2, 12q24.21- qter, 12q14.2, 12q13.2-14.1, 12q15-21.2 (Figure 4.4). Two genes known to be over expressed in MCL, CD63 (melanoma 1 antigen) and CDK4 (cyclin dependent kinase 4), reside within the 3.41 Mb MAR of gain at 12q13.2-14.1 (29, 30). Curiously, RARG (retinoic acid receptor 56 gamma), presently reported to be under expressed in MCL, resides within a gained 2.97 Mb MAR at 12q13.13-q13.2 (30). Loss of chromosome 1 material was also observed at p36.11 and p21.1-p31.1. The 1p36.11 MAR was detected in Granta-519, SP49, and Z138C with Z138C defining the boundaries. This 1.18 Mb MAR of loss contains the FGR (Gardner-Rasheed feline sarcoma viral oncogene homolog) gene. The expression level of FGR in Granta-519 and Z138C according to the Lymphochip was four fold lower than that of NCEB-1. This is consistent with the fact that NCEB-1 showed genomic gain in this region as opposed to the reduced copy number in Granta- 519 and Z138C. The 1p21.1-p31.1 MAR of loss was detected in Granta-519, Rec-1, and SP49 with Rec-1 defining the boundaries. This 35.25 Mb region contains 111 Refseq annotated genes, of which, four genes are reportedly differentially expressed in MCL, BCL10 (B-cell CLL/lymphoma 10), CNN3 (calponin 3, acidic), GBP1 (guanylate binding protein 1), and TGFBR3 (transforming growth factor-beta type III), with under expression of BCL10, GBP1, and TGFBR3 correlating well with our observation of the loss of this region (29-31). The short arm of chromosome 17 is commonly deleted in many tumor types including MCL. In our panel we observed loss of the entire p arm in Granta-519, HBL-2, and UPN-1. In addition, we observed the loss of a 1.14 Mb telomeric region in Z138C outside the TP53 region (Table 4.1). Alignment of Granta-519, HBL-2, NCEB-1, and Rec-1 defined a 760 Kb MAR of loss at 13q14.3. Loss of chromosome 13q material is commonly reported in MCL. Consistent with previous reports, this MAR did not include the tumor suppressor gene RB1 (retinoblastoma), but was 1.2 Mb distal to RB1 (3, 7). Granta-519 and NCEB-1 showed 1.26 Mb and 2.55 Mb regions of loss, respectively. The 1.26 Mb deletion in Granta-519 was subsequently confirmed by locus specific FISH analysis (Figure 4.5a). This MAR at 13q14.3 coincided with the deleted regions described in chronic lymphocytic leukemia (CLL) and recently in MCL (25, 32). Several candidate tumor suppressor genes including DLEU1 (deleted in lymphocytic leukemia 1), 57 DLEU2 (deleted in lymphocytic leukemia 2), RFP2 (ret finger protein 2), and C13orf1(chronic lymphocytic leukemia deletion region gene 6) reside within the 760 Kb gene rich region. The expression level of DLEU2 revealed that Granta-519, HBL-2, and NCEB-1 had an average 3 fold lower expression than Z138C according to Lymphochip. This correlates with the reduced copy number of this region in the three cell lines as compared to Z138C. Duplication of 7p has been reported in MCL, particularly in blastoid variants (9). Detailed examination of 7p using the SMRT array revealed 2 distinct MARs of gain. The most recurrent of these was a 5.85 Mb region between 7p22.1-pter. This region contains 37 Refseq annotated genes, of which MAD1L1 (mitotic arrest deficient-like 1) is reported to be differentially expressed in MCL (30). HBL-2, Rec-1, SP49, and Z138C demonstrated gain of this region with HBL-2 defining the MAR. The other gained MAR on 7p was large, spanning 40.4 Mb, containing 171 Refseq annotated genes, including NMB (neuromedin B) reported to be over expressed in MCL (29). A recent study by Bentz and coworkers delineated a 2.4 Mb consensus region of deletion at 8p21 (25). We observed a 730 Kb MAR of loss at 8p21.2-p21.3 containing three putative tumor suppressor genes, RHOBTB2 (Rho-related BTB domain containing 2), TNFRSF10B (tumor necrosis factor receptor superfamily, member 10B), and DBC-1 (deleted in breast cancer 1). RHOBTB2 and TNFRSF10B overlap with the previously described consensus region, however, unlike their consensus region TNFRSF10C and TNFRSF10D were excluded in our MAR. A similar deleted region was previously described in a breast cancer study (33). Next we investigated the genomic loci of genes known to be differentially expressed in MCL. Reported over expression of the anti-apoptotic dominant oncogene BCL2 in MCL coincides with a 7.26 Mb amplified MAR between 18q21.33-q22.1 (30). Lymphochip data showed a similar result. The average BCL2 expression in Granta-519, HBL-2, NCEB-1, and Z138C is 4 fold higher than that of the Rec-1 sample. This correlates with increased copy numbers of BCL2 in Granta-519, HBL-2, and Z138C as compared to normal copy number in Rec-1. However, 58 NCEB-1 also showed high level BCL2 expression. Since there is no increase in copy number at 18q21.33 in NCEB-1 this eludes to a different mechanism of over expression. The well characterized MYC oncogene which is reported to be over expressed in MCL (30) coincided with a 5.56 Mb MAR of gain in HBL-2, NCEB-1, Rec-1, and UPN-1. The boundaries of this MAR were defined by the overlapping regions of gain in HBL-2 and Rec-1. Another MAR of gain is 2.50 Mb in size located at 11p15.5-pter. It contains genes IFITM2 (interferon-induced transmembrane protein 2), H-RAS (Harvey rat sarcoma viral oncogene homolog), and IFITM1(interferon-induced transmembrane protein 1), known to be differentially expressed in MCL (29, 30). Curiously, all three of these genes were reported to be under expressed in the region we describe as gained. However, this region is densely packed with Refseq annotated genes (59 genes) and the importance of other genes in this region remains to be determined. REL (avian reticuloendotheliosis viral oncogene homolog) is another over expressed gene in MCL (29) and lies within a 1.07 Mb gained MAR between 2p15 and 2p16.1 recurrent in Rec-1, SP49, UPN-1, and JVM-2. The MAR was defined by the amplified region in Rec-1. The known genes and previously reported regions of alteration in MCL accounted for only 22 of the 34 (63 %) identified MARs. We next examined the remaining 13 MARs, in order of increasing size, for genes described in the context of cancer but not previously implicated in MCL (bolded in Table 4.1). Two sub-megabase recurrent losses were observed in seven and six of the cell lines, respectively. As these cell lines are derived from mature B-cells, one would expect the presence of deletions resulting from the rearrangement of immunoglobulin (Ig) genes during B-cell maturation (34). However, conventional chromosomal or array CGH have not detected these localized alterations. Strikingly, examination of both the kappa and lambda Ig light chain loci revealed recurrent deletions. A 130 Kb MAR at 2p11.2 coinciding precisely within the kappa Ig light chain locus was observed in Granta-519, HBL-2, NCEB-1, Rec-1, SP49, UPN-1, and JVM-2. Similarly, a 390 Kb MAR at 22q11.22 coincided with the lambda Ig light chain locus and was observed in Granta-519, HBL-2, NCEB-1, SP49, and Z138C. The 59 deletion at the lambda Ig light chain region was verified by locus specific FISH in cell line Granta-519 (Figure 4.5b). Probe hybridization was confirmed in normal metaphases (data not shown). A 430 Kb region of recurrent loss on chromosome 8 was observed at p23.3 containing only three genes; ZNF596 (zinc finger protein 596), FBXO25 (F-box only protein 25), and INM01 (hypothetical protein). However, none of these genes have described in the context of cancer. The fifth, seventh, and eighth smallest novel MARs are all gains on chromosome 2 at q37.2 (870 Kb), q37.1 (1.66 Mb), and q37.3 (2.42 Mb) and contain 2, 14, and 23 genes, respectively. The gained region at 2q37.3 contains two genes that have been described in the context of ovarian and pancreatic cancer, BOK (BCL2-related ovarian killer) and GPC1 (glipican 1), respectively. BOK has been described as a pro-apoptotic BCL2 protein with restricted expression in reproductive tissue (35). As such it is unlikely to be involved in MCL pathogenesis. However, the glypican family of cell surface proteins are expressed primarily during development with differential expression in a tissue and stage specific manner, affecting both growth and morphogenic signals (36). As such, both GPC1 and GPC3 have been implicated in pancreatic and ovarian cancers as putative oncogenes, respectively (37, 38). In addition, the sixth smallest novel MAR (1.16 Mb) at 13q31.3 is one of only two regions of recurrent amplification. The only gene affected by this amplification is GPC5 (glypican 5). GPC5 has recently been identified as amplified in various types of lymphoma (39, 40). This MAR was present in HBL-2, Rec-1, and Z138C and defined by the overlap between amplifications in HBL-2 and Z138C. The 3.10 Mb MAR of gain at 1p36.32-pter contains 37 Refseq annotated genes. Among these are several tumor necrosis factor receptor super family genes, a cell division cycle control gene (CDC2L2), and matrix metaloproteases. Unfortunately, none of these genes are represented on the Lymphochip and a candidate oncogene in this MAR remains to be determined. 60 The 9q34.2-q34.3 (4.31 Mb) gene dense MAR of gain encompasses 62 Refseq annotated genes. Among these, several putative oncogenes have been identified, VAV2 (vav2 oncogene), NOTCH1 (notch homolog 1, translocation associated), and TRAF2 (tumor necrosis factor receptor-associated factor 2 isoform 2) (41-43). VAV2 is a guanine nucleotide exchange factor involved in Rac1 activation by Src that induces morphological changes when over expressed in NIH-3T3 cells (43, 44). Over expression of NOTCH1 has been shown to promote tumor cell proliferation and survival in both T and B-cell lymphomas as well as epithelial neoplasms such as small cell lung cancer (42, 45). TRAF2 is involved in cytokine mediated activation of NF- kappaB which is commonly activated in lymphomas (41). An equal size (4.31 Mb) MAR of gain at 7q11.23 contains 49 Refseq annotated genes, including BCL7B. BCL7B shares 90% identity with BCL7A, which is recurrently translocated in high grade B-cell non-Hodgkin\u00E2\u0080\u0099s lymphoma (46). The three largest novel MARs are gains of 1q32.2-q32.3 (4.69 Mb), 8q22.1-q22.3 (4.79 Mb), and loss of 18q11.2-q21.2 (17.07 Mb). These MARs encompass 31, 22, and 51 genes, respectively. No candidate oncogenes or tumor suppressor genes have been described within these regions and further characterization is necessary to determine the selective advantage of having these genomic alterations. 4.3 Discussion Unlike solid tumours, cell lines are readily created from lymphomas. High throughput techniques allow for the rapid characterization of cell lines, with the hope of identifying new molecular aberrations that can act as biomarkers or therapeutic targets. However, the transcriptome and proteome are volatile and too readily influenced by culture conditions. Genomic analysis has the advantage of being more stable, making it theoretically easier to identify key molecular abnormalities. The eight cell lines used in this study had not been thoroughly characterized at the genomic level. The recent development of the SMRT array 61 allowed the first detailed whole genome examination of copy number changes at tiling resolution in these cell lines. Assessment of copy number alterations with 32,433 DNA segments identified an unprecedented average of 35 genetic alterations per cell line with equal numbers of amplifications and deletions. Because recurrent alterations are more likely to be indicative of critical events in pathogenesis, we considered only those alterations recurrent in at least three cell lines. Of the recurrent MARs identified, only 18 (51%) were within previously characterized genetic alterations. Surveying regions encompassing genes known to be differentially expressed in MCL accounted for an additional 4 (11%) of the identified MARs. The remaining 13 (37%) identified MARs include novel regions of alteration in MCL. These regions may require further verification in paired normal and tumor samples to eliminate the possibility of polymorphisms within the human population. In addition, 9 (26%) of the defined recurrent regions were less than 1 Mb in size. Using previously available assays with 1 Mb or greater resolution, these regions would have been impossible to define. However, due to the comprehensive overlapping nature of the SMRT array, we were able to define these regions. The utility of this comprehensive assay is made evident by the detection and localization of genomic rearrangements within the Ig loci, which previously went undetected in whole genome copy number assays. Trisomy 12 has been observed in many types of lymphoma including MCL. This suggests that there are multiple oncogenes spanning chromosome 12. Our observation of six distinct MARs of gain throughout chromosome 12 supports this hypothesis and provides regions for further characterization. Similar to trisomy 12, chromosome arm 9p deletions are common in MCL. This is traditionally associated with the deletion of tumor suppressor genes p15 and p16. However, our data would suggest additional tumor suppressor genes at three separate MARs. Further characterization of these regions in clinical specimens is required to identify the candidate tumor suppressor genes. One of the MARs containing a single gene at 13q31.3 is associated with GPC5. GPC5 amplification has been observed in multiple B-cell lymphomas. 62 Interestingly, we also identified a 2.42 Mb MAR at 2q37.3 containing GPC1. This finding raises the possibility of GPC family involvement in MCL pathogenesis. Over expression of GPC1 and GPC3 have been found in ovarian and pancreatic cancer, respectively. For the first time we can comprehensively describe MCL cell model genomes at tiling resolution. The extension of high resolution whole genome analysis to tumor genomes will prove useful in identifying novel genetic alterations in mantle cell lymphoma. Inclusion of these alterations will potentially enhance focussed MCL specific arrays. (The complete panel of \u00E2\u0080\u009CSeeGH Karyograms\u00E2\u0080\u009D output from SeeGH are given as supplemental material and the raw image data are available at http://www.bccrc.ca/cg/ArrayCGH_Group.html). 4.4 Materials and methods 4.4.1 Cell lines, culture conditions and DNA extraction A panel of eight cell lines, seven MCL derived (Granta-519, HBL-2, NCEB-1, Rec-1, SP49, UPN-1, and Z138C) (10, 11, 47) and one prolymphocytic leukemia derived (JVM-2) (47) was analyzed in this study. All cell lines studied contained CCND1-IGH gene rearrangements indicative of MCL. Granta-519, HBL-2, JVM-2, NCEB-1, and Z138C were cultured in RPMI 1640 supplemented with 10% fetal bovine serum, 1% penicillin/streptomycin, and 1% L- glutamine to a density of ~106 cells per ml and genomic DNA was extracted via standard proteinase K / RNAse treatment and phenol-chloroform extraction. DNA isolation for the remaining three cell lines (Rec-1, SP49, and UPN-1) was previously described (10, 11). 4.4.2 Array CGH hybridization Copy number alterations were assayed in the eight MCL model cell lines by performing array CGH with microarrays containing 97,299 elements, representing 32,433 BAC derived amplified fragment pools spotted in triplicate. The construction of this SMRT array was previously 63 described (26, 48). The overlapping clone coverage of this array allows the assessment of genomic integrity across the entire genome in a single experiment. Six hundred nanograms of sample or pooled normal reference male genomic DNA (Novagen, Mississaga, ON) was denatured at 100\u00C2\u00BAC for 10 minutes in a 33.5 \u00CE\u00BCl volume containing 1X Klenow buffer, and 28 \u00CE\u00BCM random octamers (Alpha DNA, Montreal, PQ) then cooled on ice. Fifteen nanomoles each of dATP, dTTP, and dGTP, 9 nmol dCTP, 45 U Klenow (Promega, Madison, WI) and 4 nmol cyanine-3 dCTP or cyanine-5 dCTP, respectively, was added to the reaction for a final volume of 50 \u00CE\u00BCl, mixed, and incubated at 37\u00C2\u00BAC overnight. Labelled sample and reference DNA were combined and purified using ProbeQuant Sephadex G-50 Columns (Amersham, Baie d\u00E2\u0080\u0099Urfe, PQ) followed by addition of 200 \u00CE\u00BCg Human Cot-1 DNA (Invitrogen, Burlington, ON) and precipitation with 0.1X volume 3 M sodium acetate and 2.5X volume 100% ethanol. The DNA pellet was re-suspended in 100 \u00CE\u00BCl hybridization solution containing 80% DIG Easy hybridization solution (Roche, Laval, PQ), 200 \u00CE\u00BCg sheared herring sperm DNA, and 100 \u00CE\u00BCg yeast tRNA. Re-suspended probe was denatured at 85\u00C2\u00BAC for 10 minutes and repetitive sequences blocked at 45\u00C2\u00BAC for 1 hour prior to hybridization. Pre-blocked probe was added to the array and placed into a hybridization chamber (Telechem, Sunnyvale, CA) and hybridized at 45\u00C2\u00BAC for ~40 hours. After hybridization arrays were washed 5 times for 5 minutes each in 0.1X SSC, 0.1% SDS at room temperature in the dark with agitation followed by 5 rinses in 0.1X SSC and dried using an oil free air stream. 4.4.3 Array CGH analysis Images of the hybridized array were captured through cyanine-3 and cyanine-5 channels using a charge-coupled device (CCD) camera system (Applied Precision, Issaquah, WA). Images were then analyzed using SoftWoRx Tracker Spot Analysis software (Applied Precision). 64 Custom software called \u00E2\u0080\u009CSeeGH\u00E2\u0080\u009D was used to visualize all data as log2 ratio plots and this software is freely available (http://www.flintbox.ca/technology.asp?tech=FB312FB) (49). To determine the experimental variation within copy number profiles using SMRT array CGH, two self versus self and five repeats of pooled male versus normal female hybridizations were performed (data available at http://www.bccrc.ca/cg/ArrayCGH_Group.html). The standard deviation of the autosomal log2 ratios for all control hybridizations was determined to be 0.086. Similar to Veltman et al., we chose a value slightly more conservative than two times the standard deviation, yielding thresholds of +/-0.2 log2 ratios (19). Therefore, genomic alterations were classified as follows: normal copy number between Log2 ratios of -0.2 and 0.2, genomic loss between Log2 ratios of -0.2 and -0.7, homozygous deletion with Log2 ratio of < -0.7, genomic gain between Log2 ratios of 0.2 and 0.5, and amplification with Log2 ratio of > 0.5. Genomic imbalances and their associated breakpoints were identified using genetic local search algorithms within the software package aCGHsmooth developed by Jong et al. using recommended program parameters and subsequently confirmed visually within the primary raw normalized data (50), available at http://www.few.vu.nl/~vumarray/. Briefly, aCGHsmooth determines breakpoints within chromosomes by performing a maximum likelihood estimation for each clone by calculating the probability that the clone of interest lies within the set of previous clones. These putative breakpoints are then shifted randomly in both directions and an overall fitness is calculated. This is repeated until either no improvement in fitness can be achieved or the maximum number of iterations have been completed. The mean values of breakpoint segments are calculated and closely smoothed levels are joined to reflect the assumption that there are few copy number values present in chromosomes. 4.4.4 Expression microarray procedure Lymphochip DNA microarrays (51) containing 12,196 cDNA elements were used to quantitate mRNA expression for five MCL cell lines (Granta-519, HBL-2, NCEB-1, Rec-1, and Z138C) as previously described (52). 65 4.4.5 Fluorescent in-situ hybridization FISH probes were created by labeling purified BAC DNA in random priming reactions containing either spectrum-red or spectrum-green labeled nucleotides. Metaphase spreads of Granta-519 were created from cells, at 8 x 105 cells/ml, treated with colchasine for three hours. Hybridization and imaging was performed as previously described (45). Figure 4.1. Whole genome SMRT aCGH SeeGH karyogram of MCL cell line HBL-2 versus pooled normal male genomic DNA. Each dot represents data from one BAC derived segment on the array. Data points to left and right of center line represent genetic losses and gains, respectively. Lines at Log2 ratios of -0.5 and +0.5 are scale bars only. 66 Figure 4.2. Summary of chromosomal imbalances detected by SMRT aCGH in 8 MCL cell line models. Lines on left side of ideogram indicate loss of chromosomal material with thick lines representing homozygous deletion; lines on right side indicate gain of chromosomal material with thick lines indicating regions of high level amplification. Each numbered line represents a loss or gain in a cell line: (1) Granta-519, (2) HBL-2, (3) NCEB-1, (4) Rec-1, (5) SP49, (6) UPN-1, (7) Z138C, (8) JVM-2. 67 Figure 4.3. Representation of genomic copy number alterations on chromosome arm 9p. Each bar represents data from one BAC derived segment on the array with each column corresponding to the indicated cell line. Green and red bars indicate genomic copy number losses and gains, respectively. 68 Figure 4.4. Representation of genomic copy number alterations on chromosome arm 12. Each bar represents data from one BAC derived segment on the array with each column corresponding to the indicated cell line. Green and red bars indicate genomic copy number losses and gains, respectively. 69 Figure 4.5. Locus specific fluorescent in-situ hybridization validation of genetic copy number alterations. (A) locus specific FISH of 13q14.3 deletion in cell line Granta-519, (B) locus specific FISH 22q11.22 homozygous deletion in cell line Granta-519. Each bar represents data from one BAC derived segment on the array at the chromosomal location indicated. Data points to left and right of center purple line represent genetic losses and gains, respectively. Blue lines are scale bars at Log2 ratios of -0.5 and +0.5. Red and green arrows indicate loci of spectrum red and spectrum green labeled FISH probes, respectively. 70 Table 4.1 Summary of genomic alterations recurring in a minimum of three MCL cell line models. Cell line Cytoband Size (MB) Minimal region boundary clones Genes known to be differentially expressed in MCL G ra nt a- 51 9 H BL -2 N C EB -1 R ec -1 SP 49 U PN -1 Z1 38 JV M -2 Number of refseq genes in region 2p11.2 0.13 685C7 - 685C7 0 - 9p21.3 1.24 328C2 - 275H17 4 P15 (\u00E2\u0086\u0093)2, P16 (\u00E2\u0086\u0093)2 22q11.22 0.39 757F24 - 50L23 7 - 9p23 0.64 771D23 - 577M1 1 - 13q14.2 - 13q14.3 0.76 195L15 - M2217H7 7 - 17p13.3 - 17pter 1.14 818O24 - M2348K1 11 - 9p24.3 - 9pter 0.48 165F24 - 143M1 2 - 9p23 - 9p24.1 3.31 702P22 - 645K2 2 - 7p22.1 - 7pter 5.85 568L2 - 379K15 37 MAD1L1 (\u00E2\u0086\u0093)30 8q24.13 - 8q24.21 5.56 497O14 - 279F20 24 MYC (\u00E2\u0086\u0091)30 11p15.5 - 11pter 2.50 583A17 - 401C19 59 1-8D (\u00E2\u0086\u0093)29, H-RAS (\u00E2\u0086\u0093)30, IFI17 (\u00E2\u0086\u0093)29,30 2p15 - 2p16.1 1.07 607B19 - 119H15 6 REL (\u00E2\u0086\u0091)30 12q13.13 - 12q13.2 2.97 260F8 - M2265L24 68 RARG (\u00E2\u0086\u0093)30 17p11.2 - 17pter 15.60 304M17 - M2348K1 208 P53 (\u00E2\u0086\u0093)2 18q11.2 - 18q21.2 17.07 1076F2 - 373G16 51 - 1p36.11 1.18 443P17 - 553K16 23 SRC2 (\u00E2\u0086\u0093)29 1p21.1 - 1p31.1 35.25 478L17 - 437B7 111 BCL-10 (\u00E2\u0086\u0093)29, CNN3 (\u00E2\u0086\u0091)29, GBP1 (\u00E2\u0086\u0093)29, TGFBR3 (\u00E2\u0086\u0093)30 2q37.1 1.66 707E3 - 599F11 14 - 2q37.2 0.87 367B19 - 473L20 2 - 18q21.33 - 18q22.1 7.26 155A6 - 177K16 23 BCL2 (\u00E2\u0086\u0091)29,30 1p36.32 - 1pter 3.10 151F10 - 34P13 37 - 12q24.21 - qter 19.10 749J2 - M2140B24 115 - 1q32.2 - 1q32.3 4.69 818N18 - 757D10 31 - 2q37.3 2.42 546M8 - 811O7 23 - 7p11.2 - 7p21.2 40.42 760D02 - 160E4 171 NMB (\u00E2\u0086\u0091)29 7q11.23 4.31 N1328G23 - 85K14 49 - 8q22.1 - 8q22.3 4.79 655F3 - 486H6 22 - 9q34.2 - 9q34.3 4.31 746P3 - 424E7 62 - 12q14.2 0.37 749L20 - 415I12 2 13q31.3 1.16 511F12 - 487A2 1 12q13.2 - 12q14.1 3.41 644F5 - 1P10 82 CD63 (\u00E2\u0086\u0091)29, CDK4 (\u00E2\u0086\u0091)30 12p11.21 - 12p12.3 12.31 48N6 - 607H18 51 - 12q15 - 12q21.2 7.27 428C23 - 401H20 26 - 8p23.3 0.43 91J19 - 130K11 3 - 8p21.2-p21.3 0.73 784K2 - 806N20 12 - LOSS GAIN * Data from University of California, Santa Cruz Human Genome Browser April 2003 Freeze. Novel regions are in bold. \u00E2\u0080\u00A0 (\u00E2\u0086\u0091) indicates over expression, (\u00E2\u0086\u0093) indicates under expression in superscript reference 71 72 4.5 References 1. Anon. A clinical evaluation of the International Lymphoma Study Group classification of non-Hodgkin's lymphoma. The non-Hodgkin's Lymphoma Classification Project. Blood 1997; 89: 3909-3918. 2. Argatoff LH, Connors JM, Klasa RJ, Horsman DE, Gascoyne RD. Mantle cell lymphoma: a clinicopathologic study of 80 cases. Blood 1997 Mar 15; 89(6): 2067-2078. 3. Swerdlow SH, Williams ME. From centrocytic to mantle cell lymphoma: a clinicopathologic and molecular review of 3 decades. Hum Pathol 2002 Jan; 33(1): 7-20. 4. Williams ME, Swerdlow SH, Rosenberg CL, Arnold A. Characterization of chromosome 11 translocation breakpoints at the bcl-1 and PRAD1 loci in centrocytic lymphoma. Cancer Res 1992 Oct 1; 52(19 Suppl): 5541s-5544s. 5. Bodrug SE, Warner BJ, Bath ML, Lindeman GJ, Harris AW, Adams JM. Cyclin D1 transgene impedes lymphocyte maturation and collaborates in lymphomagenesis with the myc gene. Embo J 1994 May 1; 13(9): 2124-2130. 6. Lovec H, Grzeschiczek A, Kowalski MB, Moroy T. Cyclin D1/bcl-1 cooperates with myc genes in the generation of B-cell lymphoma in transgenic mice. Embo J 1994 Aug 1; 13(15): 3487-3495. 7. Wlodarska I, Pittaluga S, Hagemeijer A, De Wolf-Peeters C, Van Den Berghe H. Secondary chromosome changes in mantle cell lymphoma. Haematologica 1999 Jul; 84(7): 594-599. 8. Au WY, Gascoyne RD, Viswanatha DS, Connors JM, Klasa RJ, Horsman DE. Cytogenetic analysis in mantle cell lymphoma: a review of 214 cases. Leuk Lymphoma 2002 Apr; 43(4): 783-791. 9. Bea S, Ribas M, Hernandez JM, Bosch F, Pinyol M, Hernandez L, et al. Increased number of chromosomal imbalances and high-level DNA amplifications in mantle cell lymphoma are associated with blastoid variants. Blood 1999 Jun 15; 93(12): 4365-4374. 10. Martinez-Climent JA, Vizcarra E, Sanchez D, Blesa D, Marugan I, Benet I, et al. Loss of a novel tumor suppressor gene locus at chromosome 8p is associated with leukemic mantle cell lymphoma. Blood 2001 Dec 1; 98(12): 3479-3482. 11. M'Kacher R, Farace F, Bennaceur-Griscelli A, Violot D, Clausse B, Dossou J, et al. Blastoid mantle cell lymphoma: evidence for nonrandom cytogenetic abnormalities additional to t(11;14) and generation of a mouse model. Cancer Genet Cytogenet 2003 May; 143(1): 32-38. 12. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, et al. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 1992 Oct 30; 258(5083): 818-821. 13. Forozan F, Karhu R, Kononen J, Kallioniemi A, Kallioniemi OP. Genome screening by comparative genomic hybridization. Trends Genet 1997 Oct; 13(10): 405-409. 73 14. Fiegler H, Carr P, Douglas EJ, Burford DC, Hunt S, Scott CE, et al. DNA microarrays for comparative genomic hybridization based on DOP-PCR amplification of BAC and PAC clones. Genes Chromosomes Cancer 2003 Apr; 36(4): 361-374. 15. Greshock J, Naylor TL, Margolin A, Diskin S, Cleaver SH, Futreal PA, et al. 1-Mb resolution array-based comparative genomic hybridization using a BAC clone set optimized for cancer gene analysis. Genome Res 2004 Jan; 14(1): 179-187. 16. Snijders AM, Nowak N, Segraves R, Blackwood S, Brown N, Conroy J, et al. Assembly of microarrays for genome-wide measurement of DNA copy number. Nat Genet 2001 Nov; 29(3): 263-264. 17. Kraus J, Pantel K, Pinkel D, Albertson DG, Speicher MR. High-resolution genomic profiling of occult micrometastatic tumor cells. Genes Chromosomes Cancer 2003 Feb; 36(2): 159-166. 18. Veltman JA, Fridlyand J, Pejavar S, Olshen AB, Korkola JE, DeVries S, et al. Array- based comparative genomic hybridization for genome-wide screening of DNA copy number in bladder tumors. Cancer Res 2003 Jun 1; 63(11): 2872-2880. 19. Veltman JA, Schoenmakers EF, Eussen BH, Janssen I, Merkx G, van Cleef B, et al. High-throughput analysis of subtelomeric chromosome rearrangements by use of array- based comparative genomic hybridization. Am J Hum Genet 2002 May; 70(5): 1269- 1276. 20. Weiss MM, Kuipers EJ, Postma C, Snijders AM, Siccama I, Pinkel D, et al. Genomic profiling of gastric cancer predicts lymph node status and survival. Oncogene 2003 Mar 27; 22(12): 1872-1879. 21. Garnis C, Baldwin C, Zhang L, Rosin MP, Lam WL. Use of complete coverage array comparative genomic hybridization to define copy number alterations on chromosome 3p in oral squamous cell carcinomas. Cancer Res 2003 Dec 15; 63(24): 8582-8585. 22. Garnis C, Campbell J, Zhang L, Rosin MP, Lam WL. OCGR array: an oral cancer genomic regional array for comparative genomic hybridization analysis. Oral Oncol 2004 May; 40(5): 511-519. 23. Garnis C, Coe BP, Ishkanian A, Zhang L, Rosin MP, Lam WL. Novel regions of amplification on 8q distinct from the MYC locus and frequently altered in oral dysplasia and cancer. Genes Chromosomes Cancer 2004 Jan; 39(1): 93-98. 24. Wilhelm M, Veltman JA, Olshen AB, Jain AN, Moore DH, Presti JC, Jr., et al. Array- based comparative genomic hybridization for the differential diagnosis of renal cell cancer. Cancer Res 2002 Feb 15; 62(4): 957-960. 25. Kohlhammer H, Schwaenen C, Wessendorf S, Holzmann K, Kestler HA, Kienle D, et al. Genomic DNA-chip hybridization in t(11;14)-positive mantle cell lymphomas shows a high frequency of aberrations and allows a refined characterization of consensus regions. Blood 2004 Aug 1; 104(3): 795-801. 26. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, et al. A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet 2004 Mar; 36(3): 299-303. 74 27. Dreyling MH, Bullinger L, Ott G, Stilgenbauer S, Muller-Hermelink HK, Bentz M, et al. Alterations of the cyclin D1/p16-pRB pathway in mantle cell lymphoma. Cancer Res 1997 Oct 15; 57(20): 4608-4614. 28. Koduru PR, Zariwala M, Soni M, Gong JZ, Xiong Y, Broome JD. Deletion of cyclin- dependent kinase 4 inhibitor genes P15 and P16 in non-Hodgkin's lymphoma. Blood 1995 Oct 15; 86(8): 2900-2905. 29. Ek S, Hogerkorp CM, Dictor M, Ehinger M, Borrebaeck CA. Mantle cell lymphomas express a distinct genetic signature affecting lymphocyte trafficking and growth regulation as compared with subpopulations of normal human B cells. Cancer Res 2002 Aug 1; 62(15): 4398-4405. 30. Hofmann WK, de Vos S, Tsukasaki K, Wachsman W, Pinkus GS, Said JW, et al. Altered apoptosis pathways in mantle cell lymphoma detected by oligonucleotide microarray. Blood 2001 Aug 1; 98(3): 787-794. 31. Frater JL, Hsi ED. Properties of the mantle cell and mantle cell lymphoma. Curr Opin Hematol 2002 Jan; 9(1): 56-62. 32. Migliazza A, Bosch F, Komatsu H, Cayanis E, Martinotti S, Toniato E, et al. Nucleotide sequence, transcription map, and mutation analysis of the 13q14 chromosomal region deleted in B-cell chronic lymphocytic leukemia. Blood 2001 Apr 1; 97(7): 2098-2104. 33. Hamaguchi M, Meth JL, von Klitzing C, Wei W, Esposito D, Rodgers L, et al. DBC2, a candidate for a tumor suppressor gene involved in breast cancer. Proc Natl Acad Sci U S A 2002 Oct 15; 99(21): 13647-13652. 34. Hozumi N, Tonegawa S. Evidence for somatic rearrangement of immunoglobulin genes coding for variable and constant regions. Proc Natl Acad Sci U S A 1976 Oct; 73(10): 3628-3632. 35. Hsu SY, Kaipia A, McGee E, Lomeli M, Hsueh AJ. Bok is a pro-apoptotic Bcl-2 protein with restricted expression in reproductive tissues and heterodimerizes with selective anti-apoptotic Bcl-2 family members. Proc Natl Acad Sci U S A 1997 Nov 11; 94(23): 12401-12406. 36. Kramer KL, Yost HJ. Heparan sulfate core proteins in cell-cell signaling. Annu Rev Genet 2003; 37: 461-484. 37. Kleeff J, Ishiwata T, Kumbasar A, Friess H, Buchler MW, Lander AD, et al. The cell- surface heparan sulfate proteoglycan glypican-1 regulates growth factor action in pancreatic carcinoma cells and is overexpressed in human pancreatic cancer. J Clin Invest 1998 Nov 1; 102(9): 1662-1673. 38. Lin H, Huber R, Schlessinger D, Morin PJ. Frequent silencing of the GPC3 gene in ovarian cancer cell lines. Cancer Res 1999 Feb 15; 59(4): 807-810. 39. Neat MJ, Foot N, Jenner M, Goff L, Ashcroft K, Burford D, et al. Localisation of a novel region of recurrent amplification in follicular lymphoma to an approximately 6.8 Mb region of 13q32-33. Genes Chromosomes Cancer 2001 Nov; 32(3): 236-243. 75 40. Yu W, Inoue J, Imoto I, Matsuo Y, Karpas A, Inazawa J. GPC5 is a possible target for the 13q31-q32 amplification detected in lymphoma cell lines. J Hum Genet 2003; 48(6): 331-335. 41. Brummelkamp TR, Nijman SM, Dirac AM, Bernards R. Loss of the cylindromatosis tumour suppressor inhibits apoptosis by activating NF-kappaB. Nature 2003 Aug 14; 424(6950): 797-801. 42. Jundt F, Anagnostopoulos I, Forster R, Mathas S, Stein H, Dorken B. Activated Notch1 signaling promotes tumor cell proliferation and survival in Hodgkin and anaplastic large cell lymphoma. Blood 2002 May 1; 99(9): 3398-3403. 43. Servitja JM, Marinissen MJ, Sodhi A, Bustelo XR, Gutkind JS. Rac1 function is required for Src-induced transformation. Evidence of a role for Tiam1 and Vav2 in Rac activation by Src. J Biol Chem 2003 Sep 5; 278(36): 34339-34346. 44. Schuebel KE, Movilla N, Rosa JL, Bustelo XR. Phosphorylation-dependent and constitutive activation of Rho proteins by wild-type and oncogenic Vav-2. Embo J 1998 Nov 16; 17(22): 6608-6621. 45. Henderson LJ, Okamoto I, Lestou VS, Ludkovski O, Robichaud M, Chhanabhai M, et al. Delineation of a minimal region of deletion at 6q16.3 in follicular lymphoma and construction of a bacterial artificial chromosome contig spanning a 6-megabase region of 6q16-q21. Genes Chromosomes Cancer 2004 May; 40(1): 60-65. 46. Zani VJ, Asou N, Jadayel D, Heward JM, Shipley J, Nacheva E, et al. Molecular cloning of complex chromosomal translocation t(8;14;12)(q24.1;q32.3;q24.1) in a Burkitt lymphoma cell line defines a new gene (BCL7A) with homology to caldesmon. Blood 1996 Apr 15; 87(8): 3124-3134. 47. Drexler H. The leukemia-lymphoma cell line factsbook. Academic Press: San Diego, CA, 2001. 48. Watson SK, deLeeuw RJ, Ishkanian AS, Malloff CA, Lam WL. Methods for high throughput validation of amplified fragment pools of BAC DNA for constructing high resolution CGH arrays. BMC Genomics 2004 Jan 14; 5(1): 6. 49. Chi B, DeLeeuw RJ, Coe BP, MacAulay C, Lam WL. SeeGH--a software tool for visualization of whole genome array comparative genomic hybridization data. BMC Bioinformatics 2004 Feb 9; 5: 13. 50. Jong K, Marchori E, van fer Vaart A, Weiss MM, Meijer GA. Applications of Evolutionary Computing: EvoBIO: Evolutionary Computation Computation and Bioinformatics, vol. 2611. Springer LNCS, 2003, 54-65pp. 51. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000 Feb 3; 403(6769): 503-511. 52. Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med 2002 Jun 20; 346(25): 1937-1947. 76 Chapter 5: A comprehensive analysis of common copy- number variations in the human genome\u00C2\u00A7 \u00C2\u00A7 A version of this chapter has been published: Wong Kendy K, deLeeuw Ronald J, Dosanjh Nirpjit S, Kimm Lindsey R, Cheng Ze, Horsman Douglas E, MacAualy Calum, Ng Raymond T, Brown Carolyn J, Eichler Evan E, and Lam Wan L. (2007) A Comprehensive Analysis of Common Copy-Number Variations in the Human Genome. Am J Hum Genet 80:91-104 77 5.1 Introduction Genetic variation in the human genome exists in different forms. Recent studies have shown that variations exist in the human genome at various levels: the single base pair (1), the kilobase pair (2-4), and tens to thousands of kilobase pairs (5-8). Extensive studies, including the recently published haplotype map from HapMap (1), have identified millions of SNPs in the human genome. Three recent studies that used the SNP data each identified several hundred sites of deletion in the human population; however, gains could not be deduced from this data set (2-4). By use of a fosmid paired-end sequence analysis, a comprehensive comparison between two genomes quantified 241 sites of insertion or deletion (8). By use of array comparative genomic hybridization (array CGH) techniques, large-scale copy-number variations (CNVs) were demonstrated in a fraction of the human genome (5, 6). Each of these studies added to our knowledge about CNVs in the human population, but with little overlap in findings (9). Thus, many characteristics of CNVs in the human population remain unknown, such as the total number, genomic positions, gene content, frequency spectrum, and patterns of linkage disequilibrium with one another. Understanding CNVs is critical for the proper study of disease- associated changes because segmental CNVs have been demonstrated in developmental disorders and susceptibility to disease (10). Therefore, analysis of CNVs at the whole-genome level is required to create a baseline of human genomic variation. In this study, using a whole- genome tiling-path BAC array CGH approach, (11) we measured large scale (>40 kb) segmental gains and losses in >100 individuals to expand our knowledge about CNVs and to estimate the extent of this form of variation in the human population. 5.2 Material and methods 5.2.1 DNA samples Samples were collected and were rendered anonymous. These samples included 16 from healthy blood donors, 51 from a British Columbia Cancer Agency (BCCA) screening program, 78 and 26 B-lymphoblast DNA samples encompassing 16 distinct ethnic groups from the Human Variation Collection and 14 CEPH pedigree samples from the Coriell Cell Repository (National Institute of General Medical Sciences, Camden, NJ). The DNA samples from cell lines were included to represent diverse ethnic populations. The 51 samples from the BCCA screening program included 19 from a breast cancer screening program and 32 from a colon cancer screening program. These were constitutional DNA samples obtained from blood that did not contain any neoplastic cells, and none showed CNV association with BRCA1 (MIM 113705), BRCA2 (MIM 600185), APC (MIM 175100), MSH2 (MIM 609309), or MSH6 (MIM 600678). Only 2 of the 32 samples from the colon cancer screening program showed CNV association with MLH1 (MIM 120436). In addition, no CNVs were associated with a specific sample type or source, which suggests no obvious selection bias. In total, 105 DNA samples (from 44 males and 61 females) were included in this study (Table 5.1), 95 of which were used for CNV discovery. DNA from the four grandparents of the CEPH pedigree were included in the CNV discovery sample set, whereas DNA from 10 other members of the family were included only for clustering and inheritance analysis. A donor sample was used as the male reference, and a single female sample was used only in control experiments. Genomic DNA from donors was extracted from whole blood by use of the QIAamp DNA Blood Maxi Kit (QIAGEN) and was quantified by spectrophotometry (ND-1000 [NanoDrop]). 5.2.2 BAC array CGH analysis DNA labeling and hybridization were performed as described elsewhere (11), with slight modifications. In brief, 200 ng of sample and reference DNA were differentially labeled with Cyanine 3\u00E2\u0080\u0093dCTP and Cyanine 5\u00E2\u0080\u0093dCTP (Perkin Elmer Life Sciences). The random priming reaction was incubated in the dark at 37\u00C2\u00B0C for 16\u00E2\u0080\u009318 h. DNA samples were then combined, and unincorporated nucleotides were removed using microcon YM-30 columns (Millipore). Purified samples were mixed with 100 \u00CE\u00BCg of human Cot-1 DNA (Invitrogen) and were precipitated. DNA pellets were resuspended in 45 \u00CE\u00BCl of DIG Easy hybridization solution (Roche) containing 20 79 mg/ml sheared herring sperm DNA and 10 mg/ml yeast tRNA. The sample mixture was denatured at 85\u00C2\u00B0C for 10 min, and repetitive sequences were blocked at 45\u00C2\u00B0C for 1 h before hybridization. The mixture was then applied onto BAC arrays containing 26,363 clones spotted in duplicate (53,856 elements with controls) on single slides. (These clones were selected from the SMRT clone set, to optimize tiling coverage of the genome; the clone list is available at the SMRT Array Web site (11).) Hybridization was performed in the dark at 45\u00C2\u00B0C for ~36 h inside a hybridization chamber, followed by washing three times for 3 min each with agitation in 0.1\u00C3\u0097 saline sodium citrate (SSC) and 0.1% SDS at 45\u00C2\u00B0C. Arrays were then rinsed three times for 3 min each in 0.1\u00C3\u0097 SSC at room temperature and were dried by an air stream before imaging. Slides were scanned using a charge-coupled device\u00E2\u0080\u0093based imaging system (arrayWoRx eAuto [Applied Precision]) and were analyzed with the SoftWoRx Tracker Spot Analysis software (Applied Precision). The log2 ratios of the Cyanine 3 to Cyanine 5 intensities for each spot were assessed. To remove systematic effects from nonbiological sources that introduce bias, the ratios were then normalized using a stepwise normalization technique (12). Custom SeeGH software was used to visualize normalized data as log2 ratio plots (Figure 5.1) (13). 5.2.3 CNV detection algorithm For each experiment, 1,398 clones from chromosomes X and Y were removed, and the remaining data were median normalized to remove bias introduced because of any sex- mismatched hybridization. In addition, 573 clones were removed from analysis because of printing anomalies or their shift in log2 ratios, possibly due to homology with the X or Y chromosome, leaving a total of 24,392 reliable clones for analysis (see the tab-delimited ASCII file, which can be imported into a spreadsheet, of data set 1 [online only]). Experimental SDs (SDautosome) were calculated for each experiment on the basis of the log2 ratios of the 24,392 reliable clones minus the clones removed because of low signal-to-noise ratio (SNR) or high SD of replicate clone measures (SDclone). Thresholds for determining CNV clones were set at a multiple of the SDautosome value. For each experiment, clones were annotated as 80 uninformative if they were filtered via SNR or SDclone, as a CNV loss if the log2 ratio was less than the negative threshold, as unchanged if the log2 ratio was between the negative and positive thresholds, and as a CNV gain if the log2 ratio was above the positive threshold. To determine the optimal values for SNR, SDclone, and the SDautosome multiplier, eight hybridization experiments (four repeat experiments of male reference versus the single female DNA and four experiments between those two DNAs and two additional DNA pools) were used. On the basis of the possible combinations of copy-number status in the four DNA samples used, we determined the expected CNV patterns in the eight hybridization experiments (Table 5.2). The three parameters were recursively varied until the highest proportion of CNV clones match the expected patterns (Table 5.2); this resulted in the filter settings of SDclone>0.15, SNR<3, and a stringent SDautosome multiplier of 3.3\u00C3\u0097. On the basis of six self-versus-self hybridizations to calibrate array performance, experiments with >10% uninformative data points or with an SDautosome>0.12 were repeated. Normalized log2 ratio profiles were generated for the 105 individuals from hybridization of sample DNA versus a single male reference DNA. Data points that did not meet our SDclone or SNR criteria were annotated as uninformative, whereas those whose average ratio exceeded the 3.3\u00C3\u0097 SDautosome were identified as CNV clones (see the tab-delimited ASCII file of data set 2 [online only]). CNV clones that overlapped in genomic coverage were considered to represent the same CNV loci. A custom track file for uploading the identified CNV clones to the University of California\u00E2\u0080\u0093Santa Cruz (UCSC) Human Genome Browser is available on request. After submission of the custom track file, clones displayed in blue, red, green, and black represented CNVs seen once or twice, three times, four or five times, and six or more times, respectively. 5.2.4 Determination of false-positive and false-negative rates To estimate our false-positive and false-negative rates in this study, six repeat experiments (of the single female vs. the male reference) were analyzed per our CNV algorithm (see above). In total, 803 CNV calls were made, with 340 seen only once, 50 twice, 46 three times, 15 four 81 times, 15 five times, and 15 six times. Given that our false-positive results cannot exceed the total number of calls (i.e., 803), our maximum false-positive rate is 0.5487% (803/24,392 measures \u00C3\u0097 6 experiments). By use of this maximum false-positive rate of 0.5487%, the binomial probability, p, of detecting the same clone twice within six experiments by random chance is p=0.000445. Therefore, we concluded that any clone detected twice or more was a true CNV in these six repeat experiments. In theory, we expected to detect 141 true CNVs (i.e., 50 calls seen twice, 46 seen three times, and 15 each seen four, five, and six times) in each of the six experiments (846 calls). In practice, 463 were detected, yielding an estimated false- negative rate of 45.3%. Although statistically a fraction of the single-occurrence calls (those seen only once) represent true CNVs, we conservatively considered all 340 as false-positive results, resulting in a false-positive rate of 0.2323% (340 calls/24,392 measures \u00C3\u0097 6 experiments). In short, we tolerated this high false-negative rate of 45.3% to achieve our very low false-positive rate for confidence in CNV discovery. On the basis of the false-positive and false-negative rates calculated above, in a repeat of the same hybridization experiment, one would expect to see 134 calls (803 calls/6 experiments), of which 57 would be false-positive results (0.2323% \u00C3\u0097 24,392 measures) and 77 would be true CNVs. On the basis of our false-negative rate, we would have missed 64 true CNVs (of 141 true CNVs). Therefore, of a total of 141 true CNVs, the probability of obtaining the same true CNVs in a repeat hybridization should be 54.7% (77 of 141), and the probability of seeing those same CNVs in a second repeat hybridization would be 54.7% \u00C3\u0097 54.7% (42 of the 141 true CNVs). This represents 84 calls (2 \u00C3\u0097 42 CNVs) of the 268 expected total calls (134 \u00C3\u0097 2) (a 31.3% overlap). To verify our calculated rates, three repeat hybridization experiments were performed using the same samples. The observed overlaps of CNV calls between the three possible comparisons were 31.3%, 28.6%, and 31.2%, which is in complete agreement with the expected value. The above calculations are summarized in figure A1 (online only). Additionally, 20 samples (F1, F2, F3, S1, S3, S4, S7, S8, S10, S11, S12, S14, S16, S17, S33, S38, S39, S40, S41, and S44) 82 from the discovery set were each repeated once with a fluorochrome reversal. The overlapping calls between repeats ranged from 21% to 46%, with an average of 30%, again consistent with the expected value from our false-positive and false-negative rates. Furthermore, we employed an additional platform to verify our CNV calls. We recognize that oligonucleotide arrays are generally not designed for measuring CNVs in certain loci, since many segmental duplications and repeat sequences are excluded from array design, and thus we constructed a custom oligonucleotide array (NimbleGen Systems) covering our 3,654 CNV loci with 389,027 elements (~2 kb spacing between elements). Five samples (S70, S71, S72, S73, and S80) were assayed using this custom platform. Each of these DNA samples were hybridized against the same single male reference DNA used for BAC array analysis onto the oligonucleotide array. As described elsewhere (14), to identify gains or losses from the oligonucleotide array, thresholds of 2 SDs of the mean log2 ratio for all elements in the hybridization were used. On the basis of the detection sensitivity of BAC array CGH (15), a moving window size of 19 elements (for a total of ~40 kb, with ~2 kb spacing between elements) was applied. In each window, the number of elements reporting a loss (beyond the threshold) was subtracted from the number of elements reporting a gain. The difference was then divided by 19\u00E2\u0080\u0094the total number of elements in the window. Gains or losses were scored for results at >0.1 or <\u00E2\u0088\u00920.1, respectively. Calls from the oligonucleotide array were then directly compared with CNVs detected by BAC array analysis. To confirm a BAC CNV gain (or loss), at least 10 gains (or losses) were required from the oligonucleotide probe calls covering the same BAC. 5.2.5 CNV association To obtain the genomic loci of our identified copy-number\u00E2\u0080\u0093altered clones, we used UCSC May 2004 mapping annotations from BACPAC Resources. For comparison, locations of previously identified CNVs obtained from the Database of Genomic Variants and from various publications were also anchored to the UCSC May 2004 assembly (from UCSC Genome Bioinformatics) (2- 83 4). These were then converted to elements (i.e., clones) within our clone set by comparison of chromosome number, base-pair start position, and base-pair end position. RefSeq gene information was downloaded from the UCSC May 2004 assembly and was viewed in relation to our CNVs. A gene with any overlap across a CNV boundary was considered to be associated with the CNV. Genes overlapping our CNVs were then used to match genes downloaded from the Online Mendelian Inheritance in Man (OMIM) Morbid Map. The locations of human microRNAs were downloaded from the Sanger miRBase database, were converted to the UCSC May 2004 mapping annotations, and were viewed in relation to our CNVs as described above (16). 5.2.6 Duplication analysis BAC clones and segmental duplication data were mapped to the UCSC May 2004 assembly. CNV loci were assessed for duplication content on the basis of whole-genome assembly comparison (WGAC) and whole-genome shotgun sequence detection (WSSD) analyses of human and chimpanzee genome assemblies (17-20). We required >10 kb of duplicated sequence to consider a BAC as duplicated. Lineage-specific duplications were distinguished on the basis of human and chimpanzee-only comparisons (19), available at the Segmental Duplication Database. 5.2.7 Clustering analysis A total of 105 individuals were clustered on the basis of our CNV clones, including 14 members of a CEPH pedigree: 4 grandparents (already part of our 95-sample CNV discovery set), 2 parents, and 8 offspring. All clones with copy-number gains and losses were annotated as +1 and \u00E2\u0088\u00921, respectively. Uninformative measures were left blank, whereas the remaining cells were annotated as 0. Hierarchical clustering of the samples with single linkage was performed using Cluster and was visualized using Treeview (21) (Eisen Lab: Software Web site). 84 5.2.8 Sample diversity The diversity between every possible pair of individuals was calculated by enumerating the number of CNVs (observed at least three times among the 95 samples) with differing status. The pair with the largest value was taken to be the most diverse. Variation in genome size was determined by first enumerating the net gain or net loss of clones (observed at least three times among the 95 samples) within each individual compared with our reference. The maximum variation was calculated by adding the lowest net loss and the highest net gain. To convert this difference in net clones to genomic size, the number of clones was multiplied by the minimum detection sensitivity of BAC array technology, previously shown to be 40 kb for the average-sized BAC clone (15). 5.2.9 Quantitative PCR The iQ SYBR Green Supermix system (Bio-Rad) was used for quantitative PCR (qPCR). Primers were designed using Primer3 (22), and the primers tested are summarized in the tab- delimited ASCII file of data set 3 (online only). In brief, 10 ng genomic DNA was used in a 25-\u00CE\u00BCl reaction with a test or reference primer pair at 600 nM. Reactions were performed in triplicate and were repeated on different days by use of a Bio-Rad iCycler Optical Module (at 95\u00C2\u00B0C for 10 min, then 40 cycles at 95\u00C2\u00B0C for 15 s and 60\u00C2\u00B0C for 1 min, followed by final extension 55\u00C2\u00B0C for 1 min and a melting-curve analysis). Standard curves for each primer pair were generated using a 10-fold dilution series ranging from 0.1 ng to 100 ng. Data analysis was performed as described by Weksberg et al. (23). 5.3 Results 5.3.1 Identification of CNVs By application of a whole-genome tiling-path BAC array CGH technique, pairwise comparison of DNA samples from 95 unrelated individuals against a single reference DNA sample identified a 85 total of 14,711 CNV BAC clones, averaging 155 per individual (array CGH data for all hybridization experiments have been made publicly available at the Gene Expression Omnibus [series accession number GSE5442]). This resulted in 5,132 unique clones that span 3,654 loci throughout the mapped autosomes (Figure 5.2 and the tab-delimited ASCII file of data set 2 [online only]). To determine a confidence level for our CNVs, we first calculated the probability of an event occurring repeatedly within our sample set. On the basis of our false-positive rate of 0.23%, calculated from repeat hybridization experiments, the probability of a random false- positive event occurring twice or three times by chance within our sample set of 95 was calculated (p=0.02089 and p=0.001479, respectively). A detailed description of the false- positive rate calculation is given in the \u00E2\u0080\u009CMaterial and Methods\u00E2\u0080\u009D section. Second, we examined the amount of overlap with previously reported CNVs (2-8, 24, 25) (Figure 5.3). To facilitate the comparison of our CNVs with previously reported CNVs, the locations of all published CNVs were anchored to the same human genome assembly and were mapped to elements in our clone set. As the minimum recurrence of our CNVs increased, so did the proportion that overlapped with previously reported CNVs (Figure 5.3). Below a recurrence of 3, little overlap was seen between our study and previous studies. This is likely because of false-positive events or very rare CNVs. Between recurrences of 3 and 30, a steadily increasing overlap with previous studies was observed. This may reflect that the more frequent the CNV in the population, the more likely it will be observed in any given study. Beyond a recurrence of 30, no significant increase in overlap was observed. This may reflect the differences in the composition of each study\u00E2\u0080\u0099s population. Twenty of the 95 experiments were repeated using fluorochrome reversal. In both the original and the repeat experiments, 771 CNV calls were observed. Of the repeated calls, 81% appeared at least three times in the original CNV discovery sample set of 95. This observation increased confidence for CNVs that were detected three or more times within our sample set. qPCR was performed as a quality check on a small number of loci but was not used for large- 86 scale validation because of the limited throughput of single-locus analysis (see the tab-delimited ASCII file of data set 3 [online only]). For further verification of our calls, five separate hybridizations were repeated using a custom-designed oligonucleotide array covering our 3,654 loci with 389,027 elements (~2 kb spacing between elements) (see the \u00E2\u0080\u009CMaterial and Methods\u00E2\u0080\u009D section). In the five experiments, 265 CNV calls were confirmed by the oligonucleotide array analysis. Of these CNV calls, 83% were among CNVs detected three or more times in the original CNV discovery set of 95. We next assessed whether our CNVs coincided with segmental duplications in the genome. To achieve this, we evaluated the segmental-duplication content of the CNVs detected in this study, comparing it against both human and chimpanzee sequences, since there is a significant correlation between contemporary human genome structural variation and historical segmental duplications (6-8) (Figure 5.4). As the frequency of the CNV increased, so did the enrichment with segmental duplication. This trend increased confidence for CNVs that were observed three or more times in this sample set. We calculated a 5.7-fold duplication enrichment for the most common variants (5 occurrences in the 95 individuals), which is similar to previous estimates (7, 8). Interestingly, the effect was most dramatic (a 12.1-fold increase) for duplications that arose specifically within human (19). In contrast, no enrichment was observed among chimpanzee- only segmental duplications (Figure 5.4). Elsewhere, we reported an apparent asymmetry with respect to deletion and de novo duplication; 65% of duplications found only in chimpanzee appeared to arise as the result of de novo duplication in the human lineage, as opposed to deletion of a shared duplication in a common ancestor of human and chimpanzee (19). As a result, chimpanzee-only duplications were not expected to be polymorphic in the human lineage. The results from the multiple approaches described above collectively support the presence of novel CNV loci. In addition, the overlaps with previously reported CNVs and segmental duplications, the repeated CNV calls from replicate BAC array CGH experiments and 87 oligonucleotide array hybridizations, the clustering of related individuals on the basis of their CNVs, and the qPCR verification of CNV loci sampled further support their existence. However, the prevalence of these CNVs in the human population can be confirmed only by their presence in multiple individuals. We placed the highest level of confidence in their prevalence when multiple occurrences were observed\u00E2\u0080\u0094for example, 800 loci appeared three or more times in our sample set of 95 individuals. We do not rule out the possibility that true CNVs exist among the loci that we observed at only single and double occurrences in our sample set, since they may represent infrequent events, and a larger sample size will be required to confirm their frequency in the population. We focused on the high-frequency CNVs (i.e., those found in at least 3 of 95 individuals) for further analysis. There were a total of 9,848 high-frequency CNVs annotated in the 95 individuals analyzed, averaging ~104 per individual. These represent 800 unique loci in the human genome (Figure 5.6). Strikingly, when these 800 loci are compared with known CNVs, 23% overlap with previously reported CNVs and 77% are novel. The genomic distribution of the 800 CNVs showed no apparent correlation with GC content, imprinted regions, recombination rates, or gene density. Nonrandom somatic alterations\u00E2\u0080\u0094such as the three CNVs associated with immunoglobulin gene rearrangement at chromosomal subbands 2p11.2, 14q32.33, and 22q11.22 (Figure 5.7)\u00E2\u0080\u0094were detected and removed from further analysis, whereas random somatic alterations not reflecting germline status are not expected to appear recurrently. 5.3.2 Genomic diversity within the sample population We next examined the genomic diversity within our sample set. The 800 high-frequency CNV loci (or 1,005 BAC clones) were calculated to span a minimum of 40 Mb of DNA (calculated on the basis of BAC array CGH minimum detection sensitivity of 40 kb per clone (15)). This equates to ~1.5% of the mapped human autosomes (26) that were able to withstand CNV within our sample set. This did not take into account the percentage of single- and double-occurrence loci that represented true CNVs. The two most diverse samples were S73 and S83. They 88 differed at 266 of the high-frequency CNV loci. Then, we asked the question, what is the greatest difference in genome size between two samples within our set? S55 has the highest net gain of CNV clones, at 97, whereas S83 has the highest net loss of CNV clone, at \u00E2\u0088\u0092131. Comparison of these genomes revealed a difference of 228 clones, representing a difference of at least 9 Mb in genomic size between these two individuals. 5.3.3 CNV-associated genes We next identified candidate genes whose dosage may be affected by the 800 CNV loci (Figure 5.6 and the tab-delimited ASCII file of data set 2 [online only]). In total, 1,673 RefSeq-annotated genes overlapped 546 of the 800 CNV loci. First, we looked for the CNV containing the AMY1A- AMY2A (MIM 104700; MIM 104650) amylase locus, which was a frequently observed copy- number polymorphism (5). This clone was found to be gained in seven individuals and to be lost in five individuals in our sample set (see the tab-delimited ASCII file of data set 2 [online only]). Intriguingly, many genes possibly involved in the senses were found to associate with our CNVs, including a large group of olfactory receptor genes (Table 5.3). In fact, the CNVs associated with olfactory receptor loci segregated in a Mendelian manner in the CEPH family (Figure 5.8). We also observed genes associated with taste (TAS2R and TAS1R1 [MIM 606225], encoding taste receptors), hearing (ACTG1 [MIM 102560] and MYH9 [MIM 160775]), and sight (OPN1SW [MIM 190900], encoding the short-wave\u00E2\u0080\u0093sensitive cone pigment; GNAT1 [MIM 139330], related to night blindness; and FSCN2 [MIM 607643], IMPDH1 [MIM 146690], and ROM1 [MIM 180721], linked to retinitis pigmentosa) (Table 5.3). In addition, the genes encoding rhesus blood group and defensins were also observed within these common CNVs (see the tab-delimited ASCII file of data set 2 [online only]). Surprisingly, many genes associated with disease and susceptibility to disease were also found to have CNV among our sample population. For example, a 630-kb region on chromosome 3p21.3 shown to be deleted in lung cancer was observed to be associated with copy-number loss in 20 individuals in this study (27). This region encompasses the putative tumor-suppressor 89 genes TUSC2 (MIM 607052), TUSC4 (MIM 607072), and NAT6 (MIM 607073) (Figure 5.6, Table 5.4, and the tab-delimited ASCII file of data set 2 [online only]). Many other putative oncogenes and tumor-suppressor genes were also associated with CNVs, such as the VAV2 (MIM 600428) oncogene; RAB3B (MIM 179510), of the RAS oncogene family; TNFRSF25 (MIM 603366); and CDKN1C (MIM 600856) (Table 5.4 and the tab-delimited ASCII file of data set 2 [online only]). In addition to cancer-related genes, CNVs also overlapped genes associated with a bleeding disorder (TBXA2R [MIM 188070]), diabetes mellitus (GCK [MIM 138079]), and spinal muscular atrophy (BSCL2 [MIM 606158], SMA3 [MIM 253400], SMA4 [MIM 271150], and SMN1 [MIM 600354]), as well as with susceptibility to Alzheimer disease (A2M [MIM 103950]), coronary artery disease (LPA [MIM 152200]), and schizophrenia (COMT [MIM 116790]) (Table 5.5). Furthermore, we found 21 human microRNAs that reside within 14 of the high-frequency CNV loci (Figure 5.6 and Table 5.6). 5.4 Discussion The existence of large segmental duplications and deletions in the human genome has long been observed through conventional cytogenetic analyses that use light microscopy (28). More recent genomewide analyses with increased resolutions have revealed that CNVs are present throughout the entire human genome (2-6); however, limited genomic coverage of the arrays or the limitations of the various techniques has restricted the discovery of CNVs present in the sample populations. It is currently hypothesized that several thousand CNVs exist within the human genome and thus that most are yet to be discovered (9, 29). Here, we used a whole- genome tiling BAC array CGH approach and identified both segmental gains and segmental losses throughout the entire human genome. With complete genome coverage and the tiling nature of our array, we were able to identify a large number of candidate CNVs (3,654). With a focus on only the 800 frequently occurring loci, this study has significantly expanded our knowledge of CNVs. A large proportion (77%) of these high-frequency CNVs are novel; the lack 90 of complete overlap between our CNVs and previously reported CNVs is consistent with the current hypothesis that thousands of CNVs exist in the human population. In our data set, the net difference in genomic size between two individuals could vary widely, by at least 9 Mb in the two most diverse, representing a difference of 228 distinct CNV clones. In addition, pairwise comparison of the high-frequency CNVs among the 95 individuals revealed that the genomes of the two most diverse individuals differed at 266 loci. These data demonstrate that a significant fraction of the human genome can vary in copy number. On the basis of our high-frequency CNV data set and a minimum detection sensitivity for BAC array CGH of 40 kb, at least 1.5% of the mapped human autosomes is tolerant to CNV. This is an underestimate because the percentage of single- and double-occurrence loci that may represent true CNVs was not taken into account. Over 1,500 genes were found to overlap the high-frequency CNVs detected in this study. Several of these CNV-associated genes are related to the senses, including a group of olfactory receptor genes, multiple taste-receptor genes, and several genes related to sight or hearing. Genes that are well-known to have variable copy number\u00E2\u0080\u0094such as those encoding rhesus blood group, amylases, and defensins\u00E2\u0080\u0094were also observed within our common CNVs. These associations suggest that CNVs may contribute to phenotypic diversity in humans. Elsewhere, segmental copy-number gains or losses have been demonstrated to associate with developmental disorders and susceptibility to human disease (10). Many genes associated with disease and susceptibility to disease were found to show CNV among the individuals within our study. These include genes associated with diabetes mellitus or a bleeding disorder; cancer- related genes, such as putative oncogenes and tumor-suppressor genes; and genes associated with susceptibility to coronary artery disease or Alzheimer disease. Like other aspects of human genetic variation, understanding of CNVs is critical for studying disease-associated changes correctly, as illustrated in the genome profiling of patients with mental retardation (24). Clinically relevant alterations in copy number need to be separated from a baseline of CNVs for gene 91 discovery. Therefore, it is of utmost importance when genetic association studies of diseases are conducted that they be interpreted in the context of baseline segmental copy-number status; CNVs identified in this study provide a source of information for such a baseline. Interestingly, several of our CNV loci were also found to overlap with microRNAs. Although the functions of microRNAs are largely unknown, they may play a role in the regulation of various biological processes, such as the control of development, differentiation, cell proliferation, and apoptosis, and they have also been linked to human diseases (30-32). Recent studies have shown a global downregulation of microRNAs in tumors compared with in normal tissues and an upregulation of microRNA expression via copy-number changes in lymphoma (33, 34). Our data raise the possibility that CNVs encompassing microRNAs contribute to human diversity and disease susceptibility. This comprehensive whole-genome study, identifying both segmental gains and losses in the human population, has significantly expanded our knowledge of CNVs. Remarkably, the genomes of the two most diverse individuals within this study differed by at least 9 Mb in size, or 266 loci in content. In addition, on the basis of our high-frequency CNV data set, at least 1.5% of the human genome is tolerant of CNV. However, with the lack of complete overlap between our CNVs and those identified elsewhere and the hypothesis that thousands of CNVs exist in the human genome, this comprehensive study is still an early step toward a more complete understanding of CNVs within the human population, and more studies are needed to examine the functional roles of CNVs. Figure 5.1. Example of a karyogram from a hybridization experiment in this study. 92 Figure 5.2. Detection of CNVs. 93 Figure 5.3. Distribution of overlapped CNVs at different recurrence levels. 94 Figure 5.4. Overlap of CNVs with segmental duplications (SD). 95 Figure 5.5. Cluster analysis by use of a CEPH pedigree. 96 Figure 5.6. Distribution of CNV clones. 97 Figure 5.7. Detection of immunoglobulin variations. 98 Figure 5.8. Inheritance of CNVs at five olfactory receptor loci in 14 members of a CEPH pedigree. 99 Table 5.1. Samples used in this study Sample Sample source a M/F b Sample Sample source a M/F b S1 Coriell (NA17755), Han of L.A. M S54 screening program, ethnicity unknown F S2 Coriell (NA10975), Mayan M S55 screening program, ethnicity unknown F S3 Coriell (NA17392), Mexican Indian M S56 screening program, ethnicity unknown F S4 Coriell (NA17075), Puerto Rican M S57 screening program, ethnicity unknown F S5 Coriell (NA15724), Czechoslovakian M S58 screening program, ethnicity unknown F S6 Coriell (NA15760), Iceland M S59 screening program, ethnicity unknown F S7 Coriell (NA17384), Africans N of Sahara M S60 screening program, ethnicity unknown F S8 Coriell (NA10469), Biaka M S61 screening program, ethnicity unknown F S9 Coriell (NA10492), Mbuti M S62 screening program, ethnicity unknown F S10 Coriell (NA17361), Ashkenazi Jewish M S63 screening program, ethnicity unknown F S11 Coriell (NA11522), Druze M S64 screening program, ethnicity unknown F S12 Coriell (NA13613), Taiwan Ami tribe M S65 screening program, ethnicity unknown F S13 Coriell (NA13611), Taiwan Ami tribe M S66 screening program, ethnicity unknown F S14 Coriell (NA13603), Taiwan Atayal tribe M S67 screening program, ethnicity unknown F S15 Coriell (NA13606), Taiwan Atayal tribe M S68 donor, ethnicity unknown F S16 Coriell (NA11587), Japanese M S69 donor, ethnicity unknown F S17 Coriell (NA10540), Melanesian M S70 donor, ethnicity unknown F S18 screening program, ethnicity unknown M S71 donor, ethnicity unknown F S19 screening program, ethnicity unknown M S72 donor, ethnicity unknown F S20 screening program, ethnicity unknown M S73 donor, ethnicity unknown F S21 screening program, ethnicity unknown M S74 donor, ethnicity unknown M S22 screening program, ethnicity unknown M S75 screening program, ethnicity unknown F S23 screening program, ethnicity unknown M S76 screening program, ethnicity unknown F S24 screening program, ethnicity unknown M S77 Coriell (NA17393), Mexican Indian F S25 screening program, ethnicity unknown M S78 donor, ethnicity unknown F S26 screening program, ethnicity unknown M S79 donor, ethnicity unknown F S27 screening program, ethnicity unknown M S80 donor, ethnicity unknown F S28 screening program, ethnicity unknown M S81 screening program, ethnicity unknown F S29 screening program, ethnicity unknown M S82 screening program, ethnicity unknown F S30 screening program, ethnicity unknown M S83 screening program, ethnicity unknown F S31 screening program, ethnicity unknown M S84 screening program, ethnicity unknown F S32 screening program, ethnicity unknown M S85 screening program, ethnicity unknown F S33 donor, ethnicity unknown M S86 screening program, ethnicity unknown F S34 donor, ethnicity unknown M S87 screening program, ethnicity unknown F S35 donor, ethnicity unknown M S88 screening program, ethnicity unknown F S36 donor, ethnicity unknown M S89 screening program, ethnicity unknown F S37 Coriell (NA17766), Han of L.A. F S90 screening program, ethnicity unknown F S38 Coriell (NA17076), Puerto Rican F S91 screening program, ethnicity unknown F S39 Coriell (NA15729), Czechoslovakian F F1 Coriell (NA11917, paternal grandpa), Utah M S40 Coriell (NA15766), Iceland F F2 Coriell (NA11918, paternal grandma), Utah F S41 Coriell (NA17348), Africans S of Sahara F F3 Coriell (NA11919, maternal grandpa), Utah M S42 Coriell (NA10471), Biaka F F4 Coriell (NA11920, maternal grandma), Utah F S43 Coriell (NA11521), Druze F F5* Coriell (NA10842, dad), Utah M S44 Coriell (NA10539), Melanesian F F6* Coriell (NA10843, mom), Utah F S45 screening program, ethnicity unknown F F7* Coriell (NA11909, son), Utah M S46 screening program, ethnicity unknown F F8* Coriell (NA11910, daughter), Utah F S47 screening program, ethnicity unknown F F9* Coriell (NA11911, daughter), Utah F S48 screening program, ethnicity unknown F F10* Coriell (NA11912, son), Utah M S49 screening program, ethnicity unknown F F11* Coriell (NA11913, son), Utah M S50 screening program, ethnicity unknown F F12* Coriell (NA11915, daughter), Utah F S51 screening program, ethnicity unknown F F13* Coriell (NA11916, son), Utah M S52 screening program, ethnicity unknown F F14* Coriell (NA11921, daughter), Utah F S53 screening program, ethnicity unknown F a Coriell sample numbers were shown in brackets; ethnicities were shown if known b Gender: male (M), female (F) * These 10 CEPH family samples were not included in the CNV discovery set of 95. 100 Table 5.2. Expected CNV patterns of eight hybridizations between four DNA samples. CNV combinations a Expected CNV patterns b MR FS MP FP MRvsFS MRvsFS MRvsFS MRvsFS MRvsMP FSvsMP MRvsFP FSvsFP neg neg neg - - neg neg neg pos - - neg neg neg - - neg neg - - - - neg neg pos - - - - neg neg pos neg - - neg neg pos - - - - neg neg pos pos - - - - neg neg neg - - - - + + neg neg - - - - + - neg neg pos - - - - + - - neg neg - - - - - + neg - - - - - - neg pos - - - - - - - neg pos neg - - - - - - + neg pos - - - - - - - neg pos pos - - - - - - - - neg pos neg neg - - - - + + neg pos neg - - - - + - + neg pos neg pos - - - - + - neg pos neg - - - - - + + neg pos - - - - - + - + neg pos pos - - - - - + - neg pos pos neg - - - - - + neg pos pos - - - - - - + neg pos pos pos - - - - - - neg neg neg + + + + + + neg neg + + + + + - neg neg pos + + + + + - - neg neg + + + + - + neg + + + + - - neg pos + + + + - - - neg pos neg + + + + + - + neg pos + + + + + - - neg pos pos + + + + + - - - neg neg + + + + neg + + neg pos + + - - neg + + pos - - pos neg - - + + pos - - pos pos - - - - pos neg neg - - - - + + + + pos neg - - - - + + + pos neg pos - - - - + + - pos neg - - - - + + + pos - - - - + + pos pos - - - - + - pos pos neg - - - - - + + pos pos - - - - - + pos pos pos - - - - - - 101 Table 2. (continued) CNV combinations a Expected CNV patterns b MR FS MP FP MRvsFS MRvsFS MRvsFS MRvsFS MRvsMP FSvsMP MRvsFP FSvsFP pos neg neg neg + + + + + + pos neg neg + + + + + + - pos neg neg pos + + + + + - pos neg neg + + + + + - + pos neg + + + + + - + - pos neg pos + + + + + - - pos neg pos neg + + + + - + pos neg pos + + + + - + - pos neg pos pos + + + + - - pos neg neg + + + + + + + + pos neg + + + + + + + pos neg pos + + + + + + - pos neg + + + + + + + pos + + + + + + pos pos + + + + + - pos pos neg + + + + - + + pos pos + + + + - + pos pos pos + + + + - - pos pos neg neg + + + + pos pos neg + + + + pos pos neg pos + + pos pos neg + + + + pos pos + + + + pos pos pos + + pos pos pos neg + + pos pos pos + + a Possible combinations of copy number status in the 4 DNA samples: MR, single male reference; FS, single female sample; MP, male pool; FP, female pool. pos=copy number gain; neg=copy number loss; blank=no copy number change. b Expected CNV patterns of 8 hybridizations between the 4 DNA samples. Observed experimental data were compared against these expected patterns. In each hybridization, the first sample is expected to have a net gain (+), a net loss (-), or the same number (blank) of copy number as the second sample for a CNV with the particular combination of copy number status shown on the left. 102 Table 5.3. Sensory-related genes associated with CNVs. 103 chr band a gain+loss b Gene(s) c product: disease d clone(s) in locus e 1p36.31 25 TAS1R1 sweet taste receptor T1r isoform a,b,c,d RP11-58A11, RP11-719E21 3p21.31 18 GNAT1 guanine nucleotide binding protein, alpha: Night blindness, congenital stationary RP11-787O14 7q32.1 5 IMPDH1 inosine monophosphate dehydrogenase 1 isoform a,b: Retinitis pigmentosa-10 RP11-636E12 7q32.1 3 OPN1SW opsin 1 (cone pigments), short-wave- sensitive: Colorblindness, tritan RP11-638M14 7q35 54 OR2A12, OR2A14, OR2A2, OR2A25, OR2A5, OR2A1, OR2A42, OR2A7 olfactory receptor, family 2, subfamily A RP11-703N5, RP11-466J6 8p23.3 5 OR4F21, OR4F29 olfactory receptor, family 4, subfamily F RP11-418D21 11q11 8 OR4C6, OR4P4, OR4S2, OR5D13 olfactory receptor, family 4, subfamily C,P,S,D RP11-626N6 11q12.3 3 ROM1 retinal outer segment membrane protein 1: Retinitis pigmentosa, digenic RP11-484M5 12p13.2 3 TAS2R14, TAS2R44, TAS2R48, TAS2R49, TAS2R50 taste receptor, type 2, member 14,44,48,49,50 RP11-202N1 12q13.2 3 OR6C2, OR6C4, OR6C68, OR6C70 olfactory receptor, family 6, subfamily C RP11-222A15 14q11.2 61 OR4M1, OR4Q3, OR4K1, OR4K2, OR4K5, OR4N2, OR4K13, OR4K14, OR4K15 olfactory receptor, family 4, subfamily M,Q,K,N RP11-597A11, RP11-490A23, RP11-449I24, CTD-2024K23 15q11.2 26 OR4M2, OR4N4 olfactory receptor, family 4, subfamily M,N RP11-281J20 16p13.3 7 OR1F1 olfactory receptor, family 1, subfamily F RP11-680M24 17q25.3 18 ACTG1, FSCN2 actin, gamma 1 propeptide: Deafness, autosomal dominant 20/26; fascin 2: Retinitis pigmentosa-30 RP11-730A9, RP13-550B21 19p13.2 62 OR2Z1 olfactory receptor, family 2, subfamily Z RP11-282G19, RP11-367L15 22q11.1 15 OR11H1 olfactory receptor, family 11, subfamily H RP11-561P7 22q12.3 5 MYH9 myosin, heavy polypeptide 9, non-muscle: Deafness, autosomal dominant 17 RP11-108P21 a Chromosome band b Total number of copy number gains and losses observed for a CNV locus c Sensory-related gene(s) overlapping a CNV locus d Gene product(s) and associated disease(s) according to ReqSeq of the UCSC May 2004 assembly and the Online Mendelian Inheritance in Man (OMIM) morbid map e Clone or overlapping clones in a CNV locusa Chromosome band Table 5.4. Select examples of CNVs associated with cancer-related genes. chr band a gain+loss b gene c product d clone(s) in locus e 1p36.33 49 SKI v-ski sarcoma viral oncogene homolog RP11-83K22, RP11-181G12 1p36.32 12 TP73 tumor protein p73 RP11-631K6 1p36.31 16 TNFRSF25 tumor necrosis factor receptor superfamily, RP11-58A11 1p32.3 32 RAB3B RAB3B, member RAS oncogene family RP11-469M21, RP11-91A18 1p13.3 6 VAV3 vav 3 oncogene RP11-480L11 2q14.2 18 RALB v-ral simian leukemia viral oncogene homolog B RP11-818M2 2q37.3 6 BOK BCL2-related ovarian killer RP11-343P10 3p21.31 20 NAT6, TUSC2, TUSC4 putative tumor suppressor FUS2, tumor suppressor candidates 2 & 4 RP11-787O14, RP13-487A19 4q31.1 3 RAB33B RAB33B, member RAS oncogene family RP11-124P22 6q21 3 C6orf210 candidate tumor suppressor protein RP11-601O12 6q25.1 20 ESR1 estrogen receptor 1 RP11-655H19 7p22.3 19 MAFK v-maf musculoaponeurotic fibrosarcoma oncogene RP11-16P10 7p22.3 6 MAD1L1 MAD1-like 1 RP11-325O9 8q24.21 4 MYC v-myc myelocytomatosis viral oncogene homolog CTD-2034C18 9q34.2 22 VAV2 vav 2 oncogene RP11-352K12, RP11-651E2 10p11.23 11 MAP3K8 mitogen-activated protein kinase kinase kinase RP11-350D11 11p15.4 15 CDKN1C cyclin-dependent kinase inhibitor 1C RP11-494F4 11p13 3 WT1, WIT-1 Wilms tumor 1 isoform A/B/C/D, Wilms tumor associated protein RP11-710L2 11p11.2 3 C1QTNF4 C1q and tumor necrosis factor related protein 4 RP11-425G10 11q13.1 3 MEN1 menin isoform 1 RP11-485O9 11q13.3 6 CCND1, ORAOV1 cyclin D1, oral cancer overexpressed 1 RP11-124K14 12q13.12 4 MLL2 myeloid/lymphoid or mixed-lineage leukemia 2 RP11-66M13 13q31.1 4 C13orf10 cutaneous T-cell lymphoma tumor antigen se70-2 RP11-86D5 14q32.32 3 TNFAIP2 tumor necrosis factor, alpha-induced protein 2 RP11-455L5 16p13.3 19 AXIN1 axin 1 isoform a/b RP11-598I20 16q22.3 3 BCAR1 breast cancer anti-estrogen resistance 1 RP11-109K6 17p13.2 6 TAX1BP3 Tax1 (human T-cell leukemia virus type I) RP11-753P16 17q11.2 6 NF1 neurofibromin RP11-518B17 17q21.32 3 PHB prohibitin RP11-472H5 17q25.3 17 MAFG v-maf musculoaponeurotic fibrosarcoma oncogene RP11-634L10, RP11-712H22 17q25.3 6 C1QTNF1 C1q and tumor necrosis factor related protein 1 RP11-167N2 18p11.32 15 YES1 viral oncogene yes-1 homolog 1 RP11-806L2 18q21.1 8 DCC deleted in colorectal carcinoma RP11-346H17 19p13.3 6 SH3GL1 SH3-domain GRB2-like 1 RP11-406I1 19p13.3 4 TNFSF9, TNFSF7, TNFSF14 tumor necrosis factor (ligand) superfamily, members RP11-526C20 19p13.3 4 VAV1 vav 1 oncogene CTD-2200O16 19p13.11 16 RAB3A RAB3A, member RAS oncogene family RP11-512B16 19q13.33 15 PTOV1 prostate tumor overexpressed gene 1 RP11-597G9 19q13.33 7 BAX BCL2-associated X protein isoform sigma/gamma/epsilon/delta/beta/alpha CTD-2017J20 19q13.33 8 RRAS related RAS viral (r-ras) oncogene homolog RP11-264M8, RP11-808J4 20q13.13 3 BCAS4 breast carcinoma amplified sequence 4 isoform a/b RP11-124P7 22q11.21 3 HIC2 hypermethylated in cancer 2 CTD-2245I11 a Chromosome band b Total number of copy number gains and losses observed for a CNV locus c Gene associated with cancer, according to ReqSeq of the UCSC May 2004 assembly and the Online Mendelian Inheritance in Man (OMIM) morbid map, overlapping a CNV locus d Product encoded by the gene e Clone or overlapping clones in a CNV locus 104 Table 5.5. Select CNVs overlapping genes associated with diseases or disease susceptibility. chr band a gain+loss b gene(s) c product(s) d disease e clone(s) in locus f 1p36.11 7 NR0B2 short heterodimer partner Obesity, mild, early-onset RP11-492E20 2q31.2 7 TTN titin isoform N2-A, N2-B; isoform novex-1,2,3 Muscular dystrophy, limb- girdle, type 2J RP11-95I17 4q11 3 SGCB sarcoglycan, beta (43kDa dystrophin- associated) Muscular dystrophy, limb- girdle, type 2E RP11-61F5 5q13.2 60 SMA3, SMA4 SMA3, SMA4 Spinal muscular atrophy-2,-1 RP11-313J5, RP11-155O16 5q13.2 6 SMN1 survival of motor neuron 1, telomeric isoform d Spinal muscular atrophy-4 RP11-195E2 6q25.3 34 LPA lipoprotein, Lp(a) Coronary artery disease, susceptibility to CTD-2310B5 6q26 5 PARK2 parkin isoform 1, 2, 3 Parkinson disease, juvenile, type 2 CTD-2019O18 7p13 10 GCK glucokinase isoform 2,3 Diabetes mellitus, neonatal- onset RP11-808H7 9q22.33 4 GPR51 G protein-coupled receptor 51 Nicotine dependence, susceptibility to RP11-786E15 11q12.3 3 BSCL2 seipin Spinal muscular atrophy, distal, type V RP11-484M5 12p13.31 79 A2M alpha-2-macroglobulin precursor Alzheimer disease, susceptibility to RP11-536M6 19p13.3 29 TBXA2R thromboxane A2 receptor isoform 2 Bleeding disorder due to defective thromboxane A2 receptor RP11-584K12 19q13.32 3 FKRP fukutin-related protein Muscular dystrophy, limb- girdle, type 2I RP11-422M7 22q11.21 6 COMT catechol-O- methyltransferase isoform S-COMT Schizophrenia, susceptibility to RP11-651A4 a Chromosome band b Total number of gains and losses observed for a CNV locus c Gene associated with disease or disease susceptibility, according to ReqSeq of the UCSC May 2004 assembly and the Online Mendelian Inheritance in Man (OMIM) morbid map, overlapping a CNV locus d Product encoded by the gene e Disease or disease susceptibility associated with the gene, according to the Online Mendelian Inheritance in Man (OMIM) morbid map f Clone or overlapping clones in a CNV locus 105 Table 5.6. MicroRNAs overlapping CNVs. chr band a gain+loss b microRNA(s) c clone(s) in locus d 3p21.2 7 hsa-let-7g, hsa-mir-135a-1 RP11-185J5, RP11-258D4 4p16.1 15 hsa-mir-95 CTD-2104N3, RP11-512D9 4p15.31 27 hsa-mir-218-1 RP11-644J20 8p21.3 9 hsa-mir-320 RP11-13A10 9q22.32 18 hsa-let-7a-1, hsa-let-7d, hsa-let-7f-1 RP11-519D15 10q26.3 21 hsa-mir-202 RP11-319M21, RP11-466F21, RP13-520O22 11q12.1 3 hsa-mir-130a RP11-781C10 17q25.3 13 hsa-mir-338 RP11-149I9 19p13.2 13 hsa-mir-199a-1 RP11-20N24, RP11-751C24 19p13.13 4 hsa-mir-181c, hsa-mir-181d, hsa-mir-23a, hsa-mir-24-2, hsa-mir-27a RP11-423F4 19q13.33 25 hsa-mir-150 RP11-21O13 20q11.22 3 hsa-mir-499 RP11-638P17 20q13.33 74 hsa-mir-124a-3 CTD-2240P21, RP11-543D7 22q11.21 6 hsa-mir-185 RP11-651A4 a Chromosome band b Total number of gains and losses observed for a CNV locus c Human microRNA(s) downloaded from the Sanger miRBase database overlapping a CNV locus d Clone or overlapping clones in a CNV locus 106 107 5.5 References 1. Altshuler D, Brooks LD, Chakravarti A, Collins FS, Daly MJ, Donnelly P. A haplotype map of the human genome. Nature 2005 Oct 27; 437(7063): 1299-1320. 2. Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK. A high-resolution survey of deletion polymorphism in the human genome. Nat Genet 2006 Jan; 38(1): 75-81. 3. Hinds DA, Kloek AP, Jen M, Chen X, Frazer KA. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat Genet 2006 Jan; 38(1): 82-85. 4. McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, et al. Common deletion polymorphisms in the human genome. Nat Genet 2006 Jan; 38(1): 86-92. 5. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, et al. Detection of large-scale variation in the human genome. Nat Genet 2004 Sep; 36(9): 949-951. 6. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, et al. Large-scale copy number polymorphism in the human genome. Science 2004 Jul 23; 305(5683): 525-528. 7. Sharp AJ, Locke DP, McGrath SD, Cheng Z, Bailey JA, Vallente RU, et al. Segmental duplications and copy-number variation in the human genome. Am J Hum Genet 2005 Jul; 77(1): 78-88. 8. Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, et al. Fine-scale structural variation of the human genome. Nat Genet 2005 Jul; 37(7): 727-732. 9. Eichler EE. Widening the spectrum of human genetic variation. Nat Genet 2006 Jan; 38(1): 9-11. 10. Inoue K, Lupski JR. Molecular mechanisms for genomic disorders. Annu Rev Genomics Hum Genet 2002; 3: 199-242. 11. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, et al. A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet 2004 Mar; 36(3): 299-303. 12. Khojasteh M, Lam WL, Ward RK, Macaulay C. A stepwise framework for the normalization of array CGH data. BMC Bioinformatics 2005 Nov 18; 6(1): 274. 13. Chi B, DeLeeuw RJ, Coe BP, MacAulay C, Lam WL. SeeGH--a software tool for visualization of whole genome array comparative genomic hybridization data. BMC Bioinformatics 2004 Feb 9; 5: 13. 14. Locke DP, Sharp AJ, McCarroll SA, McGrath SD, Newman TL, Cheng Z, et al. Linkage Disequilibrium and Heritability of Copy-Number Polymorphisms within Duplicated Regions of the Human Genome. Am J Hum Genet 2006 Aug; 79(2): 275-290. 15. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet 1998 Oct; 20(2): 207-211. 108 16. Griffiths-Jones S. The microRNA Registry. Nucleic Acids Res 2004 Jan 1; 32(Database issue): D109-111. 17. Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, et al. Recent segmental duplications in the human genome. Science 2002 Aug 9; 297(5583): 1003-1007. 18. Bailey JA, Yavor AM, Massa HF, Trask BJ, Eichler EE. Segmental duplications: organization and impact within the current human genome project assembly. Genome Res 2001 Jun; 11(6): 1005-1017. 19. Cheng Z, Ventura M, She X, Khaitovich P, Graves T, Osoegawa K, et al. A genome- wide comparison of recent chimpanzee and human segmental duplications. Nature 2005 Sep 1; 437(7055): 88-93. 20. She X, Jiang Z, Clark RA, Liu G, Cheng Z, Tuzun E, et al. Shotgun sequence assembly and recent segmental duplications within the human genome. Nature 2004 Oct 21; 431(7011): 927-930. 21. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome- wide expression patterns. Proc Natl Acad Sci U S A 1998 Dec 8; 95(25): 14863-14868. 22. Rozen S, Skaletsky H. Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol 2000; 132: 365-386. 23. Weksberg R, Hughes S, Moldovan L, Bassett AS, Chow EW, Squire JA. A method for accurate detection of genomic microdeletions using real-time quantitative PCR. BMC Genomics 2005; 6: 180. 24. de Vries BB, Pfundt R, Leisink M, Koolen DA, Vissers LE, Janssen IM, et al. Diagnostic genome profiling in mental retardation. Am J Hum Genet 2005 Oct; 77(4): 606-616. 25. Wolf S, Sharpe LT, Schmidt HJ, Knau H, Weitz S, Kioschis P, et al. Direct visual resolution of gene copy number in the human photopigment gene array. Invest Ophthalmol Vis Sci 1999 Jun; 40(7): 1585-1589. 26. Finishing the euchromatic sequence of the human genome. Nature 2004 Oct 21; 431(7011): 931-945. 27. Lerman MI, Minna JD. The 630-kb lung cancer homozygous deletion region on human chromosome 3p21.3: identification and evaluation of the resident candidate tumor suppressor genes. The International Lung Cancer Chromosome 3p21.3 Tumor Suppressor Gene Consortium. Cancer Res 2000 Nov 1; 60(21): 6116-6133. 28. Seabright M. A rapid banding technique for human chromosomes. Lancet 1971 Oct 30; 2(7731): 971-972. 29. Lee C. Vive la difference! Nat Genet 2005 Jul; 37(7): 660-661. 30. Alvarez-Garcia I, Miska EA. MicroRNA functions in animal development and human disease. Development 2005 Nov; 132(21): 4653-4662. 31. Bartel DP. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 2004 Jan 23; 116(2): 281-297. 109 32. Wienholds E, Plasterk RH. MicroRNA function in animal development. FEBS Lett 2005 Oct 31; 579(26): 5911-5922. 33. Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, et al. MicroRNA expression profiles classify human cancers. Nature 2005 Jun 9; 435(7043): 834-838. 34. Tagawa H, Seto M. A microRNA cluster as a target of genomic amplification in malignant lymphoma. Leukemia 2005 Nov; 19(11): 2013-2016. 110 Chapter 6: Deletion of 4q21 and low expression of CCNI and CCNG2 result in poor overall survival in mantle cell lymphoma 111 6.1 Introduction Mantle cell lymphoma (MCL) is a clinically distinct B-cell lymphoma with a median survival of approximately 3 years with few long term survivors (1, 2). MCL is characterized by a t(11;14) chromosomal translocation which brings the CCND1 gene under the influence of the IgH enhancer region resulting in the constitutive expression of CCND1 (cyclin D1) and deregulated progression from G1 to S phase of the cell cycle (3, 4). Although this abnormality is found in virtually all cases of MCL, it is insufficient to result in the disease; therefore, secondary genetic alterations have been proposed as essential in MCL pathogenesis (5, 6). In fact, chromosomal aberrations have been recurrently observed in cytogenetic studies, providing support for this hypothesis (7, 8). However, DNA copy number alterations reported in early CGH studies did not produce genetic features that consistently correlate with overall survival. This is presumably due to the technological limitations in resolution, the composition, and size of the cohort studied (9-11). For example, survival statistics could be skewed due to the inclusion of leukemic MCL cases (9, 10). Although additional studies have examined MCL copy number alterations, they did not address survival in their analyses (12-14). Consequently, no single region influencing overall survival was identified in previous studies. Two recurrent regions that have a negative impact on survival were identified in 2 of 3 studies; loss of 9p21 and loss of 17p13 (10, 11). Other regions identified in only one study each with a negative impact on survival include loss of 8p21, 9q21, 13q14, and 13q34. In addition, one region was associated with a positive impact on survival; loss of 1p21 (10). In the present study we applied comprehensive genomic profiling at 26,819 loci and expression profiling to analyze a panel of MCL cases while carefully controlling for leukemic cases and cases that did not receive treatment with a curative intent. We then focused on regions of genetic alteration that correlated to a poor outcome in MCL patients to investigate what mechanisms lead to rapid disease progression. 112 6.2 Methods Fifty-two samples obtained from patients with MCL comprised of 31 classic MCL, 16 blastoid MCL, and 5 MCL with only a leukemic component at the time of diagnosis were included in this study (Supplemental Table I). Extracted DNA was assayed for comprehensive genomic copy number status as previously described (15-18). Select cases and a microdissected mantle zone control had RNA extracted and assayed by serial analysis of gene expression (SAGE). Long SAGE libraries were constructed according to the L-SAGE kit manual (Invitrogen) and mapped using the February 12, 2006 version of SAGE Genie. Quantitative real-time PCR was carried out to assess mRNA levels using the 7500 Fast Real-Time PCR System (Applied Biosystems Inc. Foster City, CA) as per manufacturer\u00E2\u0080\u0099s instructions. Log rank tests, Kaplan-Meier plots, and Cox-regression analysis were performed using SPSS version 14.0 (SPSS, Chicago, IL, USA). P values of < 0.05 were considered significant. All data used in this study are available through the Gene Expression Omnibus (GEO) as a series numbered GSE13331. Temporary reviewer link: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=jhujbeeeysqgmhy&acc=GSE13331 6.3 Results Alignment of comprehensive genomic copy number profiles of all MCL samples defined 62 recurrently altered regions with a frequency of at least 10% (6 of the 52 cases) which exclude the immunoglobulin receptor loci at 2p11.2, 14q32.32, and 22q11.22 that may be rearranged during B-cell development (Supplemental Table II). However, these 62 regions represent not only somatic alterations within the tumor samples, but constitutional copy number variations (CNVs) in either the tumor sample or the reference DNA. Comparison of these regions with previously determined CNVs using this platform identified 20 of our regions as likely CNVs (19). We investigated the 42 remaining regions for correlation to overall survival in 45 of our 52 samples (leukemic MCL cases and two patients that were too frail to be treated with curative intent were not included) (Table 6.1). Within our series, we identified seven regions as 113 associated with poor overall survival in a univariate analysis (five independent of International Prognostic Index); losses of 4q13.3-q21.33 (P = 0.0090), 9p21 (P = 0.0004), 9q34.2-qterm (P = 0.0190), 13q14.2-q14.3 (P = 0.0133), and 15q26.2 (P = 0.0033) and gains of 8q24.21-q24.3 (P = 0.0253) and 12q14.1-q15 (P = 0.0087) (Table 6.1 bold, Supplemental Figure 1). However, several of these regions were not independent of each other and we constructed a survival model that was independent of IPI based on four regions: 4q13.3-q21.33, 8q24.21-q24.3, 9p21, and 13q14.2-q14.3. We found that cases with a greater number of these alterations had worse outcome than cases with fewer (P = 1.03x10-5, Figure 6.1A). There seemed to be a clear distinction between cases with zero or one alteration and cases with two, three, or four; therefore, we grouped these together to create two prognostic groups (P = 5.87x10-6, Figure 6.1B). We next performed long-SAGE on three cases of classic MCL and one sample of microdissected normal mantle zone cells to catalogue genes expressed in MCL or their normal cell counterpart. Investigating which genes were present in either the MCL or the normal libraries within the regions that comprise our survival model (Supplemental Table III), we identified 57 genes. Within the 9p21 region we found two expressed genes; CDKN2B and MTAP. The 13q14.3-q14.3 regions was very small (1.2 Mb) and contained no transcribed mRNA in the SAGE libraries. However, within this region reside the mir-16-1/mir-15a microRNAs that have been linked to BCL2 protein expression (20). The 8q24.21-q24.3 region is 19.8 Mb in size and contains 26 expressed genes that were seen in the SAGE Libraries. Of note, even though MYC is in the center of the altered region, it has been documented that the MYC mRNA transcript is labile and may be immeasurable in our assay (21). The final region within our survival model is 4q13.3-q21.33. This altered region, at 11.5%, is seldom reported in other studies. In addition, this is the first report that has identified its prognostic importance. We therefore investigated the 4q13.3-q31.33 region further. Based on the known biological function of the 29 expressed genes within this region, we found only two 114 that seemed to fit with a loss of the region and the biology of MCL; CCNI and CCNG2. We utilized quantitative real time PCR to measure expression levels of the CCNI and CCNG2 mRNA in 27 of our 45 samples in which RNA was available (Supplemental Table IV). We found that mRNA levels of both CCNI and CCNG2 were significantly correlated with overall survival (P = 0.0201 and P = 0.0292, respectively) (Figure 6.2A/B). 6.4 Discussion Here we identified seven regions that show correlation with poor overall survival in a univariate analysis; 4q13-q21, 8q24, 9p21, 9q34, 12q14-q15, 13q14, and 15q26. Of these, five remain significant in Cox regression analysis with the International Prognostic Index (IPI) (Table 6.1). We next developed a survival model based on our seven candidate survival regions. Consistent with the result of the univariate analysis above the survival model included 9p21 (a driving force in G1-S phase transition), 13q14 (shown to play an important role in apoptotic behavior), and 8q24 (likely resulting in the deregulation of the strong oncogene MYC). However, we were intrigued to see a previously uncharacterized alteration within the model, 4q13-q21. Upon investigating the genes that are expressed in mantle cells from this region, we noted two genes from the cyclin family, CCNI and CCNG2. The protein products of CCNI and CCNG2, cyclin I and cyclin G2, are atypical cyclins in that they do not promote cell cycling, but are involved in cell cycle arrest and apoptosis (22). Therefore the loss of this function through copy number losses may in effect promote cell cycling and provide anti-apoptotic behavior. In light of the established importance of deregulation of the cell cycle and apoptosis in MCL, we decided to further investigate the expression of these new candidates (23). Considering the cases for which we had RNA, we found that lower expression of these genes correlated with poor overall survival. The fact that only two of the cases assayed had losses of the 4q13-q21 region but other cases showed equally low expression suggests that gene deletion is not the only mechanism disrupting gene expression. Our data suggest that they play an important role in 115 MCL pathogenesis. In fact, CCNG2 has been implicated in the pathogenesis of several malignancies including oral cancer and acute lymphoblastic leukemia (24-26). In conclusion we have comprehensively defined genomic regions of recurrently altered copy number in a representative panel of MCL cases with complete clinical annotation. The most common region associated with overall survival was the loss of 9p21. However, we found six other regions that correlate with overall survival; 4q13.3-q21.33, 9q34.2-qterm, 13q14.2-q14.3, 15q26.2, 8q24.21-q24.3, and 12q14.1-q15. From these we developed a survival model based on 4q, 9p, 13q losses and 8q gains which was highly predictive of outcome (P = 5.87 x 10-6). We further investigated the novel region on 4q, and found two genes, CCNI and CCNG2, with expression levels that correlated with survival. These genes are involved in controlling the cell cycle between S-phase and G2 phase. Therefore our findings not only reinforce the importance of the cell cycle in MCL, but suggest that cell cycle checkpoints additional to G1-S are of importance in MCL pathogenesis. Figure 6.1. Survival model based on four copy number alterations; loss of 4q13.3-q21.33, gain of 8q24.21-q24.3, loss of 9p21, and loss of 13q14.2-q14.3. A) Count of alterations in model; solid line with squares represents cases with no alterations, dotted line represents cases with one alteration, dashed line represents cases with two alterations, dashed and dotted line represents cases with three alterations, solid line without squares represents cases with all 4 alterations in model (virtually superimposed on y-axis). B) Grouped survival model; dashed line represents cases with two or more alterations, solid line represents cases with only one or none these alterations. 116 Figure 6.2. CCNI and CCNG2 expression correlation to survival. A) Kaplan-Meier plot of CCNI low expression samples (dashed) versus high expression samples (solid). B) Kaplan- Meier plot of CCNG2 low expression samples (dashed) versus high expression samples (solid). 117 Table 6.1. Regions of somatic copy number alteration with greater than 10% recurrence. Status Band position Start Clone \u00E2\u0080\u00A1 End Clone\u00E2\u0080\u00A1 Size (Mb) Frequency P-value \u00C2\u00A7 Loss 1p21.3-p22.1 RP11-796M5 RP11-396N3 0.77 32.7% 0.1526 Loss 1q21.3 RP11-425D18 RP11-68O15 0.85 11.5% 0.4490 Loss 1q21.3-q23.3 RP11-805F22 RP11-620O19 6.24 11.5% 0.4490 Loss 1q25.2-q31.2 RP11-231P5 RP11-119C5 12.23 11.5% 0.0539 Loss 1q42.3-q43 RP11-27F5 RP11-124K15 5.41 11.5% 0.0607 Loss 3p25.2-p26.3 RP11-279B12 RP11-95H20 10.19 11.5% 0.0647 Loss 3p14.1 RP11-282F19 RP11-449J5 3.53 11.5% 0.7587 Loss 3p12.2-p13 RP11-73P12 RP11-657G2 11.04 11.5% 0.7587 Gain 3q29 RP11-326J2 RP11-721B3 4.31 46.2% 0.6097 Loss 4q13.3-q21.33 RP11-628C10 RP11-109D13 12.07 11.5% 0.0090 Gain 6p22.3-p24.1 RP11-39G14 RP11-636J20 4.49 15.4% 0.2674 Gain 6p21.1-p21.2 RP11-418F15 RP11-475H19 2.28 13.5% 0.9781 Loss 6q21 CTD-2025J24 RP11-601O12 2.08 34.6% 0.7283 Loss 6q24.1-q24.2 CTD-2017C15 RP11-148B14 9.02 34.6% 0.8999 Gain 7p21.3 RP11-482K3 RP11-722J11 1.49 17.3% 0.5069 Gain 7p15.3-p21.1 RP11-788B2 RP11-243C6 5.02 17.3% 0.5069 Gain 7p14.2 RP11-497E7 RP11-798L21 0.88 17.3% 0.3711 Gain 7p14.1 RP11-792O2 RP11-790C6 1.69 17.3% 0.2919 Loss 8p21.2-p21.3 RP11-380D18 RP11-806N20 1.27 23.1% 0.4235 Gain 8q24.21-q24.3 RP11-389M7 RP11-613F12 19.84 21.2% 0.0253g Loss 9p21.3 RP11-615P15 RP11-458D17 0.72 44.2% 0.0004g Loss 9q22.31-q22.33 RP11-95E22 RP11-80H12 6.82 32.7% 0.6366 Loss 9q34.2-qterm RP11-205O8 RP11-35I18 3.94 19.2% 0.019g Loss 10p14-pter RP11-797F8 RP11-453H2 6.74 21.2% 0.2904 Gain 10p12.1-p12.31 RP11-469D16 RP11-118O16 4.89 11.5% 0.6834 Gain 11q13.3 RP11-681H17 RP11-599F23 0.40 26.9% 0.3568 Loss 11q22.3 RP11-95J9 RP11-415G10 0.54 44.2% 0.6220 Gain 11q25 RP11-296M15 CTD-2013A2 0.90 17.3% 0.9976 Gain 12q14.1-q15 RP11-489P6 RP11-324P9 11.33 13.5% 0.0087g Loss 12p13.1 RP11-795A9 RP11-205A15 1.42 13.5% 0.7053 Loss 13q14.2-q14.3 RP11-480P3 RP11-686G10 1.19 48.1% 0.0133g Loss 13q34 RP11-88E10 RP11-226B11 1.91 50.0% 0.1207 Loss 15q14-q15.1 RP11-38P2 RP11-350P18 3.95 15.4% 0.0579 Gain 15q22.31-q23 RP11-45J10 RP11-72H3 5.94 21.2% 0.9053 Loss 15q26.2 RP13-564K15 RP11-18P13 1.05 13.5% 0.0033 Loss 16p12.1-p12.2 CTD-2138I8 RP11-121F6 0.58 11.5% 0.4970 Loss 17p13.1 RP11-222J21 RP11-452D1 1.08 32.7% 0.2035 Gain 17q25.1 RP11-629D12 RP11-78L11 0.85 13.5% 0.9434 Loss 22q12.2-q12.3 RP11-671L1 RP11-636L11 2.40 15.4% 0.1353 Loss Xp22.33 RP11-379A4 RP11-457M7 2.10 11.5% 0.1448 Gain Xq26 RP11-359G18 RP11-359G18 0.15 11.5% 0.7982 Gain Xq28 RP11-330B2 RP11-430K16 1.04 11.5% 0.7982 \u00E2\u0080\u00A1 RP11 clones are derived from the Roswell Park Cancer Institute Human BAC clone library 11, CTD clones are derived from the \u00E2\u0080\u009CCaltech\u00E2\u0080\u009D Human BAC Library CIT-D. \u00C2\u00A7 Overall survival correlation calculated using LogRankTest for presence or absence of copy number alteration in 45 of 52 cases. Statistically significant regions are denoted in bold font. g Significant in Cox regression analysis with IPI score. 118 119 6.5 References 1. Anon. A clinical evaluation of the International Lymphoma Study Group classification of non-Hodgkin's lymphoma. The non-Hodgkin's Lymphoma Classification Project. Blood 1997; 89: 3909-3918. 2. Argatoff LH, Connors JM, Klasa RJ, Horsman DE, Gascoyne RD. Mantle cell lymphoma: a clinicopathologic study of 80 cases. Blood 1997 Mar 15; 89(6): 2067-2078. 3. Swerdlow SH, Williams ME. From centrocytic to mantle cell lymphoma: a clinicopathologic and molecular review of 3 decades. Hum Pathol 2002 Jan; 33(1): 7-20. 4. Williams ME, Swerdlow SH, Rosenberg CL, Arnold A. Centrocytic lymphoma: a B-cell non-Hodgkin's lymphoma characterized by chromosome 11 bcl-1 and PRAD 1 rearrangements. Curr Top Microbiol Immunol 1992; 182: 325-329. 5. Bodrug SE, Warner BJ, Bath ML, Lindeman GJ, Harris AW, Adams JM. Cyclin D1 transgene impedes lymphocyte maturation and collaborates in lymphomagenesis with the myc gene. Embo J 1994 May 1; 13(9): 2124-2130. 6. Lovec H, Grzeschiczek A, Kowalski MB, Moroy T. Cyclin D1/bcl-1 cooperates with myc genes in the generation of B-cell lymphoma in transgenic mice. Embo J 1994 Aug 1; 13(15): 3487-3495. 7. Au WY, Gascoyne RD, Viswanatha DS, Connors JM, Klasa RJ, Horsman DE. Cytogenetic analysis in mantle cell lymphoma: a review of 214 cases. Leuk Lymphoma 2002 Apr; 43(4): 783-791. 8. Wlodarska I, Pittaluga S, Hagemeijer A, De Wolf-Peeters C, Van Den Berghe H. Secondary chromosome changes in mantle cell lymphoma. Haematologica 1999 Jul; 84(7): 594-599. 9. Kohlhammer H, Schwaenen C, Wessendorf S, Holzmann K, Kestler HA, Kienle D, et al. Genomic DNA-chip hybridization in t(11;14)-positive mantle cell lymphomas shows a high frequency of aberrations and allows a refined characterization of consensus regions. Blood 2004 Aug 1; 104(3): 795-801. 10. Rubio-Moscardo F, Climent J, Siebert R, Piris MA, Martin-Subero JI, Nielander I, et al. Mantle-cell lymphoma genotypes identified with CGH to BAC microarrays define a leukemic subgroup of disease and predict patient outcome. Blood 2005 Jun 1; 105(11): 4445-4454. 11. Thelander EF, Ichimura K, Collins VP, Walsh SH, Barbany G, Hagberg A, et al. Detailed assessment of copy number alterations revealing homozygous deletions in 1p and 13q in mantle cell lymphoma. Leuk Res 2007 Sep; 31(9): 1219-1230. 12. Rinaldi A, Kwee I, Taborelli M, Largo C, Uccella S, Martin V, et al. Genomic and expression profiling identifies the B-cell associated tyrosine kinase Syk as a possible therapeutic target in mantle cell lymphoma. Br J Haematol 2006 Feb; 132(3): 303-316. 13. Schraders M, Pfundt R, Straatman HM, Janssen IM, van Kessel AG, Schoenmakers EF, et al. Novel chromosomal imbalances in mantle cell lymphoma detected by genome- 120 wide array-based comparative genomic hybridization. Blood 2005 Feb 15; 105(4): 1686- 1693. 14. Tagawa H, Karnan S, Suzuki R, Matsuo K, Zhang X, Ota A, et al. Genome-wide array- based CGH for mantle cell lymphoma: identification of homozygous deletions of the proapoptotic gene BIM. Oncogene 2005 Feb 17; 24(8): 1348-1358. 15. de Leeuw RJ, Davies JJ, Rosenwald A, Bebb G, Gascoyne RD, Dyer MJ, et al. Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes. Hum Mol Genet 2004 Sep 1; 13(17): 1827-1837. 16. Deleeuw RJ, Zettl A, Klinker E, Haralambieva E, Trottier M, Chari R, et al. Whole- genome analysis and HLA genotyping of enteropathy-type T-cell lymphoma reveals 2 distinct lymphoma subtypes. Gastroenterology 2007 May; 132(5): 1902-1911. 17. Iqbal J, Kucuk C, Deleeuw RJ, Srivastava G, Tam W, Geng H, et al. Genomic analyses reveal global functional alterations that promote tumor growth and novel tumor suppressor genes in natural killer-cell malignancies. Leukemia 2009 Feb 5. 18. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, et al. A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet 2004 Mar; 36(3): 299-303. 19. Wong KK, deLeeuw RJ, Dosanjh NS, Kimm LR, Cheng Z, Horsman DE, et al. A comprehensive analysis of common copy-number variations in the human genome. Am J Hum Genet 2007 Jan; 80(1): 91-104. 20. Cimmino A, Calin GA, Fabbri M, Iorio MV, Ferracin M, Shimizu M, et al. miR-15 and miR-16 induce apoptosis by targeting BCL2. Proc Natl Acad Sci U S A 2005 Sep 27; 102(39): 13944-13949. 21. Dani C, Blanchard JM, Piechaczyk M, El Sabouty S, Marty L, Jeanteur P. Extreme instability of myc mRNA in normal and transformed human cells. Proc Natl Acad Sci U S A 1984 Nov; 81(22): 7046-7050. 22. Horne MC, Donaldson KL, Goolsby GL, Tran D, Mulheisen M, Hell JW, et al. Cyclin G2 is up-regulated during growth inhibition and B cell antigen receptor-mediated cell cycle arrest. J Biol Chem 1997 May 9; 272(19): 12650-12661. 23. Fernandez V, Hartmann E, Ott G, Campo E, Rosenwald A. Pathogenesis of mantle-cell lymphoma: all oncogenic roads lead to dysregulation of cell cycle and DNA damage response pathways. J Clin Oncol 2005 Sep 10; 23(26): 6364-6369. 24. Bogni A, Cheng C, Liu W, Yang W, Pfeffer J, Mukatira S, et al. Genome-wide approach to identify risk factors for therapy-related myeloid leukemia. Leukemia 2006 Feb; 20(2): 239-246. 25. Ito Y, Yoshida H, Uruno T, Nakano K, Takamura Y, Miya A, et al. Decreased expression of cyclin G2 is significantly linked to the malignant transformation of papillary carcinoma of the thyroid. Anticancer Res 2003 May-Jun; 23(3B): 2335-2338. 26. Kim Y, Shintani S, Kohno Y, Zhang R, Wong DT. Cyclin G2 dysregulation in human oral cancer. Cancer Res 2004 Dec 15; 64(24): 8980-8986. 121 Chapter 7: Conclusions 7.1 Summary MCL is an aggressive NHL that remains poorly treated. The median survival for patients with MCL is 3-4 years; however, there are few long term survivours. In fact, almost everyone that develops MCL will die from the disease. At the initiation of this thesis work, very little was understood about MCL other than most cases had a t(11;14) translocation and the survival of patients was intimately linked with the rate at which the malignant cells proliferated (1). It was also known at the time through mouse model systems that the translocation alone was insufficient to develop the disease and secondary genomic alterations were required. Secondary genomic copy number changes had been identified using karyotyping and CGH; however, the resolution and sensitivity were such that reports remained inconsistent. Therefore a more sensitive and higher resolution technique was required to determine where in the genome these secondary genomic copy alterations reside. This localization was important to focus future efforts to characterize candidate oncogenes and tumour suppressor genes. The identification of these candidate genes would both help us to understand the pathogenesis of MCL and allow for the potential development of novel therapies. 7.1.1 Development of tools to measure copy number alterations throughout the human genome Chapter 2 details the construction of a tiling resolution array consisting of 32,433 overlapping BAC clones covering the entire human genome. This array represented a drastic increase in our ability to identify genetic alterations and their boundaries throughout the genome in a single comparative genomic hybridization (CGH) experiment. Through profiling of cancer samples with this platform we have identified minute DNA alterations which had escaped previous detection. These alterations include microamplifications and deletions containing known 122 oncogenes and tumour-suppressor genes, as well as novel genes which may be associated with various tumours, demonstrating the need to move beyond conventional marker-based genome analysis techniques which infer status between measured loci. This submegabase resolution tiling set (SMRT) array CGH platform allows comprehensive assessment of genomic alterations at a level never before possible (2). With the construction of this tiling-resolution array for the detection of copy number alterations at unprecedented levels, we quickly determined that the tools commonly used to visualize and analyze these types of data were insufficient. Traditionally, Microsoft Excel was used to visualize and analyze data created by previous arrays containing hundreds to a couple thousand data points. As we were creating 32,433 X 3 replicates, or almost 100,000 data points, we needed better tools for handling these data. In fact, at that time Microsoft Excel would only accommodate ~65,000 rows of data. Thus, in Chapter 3 we developed a visualization tool for displaying whole genome array CGH data in the context of chromosomal location. SeeGH generates high resolution chromosome profiles from standard array ratio data files. Data are then presented in a high resolution display representative of conventional CGH karyotype diagrams with the ability to zoom in on regions of interest and view annotation information such as gene mapping. To generate these diagrams, SeeGH imports the data into a database, calculates the average ratio and standard deviation for each replicate spot, and links them to chromosome regions. Once the data are displayed, users have the option of filtering data based on user defined QC criteria, and retrieve annotation information such as clone name, NCBI sequence accession number, ratio, base pair position on the chromosome, and standard deviation. This represents a novel software tool used to view and analyze array CGH data (3). 7.1.2 Measurement of MCL model genomes With the tools in place, we next endeavoured to measure the genomes of MCL model genomes (cell lines). The reasoning behind starting with these was that they provide a homogenous 123 sample in which all of the cells should have the same copy number alterations and we wished to determine if consistent copy number patterns were evident in MCL. Having a homogenous population of cells was important at the time because the specific characteristics of the SMRT array platform; such as what proportion of aberrant cells is required to measure a single copy number change at a single or very few data points, had not been determined. In addition, the literature describing MCL genetics was inconsistent. We needed to determine whether these inconsistencies were due to randomness of copy number changes in MCL or simply due to inconsistencies in the technique used to measure those copy number changes. Thus, in chapter 4, using a newly developed DNA microarray of 32 433 overlapping genomic segments spanning the entire human genome, we searched for secondary genomic alterations concomitant with the t(11;14) in eight commonly used cell models of MCL (Granta-519, HBL-2, NCEB-1, Rec-1, SP49, UPN-1, Z138C and JVM-2). Examining these genomes at tiling resolution identified an unexpected average of 35 genetic alterations per cell line, with equal numbers of amplifications and deletions. Recurrent high-level amplifications were identified at 18q21 containing BCL2, and at 13q31 containing GPC5. In addition, a recurrent homozygous deletion was identified at 9p21 containing CDKN2A and CDKN2B. Alignment of these profiles revealed 14 recurrent losses and 21 recurrent gains as small as 130 kb. Remarkably, even the functional immunoglobulin gene deletions at 2p11 and 22q11 were detected, demonstrating the power of combining the detection sensitivity of array comparative genomic hybridization (CGH) with the resolution of an overlapping whole genome tiling-set. These alterations not only coincided with previously described aberrations in MCL, but also defined 13 novel candidate regions. Further characterization of such minimally altered genomic regions identified using whole genome array CGH will help define novel dominant oncogenes and tumor suppressor genes that play important roles in the pathogenesis of MCL. 124 7.1.3 Determination of natural copy number variations in the human genome Through the assessment of these model genomes, control experiments to determine noise characteristics, and other studies ongoing in the lab, we quickly realized that there were consistent copy number variation throughout the genome that was not due to somatic alteration during cancer development. Thus, in chapter 5, using a whole-genome array comparative genomic hybridization assay, we identified 3,654 autosomal segmental CNVs, 800 of which appeared at a frequency of at least 3%. Of these frequent CNVs, 77% were novel. In the 95 individuals analyzed, the two most diverse genomes differed by at least 9 Mb in size or varied by at least 266 loci in content. Approximately 68% of the 800 polymorphic regions overlap with genes, which may reflect human diversity in sensation (smell, hearing, taste, and sight), rhesus phenotype, metabolism, and disease susceptibility. Intriguingly, 14 polymorphic regions harbor 21 of the known human microRNAs, raising the possibility of the contribution of microRNAs to phenotypic diversity in humans. This in-depth survey of CNVs across the human genome provides a valuable baseline for studies involving human genetics. 7.1.4 Correlation of somatic copy number alterations in MCL to clinical outcome With the tools available to assay copy number alterations comprehensively and our knowledge about the baseline copy number variation in the human genome, we confidently moved forward to assess copy number alterations in MCL clinical samples. In addition, the previously mentioned specific characteristics of the SMRT array had been determined by our lab (not included in this thesis). The end result of these characteristics resulted in our determining that normal cell levels as high as 75% can be tolerated in detection of large scale alterations, while sensitive detection of small alterations is still possible in samples with ~50% purity (4). These characteristics were well within the clinical parameters of MCL. Thus, in chapter 6 we analyzed 52 cases of MCL that represented the spectrum of the disease. 62 regions of copy number alteration were defined. Of these, 20 were excluded from further analysis due to their occurence as natural copy number alterations. Of the 42 remaining somatic copy number 125 alterations we focused on alterations that denote a poor outcome amongst patients with MCL. We found seven loci that correlated with overall survival and constructed a survival model based on four of these loci (P = 5.87 x 10-6). Using gene expression we determined that CCNI and CCNG2 may be important in MCL pathogenesis (P = 0.0201 and P = 0.0292, respectively). Our findings both reinforce the hypothesis that cell cycle deregulation and apoptosis are key factors in MCL pathogenesis and extend it to checkpoints in the cell cycle additional to G1-S. 7.2 Concurrent studies 7.2.1 Concurrent developments in the detection of copy number alterations Although, as stated, the SMRT array represented an order of magnitude increase in resolution over existing techniques to measure copy number alterations, it was not long before alternate technologies followed suit. While the SMRT remains the only tiling-resolution BAC array, alternate strategies that replaced the BAC clones with synthetic oligonucleotides have been developed by several commercial companies. The first commercial platform to assess copy number alterations was initially designed to measure SNPs; however, they discovered that if enough control experiments were conducted to create a baseline of signal strengths for each probe, then copy number could be inferred (5). However, this analysis had several shortcomings, not the least of which is the number and composition of the normal control experiments. In addition, the hybridization characteristics of the assay required the reduction of genomic complexity by whole-genome sampling assay (WGSA), where restriction fragments from the genome were amplified by LMPCR (6). These LMPCR fragments provided amplified fragments from many specific loci throughout the genome which were then assayed for copy number. Between the normal control and selective assessment issues, the detection characteristics were not an improvement over the SMRT array. However, an advantage of using this assay was that regions of loss of heterozygosity (LOH) could be identified within the same experiment. The shortcomings were overcome in later iterations of the technology where 126 multiple restriction enzymes and differing software that better accommodated specific controls were introduced. Other synthetic oligonucleotide based platforms emerged shortly after the development of the SMRT array (7-9). While the increased length of the oligonucleotides, to 60- 70 bp, improved on the performance and eliminated the need for genomic complexity reduction, several probes were needed in order to make a confident copy number call. The original resolution of these arrays, encompassing tens of thousands of oligonucleotides, was comparable to the SMRT array; however, very recently an increase in the density of measured oligonucleotides to hundreds of thousands has improved the resolution linearly by an order of magnitude (10). 7.2.2 MCL secondary genomic copy number assessment Previous to our investigation of MCL model genomes, only standard karyotype and CGH studies had been reported, and those with little overlap in conclusions. However, during and after our investigation several array CGH studies of MCL genomic copy number alterations were published (11-15). The earliest of these utilized hundreds to a couple thousand BACs on an array to localize copy number alterations. Even at these relatively low resolutions, a consistent picture began to emerge about the location of copy number alterations in MCL, suggesting that the earlier inconsistencies were due to sensitivity as opposed to resolution. An additional study of MCL secondary copy number alterations using a 10K oligonucleotide SNP array provided corroborating evidence for the location and frequency of copy number alterations (16). However, the few studies that linked these findings to clinical outcome did not agree (to be discussed further in 7.2.4). 7.2.3 Natural copy number variation in the human genome Natural copy number variation in the human genome exists at multiple levels. The work within this thesis addresses the variation in copy number of segments of the genome at or larger than ~80kb. Upon the development of techniques to measure the copy number status throughout 127 the genome at resolutions below 100kb, it was immediately observed that even normal samples vary in copy number at multiple locations. This was something that needed to be controlled for while investigating somatic copy number alterations in cancer. There were two initial reports that brought these variations to the attention of the scientific world (17, 18). These were quickly followed by ours and other studies that attempted to define the locations of these throughout the genome (19-23). Looking at the overlap between these studies revealed that we were only beginning to appreciate the amount of copy number variation that exists in human genome (24). In fact, recent studies have started linking these naturally occurring variations to disease states. 7.2.4 MCL pathogenesis The determination of what makes MCL an aggressive, incurable disease essentially started with an investigation into the expression pattern of 101 MCL cases collected from centres around the world (1). The conclusion from this study were that an expression signature that encompassed genes entirely involved in cellular proliferation were very predictive of outcome in MCL. This contrasted with a previous preliminary investigation of five MCL cases versus four hyperplastic lymph nodes, which concluded that in addition to the t(11;14) translocation, expression of apoptosis related genes varied greatly between the two groups (25). While both of these studies identified the end result of genomic aberrations, i.e. mRNA expression changes, they did not provide any insight into the primary genetic events that lead to these expression changes. Of the six other studies that investigated copy number alterations, only three addressed their impact on patient outcome. Unfortunately the consistency between reports was once again very poor with no single locus being linked with outcome in all three studies. Essentially we now knew where the alterations were in the genome and at what frequency, but with the exception of a few loci, had very little idea as to what impact each locus had and why. 128 7.3 Discussion and conclusions To address the theme of this thesis, determination of genomic alterations that lead to MCL pathogenesis, we have developed tools to detect copy number alterations throughout the genome, defined both somatic and constitutional copy number variation in MCL, and correlated these to patient outcome. The funding for this project is of importance to bring up at this point because a majority of it came from a focused effort to understand MCL biology via the Lymphoma Research Foundation. This focused effort encompassed not only our research, but similar efforts throughout the world, with little overlap between studies, through the formation of the MCL Consortium. This provided a comprehensive picture of MCL biology developed in a non-random, efficient way. Through this concerted effort, MCL has gone from being a poorly understood disease to one that is well characterized. In essence, MCL pathogenesis proceeds from the initial t(11;14) translocation with an accumulation of genetic changes which provide two main forces in oncogenesis; deregulated cellular proliferation and anti-apoptotic behavior (1). The following describes the more frequent genetic alterations in MCL and their putative impact on these two essential biological functions. While the t(11;14) translocation provides cell cycle deregulation, it is not sufficient to develop the disease. There are several copy number alterations that affect the same proliferation process and are undoubtedly involved in the G1-S phase transition of the cell cycle. One that directly synergizes with the primary t(11;14) translocation event is the loss of 9p21 and resultant under- expression of CDKN2A, the gene encoding p16. This protein is an inhibitor of CDK4, a protein that combines with cyclin D1 to phosphorylate Rb and target it for ubiquination thus releasing E2F-1 to act as a transcription factor for S-phase related genes. The loss of p16 in this complex; however, is not the only copy number alteration influencing this specific complex. Gains of CDK4 at 12q14 are also prevalent as well as losses of RB1 at 13q14. Losses of RB1 may not be a driving factor for losses in that region of the genome as it does not lie within a minimally altered region (MAR). These alterations provide a driving force for the transition 129 between G1 and S-phase of the cell cycle; however, additional cell cycle checkpoints exist. These checkpoints tend to be reliant on signaling from DNA damage sensors such as ATM and p53. ATM at 11q22 shows a frequent loss in copy number that is usually localized at that region. This provides support for this being the candidate gene in this region. ATM acts as a watchdog for DNA damage and its removal would inhibit cell cycle arrest. One of the main targets of ATM is the protein p53, which is phosphorylated by ATM in response to DNA damage. The p53 protein is commonly referred to as the cellular gatekeeper for growth and division (26). It is therefore not surprising that 17p13, that contains p53, also shows frequent losses of DNA copy number. A negative regulator of p53, MDM2, is often gained in copy number and over-expressed in MCL. However, its position at 12q13 is very near to CDK4 and determination of which of these two important regulators is the driving factor for this genetic alteration is problematic as the expression of both genes is increased with gains in copy number. It could be hypothesized that both are driving the maintenance of this copy number alteration. Another gene with a less definitive role in both cellular proliferation and anti-apoptotic behaviour is MYC at 8q24, one of the most often disrupted genes in cancer. MCL is no exception to this with frequent copy number gains in this region. During the course of this thesis project, we developed a technique to assay copy number alterations at an unprecedented resolution (Chapter 2). Immediately upon creating data using this technique, we discovered that data analysis techniques currently in use were insufficient to analyze the massive amount of data created. Therefore, we developed an analysis software package that could be used to intuitively view and analyze high density copy number data created by this and subsequent array CGH platforms (Chapter 3). With these appropriate tools in place, I assayed MCL model genomes for recurrent alterations (Chapter 4). Upon investigating MCL model genomes and appropriate controls, I determined that natural copy number variation exists with the human genome that needed to be accounted for if I was to accurately address the hypothesis of my thesis. The investigation of natural copy number 130 variation in the human genome revealed a greater diversity that previously imagined (Chapter 5). In fact, the ongoing study of this variation and how it influences disease susceptibility continues to grow in scope. However, for the purpose of this thesis, we utilized our new found knowledge of CNVs to control our assessment of somatic alterations associated with MCL pathogenesis. With these background issues resolved, I analyzed a panel of MCL cases for somatic copy number alterations and correlated these to patient outcome (Chapter 6). A comparison of the cell line model genomes of MCL with clinical specimens revealed that the alterations found in clinical specimens are well represented in the cell line model genomes. However, there were additional alterations that occurred only in the cell line model genomes. These alterations could be the result of adaptation to cell culture conditions or increased genomic instability due to less selective pressure by the tumor environment. As a group, the MCL cell line model genomes more closely represent the blastoid variant of MCL. Whereas, individually, the cell line model genomes show a range of blastoid features and heterogeneity of common alterations found within the clinical samples. Therefore, we conclude that the MCL cell lines are a valuable resource for the investigation of MCL pathobiology with the caveat that the specific characteristics of each need to be considered in the context of the research goal. This thesis addresses the impact of copy number alterations on patient outcome. So, how was our studied patients\u00E2\u0080\u0099 outcome influenced by copy number alterations? The alteration most consistently associated with deleterious outcome is the loss of 9p21 and with it p16. However, several other loci had an influence on patient outcome; loss of 4q13, 9q34, 13q14, and 15q26 and gains of 8q24 and 12q14. The gained regions have been discussed above; however, the others have not yet been discussed, including 4q13, 9q34, 13q14, and 15q26. The loss of 13q14 represents one of the most frequent alterations in MCL and has also been described in chronic lymphocytic leukemia. This alteration contains three expressed genes as it is only 1.19 Mb in size; however, it does contain a microRNA cluster that has been linked to the degradation of the BCL2 mRNA. Therefore the loss of this microRNA cluster effectively 131 increases the amount of BCL2 protein which stabilizes the mitochondrial cell membrane and provides antiapoptotic behavior. This was not covered in the paragraph above, even though it is a well known mechanism, because it does not influence the cell cycle, but stops cells from undergoing apoptosis. The remaining three regions are somewhat more mysterious though, with little known for each. Within this thesis I have investigated the 4q13 loss of DNA and found that it also plays a role in the cell cycle. Two genes within the 4q13 MAR, CCNG2 and CCNI, are atypical cyclins that cause cell cycle arrest and induction of apoptosis. Therefore the loss of these genes and their function would allow the cell cycle to proceed more readily in the face of DNA damage that would normally induce apoptosis. The MAR at 15q26 was so small (1.05 Mb) so as to contain no known genes or microRNAs; therefore no clear mechanism can be ascribed to this region. The MAR at 9q34 was frequent and other regions of loss on 9q have been reported as influencing patient outcome. However, many of the cases that showed loss of this region also had loss of 9p21, the most influential alteration. In fact, if any region on chromosome 9 was analyzed for its influence on patient outcome, it would show significance. This is like due to the number of cases that did poorly with whole chromosome 9 loss. It is therefore the conclusion of this researcher that the clinical significance of the 9q34 MAR is a result of its association with 9p21 loss. 7.4 Future directions The genomic copy number alterations in MCL are now more clearly defined. However, as stated in the introduction, recent evidence has supported a role for both epigenetics and microRNAs in the development of cancer. While microRNAs can be assessed for copy number and expression in a similar manner to normal genes, epigenetic assessment requires substantially different techniques. There are a number of highly recurrent copy number alterations in MCL that still lack sufficient investigation to determine their role, particularly losses 132 of 1p and gains of 7p. It is not a great leap to assume that, like 13q14, the driving force for these regions may be microRNAs. In addition, the epigenetics of the MCL genome are still poorly understood with only selected genes having been assayed. To increase our understanding of MCL a concerted effort to assess the copy number, gene heterozygosity, epigenetics, microRNA expression, and mRNA expression in MCL will need to be integrated into a clear picture. This effort would require a consortium to collect a panel of appropriate samples and get the specific fractions into the appropriate hands. Once all dimensions are assayed in the same panel of samples, bioinformatic tools capable of integrating them will need to be brought to bear to truly understand the pathogenesis of MCL. 133 7.5 References 1. Rosenwald A, Wright G, Wiestner A, Chan WC, Connors JM, Campo E, et al. The proliferation gene expression signature is a quantitative integrator of oncogenic events that predicts survival in mantle cell lymphoma. Cancer Cell 2003 Feb; 3(2): 185-197. 2. Ishkanian AS, Malloff CA, Watson SK, DeLeeuw RJ, Chi B, Coe BP, et al. A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet 2004 Mar; 36(3): 299-303. 3. Chi B, DeLeeuw RJ, Coe BP, MacAulay C, Lam WL. SeeGH--a software tool for visualization of whole genome array comparative genomic hybridization data. BMC Bioinformatics 2004 Feb 9; 5: 13. 4. Garnis C, Coe BP, Lam SL, MacAulay C, Lam WL. High-resolution array CGH increases heterogeneity tolerance in the analysis of clinical samples. Genomics 2005 Jun; 85(6): 790-793. 5. Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, et al. High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res 2004 Feb; 14(2): 287-295. 6. Kennedy GC, Matsuzaki H, Dong S, Liu WM, Huang J, Liu G, et al. Large-scale genotyping of complex DNA. Nat Biotechnol 2003 Oct; 21(10): 1233-1237. 7. Barrett MT, Scheffer A, Ben-Dor A, Sampas N, Lipson D, Kincaid R, et al. Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. Proc Natl Acad Sci U S A 2004 Dec 21; 101(51): 17765-17770. 8. Brennan C, Zhang Y, Leo C, Feng B, Cauwels C, Aguirre AJ, et al. High-resolution global profiling of genomic alterations with long oligonucleotide microarray. Cancer Res 2004 Jul 15; 64(14): 4744-4748. 9. Carvalho B, Ouwerkerk E, Meijer GA, Ylstra B. High resolution microarray comparative genomic hybridisation analysis using spotted oligonucleotides. J Clin Pathol 2004 Jun; 57(6): 644-646. 10. Greshock J, Feng B, Nogueira C, Ivanova E, Perna I, Nathanson K, et al. A comparison of DNA copy number profiling platforms. Cancer Res 2007 Nov 1; 67(21): 10173-10180. 11. Flordal Thelander E, Ichimura K, Collins VP, Walsh SH, Barbany G, Hagberg A, et al. Detailed assessment of copy number alterations revealing homozygous deletions in 1p and 13q in mantle cell lymphoma. Leuk Res 2007 Sep; 31(9): 1219-1230. 12. Kohlhammer H, Schwaenen C, Wessendorf S, Holzmann K, Kestler HA, Kienle D, et al. Genomic DNA-chip hybridization in t(11;14)-positive mantle cell lymphomas shows a high frequency of aberrations and allows a refined characterization of consensus regions. Blood 2004 Aug 1; 104(3): 795-801. 13. Rubio-Moscardo F, Climent J, Siebert R, Piris MA, Martin-Subero JI, Nielander I, et al. Mantle-cell lymphoma genotypes identified with CGH to BAC microarrays define a 134 leukemic subgroup of disease and predict patient outcome. Blood 2005 Jun 1; 105(11): 4445-4454. 14. Schraders M, Pfundt R, Straatman HM, Janssen IM, van Kessel AG, Schoenmakers EF, et al. Novel chromosomal imbalances in mantle cell lymphoma detected by genome- wide array-based comparative genomic hybridization. Blood 2005 Feb 15; 105(4): 1686- 1693. 15. Tagawa H, Karnan S, Suzuki R, Matsuo K, Zhang X, Ota A, et al. Genome-wide array- based CGH for mantle cell lymphoma: identification of homozygous deletions of the proapoptotic gene BIM. Oncogene 2005 Feb 17; 24(8): 1348-1358. 16. Rinaldi A, Kwee I, Taborelli M, Largo C, Uccella S, Martin V, et al. Genomic and expression profiling identifies the B-cell associated tyrosine kinase Syk as a possible therapeutic target in mantle cell lymphoma. Br J Haematol 2006 Feb; 132(3): 303-316. 17. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, et al. Detection of large-scale variation in the human genome. Nat Genet 2004 Sep; 36(9): 949-951. 18. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, et al. Large-scale copy number polymorphism in the human genome. Science 2004 Jul 23; 305(5683): 525-528. 19. Conrad DF, Andrews TD, Carter NP, Hurles ME, Pritchard JK. A high-resolution survey of deletion polymorphism in the human genome. Nat Genet 2006 Jan; 38(1): 75-81. 20. Hinds DA, Kloek AP, Jen M, Chen X, Frazer KA. Common deletions and SNPs are in linkage disequilibrium in the human genome. Nat Genet 2006 Jan; 38(1): 82-85. 21. McCarroll SA, Hadnott TN, Perry GH, Sabeti PC, Zody MC, Barrett JC, et al. Common deletion polymorphisms in the human genome. Nat Genet 2006 Jan; 38(1): 86-92. 22. Sharp AJ, Locke DP, McGrath SD, Cheng Z, Bailey JA, Vallente RU, et al. Segmental duplications and copy-number variation in the human genome. Am J Hum Genet 2005 Jul; 77(1): 78-88. 23. Tuzun E, Sharp AJ, Bailey JA, Kaul R, Morrison VA, Pertz LM, et al. Fine-scale structural variation of the human genome. Nat Genet 2005 Jul; 37(7): 727-732. 24. Eichler EE. Widening the spectrum of human genetic variation. Nat Genet 2006 Jan; 38(1): 9-11. 25. Hofmann WK, de Vos S, Tsukasaki K, Wachsman W, Pinkus GS, Said JW, et al. Altered apoptosis pathways in mantle cell lymphoma detected by oligonucleotide microarray. Blood 2001 Aug 1; 98(3): 787-794. 26. Levine AJ. p53, the cellular gatekeeper for growth and division. Cell 1997 Feb 7; 88(3): 323-331. Appendices Research ethics board certificate of approval Biology of lymphoid cancers UBC BCCA Research Ethics Board Fairmont Medical Building (6th Floor) 614 - 750 West Broadway Vancouver, BC V5Z 1H5 Tel: (604) 877-6284 Fax: (604) 708-2132 E-mail: reb@bccancer.bc.ca Website: http://www.bccancer.bc.ca > Research Ethics RISe: http://rise.ubc.ca University of British Columbia - British Columbia Cancer Agency Research Ethics Board (UBC BCCA REB) Certificate of Expedited Approval: Annual Renewal PRINCIPAL INVESTIGATOR: INSTITUTION / DEPARTMENT: REB NUMBER: Joseph Connors BCCA/BCCA/Systemic Therapy - VA (BCCA) H05-60103 INSTITUTION(S) WHERE RESEARCH WILL BE CARRIED OUT: Institution Site BC Cancer Agency Vancouver BCCA Other locations where the research will be conducted: N/A PRINCIPAL INVESTIGATOR FOR EACH ADDITIONAL PARTICIPATING BCCA CENTRE: Vancouver: Joseph Connors Vancouver Island: N/A Fraser Valley: N/A Southern Interior: N/A 135 SPONSORING AGENCIES AND COORDINATING GROUPS: British Columbia Cancer Foundation Genome British Columbia Genome Canada National Cancer Institute of Canada National Institutes of Health - National Cancer Institute Terry Fox Laboratory PROJECT TITLE: Biology of Lymphoid Cancer APPROVAL DATE: EXPIRY DATE OF THIS APPROVAL: PAA#: H05-60103-A006 January 5, 2009 January 5, 2010 CERTIFICATION: 1. The membership of the UBC BCCA REB complies with the membership requirements for research ethics boards defined in Division 5 of the Food and Drug Regulations of Canada. 2. The UBC BCCA REB carries out its functions in a manner fully consistent with Good Clinical Practices. 3. The UBC BCCA REB has reviewed and approved the research project named on this Certificate of Approval including any associated consent form and taken the action noted above. This research project is to be conducted by the provincial investigator named above. This review and the associated minutes of the UBC BCCA REB have been documented electronically and in writing. The UBC BCCA Research Ethics Board has reviewed the documentation for the above named project. The research study as presented in documentation, was found to be acceptable on ethical grounds for research involving human subjects and was approved for renewal by the UBC BCCA REB. 136 UBC BCCA Ethics Board Approval of the above has been verified by one of the following: 137 If you have any questions, please call: Bonnie Shields, Manager, BCCA Research Ethics Board: 604-877-6284 or e-mail: reb@bccancer.bc.ca Dr. George Browman, Chair: 604-877-6284 or e-mail: gbrowman@bccancer.bc.ca Dr. Joseph Connors, First Vice-Chair: 604-877-6000-ext. 2746 or e-mail: jconnors@bccancer.bc.ca Dr. Lynne Nakashima, Second Vice-Chair: 604-707-5989 or e-mail: lnakas@bccancer.bc.ca 138 Online supplementary material Chapter 2: A tiling resolution DNA microarray with complete coverage of the human genome Supplementary material for this chapter is available online at the following url: http://www.nature.com/ng/journal/v36/n3/suppinfo/ng1307_S1.html Chapter 4: Comprehensive whole genome array CGH profiling of mantle cell lymphoma model genomes Supplementary material for this chapter is available online at the following url: http://hmg.oxfordjournals.org/cgi/content/full/ddh195/DC1 Chapter 5: A comprehensive analysis of common copy-number variations in the human genome Supplementary material for this chapter is available online at the following url: http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=17160897 #supplementary-material-sec "@en . "Thesis/Dissertation"@en . "2009-11"@en . "10.14288/1.0067521"@en . "eng"@en . "Pathology"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "Attribution-NonCommercial-NoDerivatives 4.0 International"@en . "http://creativecommons.org/licenses/by-nc-nd/4.0/"@en . "Graduate"@en . "Mantle cell lymphoma pathogenesis"@en . "Text"@en . "http://hdl.handle.net/2429/12469"@en .