Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Concerted genomic and epigenomic alterations in non-small cell lung cancer Wilson, Ian Michael 2010

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2010_fall_wilson_ian.pdf [ 6.3MB ]
Metadata
JSON: 24-1.0071055.json
JSON-LD: 24-1.0071055-ld.json
RDF/XML (Pretty): 24-1.0071055-rdf.xml
RDF/JSON: 24-1.0071055-rdf.json
Turtle: 24-1.0071055-turtle.txt
N-Triples: 24-1.0071055-rdf-ntriples.txt
Original Record: 24-1.0071055-source.json
Full Text
24-1.0071055-fulltext.txt
Citation
24-1.0071055.ris

Full Text

CONCERTED GENOMIC AND EPIGENOMIC ALTERATIONS IN NON-SMALL CELL LUNG CANCER by Ian Michael Wilson B.M.L.Sc., The University of British Columbia, 2002 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in The Faculty of Graduate Studies (Pathology and Laboratory Medicine) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) April 2010  © Ian Michael Wilson, 2010  Abstract Background: Around the world, lung cancer is the leading cause of cancer-related death and a major public health problem. A diagnosis of lung cancer carries a remarkably poor prognosis, even after years of research into the disease. The advent and availability of tools to survey the genomes and epigenomes of lung cancers is beginning to yield real clues into the molecular nature of the disease. These clues are being turned into new diagnostic and therapeutic tools with increasing regularity. We used modern high-resolution and high-throughput tools to identify novel genes implicated in various lung cancer phenotypes and aspects of lung cancer pathogenesis. Hypotheses: (1) Combined profiling of lung cancer genomes and epigenomes will identify critical lung cancer genes that are simultaneously affected by DNA copy number and DNA methylation aberrations. (2) The susceptibility locus on chromosome 6q, identified through familial linkage studies, contains an unidentified tumor suppressor gene (3) Key genes involved in lung cancer phenotypes can be identified through the elucidation of discriminating genomic and epigenomic alterations. Materials/Methods: Genomic and epigenomic data were analyzed independently and pair-wise to identify genes in NSCLC whose alterations are associated with NSCLC risk, development and phenotype. These genomic and epigenomic data were used in conjunction with a multitude of mRNA and protein-level assays to further refine candidate lists and validate their disruption. Targeted molecular silencing of a candidate TSG was used in conjunction with cellular assays to investigate and confirm the role of this gene in NSCLC. Results: We designed and optimized an experimental/analysis framework for the combined interrogation of epigenomic and genomic data. We used this framework to identify a novel lung cancer tumor suppressor gene, EYA4, that is frequently disrupted in lung cancers, and is associated with NSCLC risk. Following this, we identified subtype-specific genomic and epigenomic alterations with consequent gene expression changes in NSCLC subtypes. Lastly, we identified specific phenotypic characteristics of the subtypes affected by the DNA alterations. ii  Conclusions: Integrated analysis of the genomes and epigenomes of NSCLC tumors provides a unique approach for the discovery of key cancer-related genes.  iii  Table of contents Abstract ......................................................................................................................... ii Table of contents ......................................................................................................... iv List of tables ................................................................................................................viii List of figures ............................................................................................................... ix List of abbreviations .................................................................................................... xi Acknowledgements .................................................................................................... xii Dedication ...................................................................................................................xiii Co-authorship statement ...........................................................................................xiv 1.  Introduction..................................................................................................... 1 1.1.  Lung cancer............................................................................................................................ 1  1.2.  Histology ................................................................................................................................ 1  1.3.  Staging and prognosis for NSCLC .......................................................................................... 2  1.4.  Somatic genetic alterations ..................................................................................................... 2  1.5.  1.4.1.  Mutation................................................................................................................. 2  1.4.2.  Loss of heterozygosity ........................................................................................... 4  1.4.3.  Copy number alteration .......................................................................................... 5  DNA methylation and lung cancer ........................................................................................... 7 1.5.1.  Normal DNA methylation ........................................................................................ 7  1.5.2.  Interplay of DNA methylation and histone modification ........................................... 8  1.5.3.  DNA methylation analysis methods ........................................................................ 9  1.5.4.  DNA hypomethylation .......................................................................................... 12  1.5.5.  DNA hypermethylation ......................................................................................... 13  1.5.6.  Association of DNA methylation with tobacco smoke ............................................ 15  1.6.  Lineage specificity ................................................................................................................ 16  1.7.  Susceptibility and etiology..................................................................................................... 17  1.8.  Hypotheses and objectives ................................................................................................... 18  iv  1.9. 1.10.  2.  Specific aims and thesis outline ............................................................................................ 19 References....................................................................................................................... 23  Epigenomics: mapping the methylome ...................................................... 34 2.1.  Introduction .......................................................................................................................... 34  2.2.  Results ................................................................................................................................. 35  2.3.  Discussion ............................................................................................................................ 39  2.4.  References ........................................................................................................................... 44  3.  EYA4 is a non-small cell lung cancer tumor suppressor located in the  susceptibility locus on chromosome 6q................................................................... 46 3.1.  Introduction .......................................................................................................................... 46  3.2.  Results ................................................................................................................................. 48 3.2.1.  Integration of array CGH and DNA methylation identifies two-hit genes ................ 48  3.2.2.  Gene expression analysis of two-hit candidates ................................................... 48  3.2.3.  Validation and analysis of EYA4 disruption in lung specimens .............................. 49  3.2.4.  Assessment of EYA4 re-expression by treatment with 5'-azacytidine .................... 50  3.2.5.  Prognostic relevance of genes in 6q23-25 ............................................................ 50  3.2.6.  Association of EYA4 genotype with familial lung cancer risk ................................. 51  3.2.7.  Effect of EYA4 knockdown on apoptosis .............................................................. 51  3.2.8.  Identification of potential EYA4 activation targets ................................................. 52  3.2.9.  Frequency of EYA4 inactivation in NSCLC squamous cell subtype ....................... 53  3.2.10.  Analysis of EYA4 in pre-invasive lung cancer ....................................................... 53  3.2.11.  Expression of EYA4 in other malignancies ........................................................... 54  3.3.  Discussion ............................................................................................................................ 54  3.4.  Materials and methods.......................................................................................................... 58 3.4.1.  Sample collection and nucleic acid extraction ....................................................... 58  3.4.2.  Array CGH ........................................................................................................... 59  3.4.3.  DNA methylation profiling ..................................................................................... 59  3.4.4.  Array CGH/DNA methylation integration ............................................................... 59  3.4.5.  Gene expression profiling and data processing .................................................... 60  3.4.6.  Quantitative real-time PCR analysis of mRNA expression .................................... 61  3.4.7.  Quantitative real-time methylation-specific PCR ................................................... 61  3.4.8.  5'-azacytidine treatment ....................................................................................... 62  v  3.5.  4.  3.4.9.  DNA methylation and gene expression correlation................................................ 62  3.4.10.  EYA4 mRNA knockdown...................................................................................... 63  3.4.11.  Survival analysis .................................................................................................. 63  3.4.12.  Function analysis ................................................................................................. 64  3.4.13.  AnnexinV/propidium iodide staining ...................................................................... 64  3.4.14.  Serial analysis of gene expression data analysis .................................................. 65  3.4.15.  Analysis of familial lung cancer genotypes............................................................ 65  3.4.16.  Statistical analysis................................................................................................ 66  3.4.17.  Microarray data deposition ................................................................................... 66  References ........................................................................................................................... 85  Genetics and epigenetics contribute to AC and SqCC tumor phenotypes .........................................................................................................................88 4.1.  Introduction .......................................................................................................................... 88  4.2.  Results ................................................................................................................................. 90 4.2.1.  Assessment of global genomic instability in AC and SqCC ................................... 90  4.2.2.  Identification of differential copy-number alteration patterns in AC and SqCC ....... 91  4.2.3.  Subtype specific gene disruption in AC and SqCC................................................ 92  4.2.4.  Different gene networks are associated with the development of AC and SqCC ... 94  4.2.5.  Global subtype differences in DNA methylation levels .......................................... 95  4.2.6.  Subtype-specific epigenetic alterations in AC and SqCC subtypes ....................... 97  4.2.7.  Epigenetically regulated genes complement genetically regulated genes.............. 98  4.2.8.  Concerted genetic and epigenetic disruption of subtype-specific genes ................ 99  4.2.9.  Subtype-specific genes are associated with distinct clinical characteristics ......... 100  4.3.  Discussion .......................................................................................................................... 101  4.4.  Materials and methods........................................................................................................ 107  4.5.  4.4.1.  DNA samples..................................................................................................... 107  4.4.2.  Tiling path array comparative genomic hybridization........................................... 107  4.4.3.  DNA methylation analysis .................................................................................. 108  4.4.4.  Comparison of subtype alteration frequencies .................................................... 109  4.4.5.  Gene expression microarray analysis ................................................................. 110  4.4.6.  Statistical analysis of gene expression data........................................................ 110  4.4.7.  Survival analysis ................................................................................................ 111  4.4.8.  Network identification ......................................................................................... 111  References ......................................................................................................................... 126  vi  5.  Conclusions .................................................................................................130 5.1.  Research summary............................................................................................................. 130  5.2.  Development of a strategy and technology for two-hit detection of tumor genes .................. 131  5.3.  Integrated analysis of DNA methylation and DNA copy number identifies EYA4 as a novel lung  tumor suppressor gene ........................................................................................................................ 134 5.4.  The role of DNA copy number and DNA methylation alterations in NSCLC subtype phenotypes .................................................................................................................................................135  5.5.  Significance of work ............................................................................................................ 138  5.6.  Future directions ................................................................................................................. 139  5.7.  References ......................................................................................................................... 141  Appendix ....................................................................................................................143 Appendix A – UBC Research Ethics Board Certificates Approval................................................... 143  vii  List of tables Table 3.1  Frequently deleted and hypermethylated probes in lung  adenocarcinoma.............................................................................................................82 Table 3.2  Genes with expression patterns in concordance with EYA4 in non-  malignant bronchial epithelial cells…………….…………………………………………….84 Table 4.1  Sample set clinical characteristics………………………………………….124  viii  List of figures Figure 1.1  Lung cancer classification………………………………………….…………22  Figure 2.1  MeDIP-array CGH schema…………….……………………………………..41  Figure 2.2  Epigenomic instability in lung cancer………………………………………..42  Figure 2.3  Alignment of epigenomic and genomic profiles…………………………….43  Figure 3.1  Frequent disruption and down-regulation of EYA4………………………...67  Figure 3.2  EYA4 hypermethylation Controls Gene Expression……………………….69  Figure 3.3  EYA4 expression is associated with poor survival and familial lung cancer  risk……………………………………………………………………………………………….72 Figure 3.4  EYA4 promotes apoptosis…………….……………………………………...74  Figure 3.5  DNA methylation and mRNA expression of EYA4 in SqCC and pre-  invasive samples………………………………………………………………………………78 Figure 3.6  EYA4 expression in other malignancies…………………………………….80  Figure 4.1  Copy number alterations in AC and SqCC………………………………..112  Figure 4.2  Differential expression as a result of subtype specific copy number  alterations……………………………………………………………………………………..114 Figure 4.3  Gene networks involved in the development of SqCC and AC…………116  Figure 4.4  Global DNA methylation patterns of NSCLC tumors and associated  normal tissues………………………………………………………………………………...118 Figure 4.5  SCLC signaling is significantly enriched in epigenetically altered SqCC  genes…………………………………………………………………………………………..120  ix  Figure 4.6  MAPK1 alteration and survival is different in AC and SqCC tumors…...122  Figure 5.1  Hypomethylation of an amplicon in lung adenocarcinoma cells………...147  x  List of abbreviations Abbreviation  Definition  AC  Adenocarcinoma  aCGH  Array comparative genomic hybridization  AFP  Amplified fragment pool  BAC  Bacterial artificial chromosome  bp  Base pairs  cDNA  Complementary DNA  CIS  Carcinoma in situ  CNV  Copy number variation  DNA  Deoxyribonucleic acid  FISH  Fluorescence in situ hybridization  GELCC  Genetic epidemiology of lung cancer consortium  kbp  Kilo-base pairs  LC  Large cell  LIC  Large insert clone  LOH  Loss of heterozygosity  LOI  Loss of imprinting  Mbp  Mega-base pairs  MeDIP  Methylated DNA immunoprecipitation  mM  Millimolar  mRNA  Messenger RNA  NSCLC  Non-small cell lung cancer  PCR  Polymerase chain reaction  RNA  Ribonucleic acid  RMA  Robust multi-chip averaging  SAGE  Serial analysis of gene expression  SCLC  Small-cell lung cancer  SNP  Single nucleotide polymorphism  SqCC  Squamous cell carcinoma  Xi  Inactive X-chromosome  Xa  Active X-chromosome xi  Acknowledgements I would like to acknowledge the contributions of co-authors involved in the preparation of the manuscripts for chapters 2-4. Chapter 2: I wish to thank Jon Davies, Michael Weber, Carolyn Brown, Carlos Alvarez, Calum MacAulay, Dirk Schübeler and Wan Lam for input and helpful discussion. Chapter 3: I wish to thank; Emily Vucic, Yu-An Zhang, Daniel Starczynowski, Kim Lonergan, Timon Buys, and Katey Enfield for experimental assistance; Raj Chari for help with data analysis; Ite Laird-Offringa, Pengyuan Liu, Ming You, and Marshall Anderson for providing additional DNA methylation and genotype data, John Yee and Stephen Lam for providing specimens; Aly Karsan, Calum MacAulay, Adi F. Gazdar and Wan Lam for their critiques and help assembling this complex manuscript. Chapter 4: We wish to thank Raj Chari and Bradley Coe for help with data analysis; John Yee, John English, Nevin Murray, and Ming-Sound Tsao for providing specimens; John Minna, Adi Gazdar, Calum MacAulay, Stephen Lam, and Wan Lam for useful discussion and insights.  I am also thankful for the opportunity to acknowledge the continual and extremely valuable support I have received from my supervisor and supervisory committee. Drs. Carolyn Brown, Stephen Lam, Calum MacAulay, and David Walker (chair) have together provided me with the tools in critical thinking without which I would never have made it this far. Dr. Wan Lam has provided me with everything I have needed and has been a tremendous supervisor, ally, and friend.  xii  Dedication  For Jill, my family, and my friend(s).  xiii  Co-authorship statement Chapters 2 through 4 were originally co-authored as research manuscripts for publication. The entries below represent the complete citations for each of these works. Chapter 2: Wilson IM, Davies JJ, Brown CJ, Weber M, Alvarez CE, MacAulay C, Schübeler D, Lam WL (2006) Epigenomics: mapping the methylome. Cell Cycle 5:1558.  Contribution: I conceived and wrote the manuscript, made the figures, and coperformed all experiments and performed all data analysis. Co-authors assisted with data interpretation and assay development.  Chapter 3: Wilson IM, Vucic EA, Chari R, Zhang YA, Starczynowski DT, Lonergan KA, Enfield KS, Buys TPH, Yee J, Laird-Offringa I, Karsan A, Liu P, You M, Anderson M, MacAulay CA, Lam S, Gazdar AF, Lam WL (2010) EYA4 is a non-small cell lung cancer tumor suppressor located in the susceptibility locus on chromosome 6q. Contribution: I conceived the experimental design, performed many of the experiments, performed most of the data analysis and statistical assessments, and wrote the paper. Co-authors assisted with additional experiments, data processing, and provided some genomic profiles. Chapter 4: Lockwood WW*, Wilson IM*, Chari R, Coe BP, Yee J, English J, Murray N, Tsao MS , Minna JD, Gazdar AF, MacAulay C, Lam S, Lam WL (2010) Genetics and epigenetics contribute to AC and SqCC tumor phenotypes. *Contributed equally Contribution: I am co-lead author on this manuscript. I generated genomic profiles used in the study, and performed much of the data analysis, interpretation, and writing. Additional co-authors have provided data processing assistance and procurement of samples.  xiv  1.  Introduction  1.1.  Lung cancer  Lung cancer is the most common cause of cancer-related mortality in the world, and is most commonly attributed to cigarette smoking1 although other carcinogens, such as arsenic and asbestos, and significant genetic factors are implicated in the disease as well2-5. Combined, the causes of lung cancer result in an estimated 1.2 million deaths annually6. Although incidence and mortality of the disease has begun to plateau in developed countries, tobacco use is endemic in many areas of the world where lung cancer incidence continues to rise6,7. As with other epithelial cancers, most cancers of the lung are thought to be a result of the sequential accumulation of molecular alterations in a step-wise fashion8-10. Work with both mouse and human models of carcinogenesis has shown that multiple events are required to initiate the carcinogenic process11.  1.2.  Histology  Lung cancers are partitioned histologically and clinically into either small cell lung cancer (SCLC) or non-small cell lung cancer (NSCLC) subgroups10. The NSCLC group is further divided into those tumors classified as squamous cell carcinoma (SqCC), adenocarcinoma (AC), and large-cell carcinoma (LC)10. NSCLC is more prevalent than SCLC, and SqCC and AC are the predominant subtypes1. SqCC tends to arise more centrally than AC, and generally follows a stepwise progression from hyperplasia of the pseudostratified ciliated epithelia to metaplasia, dysplasia, carcinoma in situ and invasive carcinoma10. The progression of AC is less well defined, but it is thought that 1  atypical adenomatous hyperplasia (AAH) is the precursor lesion10,12. Please see Figure 1.1 for a schematic representation of lung cancer classification10. Varying levels of differentiation are typical, with LC being the least differentiated, and SqCC the most. AC is known for commonly having a large degree of heterogeneity when compared to the other subtypes13. All types of lung cancer are associated with the smoking of tobacco; however of lung cancers arising in non-smokers, AC is the most frequent with SqCC and SCLC the least14,15.  1.3.  Staging and prognosis for NSCLC  The overall 5-year survival rate for NSCLC is a paltry 16%16, however significant differences in prognosis are associated with disease staging at time of diagnosis. For example, patients diagnosed with early localized disease will, on average, live longer (half live longer than 5 years) than those diagnosed with metastatic disease because they show better response to existing therapies (surgical resection, chemotherapy, radiotherapy)16. The typically late stage of NSCLC diagnosis coupled with relatively ineffective systemic therapies are thought to be the main factors contributing to the poor survival statistics16.  1.4.  Somatic genetic alterations  1.4.1.  Mutation  Mutations to DNA sequences caused by the mutagenic compounds found in cigarette smoke and other environmental carcinogens such as arsenic are well characterized17. These sequence alterations occur at tumor suppressor loci such as TP53 and in oncogenes such as EGFR, and are present in many tumors as well as pre2  cancerous tissues18. This highlights the ability of a sequence of mutations to promote carcinogenesis by both abrogating/altering the function of tumor suppressor genes, and enhancing the function or expression of proto-oncogenes. Mutations are known to precede frank cancer, as they have even been detected in smokers as well as former smokers with and without cancer8,19. In lung cancer patients, it is thought that exposure to mutagens present in tobacco smoke is responsible for the bulk of observed mutations17. Indeed, DNA mutations are so tightly linked with cigarette smoking that distinct differences in mutational profiles exist between the tumors of smokers and nonsmokers15. Somatic alterations to the DNA sequence (mutations) may be discovered in numerous ways, including the common DNA sequencing technologies that have been developed. Single base point mutations may be transversions (purine <-> pyrimidine) or transitions (purine <-> purine or pyrimidine <-> pyrimidine), either of which may affect the protein amino acid sequence or not. Insertions or deletions, known as indels, also occur17. Once the mutational spectrum of a gene is known, it is possible to devise specific assays to more quickly query the mutational status of the gene20. Recently however, a shift has occurred from primarily gene-specific mutational analysis to wholegenome sequencing of tumors. This has been enabled by the advent of numerous high-throughput approaches to sequencing by synthesis, which comprise the current generation of sequencing technologies21. The breadth and scope of somatic mutations in lung cancer has been demonstrated using these technologies to sequence the genome of a small-cell lung cancer cell line22. Doing so identified a staggering 22910 mutations, including over 100 in coding exons. Not only were the detected aberrations exceedingly numerous, but they also bore the tell-tale signatures of tobacco smokeassociated mutation (G -> T transversions). 3  1.4.2.  Loss of heterozygosity  The reduction of genomic regions from a bi-allelic state to a mono-allelic state, which is commonly observed in NSCLC tumors23, can be detected throughout the airways of both current and former smokers. Indeed, this loss of heterozygosity (LOH) is detectable in nearly all people with a history of smoking24-26. The extent of lung injury caused by cigarette smoke is demonstrated by the fact that LOH exists throughout the smoke-damaged lung in those with and without cancer25. Because LOH is a very early molecular change that is detectable well before clinically apparent lung cancer, it is thought to play a significant role in the initiation of carcinogenesis10,25,27,28. Accordingly, LOH occurs frequently at tumor suppressor gene loci (such as the p16 locus at chromosome 9p21), and is associated with the under-expression of the genes at these loci23,26,29-31. Loss of heterozygosity at such loci likely serves as either a first or second hit in a ‘two hit’ scenario, as gene silencing by DNA hypermethylation and sequence mutation are associated with LOH26,31. For example, if one allele of a TSG is silenced by hypermethylation, subsequent LOH (deletion) of the other allele would leave no functional copies of the gene. Alternatively, if LOH occurs first, then only one mutation or DNA hypermethylation event at the locus is required to abrogate gene function. Recent work however suggests that LOH is not limited to regions of copy number deletion32,33. Somatic uniparental disomy is becoming more recognized thanks to the advent of modern single-nucleotide polymorphism detecting arrays (SNP chips)33. This type of alteration has been characterized for both tumor suppressor as well as oncogene loci. In each case, the mutated copy (loss of function for TSG, gain of function for oncogene) of the gene may be present on two copies of an allele with a common parental origin, while the wild type allele is lost33. 4  Genotyping experiments, such as those performed on SNP chips, are now commonly used in cancer research as they provide numerous types of data34-36. Loss of heterozygosity is detectable by comparing heterozygous loci from a non-tumor sample to the same loci in a tumor from the same patient. If paired non-tumor and tumor samples are not available, it is possible to infer LOH based on pre-existing genotypic databases such as those generated by the HapMap project37. The SNP data can also be used for genome-wide association studies, which are extensively used in the identification of alleles associated with increased risk, survival, and other clinical correlates3,34,38-40. Lastly, the new generation of SNP arrays have proven to yield high quality DNA copy number data, including information such as allele-specific copy number, while also giving researchers the ability to assess normal cell contamination41,42.  1.4.3.  Copy number alteration  Numerous alterations to the normal diploid complement of the human genome are associated with NSCLC43-45. NSCLC genomes are typically highly rearranged, exhibiting aneuploidy as well as segmental copy number alterations (duplications/deletions)43,46-51. DNA copy number alterations exist at all stages of tumorigenesis, including within histologically normal pre-neoplastic cells52. Common regions of loss such as those observed at chromosome 3p, 9p, and 17p are known to harbor tumor suppressor genes, while oncogenes are known to reside in regions of frequent copy number gain9,44,53,54. Loci exhibiting DNA deletion may be either homozygous or hemizygous, while regions of DNA gain may exhibit as few as one extra copy or as many as 8044. Regions of DNA known to be greatly increased in copy 5  number are referred to as amplicons, and they exist within the tumor genome in a myriad of arrangements based on the genomic mechanism by which they are derived55. For example, amplicons may exist as many extrachromosomal entities, identical copies of a gene distributed throughout different chromosomes, or as tandem or head to tail arrangements of many copies at a single locus54,55. DNA copy number alterations can be detected in nearly as many ways as they are generated by the tumor cell. Site-specific assays such as fluorescence in situ hybridization, and PCR gave way to spectral karyotyping, and eventually array comparative genome hybridization (CGH)56. Various incarnations of array CGH technologies have been widely used to survey cancer genomes for copy number alterations, and have yielded significant insights into disease processes44,56-58. Array targets may be either large fragments of human genomic DNA inserted into prokaryotic or simple eukaryotic constructs for propagation, known as large-insert clones (LIC). This is the design of the British Columbia Cancer Research Center array manufactured by the Wan Lam lab59. In its current incarnation this array of spotted elements covers the entire human genome with an average detection resolution of ~80 kb60. This array design has been used extensively in this thesis, and in numerous other published works44,57,58,61-65. Microarrays comprised of oligonucleotides comprise the other main group of CGH arrays. These oligonucleotides may be either spotted or synthesized directly on the chip56. While the number of discrete elements on the array can be much higher than spotted LIC arrays (2x106 vs. 6x104 respectively), the resolution of these types of arrays is limited by the increased noise at each element compared to that of an LIC array56,60.  6  Array CGH experiments are technically simple experiments to perform. Typically, repeat-blocked fluorescently labeled probes generated from a diploid reference sample and probes from a tumor sample are competitively co-hybridized to the microarray59. Once images of the microarray have been acquired and processed, systematic biases in the hybridization experiment must be removed. This can be performed using readily available software packages which remove dye bias, slide gradients, and background biases66. DNA alteration data may then be inferred either visually, or computationally. Numerous computational approaches exist for detecting segmental DNA gains and losses in aCGH data66-70. Microarrays designed to assess SNP genotypes have also been adapted to copy number detection, and while their increased resolution compared to other platforms has been debated60 they are constantly improving and are widely used71,72. Recent implementations of massively parallel sequencing technologies can yield DNA copy number information, as well as structural and mutational information, and are likely to be important in years to come22.  1.5.  DNA methylation and lung cancer  1.5.1.  Normal DNA methylation  In humans, the most common, and best studied covalent modification to doublestranded DNA molecules is the addition of a methyl group (-CH3) to the 5'-carbon of cytosines. This occurs nearly exclusively in cytosines located 5' to a guanine (CpG dinucleotide), however there is recent evidence that non-CpG dinucleotide cytosine methylation is more widespread than previously thought73,74. Methylated cytosines in 7  CpG dinucleotides are thought to account for 3-4% of cytosines in the genome75 although their distribution is not random76. CpG dinucleotides are underrepresented in the genome as a whole, but are enriched in evolutionarily conserved motifs known as CpG islands77. These islands are present in the promoter region of just over half of human genes, where they are largely unmethylated in contrast to the heavily methylated CpG dinucleotides that exist elsewhere in the genome78,79. DNA methylation is a functional modification which, in concert with chromatin remodeling machinery, aids in the establishment and maintenance of heterochromatin, the control of replication timing80,81, and abrogates inappropriate transcription of parasitic DNA elements82,83. In addition to these higher-order roles in the maintenance of genomic architecture, DNA methylation also has a role in regulating imprinted genes (allele-specific expression)8486  , the disruption of which has been linked to numerous disease states including  cancer87-92. DNA methylation is also involved in X-chromosome inactivation, where promoters on the inactive X become densely methylated and silenced93,94. As with imprinting, the misregulation of X chromosome inactivation has been detected, and studied in the context of cancers95-97.  1.5.2.  Interplay of DNA methylation and histone modification  Along with stable modifications to DNA nucleotides (such as DNA methylation), there are numerous altered forms of the proteins that package, surround, and shape the DNA double strand81,84. DNA is wrapped around nucleosomes typically comprised of two copies each of histone-H2A, H2B, H3, and H4 which together make chromatin98. Within the context of DNA methylation, chromatin structure, and gene regulation, modifications to the N-terminal tail of histone-H3 are the best studied98,99. Methylation 8  of histone-H3-lysine-9 (H3K9) is known to be associated with an inactive chromatin configuration and transcriptional silencing, while acetylation of the same residue is associated with the opposite84,99,100. Indeed, for twenty-five years the modification of histone tails to engender a closed chromatin conformation has been thought to be the primary mechanism by which DNA methylation results in gene silencing101. This link between DNA methylation and histone structure was explained by the identification of proteins such as methyl-CpG binding domain 1 (MBD1) and methyl-CpG binding protein 2 (MeCP2) that specifically bind methylated DNA102,103. These proteins subsequently associate with different proteins which can both establish and maintain methyl marks such as the DNA methyl-transferase family and enzymes such as histone-deacetylases (HDACs) and histone methyl-transferases (HMTs) which modify histone tails104,105.  1.5.3.  DNA methylation analysis methods  Deciphering the many roles played by DNA methylation has required the development of novel techniques, as well as the adaptation of others. The root of all DNA methylation assays is the ability to discriminate between methylated and unmethylated cytosines, which can be achieved by exploiting methylationsensitive/insensitive isoschizomer restriction-enzyme pairs, chemical conversion of unmethylated cytosine to uracil, and most-recently, the affinity for methylated DNA of specially developed antibodies and methylated-DNA binding proteins106,107. Restrictionlandmark genome scanning, methylation-sensitive representational difference amplification, and methylationspecific digital karyotyping usually exploit rare-cutting methylation-sensitive restriction enzymes, and were some of the first genome-wide scans enabling identification of aberrantly methylated regions108-111. Further refinement 9  of these techniques and adaptation of the protocols to work with generic and epigenome-tailored microarrays has yielded significant results112-116. Although DNA quantity needed was once a limiting concern in these assays, often precluding the use of clinical specimens, the incorporation of various amplification technologies has allowed the profiling of small tumor samples117. Affinity methods such as methylated-DNA immunoprecipitation have somewhat more recently evolved, and have helped researchers to escape the stricture imposed by using sequence biased restriction enzyme approaches118-121. Although affinity approaches are not immune to bias, they have been instrumental in further elucidation of the role DNA methylation plays in health and disease122-126 and are highly amenable to sequencing using the current generation of high-throughput DNA sequencers127. Analysis of methylation patterns using bisulfite-converted DNA remains the gold standard method however, and it has not escaped the inevitable push for higherresolution results, and higher-throughput protocols. What began as PCR and derivative techniques, has evolved into large-scale sequencing of bisulfite-converted genomes, and single-nucleotide polymorphism arrays modified to detect methylated residues128134  . The complexities of sequencing bisulfite-converted DNA are daunting, but groups  are making significant progress by reducing the genomic area interrogated (to perhaps one hundred genes)135. One such complexity arises from the fact that typical sequencing reaction chemistries have evolved to examine templates consisting of 4 nucleotides present in roughly equal proportions – a characteristic that bisulfite treated DNA does not possess. Additionally, it is considerably more difficult to map reads generated from bisulfite-converted DNA than in typical sequencing experiments136. Despite these difficulties, whole-methylome sequencing studies have been undertaken 10  successfully73,74. Among significant findings, these papers have demonstrated the breadth of the information gleaned through such experimentation, as well as the magnitude of the bioinformatic challenge these datasets create. However, what has become abundantly clear, is that the epigenome is even more complex than previously imagined, and that substantial differences in DNA methylation patterns exist between stages of differentiation, different cell types, and different individuals. In addition to examining the DNA directly, pharmacologic approaches have also been used to investigate DNA methylation patterns137. Researchers infer methylation status by manipulating the DNA methylation maintenance and establishment machinery and monitoring results using existing tools, such as gene-expression microarrays137,138. Interestingly, approaches such as these help streamline the validation process, as biological effects of the altered methylation are already demonstrated. The transition from locus-specific to global approaches, and the resulting deluge of data, have also necessitated the development of several computational methods for generating approximations of actual methylation level from the relative levels generated by most microarray and enrichment-based sequencing assays112,127,139,140. Additional computational approaches will likely evolve, and hopefully along with better quantification of methylation, the usability of these programs will improve as well. Improved estimation of DNA methylation levels from current technologies is not the only challenge these technologies face however. It is likely that affinity based methods of DNA methylation assessment also suffer some form of bias as a result of test sample copy number as previously shown in chromatin immunoprecipitation assays141. This is in addition to the bias towards CpG rich regions of the genome which is known to exist 11  for MeDIP127. In contrast, bisulfite based bead assays, such as those offered by Illumina do not exhibit DNA copy number bias133, but suffer from a lack of coverage – providing information at only ~27000 CpG dinucleotides in the genome. To cover more of the genome with greater resolution than is possible from affinity or restriction-based approaches, one must turn to large scale sequencing of bisulfiteconverted DNA using the current generation of high-throughput sequencing technologies.  1.5.4.  DNA hypomethylation  Since DNA methylation plays such a critical role in regulating normal cellular function, it is not surprising that methylation patterns are heavily disturbed in cancers. Globally, cancer genomes are known to be hypomethylated, a phenomenon which is tied to reduced genomic stability, copy number alterations, retroviral re-expression, loss of imprinting (LOI)87-89,92,142,143, and re-activation of tissue-specific genes144-146. Genomic instability is central to all documented tumor types, including lung cancers147 and is known to be linked with DNA hypomethylation148, as hypomethylated DNA is more permissive to inappropriate recombination events149. This has wideranging implications for the study of lung cancer, notably as lung cancers are more frequent in older individuals and global DNA methylation levels tend to decrease with age150,151. In addition to a global reduction in 5'-methylcytosine levels with age, the same reduction of methylated DNA is seen in histologic progression of lung cancers, with more advanced lesions having a higher degree of hypomethylation152-155. This mirrors the increase in genomic instability that is associated with progression from normal epithelia to carcinoma in situ and beyond156. Since DNA hypomethylation  12  seems to predispose genomes towards instability, it has also been examined and documented in the context of specific DNA alteration events, such as LOH157. Although reports of hypomethylated oncogenes such as HRAS exist, the bulk of methylated cytosines in tumors are being lost from repetitive regions152,158. Included are satellite DNA families as well as segmental duplications, subtelomeric regions, and short (Alu) and long (LINE-1) interspersed repetitive elements83,158-160 whose expression is often elevated in tumor samples. Following from this, it has been postulated that aberrant reexpression of functional Alu and LINE-1 elements might be another mechanism of genomic instability induction161. Despite the majority of DNA methylation being lost from repetitive elements, loss of methylation from non-repetitive regions, such as imprinted areas of the genome, occurs as well142,143. As imprinted genomic regions rely on DNA methylation marks to maintain maternal or paternal-specific expression patterning, loss of imprinting, mediated by DNA hypomethylation, results in biallelic expression. This has been well documented in lung cancers142,143, and has been linked with a tumorigenesis and malignant transformation162.  1.5.5.  DNA hypermethylation  Although the loss of millions of methylated cytosines per genome makes hypomethylation the most common DNA alteration in a cancer cell, hypermethylation is also paradoxically very frequent, especially when compared to known coding sequence mutations158,163. Indeed, whole genome scans have revealed that hypermethylated CpG islands number in the hundreds in a given tumor158,163. Hypermethylation of CpG islands is not coincidental, as CpG island hypermethylation in gene promoter regions is often correlated with expression silencing76,164. Since the discovery of this relationship, 13  cancer-specific CpG island hypermethylation has been well researched and the understanding of disease phenotypes has expanded dramatically. From a lung-cancer perspective, promoter hypermethylation of CDKN2A is a very early event, detectable in both histologically normal cells collected from the airways, and in sputum165. These findings have clear diagnostic relevance, especially since the aberrant methylation of this important tumor suppressor gene is detectable up to three years before clinically overt disease165,166. DNA hypermethylation is also very versatile in the role it plays promoting tumorigenesis: it can be either the first, or second hit of a classical tumor suppressor gene. This important fact was discerned by comparing the DNA hypermethylation profiles of sporadic and familial (methylation-linked) cancers – which were found to be similar167. Long lists of genes hypermethylated in lung cancer are possible, given the frequency with which genes are silenced by this mechanism, however some of the best studied and most frequent of these genes are synonymous with tumor suppression such as CDKN2A, RASSF1A, RARβ, MGMT, GSTP1, CDH13, APC, DAPK1, and RUNX3124,137,138,168-175. DNA hypermethylation of promoter CpG islands is not so well studied simply because it has a rather simple relation to gene expression, (in comparison to DNA hypomethylation), but also because it holds great value as a therapeutic target and a biomarker165. DNA hypermethylation related gene silencing is seen as a particularly promising therapeutic target because it is reversible by nature84. While repair of most established sequence mutations and chromosomal rearrangements is impossible, the removal of methyl groups is accomplished by inhibiting the DNA methyltransferase responsible for post-replicative methylation pattern maintenance. Although the existence of an active DNA demethylating system within our cells is likely given the timing of DNA methylation 14  patterning during gametogenesis, the details are not yet clear176-180. Although DNA methyltransferase 1 (DNMT1) inhibiting drugs exist, along with drugs that act on the post-translational modification of histones, they are indiscriminate in their effects. The ultimate goal of epigenetically targeted therapies needs to be the ability to target genes for de-methylation and re-expression within the tumor cells, while avoiding the relaxation of DNA methylation mediated controls in normal cells. This will prove challenging, but would have great benefit for not only the treatment of cancer, but in chemoprevention of it as well.  1.5.6.  Association of DNA methylation with tobacco smoke  Tobacco smoke affects cells through various genetic and epigenetic mechanisms, and has long been recognized as the cause of most lung cancers181. Smokers typically have numerous phenotypic manifestations of tobacco smoke damage, and characteristically, researchers have sought to identify the molecular alterations responsible for these phenotypes by examining all aspects of cellular biology, including DNA methylation profiles166,182,183. The situation is complex however, as both hypo- and hypermethylation have been associated with smoke exposure, and both permanent and transient alterations are seen184. For example, exposure of cells to a carcinogen found in tobacco smoke (N-methyl-N-nitrosourea) resulted in global DNA hypomethylation184. This loss of DNA methylation resulting from tobacco smoke is in fact known to predispose to genetic abnormalities such as LOH185. Other lasting effects of tobacco smoke on the epigenome have made it plausible to use DNA methylation patterns as biomarkers for past exposure186,187, a proposition that is strengthened by the fact that DNA methylation alterations exhibit a dose response with tobacco smoke188. Indeed, 15  hypermethylation of CDKN2A corresponds with smoke exposure, and is known to be detectable in sputum years before disease is clinically apparent165,166, in contrast to KRAS mutations which are thought to be apparent exclusively in the presence of clinically detectable cancer189. Not all changes are this permanent however, as some are related only to active smoking. Hypomethylation of SNCG1 for example is seen only in response to cigarette smoke, and it is remethylated shortly after smoking cessation190.  1.6.  Lineage specificity  Lung cancers of different histologies are thought to possess different cells of origin. For example, it is thought that adenocarcinomas of the lung might arise from type II pneumocytes or Clara cells while squamous cell carcinoma tumors are thought to arise from basal cells191. Given that all cells in the normal lung are genetically identical, then it is reasonable to surmise that the cell of origin and local environment of pre-neoplastic cells impacts the type of tumor they are likely to engender. The interaction of the local cellular environment with cellular genetics will then impact whether a given alteration to the genetic (or epigenetic) makeup of the cell will cause it to become cancerous. An elegant example of this is seen in the lungs of Kras mutant mice191,192. Each of the lung cells possesses the same oncogenic mutation, however only the combination of cell type and environment of the lung periphery is permissive to tumor formation191,192. This not only points to a differing cell-type of origin for different malignancies, but it also highlights the role of the microenvironment in tumor formation.  16  1.7.  Susceptibility and etiology  Although exposure to tobacco smoke is known to be the main cause of lung cancer, it has long been known that there must be genetic modifiers of risk as well193. The existence of cancer-free lifelong smokers and non-smokers with lung cancer attests to the proposed genetic basis of lung cancer risk. Previous work has indeed shown that individuals with a family history of lung cancer among immediate relatives have increased lung cancer risk compared to those without a family history of lung cancer5,194,195. These discoveries spawned the creation of the Genetic Epidemiology of Lung Cancer Consortium (GELCC), and subsequent studies have identified chromosome 6q23-25 as the primary familial region of susceptibility193. This region was delineated using multi-generational lung cancer families. Notably, the association of the area with lung cancer increased along with the number of affected individuals in a family (e.g. higher association in family with 5 affected individuals compared to 3 affected individuals) 193. They have since shown that this region is associated with risk of lung cancers in smokers, light smokers, and non-smokers alike196. Further work on this region has identified hypermethylated and under-expressed genes, as well as overexpressed potential oncogenes197,198. As a single, causative, target-gene has not been identified within this region, it is likely that there are other loci of importance yet to be characterized in familial disease. Few lung cancers are familial in origin however, and as such, numerous groups have performed large genome-wide association studies to identify allelotypes associated with lung cancer risk. Many of these studies have focused on the nicotinicacetylcholine receptors on chromosome 15q3,34,39,40. Although the degree of replication 17  of this locus in individual studies is impressive, it is not yet clear whether risk-alleles are altering lung cancer risk by changing smoking behavior or not. There has been some interest in determining which former smokers are likely to still develop lung cancer, a significant problem given that over half of lung cancers now occur in former smokers199,200. Based on studies of gene expression changes in response to smoking and smoking cessation, it seems likely that the irreversible alteration of gene expression in response to cigarette smoke may confer some risk199,201,202. Indeed, most of the genes whose expression is irreversibly altered by tobacco smoke are downregulated, and it is possible that some of these genes may be methylated due to their previously described role as tumor suppressors in other cancers203-205.  1.8.  Hypotheses and objectives  Gene disruptions key to the development of lung cancer occur by genetic or epigenetic events or both. The goal of my thesis is to identify key biologically relevant non-small cell lung cancer (NSCLC) genes exhibiting concerted gene dosage and DNA methylation aberrations by testing the following hypotheses: 1) Integrated profiling of lung cancer genomes and epigenomes will identify critical lung cancer genes that are simultaneously affected by DNA copy number and DNA methylation aberrations. 2) The susceptibility locus on chromosome 6q, identified through familial linkage studies, contains an unidentified tumor suppressor gene. 3) Differences in lung cancer phenotypes can be identified through the identification of discriminating genomic and epigenomic alterations.  18  1.9.  Specific aims and thesis outline  The hypotheses above were evaluated by completion of the following aims: Aim 1 - Development of a strategy for detection of concurrent genetic and epigenetic disruption. Chapter 2 describes the development of the methodology needed to identify and evaluate concurrent alterations of gene dosage and DNA methylation. At the initiation of this thesis, whole-genome DNA methylation and gene dosage levels had not previously been assayed concurrently. Existing unbiased whole-genome approaches to copy number detection were employed and paired with methylated-DNA immunoprecipitation (MeDIP)122,123 which had recently been co-developed by our lab. MeDIP was the first technique for DNA methylation discrimination that was amenable to whole-genome study. To examine the two concurrently necessitated the development of novel techniques and analysis methods. In doing so, I was lead author on a manuscript where the gene dosage and DNA methylation levels of two AC genomes were examined206. This marked the first use of methylated DNA immunoprecipitation (MeDIP) to analyze lung cancer cells, and additionally, the first combination of genedosage and DNA methylation data on a genome-wide scale. To accomplish this required the development of novel bioinformatic approaches as well as visualization tools207. In addition to the numerous interesting individual findings (copy number or DNA methylation) these experiments also identified concurrent genomic and epigenomic alterations. By comparing the methylation profile of the cancer cells to patient-matched non-malignant cells, I identified regions of copy number gain coinciding with DNA hypomethylation as well as regions of copy number loss coinciding with DNA 19  hypermethylation. In doing so, I validated my approach of using two separate methods to evaluate different DNA alterations in the same sample. Although this approach (MeDIP) for DNA methylation analysis was not used for the rest of the thesis, I still used the basic framework I had previously established for the concurrent analysis of gene dosage and DNA methylation. I have since expanded the analysis of the two AC samples from chapter 2 using a newer DNA methylation profiling technology, and have reaffirmed the results and conclusions of chapter 2. Aim 2 - Identification of two-hit TSGs disrupted by DNA methylation and copy number loss. I used an integrated approach to examine copy number loss and DNA hypermethylation in clinical AC samples, identifying potential tumor suppressor candidates within a region of lung cancer risk. The results of this are covered in chapter 3, which draws from the analysis schema devised in chapter 2, but applies it to a cohort of clinical AC samples followed by a great deal of candidate gene exploration. Using integrated gene-dosage and DNA methylation data I generated a list of frequent two-hit candidates located within, or near, the susceptibility region on chromosome 6. I then performed extensive validation and characterization experiments to narrow the candidate list down to a single gene, which I characterize as a clinically relevant tumor suppressor candidate within the previously annotated susceptibility region. This work directly addressed both hypothesis 1, and hypothesis 2.  20  Aim 3 - Genome-wide scan of NSCLC genomes for regions of concerted disruption. Previous studies of gene dosage and DNA methylation patterns in AC and SqCC tumors have shown that the genomes of these histologically distinct tumors have differences in both the genetic and epigenetic compartment58,208. In order to determine whether these divergent alterations explain previously identified gene expression differences209, and whether they contribute to the development of the two subtypes we sought to identify and characterize the functions of genes located within these regions. In this study we have compared genomic alteration frequency, and subsequent gene expression changes that distinguish between SqCC and AC, the two main subtypes of NSCLC. Initial analysis indicates that a substantial number of sub-type specific genomic alterations exist. This has been followed up by integration of gene expression profile data for regions of genomic alteration, and separation of genes whose expression is linked with altered copy number. Network analysis indicates that numerous phenotypic differences between the two sub-types are potentially explained by copy-number regulated gene expression alterations. I have augmented this work by using DNA methylation profiles to identify alterations to DNA methylation patterns which complement the observed sub-type specific gene-expression alterations observed. This showed that both genomic and epigenomic alteration of genes is involved in generating phenotypic differences in NSCLC subtypes. This work has addressed hypothesis 3.  21  Figure 1.1 – Lung cancer classification  Lung Cancer  Non-small cell lung cancer (NSCLC) 80%  Histological subclassification  Small cell lung cancer (SCLC) 20%  Precursor lesions  Hyperplasia, metaplasia, dysplasia, carcinoma in situ (CIS)  Atypical adenomatous hyperplasia (AAH)  ?  ?  Clinical lesions  Squamous Cell Carcinoma (SqCC) 25-30%  Adenocarcinoma (AC) 40-50%  Large Cell Carcinoma (LC) 9%  Small Cell Carcinoma 15-20%  Lung cancers are classed histologically into two main groups SCLC, and NSCLC. The NSCLC group is further subdivided into SqCC, AC, and LC histologies. SqCC arise through a reasonably well known progression from hyperplasia to carcinoma in situ, while AC is thought to arise through what is known as atypical adenomatous hyperplasia. The precursor lesions for LC and SCLC are not known.  22  1.10. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.  14. 15. 16. 17. 18. 19. 20.  References Travis, W.D. Pathology of lung cancer. Clin Chest Med 23, 65-81, viii (2002). Bailey-Wilson, J.E., Sellers, T.A., Elston, R.C., Evens, C.C. & Rothschild, H. Evidence for a major gene effect in early-onset lung cancer. J La State Med Soc 145, 157-162 (1993). Liu, P., et al. Familial aggregation of common sequence variants on 15q24-25.1 in lung cancer. J Natl Cancer Inst 100, 1326-1330 (2008). Ooi, W.L., Elston, R.C., Chen, V.W., Bailey-Wilson, J.E. & Rothschild, H. Increased familial risk for lung cancer. J Natl Cancer Inst 76, 217-222 (1986). Sellers, T.A., et al. Evidence for mendelian inheritance in the pathogenesis of lung cancer. J Natl Cancer Inst 82, 1272-1279 (1990). Youlden, D.R., Cramb, S.M. & Baade, P.D. The International Epidemiology of Lung Cancer: geographical distribution and secular trends. J Thorac Oncol 3, 819-831 (2008). Parkin, D.M. Cancer in developing countries. Cancer Surv 19-20, 519-561 (1994). Soh, J., et al. Sequential molecular changes during multistage pathogenesis of small peripheral adenocarcinomas of the lung. J Thorac Oncol 3, 340-347 (2008). Sekido, Y., Fong, K.M. & Minna, J.D. Molecular genetics of lung cancer. Annu Rev Med 54, 73-87 (2003). Wistuba, II & Gazdar, A.F. Lung cancer preneoplasia. Annu Rev Pathol 1, 331348 (2006). Hahn, W.C. & Weinberg, R.A. Rules for making human tumor cells. N Engl J Med 347, 1593-1603. (2002). Yamasaki, M., et al. Correlation between genetic alterations and histopathological subtypes in bronchiolo-alveolar carcinoma and atypical adenomatous hyperplasia of the lung. Pathol Int 50, 778-785 (2000). Motoi, N., et al. Lung adenocarcinoma: modification of the 2004 WHO mixed subtype to include the major histologic subtype suggests correlations between papillary and micropapillary adenocarcinoma subtypes, EGFR mutations and gene expression analysis. Am J Surg Pathol 32, 810-827 (2008). Dennis, P.A., et al. The biology of tobacco and nicotine: bench to bedside. Cancer Epidemiol Biomarkers Prev 14, 764-767 (2005). Sun, S., Schiller, J.H. & Gazdar, A.F. Lung cancer in never smokers--a different disease. Nat Rev Cancer 7, 778-790 (2007). Jemal, A., et al. Cancer statistics, 2009. CA Cancer J Clin (2009). Pfeifer, G.P. & Besaratinia, A. Mutational spectra of human cancer. Hum Genet 125, 493-506 (2009). Ding, L., et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069-1075 (2008). Toyooka, S., et al. The impact of sex and smoking status on the mutational spectrum of epidermal growth factor receptor gene in non small cell lung cancer. Clin Cancer Res 13, 5763-5768 (2007). Silva, J.M., et al. TP53 gene mutations in plasma DNA of cancer patients. Genes Chromosomes Cancer 24, 160-161 (1999). 23  21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41.  Mardis, E.R. & Wilson, R.K. Cancer genome sequencing: a review. Hum Mol Genet 18, R163-168 (2009). Pleasance, E.D., et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184-190 (2010). Tsuchiya, E., et al. Allelotype of non-small cell lung carcinoma--comparison between loss of heterozygosity in squamous cell carcinoma and adenocarcinoma. Cancer Res 52, 2478-2481 (1992). Mao, L., et al. Clonal genetic alterations in the lungs of current and former smokers. J Natl Cancer Inst 89, 857-862 (1997). Wistuba, II, et al. Molecular damage in the bronchial epithelium of current and former smokers. J Natl Cancer Inst 89, 1366-1373 (1997). Zabarovsky, E.R., Lerman, M.I. & Minna, J.D. Tumor suppressor genes on chromosome 3p involved in the pathogenesis of lung and other cancers. Oncogene 21, 6915-6935 (2002). Wistuba, II, et al. Allelic losses at chromosome 8p21-23 are early and frequent events in the pathogenesis of lung cancer. Cancer Res 59, 1973-1979 (1999). Wistuba, II, Mao, L. & Gazdar, A.F. Smoking molecular damage in bronchial epithelium. Oncogene 21, 7298-7306 (2002). Abujiang, P., et al. Loss of heterozygosity (LOH) at 17q and 14q in human lung cancers. Oncogene 17, 3029-3033 (1998). Virmani, A.K., et al. Allelotyping demonstrates common and distinct patterns of chromosomal loss in human lung cancer types. Genes Chromosomes Cancer 21, 308-319 (1998). Kohno, H., Hiroshima, K., Toyozaki, T., Fujisawa, T. & Ohwada, H. p53 mutation and allelic loss of chromosome 3p, 9p of preneoplastic lesions in patients with nonsmall cell lung carcinoma. Cancer 85, 341-347 (1999). Spence, J.F., et al. Uniparental disomy as a mechanism for human genetic disease. Am. J. Hum. Genet., 217-226 (1988). Chari, R., et al. Integrating the multiple dimensions of genomic and epigenomic landscapes of cancer. Cancer Metastasis Rev 29, 73-93 (2010). Amos, C.I., et al. Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat Genet 40, 616-622 (2008). Landi, M.T., et al. A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma. Am J Hum Genet 85, 679-691 (2009). Wang, Y., et al. Common 5p15.33 and 6p21.33 variants influence lung cancer risk. Nat Genet 40, 1407-1409 (2008). Thorisson, G.A., Smith, A.V., Krishnan, L. & Stein, L.D. The International HapMap Project Web site. Genome Res 15, 1592-1593 (2005). Huang, Y.T., et al. Genome-wide analysis of survival in early-stage non-small-cell lung cancer. J Clin Oncol 27, 2660-2667 (2009). Thorgeirsson, T.E., et al. A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature 452, 638-642 (2008). Hung, R.J., et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452, 633-637 (2008). Goransson, H., et al. Quantification of normal cell fraction and copy number neutral LOH in clinical lung cancer samples using SNP array data. PLoS One 4, e6057 (2009). 24  42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61.  Bignell, G.R., et al. High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res 14, 287-295 (2004). Tonon, G., et al. High-resolution genomic profiles of human lung cancer. Proc Natl Acad Sci U S A 102, 9625-9630 (2005). Lockwood, W.W., et al. DNA amplification is a ubiquitous mechanism of oncogene activation in lung and other cancers. Oncogene 27, 4615-4624 (2008). Coe, B.P., et al. Differential disruption of cell cycle pathways in small cell and non-small cell lung cancer. Br J Cancer 94, 1927-1935 (2006). Garnis, C., et al. Involvement of multiple developmental genes on chromosome 1p in lung tumorigenesis. Hum Mol Genet 14, 475-482 (2005). Garnis, C., et al. Chromosome 5p aberrations are early events in lung cancer: implication of glial cell line-derived neurotrophic factor in disease progression. Oncogene 24, 4806-4812 (2005). Choi, J.S., et al. Comparative genomic hybridization array analysis and real-time PCR reveals genomic copy number alteration for lung adenocarcinomas. Lung 184, 355-362 (2006). Choi, Y.W., et al. Comparative genomic hybridization array analysis and real time PCR reveals genomic alterations in squamous cell carcinomas of the lung. Lung Cancer 55, 43-51 (2007). Balsara, B.R. & Testa, J.R. Chromosomal imbalances in human lung cancer. Oncogene 21, 6877-6883. (2002). Fong, Y., Lin, Y.S., Liou, C.P., Li, C.F. & Tzeng, C.C. Chromosomal imbalances in lung adenocarcinomas with or without mutations in the epidermal growth factor receptor gene. Respirology 15, 700-705 (2010). Powell, C.A., Klares, S., O'Connor, G. & Brody, J.S. Loss of heterozygosity in epithelial cells obtained by bronchial brushing: clinical utility in lung cancer. Clin Cancer Res 5, 2025-2034 (1999). Minna, J.D., Fong, K., Zochbauer-Muller, S. & Gazdar, A.F. Molecular pathogenesis of lung cancer and potential translational applications. Cancer J 8 Suppl 1, S41-46. (2002). Albertson, D.G. Gene amplification in cancer. Trends Genet 22, 447-455 (2006). Albertson, D.G., Collins, C., McCormick, F. & Gray, J.W. Chromosome aberrations in solid tumors. Nat Genet 34, 369-376 (2003). Davies, J.J., Wilson, I.M. & Lam, W.L. Array CGH technologies and their applications to cancer genomes. Chromosome Res 13, 237-248 (2005). Aviel-Ronen, S., et al. Genomic markers for malignant progression in pulmonary adenocarcinoma with bronchioloalveolar features. Proc Natl Acad Sci U S A 105, 10155-10160 (2008). Garnis, C., et al. High resolution analysis of non-small cell lung cancer cell lines by whole genome tiling path array CGH. Int J Cancer 118, 1556-1564 (2006). Ishkanian, A.S., et al. A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet 36, 299-303 (2004). Coe, B.P., et al. Resolving the resolution of array CGH. Genomics 89, 647-653 (2007). Baldwin, C., Garnis, C., Zhang, L., Rosin, M.P. & Lam, W.L. Multiple microalterations detected at high frequency in oral cancer. Cancer Res 65, 75617567 (2005). 25  62. 63. 64. 65. 66. 67. 68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80. 81. 82.  Chi, B., DeLeeuw, R.J., Coe, B.P., MacAulay, C. & Lam, W.L. SeeGH--a software tool for visualization of whole genome array comparative genomic hybridization data. BMC Bioinformatics 5, 13 (2004). Chi, B., DeLeeuw, R.J., Coe, B.P., MacAulay, C. & Lam, W.L. SeeGH - A software tool for visualization of whole genome array comparative genomic hybridization data. BMC Bioinformatics 5, 13 (2004). Coe, B.P., et al. High-resolution chromosome arm 5p array CGH analysis of small cell lung carcinoma cell lines. Genes Chromosomes Cancer 42, 308-313 (2005). Coe, B.P., et al. Gain of a region on 7p22.3, containing MAD1L1, is the most frequent event in small-cell lung cancer cell lines. Genes Chromosomes Cancer 45, 11-19 (2006). Khojasteh, M., Lam, W.L., Ward, R.K. & MacAulay, C. A stepwise framework for the normalization of array CGH data. BMC Bioinformatics 6, 274 (2005). Jong, K., Marchiori, E., Meijer, G., Vaart, A.V. & Ylstra, B. Breakpoint identification and smoothing of array comparative genomic hybridization data. Bioinformatics 20, 3636-3637 (2004). Shah, S.P., et al. Integrating copy number polymorphisms into array CGH analysis using a robust HMM. Bioinformatics 22, e431-439 (2006). Hsu, L., et al. Denoising array-based comparative genomic hybridization data using wavelets. Biostatistics 6, 211-226 (2005). Chari, R., Lockwood, W.W. & Lam, W.L. Computational methods for the analysis of array comparative genomic hybridization. Cancer Inform 2, 48-58 (2007). Pei, J., Kruger, W.D. & Testa, J.R. High-resolution analysis of 9p loss in human cancer cells using single nucleotide polymorphism-based mapping arrays. Cancer Genet Cytogenet 170, 65-68 (2006). LaFramboise, T., et al. Allele-specific amplification in cancer revealed by SNP array analysis. PLoS Comput Biol 1, e65 (2005). Lister, R., et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315-322 (2009). Laurent, L., et al. Dynamic changes in the human methylome during differentiation. Genome Res 20, 320-331. Baylin, S.B. DNA methylation and gene silencing in cancer. Nat Clin Pract Oncol 2 Suppl 1, S4-11 (2005). Bird, A. DNA methylation patterns and epigenetic memory. Genes Dev 16, 6-21 (2002). Bird, A.P. CpG-rich islands and the function of DNA methylation. Nature 321, 209-213 (1986). Bird, A. DNA methylation and the frequency of CpG in animal DNA. Nuc. Acids Res. 8, 1499-1504 (1980). Bird, A. The essentials of DNA methylation. Cell 70, 5-8 (1992). Antequera, F. & Bird, A. CpG islands. EXS 64, 169-185 (1993). Bird, A. Molecular biology. Methylation talk between histones and DNA. Science 294, 2113-2115 (2001). Bestor, T.H. DNA methylation: evolution of a bacterial immune function into a regulator of gene expression and genome structure in higher eukaryotes. Philos Trans R Soc Lond B Biol Sci 326, 179-187 (1990). 26  83. 84. 85. 86. 87. 88. 89. 90. 91. 92. 93. 94. 95. 96. 97. 98. 99. 100. 101. 102. 103. 104.  Walsh, C.P., Chaillet, J.R. & Bestor, T.H. Transcription of IAP endogenous retroviruses is constrained by cytosine methylation. Nat Genet 20, 116-117 (1998). Feinberg, A.P. & Tycko, B. The history of cancer epigenetics. Nat Rev Cancer 4, 143-153 (2004). Surani, M.A., Barton, S.C. & Norris, M.L. Development of reconstituted mouse eggs suggests imprinting of the genome during gametogenesis. Nature 308, 548550 (1984). McGrath, J. & Solter, D. Completion of mouse embryogenesis requires both the maternal and paternal genomes. Cell 37, 179-183 (1984). Pal, N., et al. Preferential loss of maternal alleles in sporadic Wilms' tumor. Oncogene 5, 1665-1668 (1990). Schroeder, W.T., et al. Nonrandom loss of maternal chromosome 11 alleles in Wilms tumors. Am J Hum Genet 40, 413-420 (1987). Scrable, H., et al. A model for embryonal rhabdomyosarcoma tumorigenesis that involves genome imprinting. Proc Natl Acad Sci U S A 86, 7480-7484 (1989). Zhang, Y. & Tycko, B. Monoallelic expression of the human H19 gene. Nat Genet 1, 40-44 (1992). Giannoukakis, N., Deal, C., Paquette, J., Goodyer, C.G. & Polychronakos, C. Parental genomic imprinting of the human IGF2 gene. Nat Genet 4, 98-101 (1993). Rainier, S., et al. Relaxation of imprinted genes in human cancer. Nature 362, 747-749 (1993). Brown, C.J. & Willard, H.F. Localization of a gene that escapes inactivation to the X chromosome proximal short arm: implications for X inactivation. Am J Hum Genet 46, 273-279 (1990). Brown, C.J., Flenniken, A.M., Williams, B.R. & Willard, H.F. X chromosome inactivation of the human TIMP gene. Nucleic Acids Res 18, 4191-4195 (1990). Brown, C.J. & Robinson, W.P. The causes and consequences of random and non-random X chromosome inactivation in humans. Clin Genet 58, 353-363 (2000). Orstavik, K.H. Skewed X inactivation in healthy individuals and in different diseases. Acta Paediatr Suppl 95, 24-29 (2006). Spatz, A., Borg, C. & Feunteun, J. X-chromosome genetics and human cancer. Nat Rev Cancer 4, 617-629 (2004). Esteller, M. Epigenetics in cancer. N Engl J Med 358, 1148-1159 (2008). Esteller, M. & Herman, J.G. Cancer as an epigenetic disease: DNA methylation and chromatin alterations in human tumors. J Pathol 196, 1-7 (2002). Tycko, B. Epigenetic gene silencing in cancer. J Clin Invest 105, 401-407. (2000). Keshet, I., Lieman-Hurwitz, J. & Cedar, H. DNA methylation affects the formation of active chromatin. Cell 44, 535-543 (1986). Esteller, M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat Rev Genet 8, 286-298 (2007). Esteller, M., et al. Cancer epigenetics and methylation. Science 297, 1807-1808; discussion 1807-1808 (2002). Lewis, J.D., et al. Purification, sequence, and cellular localization of a novel chromosomal protein that binds to methylated DNA. Cell 69, 905-914 (1992). 27  105. 106. 107. 108. 109. 110. 111. 112. 113. 114. 115. 116. 117. 118. 119. 120. 121.  122. 123.  Nan, X., et al. Transcriptional repression by the methyl-CpG-binding protein MeCP2 involves a histone deacetylase complex. Nature 393, 386-389 (1998). Callinan, P.A. & Feinberg, A.P. The emerging science of epigenomics. Hum Mol Genet 15 Spec No 1, R95-101 (2006). Hatada, I. Emerging technologies for genome-wide DNA methylation profiling in cancer. Crit Rev Oncog 12, 205-223 (2006). Dai, Z., et al. Global methylation profiling of lung cancer identifies novel methylated genes. Neoplasia 3, 314-323 (2001). Brena, R.M., et al. Aberrant DNA methylation of OLIG1, a novel prognostic factor in non-small cell lung cancer. PLoS Med 4, e108 (2007). Takai, D., et al. Silencing of HTR1B and reduced expression of EDN1 in human lung cancers, revealed by methylation-sensitive representational difference analysis. Oncogene 20, 7505-7513 (2001). Hu, M., et al. Distinct epigenetic changes in the stromal cells of breast cancers. Nat Genet 37, 899-905 (2005). Irizarry, R.A., et al. Comprehensive high-throughput arrays for relative methylation (CHARM). Genome Res 18, 780-790 (2008). Yan, P.S., et al. Dissecting complex epigenetic alterations in breast cancer using CpG island microarrays. Cancer Res 61, 8375-8380 (2001). Ching, T.T., et al. Epigenome analyses using BAC microarrays identify evolutionary conservation of tissue-specific methylation of SHANK3. Nat Genet 37, 645-651 (2005). Yamamoto, F. & Yamamoto, M. A DNA microarray-based methylation-sensitive (MS)-AFLP hybridization method for genetic and epigenetic analyses. Mol Genet Genomics 271, 678-686 (2004). Heisler, L.E., et al. CpG Island microarray probe sequences derived from a physical library are representative of CpG Islands annotated on the human genome. Nucleic Acids Res 33, 2952-2961 (2005). Omura, N., et al. Genome-wide profiling of methylated promoters in pancreatic adenocarcinoma. Cancer Biol Ther 7, 1146-1156 (2008). Keshet, I., et al. Evidence for an instructive mechanism of de novo methylation in cancer cells. Nat Genet 38, 149-153 (2006). Zhang, X., et al. Genome-wide high-resolution mapping and functional analysis of DNA methylation in arabidopsis. Cell 126, 1189-1201 (2006). Gebhard, C., et al. Genome-wide profiling of CpG methylation identifies novel targets of aberrant hypermethylation in myeloid leukemia. Cancer Res 66, 61186128 (2006). Rauch, T., Li, H., Wu, X. & Pfeifer, G.P. MIRA-assisted microarray analysis, a new technology for the determination of DNA methylation patterns, identifies frequent methylation of homeodomain-containing genes in lung cancer cells. Cancer Res 66, 7939-7947 (2006). Weber, M., et al. Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat Genet 37, 853-862 (2005). Weber, M., et al. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet 39, 457-466 (2007).  28  124. 125. 126. 127. 128. 129. 130. 131. 132. 133. 134. 135. 136. 137. 138.  139. 140. 141.  Rauch, T. & Pfeifer, G.P. Methylated-CpG island recovery assay: a new technique for the rapid detection of methylated-CpG islands in cancer. Lab Invest 85, 1172-1180 (2005). Jacinto, F.V., Ballestar, E., Ropero, S. & Esteller, M. Discovery of epigenetically silenced genes by methylated DNA immunoprecipitation in colon cancer cells. Cancer Res 67, 11481-11486 (2007). Ballestar, E., et al. Methyl-CpG binding proteins identify novel sites of epigenetic inactivation in human cancer. EMBO J 22, 6335-6345 (2003). Down, T.A., et al. A Bayesian deconvolution strategy for immunoprecipitationbased DNA methylome analysis. Nat Biotechnol 26, 779-785 (2008). Bian, Y.S., Yan, P., Osterheld, M.C., Fontolliet, C. & Benhattar, J. Promoter methylation analysis on microdissected paraffin-embedded tissues using bisulfite treatment and PCR-SSCP. Biotechniques 30, 66-72 (2001). Trinh, B.N., Long, T.I. & Laird, P.W. DNA methylation analysis by MethyLight technology. Methods 25, 456-462 (2001). Weisenberger, D.J., et al. Analysis of repetitive element DNA methylation by MethyLight. Nucleic Acids Res 33, 6823-6836 (2005). Fan, J.B., et al. Illumina universal bead arrays. Methods Enzymol 410, 57-73 (2006). Houshdaran, S., et al. Widespread epigenetic abnormalities suggest a broad DNA methylation erasure defect in abnormal human sperm. PLoS One 2, e1289 (2007). Houseman, E.A., et al. Copy number variation has little impact on bead-arraybased measures of DNA methylation. Bioinformatics 25, 1999-2005 (2009). Breton, C.V., et al. Prenatal tobacco smoke exposure affects global and genespecific DNA methylation. Am J Respir Crit Care Med 180, 462-467 (2009). Taylor, K.H., et al. Large-scale CpG methylation analysis identifies novel candidate genes and reveals methylation hotspots in acute lymphoblastic leukemia. Cancer Res 67, 2617-2625 (2007). Lister, R. & Ecker, J.R. Finding the fifth base: genome-wide sequencing of cytosine methylation. Genome Res 19, 959-966 (2009). Shames, D.S., et al. A genome-wide screen for promoter methylation in lung cancer identifies novel methylation markers for multiple malignancies. PLoS Med 3, e486 (2006). Zhong, S., Fields, C.R., Su, N., Pan, Y.X. & Robertson, K.D. Pharmacologic inhibition of epigenetic modifications, coupled with gene expression profiling, reveals novel targets of aberrant DNA methylation and histone deacetylation in lung cancer. Oncogene 26, 2621-2634 (2007). Pelizzola, M., et al. MEDME: an experimental and analytical methodology for the estimation of DNA methylation levels based on microarray derived MeDIPenrichment. Genome Res 18, 1652-1659 (2008). Yamashita, S., Hosoya, K., Gyobu, K., Takeshima, H. & Ushijima, T. Development of a Novel Output Value for Quantitative Assessment in Methylated DNA Immunoprecipitation-CpG Island Microarray Analysis. DNA Res (2009). Vega, V.B., Cheung, E., Palanisamy, N. & Sung, W.K. Inherent signals in sequencing-based Chromatin-ImmunoPrecipitation control libraries. PLoS One 4, e5241 (2009). 29  142. 143. 144. 145. 146. 147. 148. 149. 150. 151. 152. 153. 154. 155. 156. 157. 158. 159. 160. 161.  Kohda, M., et al. Frequent loss of imprinting of IGF2 and MEST in lung adenocarcinoma. Mol Carcinog 31, 184-191 (2001). Kondo, M., et al. Frequent loss of imprinting of the H19 gene is often associated with its overexpression in human lung cancers. Oncogene 10, 1193-1198 (1995). De Smet, C., et al. The activation of human gene MAGE-1 in tumor cells is correlated with genome-wide demethylation. Proc Natl Acad Sci U S A 93, 71497153 (1996). Cho, B., et al. Promoter hypomethylation of a novel cancer/testis antigen gene CAGE is correlated with its aberrant expression and is seen in premalignant stage of gastric carcinoma. Biochem Biophys Res Commun 307, 52-63 (2003). Wilson, A.S., Power, B.E. & Molloy, P.L. DNA hypomethylation and human diseases. Biochim Biophys Acta 1775, 138-162 (2007). Hanahan, D. & Weinberg, R.A. The hallmarks of cancer. Cell 100, 57-70 (2000). Gaudet, F., et al. Induction of tumors in mice by genomic hypomethylation. Science 300, 489-492 (2003). Rizwana, R. & Hahn, P.J. CpG methylation reduces genomic instability. J Cell Sci 112 ( Pt 24), 4513-4519 (1999). Fraga, M.F., et al. Epigenetic differences arise during the lifetime of monozygotic twins. Proc Natl Acad Sci U S A 102, 10604-10609 (2005). Ogino, S., et al. A cohort study of tumoral LINE-1 hypomethylation and prognosis in colon cancer. J Natl Cancer Inst 100, 1734-1738 (2008). Feinberg, A.P. & Vogelstein, B. Hypomethylation of ras oncogenes in primary human cancers. Biochem Biophys Res Commun 111, 47-54 (1983). Feinberg, A.P., Gehrke, C.W., Kuo, K.C. & Ehrlich, M. Reduced genomic 5methylcytosine content in human colonic neoplasia. Cancer Res 48, 1159-1161 (1988). Gama-Sosa, M.A., et al. The 5-methylcytosine content of DNA from human tumors. Nucleic Acids Res 11, 6883-6894 (1983). Vertino, P.M., Spillare, E.A., Harris, C.C. & Baylin, S.B. Altered chromosomal methylation patterns accompany oncogene-induced transformation of human bronchial epithelial cells. Cancer Res 53, 1684-1689 (1993). Adachi, J., et al. Microsatellite instability in primary and metastatic lung carcinomas. Genes Chromosomes Cancer 14, 301-306 (1995). Vachtenheim, J., Horakova, I. & Novotna, H. Hypomethylation of CCGG sites in the 3' region of H-ras protooncogene is frequent and is associated with H-ras allele loss in non-small cell lung cancer. Cancer Res 54, 1145-1148 (1994). Rauch, T.A., et al. High-resolution mapping of DNA hypermethylation and hypomethylation in lung cancer. Proc Natl Acad Sci U S A 105, 252-257 (2008). Chalitchagorn, K., et al. Distinctive pattern of LINE-1 methylation level in normal tissues and the association with carcinogenesis. Oncogene 23, 8841-8846 (2004). Groudine, M., Eisenman, R. & Weintraub, H. Chromatin structure of endogenous retroviral genes and activation by an inhibitor of DNA methylation. Nature 292, 311-317 (1981). Takai, D., Yagi, Y., Habib, N., Sugimura, T. & Ushijima, T. Hypomethylation of LINE1 retrotransposon in human hepatocellular carcinomas, but not in surrounding liver cirrhosis. Jpn J Clin Oncol 30, 306-309 (2000). 30  162. 163. 164. 165. 166. 167. 168. 169. 170. 171. 172. 173. 174. 175. 176. 177. 178. 179. 180. 181.  Nakanishi, H., et al. Loss of imprinting of PEG1/MEST in lung cancer cell lines. Oncol Rep 12, 1273-1278 (2004). Costello, J.F., et al. Aberrant CpG-island methylation has non-random and tumor-type-specific patterns. Nat Genet 24, 132-138 (2000). Herman, J.G. Hypermethylation of tumor suppressor genes in cancer. Semin Cancer Biol 9, 359-367 (1999). Belinsky, S.A., et al. Aberrant methylation of p16(INK4a) is an early event in lung cancer and a potential biomarker for early diagnosis. Proc Natl Acad Sci U S A 95, 11891-11896 (1998). Belinsky, S.A., et al. Aberrant promoter methylation in bronchial epithelium and sputum from current and former smokers. Cancer Res 62, 2370-2377 (2002). Esteller, M., et al. DNA methylation patterns in hereditary human cancers mimic sporadic tumorigenesis. Hum Mol Genet 10, 3001-3007 (2001). Yanagawa, N., et al. Promoter hypermethylation of RASSF1A and RUNX3 genes as an independent prognostic prediction marker in surgically resected non-small cell lung cancers. Lung Cancer 58, 131-138 (2007). Yanagawa, N., et al. Promoter hypermethylation of tumor suppressor and tumorrelated genes in non-small cell lung cancers. Cancer Sci 94, 589-592 (2003). Topaloglu, O., et al. Detection of promoter hypermethylation of multiple genes in the tumor and bronchoalveolar lavage of patients with lung cancer. Clin Cancer Res 10, 2284-2288 (2004). Suzuki, M., et al. Methylation and gene silencing of the Ras-related GTPase gene in lung and breast cancers. Ann Surg Oncol 14, 1397-1404 (2007). Suzuki, M., et al. DNA methylation-associated inactivation of TGFbeta-related genes DRM/Gremlin, RUNX3, and HPP1 in human cancers. Br J Cancer 93, 1029-1037 (2005). Tsou, J.A., Hagen, J.A., Carpenter, C.L. & Laird-Offringa, I.A. DNA methylation analysis: a powerful new tool for lung cancer diagnosis. Oncogene 21, 54505461 (2002). Zochbauer-Muller, S., et al. Aberrant promoter methylation of multiple genes in non-small cell lung cancers. Cancer Res 61, 249-255 (2001). Dammann, R., et al. Epigenetic inactivation of a RAS association domain family protein from the lung tumor suppressor locus 3p21.3. Nat Genet 25, 315-319 (2000). Marchal, R., Chicheportiche, A., Dutrillaux, B. & Bernardino-Sgherri, J. DNA methylation in mouse gametogenesis. Cytogenet Genome Res 105, 316-324 (2004). Detich, N., Theberge, J. & Szyf, M. Promoter-specific activation and demethylation by MBD2/demethylase. J Biol Chem 277, 35791-35794 (2002). Engel, N., et al. Conserved DNA methylation in Gadd45a(-/-) mice. Epigenetics 4, 98-99 (2009). Jin, S.G., Guo, C. & Pfeifer, G.P. GADD45A does not promote DNA demethylation. PLoS Genet 4, e1000013 (2008). Barreto, G., et al. Gadd45a promotes epigenetic gene activation by repairmediated DNA demethylation. Nature 445, 671-675 (2007). Hutt, J.A., et al. Life-span inhalation exposure to mainstream cigarette smoke induces lung cancer in B6C3F1 mice through genetic and epigenetic pathways. Carcinogenesis 26, 1999-2009 (2005). 31  182. 183. 184. 185. 186. 187. 188. 189.  190. 191. 192. 193. 194. 195. 196. 197. 198. 199. 200.  Gratziou, C. Respiratory, cardiovascular and other physiological consequences of smoking cessation. Curr Med Res Opin 25, 535-545 (2009). Palmisano, W.A., et al. Predicting lung cancer by detecting aberrant promoter methylation in sputum. Cancer Res 60, 5954-5958 (2000). Boehm, T.L. & Drahovsky, D. Hypomethylation of DNA in Raji cells after treatment with N-methyl-N-nitrosourea. Carcinogenesis 2, 39-42 (1981). Kaplan, R., et al. Monoallelic up-regulation of the imprinted H19 gene in airway epithelium of phenotypically normal cigarette smokers. Cancer Res 63, 14751482 (2003). Tessema, M., et al. Concomitant promoter methylation of multiple genes in lung adenocarcinomas from current, former and never smokers. Carcinogenesis 30, 1132-1138 (2009). Phillips, J.M. & Goodman, J.I. Inhalation of cigarette smoke induces regions of altered DNA methylation (RAMs) in SENCAR mouse lung. Toxicology 260, 7-15 (2009). Toyooka, S., et al. Dose effect of smoking on aberrant methylation in non-small cell lung cancers. Int J Cancer 110, 462-464 (2004). Kersting, M., et al. Differential frequencies of p16(INK4a) promoter hypermethylation, p53 mutation, and K-ras mutation in exfoliative material mark the development of lung cancer in symptomatic chronic smokers. J Clin Oncol 18, 3221-3229 (2000). Liu, H., Zhou, Y., Boggs, S.E., Belinsky, S.A. & Liu, J. Cigarette smoke induces demethylation of prometastatic oncogene synuclein-gamma in lung cancer cells by downregulation of DNMT3B. Oncogene 26, 5900-5910 (2007). Giangreco, A., Groot, K.R. & Janes, S.M. Lung cancer and lung stem cells: strange bedfellows? Am J Respir Crit Care Med 175, 547-553 (2007). Fisher, G.H., et al. Induction and apoptotic regression of lung adenocarcinomas by regulation of a K-Ras transgene in the presence and absence of tumor suppressor genes. Genes Dev 15, 3249-3262 (2001). Bailey-Wilson, J.E., et al. A major lung cancer susceptibility locus maps to chromosome 6q23-25. Am J Hum Genet 75, 460-474 (2004). Tokuhata, G.K. & Lilienfeld, A.M. Familial aggregation of lung cancer in humans. J Natl Cancer Inst 30, 289-312 (1963). Sellers, T.A., et al. Lung cancer detection and prevention: evidence for an interaction between smoking and genetic predisposition. Cancer Res 52, 2694s2697s (1992). Liu, P., et al. Cumulative effect of multiple loci on genetic susceptibility to familial lung cancer. Cancer Epidemiol Biomarkers Prev 19, 517-524. Tessema, M., et al. Promoter methylation of genes in and around the candidate lung cancer susceptibility locus 6q23-25. Cancer Res 68, 1707-1714 (2008). You, M., et al. Fine mapping of chromosome 6q23-25 region in familial lung cancer families reveals RGS17 as a likely candidate gene. Clin Cancer Res 15, 2666-2674 (2009). Beane, J., et al. Reversible and permanent effects of tobacco smoke exposure on airway epithelial gene expression. Genome Biol 8, R201 (2007). Spira, A., et al. Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nat Med 13, 361-366 (2007). 32  201. 202. 203. 204. 205. 206. 207. 208. 209.  Chari, R., et al. Effect of active smoking on the human bronchial epithelium transcriptome. BMC Genomics 8, 297 (2007). Spira, A., et al. Effects of cigarette smoke on the human airway epithelial cell transcriptome. Proc Natl Acad Sci U S A 101, 10143-10148 (2004). Huang, Y., de la Chapelle, A. & Pellegata, N.S. Hypermethylation, but not LOH, is associated with the low expression of MT1G and CRABP1 in papillary thyroid carcinoma. Int J Cancer 104, 735-744 (2003). Xin, H., et al. Targeted delivery of CX3CL1 to multiple lung tumors by mesenchymal stem cells. Stem Cells 25, 1618-1626 (2007). Lu, D.D., et al. The relationship between metallothionein-1F (MT1F) gene and hepatocellular carcinoma. Yale J Biol Med 76, 55-62 (2003). Wilson, I.M., et al. Epigenomics: mapping the methylome. Cell Cycle 5, 155-158 (2006). Chari, R., et al. SIGMA2: a system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes. BMC Bioinformatics 9, 422 (2008). Toyooka, S., et al. DNA methylation profiles of lung tumors. Mol Cancer Ther 1, 61-67 (2001). Bhattacharjee, A., et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 98, 13790-13795. (2001).  33  2.  Epigenomics: mapping the methylome 1  2.1.  Introduction  Methylation of cytosine in the CpG dinucleotide context of mammalian DNA has long been recognized as an important determinant of both development and gene regulation1. Consequently, many technologies have emerged to enable the detection and analysis of DNA methylation status. For example, a recent adaptation of digital karyotyping technology demonstrates very impressive resolution in genome-wide analysis, albeit at substantial cost and effort2. Alternatively, array-based technologies offer an avenue for both local and genome-wide methylation analysis2-8, marking a departure from the traditional locus-specific methods9,10. These genome-wide approaches alleviate the need to extrapolate local patterns to a global scale. To adapt array based comparative genomic hybridization (array CGH) for epigenomic analysis, a method to select for or against methylated sequences is required. For example, enrichment by methylation-specific PCR relies on methylationsensitive restriction enzyme cleavage4 or bisulfite treatment8 to promote differential amplification of methylated and unmethylated sequences. Alternatively, immunoprecipitation techniques are also used for sample enrichment in epigenetic research. Ballestar et al.11 pioneered the use of chromatin immunoprecipitation (ChIP) enriched DNA in CGH experiments on both metaphase spreads and CpG island arrays. Their work identified novel binding loci of methyl-CpG binding proteins (MBD) in cancer, and hence novel regions of methylation-mediated silencing. 1  A version of this chapter has been previously published. Wilson IM, Davies JJ, Weber M, Brown CJ, Alvarez CE, MacAulay C, Schübeler D, Lam WL (2006) Epigenomics: mapping the methylome. Cell Cycle 5(2):155-158.  34  2.2.  Results  Recently we reported a method, called methylated DNA immunoprecipitation (MeDIP), for direct isolation of methylcytosine rich DNA, providing an unbiased means to enrich methylated DNA6. An anti-5'-methylcytosine monoclonal antibody is used to capture methylated genomic DNA fragments. This method provided up to 90-fold enrichment of the methylated sequences in a dose-dependent, and sequenceindependent manor. The use of MeDIP in combination with array CGH has provided comprehensive maps of the human methylome. In this approach MeDIP enriched DNA and input DNA (i.e. without immunoprecipitation) are differentially labeled using fluorescent dyes (Cy3 and Cy5) and competitively hybridized to the genomic DNA arrays (Figure 2.1). Although currently available CpG element arrays display over 12,000 CpG islands3,12, whole genome tiling path arrays are used complementarily in order to achieve complete genome coverage. The Sub-Megabase Resolution Tiling (SMRT13) array that spans the human genome with 32,433 overlapping BAC clones provides a practical resolution of ~80 kb for whole genome methylation analysis. The fluorescence signal ratio is calculated for each BAC with positive log2 ratios representing methylated sequences. The visualization tool SeeGH14 displays the calculated ratios at the chromosomal location of each BAC, resulting in a whole genome view of methylation status. The value of this approach for describing the normal methylome and mapping changes that occur in neoplasia is evident6. Using this method, one of the primary epigenetic differences we expected to see was between the male and female sex chromosomes. In order to achieve dosage compensation with males, who have a single X chromosome and the sex-determining Y 35  chromosome, one of the two X chromosomes in females is inactivated early in development. The inactive X (Xi) is known to have extensive hypermethylation at CpG island promoters15, and based on this knowledge, the concept of global methylation of the Xi was extrapolated. Since the SMRT array includes 1461 X-linked BAC clones, comparison of unrelated male (active X only) and female (active and inactive X) nontransformed fibroblasts should yield a detailed methylation profile of the inactive X. To our surprise, the female X profile was hypomethylated overall when compared to the male, but was hypermethylated in gene-rich regions6, Although unexpected, this is consistent with previous reports of hypomethylation on Xi16,17. In contrast to the X chromosome, the autosomal methylation profiles were found to be highly similar between male and female samples6. Furthermore, since DNA is methylated at CpG dinucleotides, it was expected that features such as sequence composition would determine genomic DNA methylation patterns. Indeed, the major correlates of methylation at the whole autosome level were GC composition, gene density, and Alu short-interspersed nuclear elements (SINE); features that are generally associated with each other. These positive correlations may exist as a consequence of an evolutionary response to invading DNA sequences. Indeed, extensive methylation in gene-dense regions may slow the transcription process18 and this may be a mechanism used by the cell to suppress transcription from gene-associated parasite sequences such as transposable elements18. While Alu SINEs are known to be hypermethylated19, the array experiments were designed to block signal from repetitive elements. Thus, the hypermethylated signals observed in Alu-rich regions are likely due to methylation in the surrounding unique-sequence DNA, but potentially initiating in the repetitive DNA.  36  Clearly further experiments are required to elucidate the interplay between methylation of repetitive DNA and its influence on DNA methylation of unique host sequence. Disruption of DNA methylation patterns is a prominent hallmark of cancer. Hypermethylation of gene-associated CpG-islands can lead to aberrant silencing of tumor suppressor genes (TSGs), whereas global hypomethylation may lead to oncogene activation and genomic instability as reviewed by Esteller20. Though the importance of altered DNA methylation in tumorigenesis is clear, little is known about its extent and genomic distribution. Using the MeDIP array CGH technique, a colon cancer cell sample showed over 400 regions of differential methylation when compared to nontransformed, normal fibroblasts. In particular, a large number of hypomethylated regions were observed in the colon cancer sample, from 100 kb to 20 Mb in size6. Though many of these regions of difference may be due to tissue specific epigenetic patterning, we predict that a substantial proportion are a consequence of pathological processes. Most of these hypomethylated regions corresponded to gene-poor regions of the genome, raising the possibility that hypomethylation is linked to cancer not only by affecting oncogene expression. Rather, genomic instability, recombination rates and/or expression of parasitic DNA elements may also be factors21,22. Furthermore, new data comparing lung cancer with matched lymphoblast samples demonstrate a substantial number of large-scale differentially methylated regions. This suggests that regional epigenomic instability is a common phenomenon across tumor genomes (Figure 2.2). As one of the major characteristics of cancer cells is genomic instability, an increasing number of high-resolution, high-throughput techniques have been developed to determine the genomic integrity of a tumor13,23,24. In theory, recurrently altered  37  regions harbor important cancer-related genes that are selected for during tumorigenesis. High-resolution technologies have proven instrumental in the identification of small areas of the genome, facilitating mapping of candidate genes25. However, many recurrent alterations are megabases in size and may contain hundreds of genes. Studies that examine the link between gene expression and genomic copynumber have shown that the correlation is complex. For example, of the 93 genes residing in DNA copy number alterations on chromosome 22 in ovarian cancer, Benetkiewicz et al.26 identified only 33 (35%) genes with corresponding expression change (e.g. overexpression of gene in an amplicon). This suggests the presence of regulatory controls on genes in a given altered region that can override the impact of copy number alteration. DNA methylation is likely one such regulatory mechanism. We assessed the regions with copy number changes identified by our tiling path array in colon cancer for concurrent marked methylation changes using MeDIP-array CGH. Although one would expect that the majority of methylation changes are independent of copy number status6, the presence of concurrent genetic and epigenetic modification would reveal regions that harbor genes potentially important to carcinogenesis. Strikingly, using this integrative approach, we detected co-localization of hypomethylation with an amplicon at 1q21-q23 in lung adenocarcinoma (Figure 2.3). This region is known to be altered in lung cancer and contains Mcl-1, an anti-apoptotic member of the Bcl-2 family known to be overexpressed in non-small cell lung cancer27. Our data suggest that copy-number gain in combination with extensive demethylation is a potential mechanism for Mcl-1 overexpression. We have also detected regions of concomitant hemizygous genomic loss and hypermethylation. This suggests the presence of tumor suppressor genes and the fulfillment of Knudson’s two-hit 38  hypothesis28. These findings clearly demonstrate the value in taking an integrated genetic/epigenetic approach to analyze cancer genomes in order to gain insight into the complex biology of neoplasia. Additionally, other diverse areas of biology and disease are known to have epigenetic components, and would thus benefit from this epigenomic technique.  2.3.  Discussion  In addition to utility in research, we anticipate that MeDIP array CGH will also find a place in the clinic and in drug development. Methylome profiling will lead to the development of new methylation-markers to aid in disease diagnosis and classification. It also seems likely to yield markers of drug response and toxicity. It is important to note that the successful use of MeDIP DNA on the SMRT and CpG arrays has been achieved with as little as 500 ng genomic DNA (unpublished data). The ability to perform experiments on such limited amounts of DNA is a great asset, as the size of the specimen is often a consideration, especially when working with clinical samples. Furthermore, the fact that the DNA fragments used for the initial MeDIP capture are in the range of 300-1000 bp means that archival samples, which typically yield partially degraded DNA due to formalin fixation, can be analyzed for aberrant methylation enabling retrospective studies. Other methods that rely on methylation-sensitive restriction enzymes, such as that of Ching et al.4 may require micrograms of starting DNA material limiting the utility of these techniques. Moreover, those methods sample the fraction of the genome corresponding to a restriction site and one must consider that any positive site could be due to a DNA polymorphism.  39  As we describe a new approach to viewing the epigenome, a definition of the normal methylation pattern must be determined in order to identify alterations in the methylome. Investigating the interindividual variability of normal tissue is necessary to establish a baseline for methylation polymorphisms (or polymethisms) in the human population. It is equally important to describe the intraindividual variation of methylome patterning in different tissue types. Finally, the integration of genomic, epigenomic and gene expression (transcriptome and proteome) profiles will facilitate a systems approach to understanding causal events and their downstream consequences in the context of developmental biology and disease progression.  40  Figure 2.1 – MeDIP-array CGH schema  Genomic DNA is sonicated to a size range of 300-1000 bp. Anti-5’-methylcytosine antibody is used to immunoprecipitate methylated DNA (IP) for comparison with the Input DNA (IN). The two samples are differentially labeled and co-hybridized to a SMRT or CpG array. Relative signal intensity at each locus or spot represented on the array indicates methylation status.  41  Figure 2.2 – Epigenomic instability in lung cancer  This plot summarizes the frequency of methylated regions in two adenocarcinoma and matched lymphoblast samples. A large number of differentially methylated regions exist between tumor and lymphoblast sample. Although tissue-specific patterns need to be identified, we predict that a large number of these alterations are involved in or are a consequence of oncogenesis. Hypermethylated regions are right of the chromosome and hypomethylated regions are to the left. Red indicates adenocarcinoma, green denotes lymphoblast, yellow is common to both.  42  Figure 2.3 – Alignment of epigenomic and genomic profiles  Integration of the two datasets reveals areas of concomitant alteration in methylation and copy-number status as well as areas of independent alteration. A region of hypomethylation corresponding to DNA copy increase (1q21-q23) may lead to overexpression of oncogenes such as Mcl-1. Regions of hypermethylation corresponding to normal DNA copy status (9q13) demonstrate the insensitivity of MeDIP to genomic copy number.  43  2.4. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.  References Feinberg, A.P. The epigenetics of cancer etiology. Semin Cancer Biol 14, 427432 (2004). Hu, M., et al. Distinct epigenetic changes in the stromal cells of breast cancers. Nat Genet 37, 899-905 (2005). Yan, P.S., et al. Dissecting complex epigenetic alterations in breast cancer using CpG island microarrays. Cancer Res 61, 8375-8380 (2001). Ching, T.T., et al. Epigenome analyses using BAC microarrays identify evolutionary conservation of tissue-specific methylation of SHANK3. Nat Genet 37, 645-651 (2005). Yamamoto, F. & Yamamoto, M. A DNA microarray-based methylation-sensitive (MS)-AFLP hybridization method for genetic and epigenetic analyses. Mol Genet Genomics 271, 678-686 (2004). Weber, M., et al. Chromosome-wide and promoter-specific analyses identify sites of differential DNA methylation in normal and transformed human cells. Nat Genet 37, 853-862 (2005). Shi, H., et al. Expressed CpG island sequence tag microarray for dual screening of DNA hypermethylation and gene silencing in cancer cells. Cancer Res 62, 3214-3220 (2002). Gitan, R.S., Shi, H., Chen, C.M., Yan, P.S. & Huang, T.H. Methylation-specific oligonucleotide microarray: a new potential for high-throughput methylation analysis. Genome Res 12, 158-164 (2002). Herman, J.G., Graff, J.R., Myohanen, S., Nelkin, B.D. & Baylin, S.B. Methylationspecific PCR: a novel PCR assay for methylation status of CpG islands. Proc Natl Acad Sci U S A 93, 9821-9826 (1996). Melnikov, A.A., Gartenhaus, R.B., Levenson, A.S., Motchoulskaia, N.A. & Levenson Chernokhvostov, V.V. MSRE-PCR for analysis of gene-specific DNA methylation. Nucleic Acids Res 33, e93 (2005). Ballestar, E., et al. Methyl-CpG binding proteins identify novel sites of epigenetic inactivation in human cancer. EMBO J 22, 6335-6345 (2003). Heisler, L.E., et al. CpG Island microarray probe sequences derived from a physical library are representative of CpG Islands annotated on the human genome. Nucleic Acids Res 33, 2952-2961 (2005). Ishkanian, A.S., et al. A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet 36, 299-303 (2004). Chi, B., DeLeeuw, R.J., Coe, B.P., MacAulay, C. & Lam, W.L. SeeGH--a software tool for visualization of whole genome array comparative genomic hybridization data. BMC Bioinformatics 5, 13 (2004). Migeon, B.R. X chromosome inactivation: theme and variations. Cytogenet Genome Res 99, 8-16 (2002). Bernardino, J., et al. DNA methylation of the X chromosomes of the human female: an in situ semi-quantitative analysis. Chromosoma 104, 528-535 (1996). Lindsay, S., et al. Differences in methylation on the active and inactive human X chromosomes. Ann Hum Genet 49, 115-127 (1985).  44  18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28.  Lorincz, M.C., Dickerson, D.R., Schmitt, M. & Groudine, M. Intragenic DNA methylation alters chromatin structure and elongation efficiency in mammalian cells. Nat Struct Mol Biol 11, 1068-1075 (2004). Rubin, C.M., VandeVoort, C.A., Teplitz, R.L. & Schmid, C.W. Alu repeated DNAs are differentially methylated in primate germ cells. Nucleic Acids Res 22, 51215127 (1994). Esteller, M. DNA methylation and cancer therapy: new developments and expectations. Curr Opin Oncol 17, 55-60 (2005). O'Neill, R.J., O'Neill, M.J. & Graves, J.A. Undermethylation associated with retroelement activation and chromosome remodelling in an interspecific mammalian hybrid. Nature 393, 68-72 (1998). Chen, R.Z., Pettersson, U., Beard, C., Jackson-Grusby, L. & Jaenisch, R. DNA hypomethylation leads to elevated mutation rates. Nature 395, 89-93 (1998). Bertone, P., et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242-2246 (2004). Lucito, R., et al. Representational oligonucleotide microarray analysis: a highresolution method to detect genome copy number variation. Genome Res 13, 2291-2305 (2003). Garnis, C., et al. High resolution analysis of non-small cell lung cancer cell lines by whole genome tiling path array CGH. Int J Cancer 118, 1556-1564 (2006). Benetkiewicz, M., et al. High-resolution gene copy number and expression profiling of human chromosome 22 in ovarian carcinomas. Genes Chromosomes Cancer 42, 228-237 (2005). Song, L., Coppola, D., Livingston, S., Cress, D. & Haura, E.B. Mcl-1 regulates survival and sensitivity to diverse apoptotic stimuli in human non-small cell lung cancer cells. Cancer Biol Ther 4, 267-276 (2005). Knudson, A.G., Jr. Mutation and cancer: statistical study of retinoblastoma. Proc Natl Acad Sci U S A 68, 820-823 (1971).  45  3.  EYA4 is a non-small cell lung cancer tumor suppressor located in the susceptibility locus on chromosome 6q 2  3.1.  Introduction  Lung cancer is the most common cause of cancer death world-wide. Only 16% of patients survive five years or more post diagnosis1, due to the typically late stage of detection and very modest improvement in therapy over the last two decades. Over 80% of lung cancers are non-small cell lung cancer (NSCLC) with adenocarcinomas (AC) and squamous cell carcinoma (SqCC) as the predominant histologic cell types representing 60% of tumors overall2. Recently, numerous groups have performed large-scale genome-wide association studies to identify genotypes associated with lung cancer risk. Strikingly, three of these studies identified regions on chromosome 15q as points of interest, implicating genes encoding nicotine acetylcholine receptors as having a role in elevated risk of disease3-7. What is not yet clear is how these loci modify lung cancer risk, be it through smoking behavior or otherwise. An alternative, more focused approach to defining lung cancer susceptibility loci is through familial linkage studies. Using this approach, recent studies of familial lung cancers have identified chromosome 6q23-25 as a likely locus harboring susceptibility genes8,9. Unfortunately, this is a very broad region that spans over 30 Mb and encodes many genes. In an effort to identify the important candidate genes within this region, researchers have taken a more targeted approach to identify somatic genetic and  2  A version of this chapter is being prepared for submission as a research manuscript. Wilson IM, Vucic EA, Chari R, Zhang YA, Starczynowski DT, Lonergan KM, Enfield KS, Buys TP, Yee J, LairdOffringa I, Karsan A, Liu P, You M, Anderson M, MacAulay C, Lam S, Gazdar AF, Lam WL (2010) "EYA4 is a non-small cell lung cancer tumor suppressor located in the susceptibility locus on chromosome 6q"  46  epigenetic disruptions to these genes10. These studies have delineated many different DNA alterations in the region, including frequent deletion and LOH in tumors of smokers and non-smokers alike, as well as the discovery of epigenetically silenced tumor suppressor (TSGs) genes and a potential oncogene8-12. Despite these efforts, a single causative gene has not been identified, suggesting that alterations to multiple genes within the region might be necessary to confer increased lung cancer risk. Over-expressed oncogenes are currently used for cancer diagnostics and therapeutics. However, as DNA hypermethylation is known to be a very early event during multistage carcinogenesis and has the added benefit of chemotherapeutic reversibility, hypermethylated TSGs are also potentially valuable diagnostic and therapeutic targets13. Therefore, we sought to identify novel and important TSGs in lung AC by identifying "two-hit" genes, i.e., those simultaneously inactivated by two mechanisms (copy-number loss and hypermethylation) in the same tumor specimen. Complementary and minimally biased genome-wide approaches were used to identify the most frequently occurring two-hit loci; only those genes most aggressively silenced in the tumor were retrieved. Using this approach, we have discovered a previously uncharacterized TSG within the lung cancer susceptibility locus at 6q23-25: Eyes absent 4 (EYA4). Here we describe mechanisms of EYA4 inactivation including frequent hypermethylation and deletion, consequent under-expression in NSCLC tumors, and biological function. Specifically, we show that stable inactivation of EYA4 confers reduced apoptosis in vitro and concomitant attenuation of growth arrest and DNA damage-inducible 45 alpha (GADD45a), a gene critical to apoptosis. Most significantly however, we demonstrate that EYA4 is associated with familial lung cancer  47  risk and that low expression is prognostic of poor survival. The function, frequency of disruption, and clinical relevance of EYA4 signify a key role in development of NSCLC.  3.2.  Results  3.2.1.  Integration of array CGH and DNA methylation identifies twohit genes  Using whole-genome array CGH and the Illumina HumanMethylation27 platform to examine a group of 30 AC and paired non-malignant tissue, we identified 45 frequent (≥15%) two-hit loci, corresponding to 38 genes within the tumor group (Table 3.1). In addition to established tumor suppressor loci such as those on chromosome 8p and 17p, we identified probes from chromosome 6q, corresponding to four genes (Eyes Absent 4 [EYA4]; Lipoprotein A Precursor [LPA]; G-Protein-coupled Receptor 6 [GPR6]; Nuclear Receptor Subfamily 2, Group E, member 1 [NR2E1]). Two of these genes map within the previously described susceptibility locus located at chromosome 6q23-25 (Figure 3.1a). An example of an AC tumor that is highly methylated at the EYA4 locus is shown in Figure 3.1b. These four genes were chosen for further validation based on their proximity to the 6q locus.  3.2.2.  Gene expression analysis of two-hit candidates  mRNA expression of NR2E1, GPR6, EYA4, and LPA was assessed in 34 pairs of AC and matched non-malignant lung tissues. Only EYA4 was significantly (p<0.0001; Wilcoxon-signed rank) down regulated in AC tumor tissue relative to matched normal controls (n=34 pairs; Figure 3.1c). As LPA, NR2E1, and GPR6, were not significantly under-expressed, they were excluded from further analysis. 48  We also assessed EYA4 transcript levels in two independent lung cancer cohorts14,15. EYA4 was found to be significantly under-represented in both cohorts relative to available normal references. This provides strong multi-center and multiplatform evidence that EYA4 is consistently under-expressed in lung adenocarcinoma.  3.2.3.  Validation and analysis of EYA4 disruption in lung specimens  To validate the observed DNA hypermethylation of EYA4, quantitative methylationspecific PCR was performed on a region 60 nucleotides upstream to 38 nucleotides downstream of the transcription start site, as methylation of CpG sites within this locus correspond best to loss of EYA4 expression (Figure 3.2a). This analysis of clinical specimens reveals that nearly half (46%) of the tumors showed marked hypermethylation when compared to corresponding paired non-malignant lung, validating our microarray results (n=45 pairs, p<0.0001, Wilcoxon signed rank test) (Figure 3.2b). Confirmation of EYA4 under-expression in AC, as observed in the three cohorts described above, was performed by quantitative reverse-transcriptase PCR (qRT-PCR) in 40 pairs of lung AC and patient matched non-malignant lung tissues. This data demonstrated frequent (72%) down-regulation of EYA4 (Figure 3.2c). To examine the relationship between DNA methylation and expression of EYA4, we correlated mRNA expression with DNA methylation (determined by validation qPCR techniques) in these tumors. The negative correlation, as expected, implied a direct role for DNA methylation in silencing EYA4 (data not shown). To gain further insight and minimize the effects of cellular heterogeneity, a parallel analysis was performed using DNA methylation levels and mRNA expression levels of 49  EYA4 in a panel of lung adenocarcinoma cell line samples (n=38)( Table 3.2). This showed, as expected, that EYA4 expression is inversely correlated with DNA methylation level (range R=-0.59 to R=-0.81; Figure 3.2d). These results demonstrate a direct role of DNA methylation in controlling EYA4 expression.  3.2.4.  Assessment of EYA4 re-expression by treatment with 5'azacytidine  Two lung AC cell lines, one with methylation at EYA4 (NCI-H1395) and one without (HCC2935), were treated with the demethylating drug 5'-azacytidine. As predicted, EYA4 expression in NCI-H1395 was induced following drug treatment (Figure 3.2e) and increased markedly compared to HCC2935. This mirrors previous findings in other epithelial cancer cells16,17 and further confirms that DNA methylation has a direct role in silencing EYA4.  3.2.5.  Prognostic relevance of genes in 6q23-25  We performed survival analysis for EYA4 and for genes from the 6q23-25 locus previously shown to be somatically altered in lung tumors (TCF21, SYNE1, AKAP12, IL20RA, ACAT2)10, by applying a Mantel-Cox log test to an AC public dataset (GSE3141)14. In this analysis, EYA4 was the gene most significantly associated with survival (Figure 3.3a). We discovered that patients whose tumors have low expression of EYA4 have significantly poorer outcome than those with higher EYA4 expression (two-tailed p=0.007, Figure 3.3b). This data underscores the clinical relevance of EYA4 expression to AC patient prognosis.  50  3.2.6.  Association of EYA4 genotype with familial lung cancer risk  In a genome-wide association study of familial lung cancer samples, a cluster of SNPs in the EYA4 gene exhibited associations with familial NSCLC. Five common variants in the EYA4 gene (rs7743259, rs159420, rs35689029, rs1878551, and rs2677826) were found to be associated with familial lung cancer (P < 0.05) (Figure 3.3c).  3.2.7.  Effect of EYA4 knockdown on apoptosis  Previous literature implicates EYA family members as modulators of the apoptotic response18. To investigate whether EYA4 inactivation affects apoptotic response in normal cells, we established a stable shRNA EYA4 knock-down lymphoblast cell line (HCC-1954BL). Analysis of knock-down efficacy revealed that EYA4 expression was decreased 65% compared to control cells (data not shown). To assess whether reduced EYA4 expression conferred an increase in cell survival, annexin V/propidium iodide (AV/PI) staining was used to assess apoptotic cells after 24 hours of serum starvation. Following serum starvation, EYA4 knock-down cells displayed a marked and reproducible decrease in the numbers of early (AV+/PI-) and late (AV+/PI+) apoptotic cells compared to empty vector (pLKO) cells (26.6% ± 4.6 for control vs. 14.5% ± 1.7 for EYA4kd) (Figure 3.4a, 3.4b, 3.4d). Indirect evidence of reduced apoptosis in EYA4 knock-down cells was obtained by performing qRT-PCR to assess expression of the apoptosis marker GADD45a. We found GADD45a expression elevated to a much greater degree in control cells, compared to EYA4 knock-down cells (Figure 3.4c). These findings are consistent with our FACS analysis and suggest reduced apoptotic  51  signaling in EYA4 knock-down cells, implying a direct tumor suppressor role for EYA4 via induction of the apoptotic response.  3.2.8.  Identification of potential EYA4 activation targets  EYA family members are transcriptional co-activators that interact with members of the SIX gene family19,20 to regulate target gene expression. To identify genes potentially activated by the EYA4/SIX complex, the ten highest and ten lowest EYA4expressing bronchial epithelial cell profiles were selected from the 67 available, and were then compared using the significance analysis of microarrays (SAM) algorithm (n=20)21. Twenty-eight genes with expression patterns similar to EYA4 were identified at a q-value threshold of 5 (Table 3.2). Congruent with the known roles of EYA4, pathway analysis software showed these 28 genes to be most significantly enriched for roles in cellular development, embryonic development, cancer, and cell death (data not shown). Of the 28 genes identified, the prominence of GADD45a was interesting, considering its known role in the apoptotic program and cell cycle arrest22. This also correlates well with our observation that GADD45a mRNA expression is attenuated in cells lacking EYA4 expression when exposed to stressful growth conditions. We also identify SOX9 expression as significantly associated with EYA4. This mirrors recent findings of increased EYA4 expression in response to ectopic SOX9 expression in neurofibromas23.  52  3.2.9.  Frequency of EYA4 inactivation in NSCLC squamous cell subtype  Considering the differences and similarities between lung cancer histological subtypes24 we determined whether EYA4 disruption was also common to squamous-cell carcinomas (SqCC), the second most prevalent lung cancer subtype. A comparison of EYA4 expression in SqCC tumors and normal bronchial epithelia shows that EYA4 is also significantly under-expressed in SqCC tumors (p<0.0001, Mann-Whitney U test) (Figure 3.5a). To determine whether the under-expression of EYA4 in SqCC can be attributed in part to epigenomic inactivation, we examined gene-specific DNA methylation levels in SqCC tumors and patient-matched non-malignant tissue using the Illumina GoldenGate assay. We found that similar to AC tumors, EYA4 is significantly hypermethylated in SqCC tumors (p<0.02, Wilcoxon sign rank), (Figure 3.5b). Inactivation of EYA4 by DNA methylation in the two most common NSCLC subtypes implicates EYA4 as a potentially important target for therapy and early detection of NSCLC.  3.2.10. Analysis of EYA4 in pre-invasive lung cancer To assess EYA4 disruption early in tumorigenesis, we evaluated gene dosage levels in 20 carcinoma in situ (CIS) specimens by array CGH. These extremely rare samples were collected by fluorescent bronchoscopy (Figure 3.5e) and represent a stage of cancer development typically too early for detection using routine procedures. Our analysis revealed deletion of EYA4 in 35% (7/20) of CIS lesions, indicating that loss of EYA4 is a common event prior to the development of invasive cancer (Figure 3.5c). To 53  determine whether gene deletion is accompanied by expression disturbances, we used data generated in a previous study by serial analysis of gene expression (SAGE)(see methods). We detected a trend clearly indicating a loss of EYA4 expression in the CIS and SqCC groups when compared to histologically normal bronchial cells (Figure 3.5d), indicating that EYA4 disruption is an early event in NSCLC tumorigenesis. Importantly, using MS-PCR we detected hypermethylation in 40% (4/10) of histologically normal bronchial epithelia from patients with previous NSCLC tumors as well as in a single high risk patient with chronic-obstructive pulmonary disease (Figure 3.5f). This indicates that hypermethylation of the gene is indeed a very early event, and raises the possibility of using EYA4 hypermethylation screening as an early diagnosis or risk assessment tool.  3.2.11. Expression of EYA4 in other malignancies EYA4 has been described both as a potential TSG16,17, and recently as an overexpressed oncogene in tumors of neural origin23. In an attempt to resolve this apparent contradiction, we assessed EYA4 expression in over 350 cancer cell line samples. We detected high EYA4 expression in cancer cells derived from sarcomas, autonomic ganglia, and brain, relative to lung, gastrointestinal, pancreatic, head and neck, or colorectal tumors (Figure 3.6a), indicating possible tissue specific roles. Examination of lung cancer cell lines showed no difference in EYA4 expression between NSCLC and small-cell lung cancer (SCLC) (Figure 3.6b).  3.3.  Discussion  Although exposure to tobacco smoke is known to be the main cause of lung cancer, it is clear that significant genetic modifiers of risk are also involved. Susceptibility 54  manifests as increased lung cancer mortality in siblings of affected individuals, and increased risk where there is family history of the disease25-27. These discoveries spawned the creation of the Genetic Epidemiology of Lung Cancer Consortium (GELCC), and subsequent studies have identified chromosome 6q23-25 as the primary region of susceptibility9. Further work on this region has identified hypermethylated and under-expressed genes, as well as over-expressed potential oncogenes8,10. However, despite extensive work, a single causative gene has not been identified, making it probable that there are multiple loci of importance yet to be characterized in familial disease. Working with familial samples is challenging however, as the mortality (15% five year survival) and low resection rate of lung cancer make it exceedingly difficult to collect adequate numbers of specimens for analysis. We have therefore approached the problem by searching for somatic alterations in tumors to identify genes presumably important in lung cancer based on their inactivation. In this study, we have identified a gene within the primary lung cancer susceptibility region on chromosome 6q that is frequently both lost and hypermethylated - consistent with two-hit inactivation of a tumor suppressor gene. Eyes absent 4 (EYA4), located at 6q23.2, is an attractive TSG candidate not only based on our findings, but also based on known functional roles of other EYA family members (see below) and the presence of autosomal-dominant EYA4 mutations in post-lingual deafness28-32. This is notable because autosomal dominance is the model of susceptibility used by the GELCC to identify the primary lung cancer susceptibility region (6q23-25)9. EYA4 inactivation may be widespread within epithelial malignancies, as hypermethylation of EYA4 has also been identified in esophageal and colon cancer cells16,17. Mutation and abrogation of  55  EYA family members has effects consistent with tumor suppression as well. For example, in Drosophila, EYA mutant cells over-proliferate and fail to differentiate properly21. Also, in mice, exogenous expression of Eya2 triggers apoptosis18. The function of EYA4 is obviously complex and of some importance as recent work by Okabe et al. shows that EYA4 enhances the innate immune response, the body's first line of defense against tumorigenesis, a hallmark of cancer33. Recent work implicates EYA4 as an oncogene in Schwannomas, suggesting a tissue-specific role for the gene, congruent with our data23 (Figure 3.6a). In this study, all lung cancer cohorts analyzed expressed EYA4 at levels significantly below that of normal matched tissues. We identify frequent loss and hypermethylation of the gene, and common two-hit inactivation as well, suggesting this inactivation of this gene is likely to be a causal alteration in lung cancer development. Correlation between reduced mRNA expression and promoter hypermethylation, suggested that DNA methylation regulates EYA4 transcription, which we validated using demethylating drugs (Figure 3.5e). Despite these numerous lines of evidence supporting a tumor suppressor role for EYA4 in lung cancers, identification of EYA4 as a potential oncogene in cancers of neural origin23, suggests that EYA4 may have a tissue-specific role. Our analysis involving multiple tumor types, demonstrates that EYA4 is expressed at a much lower level in epithelial tumors than in sarcomas or neurally derived tumors, supporting these seemingly contradictory data in favor of a tissue-specific role in cancer development. Moreover, we demonstrate that somatic genetic (loss) and epigenetic (hypermethylation) alterations to EYA4 occur in multiple NSCLC sub-groups, thus supporting the relevance of loss of function to lung cancer development.  56  Subsequent to demonstrating that EYA4 disruption is frequent in lung cancer, we sought to determine the functional consequences of this disruption. Based on the known role of EYA4 as a transcriptional co-activator associated with apoptosis, we performed SAM analysis on gene expression profiles from histologically normal bronchial epithelia to identify genes whose transcript levels are associated with expression of EYA4. Here we determined that expression of GADD45a and other celldeath associated genes were significantly associated with EYA4 transcript levels. Our functional work using virus-immortalized normal cells validated these findings by demonstrating that EYA4 knock down prevents cells from undergoing apoptosis in response to stressful growth conditions and also prevents appropriate elevation of GADD45a levels following stress, another hallmark of cancer. This supports our hypothesis that EYA4 disruption by loss and hypermethylation would be advantageous to pre-malignant cells by enabling them to avoid apoptosis. We have identified frequent somatic genetic or epigenetic alteration (70%) of EYA4 which we have validated in multiple lung cancer cohorts using numerous methodologies, amounting to significant evidence that EYA4 is a lung TSG. Work is currently being undertaken to determine whether EYA4 is somatically mutated in NSCLC genomes. As this TSG is located within the 6q susceptibility locus, we sought to determine whether variants of the gene were associated with an increase in lung cancer risk. To do so, we examined familial lung cancer cases and controls, and discovered a cluster of single nucleotide polymorphisms within EYA4 that are significantly associated with lung cancer susceptibility. This evidence suggests that EYA4 is one of the key genes in this previously identified susceptibility locus. As such, it is reasonable to expect such a critical gene to be tightly correlated with patient 57  survival. In fact, this is shown to be the case as low expression of EYA4 is significantly associated with poor prognosis. Given the prevalence of two-hit targeted inactivation of EYA4 in NSCLC, the multiple tumor suppressor functions of the gene, its early disruption in pre-invasive lesions, and its strong association with survival and familial cancer risk, our findings suggest that EYA4 is a novel TSG within the lung cancer susceptibility locus that is critical to NSCLC development in both familial and sporadic lung cancers.  3.4.  Materials and methods  3.4.1.  Sample collection and nucleic acid extraction  Lung tumors (AC and SqCC) and adjacent histologically normal parenchyma were obtained from freshly resected tumors. Microdissection was undertaken so that tumor cells comprised a minimum of 70% of each sample. Bronchial epithelial brushing specimens were obtained from airways less than 2 mm diameter and were collected in RNAlater® Applied Biosystems (Ambion Inc., Canada) and stored at -85°C 34. Bronchial biopsy specimens of locally invasive squamous cell carcinoma and carcinoma in situ were collected in 10% buffered formalin. Some of the CIS samples were also collected in RNAlater® as described previously34. DNA was extracted from each sample using a standard phenol/chloroform protocol. RNA was extracted using TRIzol reagent. The study was approved by the Review of Ethics Board of the University of British Columbia and the British Columbia Cancer Agency.  58  3.4.2.  Array CGH  Array CGH was performed as described previously using a whole-genome tiling path array35,36. Gene dosage profiles were obtained for 30 lung adenocarcinoma (AC) samples and 20 CIS samples. Genome segmentation was performed using aCGHSmooth37 so that lost clones were assigned a value of -1, retained clones a value of 0, and gained clones a value of 1.  3.4.3.  DNA methylation profiling  For AC samples and lung cancer cell line samples, DNA methylation profiling was performed using the Illumina HumanMethylation27 chip. Five hundred nanograms of DNA from 30 ACs and 30 matched parenchyma samples were analyzed by this technology. Normalized β-values were obtained and only those with a detection p-value of ≤0.05 were used. For the AC tumor samples, probes were deemed 'hypermethylated' if the difference between tumor and patient matched normal parenchyma was >0.2429, representing the 95th percentile of differences observed. For squamous cell carcinoma (SqCC) samples, DNA methylation was performed on the Illumina GoldenGate Cancer chip (8 SqCC tumors and 8 normal samples were tested).  3.4.4.  Array CGH/DNA methylation integration  For AC samples, DNA copy number status was determined for each BAC array clone (-1 as loss, 0 as neutral, +1 as gain). Where overlapping clones differed in computed status, the midpoint of the overlap was used to eliminate redundancy. Gene dosage was then obtained for each Illumina probe that was covered by the copy number assay and located within an autosome (23,940 probes). 59  3.4.5.  Gene expression profiling and data processing  3.4.5.1. Agilent Sixty-eight mRNA expression profiles were generated from 34 AC tumor samples and respective matched normal parenchyma samples on a custom 39,909 probe Agilent microarray. Two channel experiments were performed by differentially labeling the test sample (tumor or parenchyma) and a standard normal lung reference. Differentially labeled products were then competitively co-hybridized to the custom array. Expression profiles were processed using the Rosetta Resolver system (Rosetta, Seattle, WA, USA) and log10 (sample/standard) values were generated for each probe.  3.4.5.2. Affymetrix Publicly available Affymetrix U133 Plus 2.0 and Affymetrix U133A CEL files were downloaded from NCBI Gene Expression Omnibus (GEO) from the study by Bild et al. under accession GSE3141 14 and Landi et al. under accession GSE10072 15,38. Affymetrix .CEL files used for EYA4 expression in multiple tissues of origin were downloaded for The Sanger Cell Line Project at the BROAD Institute (http://www.broad.mit.edu/cgi-bin/cancer/web_tools/get_files/index.cgi). Non-malignant bronchial brush cells from airways < 2 mm in diameter were retrieved from heavy current and former smokers by bronchial brushing during bronchoscopy under conscious sedation. Expression profiles were generated for 45 collected brushings on the Affymetrix U133 Plus 2.0 chip. Bronchial brush cells and cancer data from GSE3141 were combined and RMA analysis 39 was performed on this merged dataset  60  using the "affy" package from Bioconductor 40. For each sample, log2 intensity values were then calculated for every probe.  3.4.6.  Quantitative real-time PCR analysis of mRNA expression  The High-Capacity cDNA archive kit from (Applied Biosystems Inc., Foster City CA) was used to generate cDNA total RNA. Quantitative real-time PCR was performed using mastermix, plates, and primers (Applied Biosystems Inc. EYA4- Hs00187965_m1, 18s - Hs99999901_s1, GADD45a - Hs00169255_m1). Triplicate reactions were performed for each biological sample being assayed. Fold-expression changes were calculated using the ΔΔCt method, using untreated cells as the comparator 41. Error bars were calculated using ABI software.  3.4.7.  Quantitative real-time methylation-specific PCR  Quantitative PCR analyses were performed using the Chromo4 MJ Research Real time PCR system. Sodium bisulfite-treated genomic DNA was prepared using EZ DNA methylation-Gold kit (Zymo Research, Orange, CA) and amplified by fluorescencebased real time MSP using TaqMan technology as described previously42,43. In brief, EYA4 promoter and exon 1 region primers (NCBI Reference Sequence: NT_025741.15, 6q23.2), 5’-TTGCGTAAGTGCGAGGTTGTC-3’ (forward), 5’AACAACGACAACTTCACGTAA-3’ (reverse), and 5’- FAM TCGTTTTCGGTTTTCGCGTAA BHQ1-3’ (probe), were designed to specifically amplify bisulfite-converted DNA within a region of the promoter that contains frequent methylation of CpG sites that corresponds to loss of EYA4 expression in cancer cells. The non-methylated form of MYOD1 was used as an internal reference standard. The 61  fluorescence emission intensity values for EYA4 and MYOD1 were calculated based on the Ct value of an individual sample using the intercept and the slope of the standard curve. The EYA4 methylation ratio is defined as the ratio of the fluorescence emission intensity value of EYA4 to those of MYOD1 multiplied by 1000. The ratio is a measure for the relative level of methylation in an individual sample. Standard MS-PCR was performed using primers specific to methylated and unmethylated forms of EYA4 as described previously17  3.4.8.  5'-azacytidine treatment  NCI-H1395 (EYA4 methylation positive) and NCI-HCC2935 (EYA4 methylation negative) cells were cultured according to American Type Culture Collection directions. Cells were seeded in triplicate at recommended density on day 0, and media containing 10 μM 5'-azacytidine (5' -aza, Sigma Aldrich, St. Louis MO) was added on days 1, 3, and 5. On day 6, total RNA was extracted using TRIzol reagent. Untreated cells were seeded and cultured along with treated cells, with similar media changes but with no drug.  3.4.9.  DNA methylation and gene expression correlation  For a panel of 38 cell lines established from lung AC tumors, DNA methylation levels for EYA4 were obtained using the the Illumina HumanMethylation27 chip, and mRNA expression levels for EYA4 were obtained from the Illumina WG-6 gene expression chip. The Spearman correlation coefficient was calculated for each of the 8 EYA4 DNA methylation probes with the average of the 3 EYA4 expression probes.  62  3.4.10. EYA4 mRNA knockdown HCC1954BL lymphoblastoid cells were obtained from ATCC (ATCC, Manassas VA). They were established in recommended media (RPMI 1640) with 10% FBS. Lentiviral vectors with an shRNA insert targeted against EYA4 as well as a puromycin resistance selectable marker were purchased from Open Biosystems (Huntsville, AL). Briefly, lentiviruses containing a pLKO plasmid construct coding an shRNA targeted for EYA4, were prepared by transfecting 293T cells with the packaging plasmids VSVG and d8.91 and the shRNA plasmids using TransIT-LT1 transfection reagent (Mirus, Mississauga, ONT). Virus containing empty pLKO vector served as a control. Virus supernatant was collected from the transfected 293T cells each day for 3 consecutive days post transfection. Cells of the lymphoblast cell line HCC1954BL were infected at 50-60% confluencey, using 1 mL of virus. After 48 h, cells were selected with 5.0 mg/mL puromycin. Selection was continued until all non-infected cells were eliminated. Stably -infected cell lines were maintained in growth media (RPMI1640/10%FBS) supplemented with 5.0 mg/mL of puromycin. Sequence details for the EYA4 hairpin used can be found on the Open Biosystems website (Clone Id T TRCN0000051094, www.openbiosystems.com).  3.4.11. Survival analysis Gene expression profiles and associated survival data under accession GSE3141 were downloaded from NCBI GEO38. This is currently the only public dataset with annotated survival data available which makes use of the more advanced Affymetrix U133 Plus 2.0 microarray. This array contains significantly improved probed design, which enable higher confidence in our bioinformatic analyses. Adenocarcinoma profiles 63  from this dataset were analyzed. The highest and lowest tertiles of samples, based on EYA4 expression (probeset 1561088_at), were analyzed using survival analysis functions in Matlab (Mathworks, Natick, MA). Performing a Log-rank (Mantel-Cox) test indicated that the curves were different and that low EYA4 expression is associated with poor survival. Similar analyses were undertaken for other genes mapping to the same chromosomal region, including TCF21, SYNE1, AKAP12, IL20RA, and ACAT2. Further analysis stratified based on other clinical parameters was not possible due to sample size.  3.4.12. Function analysis Functional Analysis using Ingenuity Pathway Analysis (Ingenuity Systems, Redwood, CA) identified the biological functions and/or diseases that were most significant to the data set. Genes generated from the Significance Analysis of Microarrays (SAM) analysis of low and high EYA4 expression profiles and were associated with biological functions and/or diseases in the Ingenuity Pathways Knowledge Base were considered for the analysis 21. Fisher’s exact test was used to calculate a p-value determining the probability that each biological function and/or disease assigned to that data set is due to chance alone.  3.4.13. AnnexinV/propidium iodide staining Approximately 1 × 105 cells (HCC 1954BL-PLKO, HCC1954BL - EYA4 knockdown) were washed in PBS, and resuspended in AnnexinV-binding buffer (10 mM HEPES; 140 mM NaCl; 2.5 mM CaCl2; pH 7.4), propidium iodide (50 ng/μL), and AnnexinVconjugated antibody (1:20). Following a 15 min incubation, an additional 500 μL 64  AnnexinV-binding buffer was added and the cells were analyzed by flow cytometry. The apoptosis assay was carried out for three biological replicates.  3.4.14. Serial analysis of gene expression data analysis SAGE profiles for 14 histologically normal exfoliated bronchial epithelia samples, 5 CIS lesions, and 6 locally advanced SqCCs (GSE7898) were assessed for EYA4 expression using the EYA4 SAGE tag with the highest average count number (TAATTTGTGT). On average, 105 SAGE tags, excluding linker and duplicate ditags, were sequenced per library. Therefore, for normalization and facilitation of comparisons, tag counts were scaled to 10010,000 tags for each library. Library generation and tag-mapping are detailed in Lonergan et al.34,44.  3.4.15. Analysis of familial lung cancer genotypes Detailed information on study subjects were described previously45. Briefly, 194 familial lung cancer cases and 217 disease-free controls were recruited from the Genetic Epidemiology of Lung Cancer Consortium (GELCC). Each case patient with familial lung cancer was chosen from one high-risk lung cancer family with three or more members with lung cancer. All the case patients in this study are histologically confirmed non-small cell lung cancer. Non-cancer control subjects were obtained from a combination of unaffected spouses from GELCC families and of unaffected individuals from the Coriell Institute for Medical Research (Camden, NJ) and the Fernald Medical Monitoring Program (Fernald, OH). These control subjects had no blood relationship with any selected case patients.  65  The statistical significance of the association between SNP allele and disease status was assessed primarily with Cochran-Armitage trend test with 1 degree of freedom, implemented in PLINK software (http://pngu.mgh.harvard.edu/~purcell/plink/). Allelic odds ratios (ORs) associated with each SNP and 95% confidence intervals (CIs) were estimated.  3.4.16. Statistical analysis Non-parametric statistical tests were used wherever possible. The Wilcoxon signedrank test was used for paired tests and the Mann-Whitney U test was used for unpaired comparisons. For correlation analysis, the Spearman rank correlation coefficient was calculated. Statistics were calculated using Matlab (Mathworks, Natick, MA) and Graphpad Prism (Graphpad, La Jolla, CA).  3.4.17. Microarray data deposition Data for the array CGH (30 AC samples) and DNA methylation (30 matched AC and non-malignant lung samples) are deposited in GEO under accession GSE19034.  66  Figure 3.1 - Frequent disruption and down-regulation of EYA4 (a) Frequency of two-hit alterations on chromosome 6. To identify new and important TSGs in lung cancer, we obtained synchronized gene dosage and DNA methylation status data for >23,000 CpG dinucleotides associated with >12,000 genes for 30 pairs of lung AC and patient-matched non-malignant lung tissues (n=30 pairs). Loci that were hypermethylated and lost in the same sample were deemed "two-hit". A two-hit frequency threshold of 0.15 (the horizontal dashed red line) was used to select genes for further gene expression analysis. The previously identified lung cancer susceptibility locus at 6q23-25 is denoted by the black dashed vertical lines. The location of four gene-associated probes altered in greater than 15% of samples NR2E1, GPR6, EYA4 and LPA - are shown. (b) Hypermethylation of EYA4 in one tumor/non-malignant pair. Beta values were calculated using BeadStudio software (Illumina) and represent the fraction formed by the division of the methylated signal over the total signal: β = methylated signal/(methylated signal + unmethylated signal). EYA4 methylation levels for non-malignant lung are shown in grey, and methylation levels at the same CpG locus in patient-matched AC tumor are shown in white. This tumor sample is significantly (p=0.0078; Wilcoxon signed-rank) hypermethylated at the 8 EYA4 probes represented on the array compared to the matched non-malignant lung. The probe cg01805282 exhibited the highest two-hit of any EYA4 probe with a frequency of 20%. (c) EYA4 is under-expressed in AC tumor samples. In addition to DNA hypermethylation and allelic loss, mRNA levels of a TSG should be depleted in tumor tissues relative to non-malignant samples. EYA4 expression was assessed in 34 AC tumors and 34 patient-matched non-malignant lung specimens by comparing the expression of EYA4 in the sample analyzed (non-malignant lung or tumor) to that of a 67  normal reference in a two-channel experiment. Each dot represents EYA4 expression in a different tumor or non-malignant sample. The position on the Y-axis is determined by the log2 ratio of the sample (tumor or non-malignant) signal to the reference signal (universal lung reference). The AC group is significantly (p<1x106; Wilcoxon signed rank) under-expressed relative to the non-malignant tissue.  68  Figure 3.2 - EYA4 hypermethylation Controls Gene Expression (a) Bisulfite sequencing identifies DNA methylation loci controlling EYA4 expression. Sequencing of the EYA4 promoter CpG island after sodium bisulfite conversion was performed in immortalized human bronchial epithelial cultures (HBEC) that express EYA4, and NSCLC lines lacking expression. Analysis of the DNA methylation levels at sequenced CpG sites identified a region immediately 5’ of the transcription start site that correlated with expression. Unmethylated (black), infrequently methylated (blue), frequently methylated (pink) and fully methylated (red) CpG sites are illustrated with vertical lines. The region proximal to the transcription start site shown as the green dashed line is the location chosen for the qMS-PCR assay described below. (b) Validation of EYA4 hypermethylation in clinical specimens. In order to validate hypermethylation of EYA4 as seen in the microarray studies, we used a highly-sensitive locus-specific quantitative real-time methylation-specific PCR assay. Using this assay we have examined the methylation levels of EYA4 and MYOD1 (un-methylated control gene) in 46 AC tumors and 46 patient matched non-malignant lung tissues. The methylation status of EYA4 is shown by the position on the Y-axis, which corresponds to the ratio of methylated EYA4 signal to MYOD1 signal. Each dot represents one sample analyzed. The AC samples analyzed were significantly hypermethylated compared to the parenchyma samples (p<0.0001; Wilcoxon signed rank test). The dotted line in this diagram represent anEYA4/MYOD1 ratio threshold of 5, which was used to define hypermethylated samples. This locus-specific approach confirmed that EYA4 is frequently hypermethylated in AC tumors. (c) Validation of EYA4 underexpression by qRT-PCR. EYA4 mRNA levels were measured by quantitative reversetranscriptase PCR in 40 lung AC samples and 40 patient matched non-malignant lung 69  tissue samples. Performing each reaction in triplicate, and using primers specific for 18s RNA as an endogenous control we find that EYA4 expression is reduced to less than half of normal levels in over 70% of samples (indicated by the broken line). This further validates the multiple microarray cohorts we have analyzed, suggesting that down-regulation of EYA4 is a common occurrence in lung AC. (d) Correlation of EYA4 mRNA levels with EYA4 promoter methylation. DNA methylation levels were obtained for 38 lung AC cell line samples using the Illumina HumanMethylation27 array. EYA4 mRNA levels were obtained for the same cell line samples using the Illumina WG-6 gene expression array. Gene expression values calculated from the average of the three EYA4 expression probes were compared with DNA methylation levels at the 8 loci assayed on the chip using a Spearman correlation. Spearman correlation coefficients ranged from -0.59 to -0.81. In this image, the correlation between DNA methylation at probe cg24176563 (ρ=-0.81) is shown, exhibiting the tight relationship between EYA4 methylation and expression levels. (e) 5'-azacytidine restores EYA4 expression in methylated cells. To further validate the role of DNA promoter hypermethylation in controlling EYA4 mRNA expression, NCI-H1395 and HCC2935 (two lung AC lines) were treated with 10 µM 5'-azacytidine and EYA4 expression was assessed by qRTPCR. NCI-H1395 has a hypermethylated CpG island upstream of EYA4, and HCC2935 does not. After treatment with 5'-azacytidine, expression of EYA4 was vastly increased in NCI-H1395 compared to HCC2935, when using 18s as an endogenous control,  70  demonstrating the role of DNA hypermethylation in control of EYA4 expression.  71  Figure 3.3 - EYA4 expression is associated with poor survival and familial lung cancer risk (a) Survival analysis of genes within the 6q23-25 locus with previously identified genetic or epigenetic alterations. Chromosome 6 is shown, along with an expanded diagram of the 6q23-25 susceptibility locus, with basepair positions located to the left of the expanded ideogram. Genes that have previously been shown to be altered in lung cancer cell lines are shown, and those that are altered in clinical lung tumors have been included in the survival analysis. EYA4 is more significantly associated with prognosis than other previously identified genes highlighting its importance and potential clinical utility. (b) EYA4 mRNA expression and survival in AC patients. EYA4 mRNA expression was assessed in an external dataset (GSE3141) and samples were divided into tertiles. Survival information for the highest and lowest tertiles (for EYA4 expression) were compared using the Mantel-Cox log test. In this Kaplan-Meier plot, the dashed line represents the low-EYA4 expressing patients, and the solid line represents the high-EYA4 expressing patients. EYA4 expression is significantly associated with poorer prognosis (p=0.007). (c) Location of SNPs significantly associated with familial lung cancer risk. Genotype data for 194 6q-linked familial lung cancers and 217 unrelated non-cancer controls were compared to determine whether EYA4 allelotypes were associated with risk. Five SNPs displayed here were significantly (P<0.05) associated. Their location within and around the genomic sequence of EYA4 is shown. The purple vertical bars represent exons of the gene, which is transcribed from top to bottom starting at the black arrow. The gene is located within the chromosomal band 6q23.2. (d) SNPs significantly associated with familial lung cancer risk. The dbSNP ID is shown in the left most column, and it's genomic 72  position is also shown. The allele associated with an increase in risk is described as the risk allele.  73  Figure 3.4 - EYA4 promotes apoptosis Assessment of apoptosis in EYA4 targeted knockdown cells. (a) Evaluation of apoptosis levels after cell stress. Following 24 h of serum starvation, approximately 100,000 EYA4 knockdown (EYA4kd) cells and the same number of control cells transduced with vector alone were stained for Annexin V and propidium iodide to detect cells undergoing apoptosis. Stained cells were then analyzed by flow cytometry to detect apoptotic cells (n=3). Significantly more apoptotic cells exist in the control cells (grey) than in the EYA4 knockdown cells (white), indicating EYA4 is involved in the apoptotic program (p=0.05;Mann-Whitney U-test). (b) FACS analysis of control cells. In this representative FACS plot, intensity of Annexin V binding is demonstrated on the Xaxis and intensity of PI staining is displayed on the Y-axis. HCC-1935 BL cells transduced with an empty PLKO vector were starved of serum for 24 hr. After serum starvation, FACS analysis of cells was performed to count cells undergoing apoptosis. FACS analysis of these control cells showed more apoptotic cells in the upper right quadrant than the EYA4kd cells (shown in d). (c) GADD45a expression levels in serumstarved EYA4kd and control cells. GADD45a is a gene known to be 1) activated under stressful growth conditions, 2) activated in the presence of DNA damage, and 3) downregulated in NSCLC46. GADD45a mRNA levels were assessed by qRT-PCR in control (grey) and EYA4kd (white) cells. The Y-axis shows the change in GADD45a expression following 24 hr of serum starvation. GADD45a expression following serum starvation rises substantially in the control cells, but is significantly attenuated in the EYA4kd cells. As GADD45a is associated with DNA damage and stressful growth conditions, this mirrors our finding that EYA4kd attenuates the apoptotic response (n=3). (d) FACS analysis of EYA4kd cells. This representative plot demonstrates intensity of Annexin V 74  binding is on the X-axis and intensity of PI staining on the Y-axis. HCC-1935BL cells with reduced EYA4 expression via targeted knockdown were starved of serum for 24 hr. After serum starvation, FACS analysis of cells was performed to count cells undergoing apoptosis. This analysis demonstrates the abrogation of the apoptotic program following serum starvation in lymphoblastoid cells with reduced expression of EYA4 when compared to control cells.  75  76  Figure 3.5 - DNA methylation and mRNA expression of EYA4 in SqCC and pre-invasive samples EYA4 inactivation in SqCC tumors and pre-malignant lesions. (a) mRNA expression levels of EYA4 in SqCC tumors and histologically normal bronchial epithelia. SqCC Affymetrix expression profiles (n=45) were downloaded from GEO accession number GSE3141. Bronchial epithelia from small airways were used as a normal comparator (n=67). Data for these two sets was RMA normalized together, and expression values were log2 transformed. SqCC tumors express significantly less EYA4 mRNA than normal tissues (p<0.0001; Mann Whitney U test) showing consistent down-regulation of the gene in SqCC as well as AC tumors. (b) DNA methylation levels of SqCC tumors and matched non-malignant lung tissue (n=8 pairs). EYA4 promoter DNA methylation levels were obtained for the probes present on the Illumina Goldengate platform, and probe methylation level was averaged for each sample. Each sample is plotted as a separate dot. Tumors show significantly more (p<0.02; Wilcoxon signed rank test) methylation than non-tumor tissue, indicating that EYA4 is hypermethylated in SqCC tumors as well as AC tumors. (c) Loss of EYA4 is an early event. DNA copy number of EYA4 was assessed by array CGH in carcinoma in situ lesions (n=20). Two clones span the EYA4 locus, and their name and genomic location are given. After array CGH was performed and systematic biases were removed by normalization, computational segmentation was performed. Analysis of segmented values is shown here as green for clones exhibiting loss, grey for retained clones, and white for data excluded based on quality. This indicates that EYA4 is frequently (7/20: 35%) deleted in these very early lesions. (d) Analysis of EYA4 expression in pre-invasive lesions and early cancers was assessed using serial-analysis of gene expression data. EYA4 expression, counted in tags per 10,000, is plotted for three groups representing 25 distinct biological 77  samples. 14 normal bronchial epithelia samples, 5 CIS samples, and 6 invasive SqCC samples were profiled. In this figure each dot represents an individual sample and the horizontal line represents the mean of tags for that group. Using these criteria, it is evident that EYA4 expression is reduced in the CIS group and remains reduced in the SqCC group. This again demonstrates that inactivation of EYA4 is a very early event in the progression of cancer. (e) White light bronchoscopy (top) can be used to identify carcinoma in situ lesions, however, auto-fluorescence bronchoscopy identifies carcinoma in situ lesions with much more sensitivity. Lack of auto-fluorescence is used to help identify CIS lesions and to delineate the margin of disease. After resection, CIS lesions are processed for DNA and RNA extraction providing a unique opportunity to study these early specimens. (f) EYA4 is hypermethylated in histologically normal bronchial epithelia. Hypermethylation of EYA4 was detected after bisulfite-conversion using MS-PCR. Positive (NCI-H13395) and negative (HCC-2935) controls are shown, along with the bronchial epithelia from the small airways (<2mm) of 10 patients with distant NSCLC tumors and 5 patients without cancer at the time of the sample collection. Identification of hypermethylation in these histologically normal cells provides compelling evidence that inactivation of EYA4 is a very early event. Hypermethylation of EYA4 may also be of use as a potential biomarker to screen for the presence of, or risk of lung cancer.  78  79  Figure 3.6 - EYA4 expression in other malignancies Expression of EYA4 mRNA in other tumor types. (a) EYA4 expression in cancer cells from multiple tissues. Gene expression profiles were obtained for over 350 cancer cell lines of various origin; sarcoma, autonomic ganglia, brain, lung, gastrointestinal track, pancreas, head and neck, and colorectal. All Affymetrix gene expression profiles were RMA normalized together, and probe levels were log2 normalized. EYA4 expression level is shown on the Y-axis. The whiskers encompass the middle 90% of the data points, and samples with expression levels lying outside the whiskers plotted as dots. The median of the data points is represented by the bar in the middle of the box. EYA4 expression levels were assessed in these cells lines, according to tissue of origin. We have identified tissue-specific expression of EYA4, noting that it is higher in sarcomas and neurally-derived cancers than it is in the epithelial cancers. (b) EYA4 expression in different lung cancer histologic subtypes. Gene expression profiles described above for lung cancer cell lines were further segregated into small-cell and non-small cell lung cancer subtypes. EYA4 expression level is shown on the Y-axis. The whiskers encompass the middle 90% of the data points, and samples with expression levels lying outside the whiskers plotted as dots. The median of the data points is represented by the bar in the middle of the box. EYA4 expression was then assessed in the group as a whole, and in each major histologic subtype. As shown, there is no observed difference between subtypes for EYA4 expression.  80  81  Table 3.1 - Frequently deleted and hypermethylated probes in lung adenocarcinoma TargetID  Chr  MapInfo  Symbol  Product nuclear receptor subfamily 2; group E; member 1  CPG_ISLAND  TwoHitFreq  cg03958979  6  108593080  NR2E1  TRUE  0.3  cg08369065  8  11600092  GATA4  GATA binding protein 4 NFAT activation molecule 1 precursor  TRUE  0.3  cg17568996  22  41158069  NFAM1  FALSE  0.3  cg20723355  17  6620321  FBXO39  F-box protein 39 neurofilament 3 (150kDa medium)  TRUE  0.26666667  cg23290344  8  24827371  NEF3  cg02613386  17  6620257  FBXO39  TRUE  0.26666667  TRUE  0.23333333  TRUE  0.23333333  GUCY2D  F-box protein 39 growth arrest-specific 7 isoform a guanylate cyclase 2D; membrane (retinaspecific)  cg22471346  17  10042198  GAS7  cg25465406  17  7846823  TRUE  0.23333333  cg01805282  6  133605457  EYA4  eyes absent 4 isoform c  TRUE  0.2  cg07260592  6  161020112  LPA  TRUE  0.2  cg07634191  8  27906097  SCARA5  lipoprotein; Lp(a) hypothetical protein LOC286133  TRUE  0.2  cg08118311  18  74841389  SALL3  TRUE  0.2  cg09945801  8  31009681  WRN  TRUE  0.2  cg11220060  19  12858418  KLF1  sal-like 3 Werner syndrome protein Kruppel-like factor 1 (erythroid)  FALSE  0.2  cg14900471  8  11599029  GATA4  TRUE  0.2  cg17963840  8  26778778  ADRA1A  TRUE  0.2  cg19537511  17  8154538  ARHGEF15  FALSE  0.2  cg19697981  6  108594513  NR2E1  TRUE  0.2  cg26252167  6  110407266  GPR6  GATA binding protein 4 alpha-1A-adrenergic receptor isoform 3 Rho guanine exchange factor 15 nuclear receptor subfamily 2; group E; member 1 G protein-coupled receptor 6  TRUE  0.2  cg00090147  8  11599776  GATA4  TRUE  0.16666667  cg04662594  8  21972961  EPB49  FALSE  0.16666667  cg07017374  13  27572451  FLT3  GATA binding protein 4 erythrocyte membrane protein band 4.9 (dematin) fms-related tyrosine kinase 3  TRUE  0.16666667  cg09626984  8  11603505  GATA4  TRUE  0.16666667  cg10709021  8  31009650  WRN  TRUE  0.16666667  cg10762615  17  18588052  FBXW10  FALSE  0.16666667  cg12111714  13  24941472  ATP8A2  GATA binding protein 4 Werner syndrome protein F-box and WD-40 domain protein 10 ATPase; aminophospholipid transporter-like; Class I;  TRUE  0.16666667  82  TargetID  Chr  MapInfo  Symbol  Product type 8A; member 2  CPG_ISLAND  TwoHitFreq  cg13098960  17  7434875  SOX15  SRY-box 15 cytochrome P450; family 2; subfamily E; polypeptide 1  FALSE  0.16666667  TRUE  0.16666667  glucose transporter 14 cytochrome P450; family 4; subfamily F; polypeptide 3 transient receptor potential cation channel; subfamily M; member 3 isoform a  TRUE  0.16666667  FALSE  0.16666667  FALSE  0.16666667  CTL2 protein protocadherin 8 isoform 2 precursor potassium voltagegated channel; shakerrelated subfamily; member 5  TRUE  0.16666667  TRUE  0.16666667  TRUE  0.16666667  GATA binding protein 4 neuro-oncological ventral antigen 2  TRUE  0.16666667  TRUE  0.16666667  CTL2 protein gap junction protein; beta 6 (connexin 30)  TRUE  0.16666667  TRUE  0.16666667  DHHC1 protein hairless protein isoform a potassium voltagegated channel; shakerrelated subfamily; member 6 homeobox protein Gsh1 Smith-Magenis syndrome chromosome region; candidate 7 aldehyde dehydrogenase 3 family; member A1 potassium voltagegated channel; subfamily G; member 3 isoform 1  FALSE  0.16666667  TRUE  0.16666667  TRUE  0.16666667  TRUE  0.16666667  TRUE  0.16666667  FALSE  0.16666667  TRUE  0.16666667  cg13315147  10  135191518  CYP2E1  cg13323752  12  7916834  SLC2A14  cg16377880  19  15612167  CYP4F3  cg16832407  9  72926359  TRPM3  cg17826679  19  10597038  SLC44A2  cg20366906  13  52320382  PCDH8  cg20792062  12  5023549  KCNA5  cg21073927  8  11599564  GATA4  cg21461100  19  51169178  NOVA2  cg21663431  19  10597355  SLC44A2  cg22377389  13  19703679  GJB6  cg25766774  3  44993913  ZDHHC3  cg26045434  8  22043806  HR  cg26162582  12  4788652  KCNA6  cg26609631  13  27264814  GSH1  cg26771272  17  18103908  SMCR7  cg27329371  17  19592532  ALDH3A1  cg27553955  2  42573830  KCNG3  83  Table 3.2 - Genes with expression patterns in concordance with EYA4 in non-malignant bronchial epithelial cells Gene ID  Affymetrix Probe Name  Denominator Score(d)  Numerator(r)  (s+s0)  Fold Change  qvalue(%)  CKMT1A  235452_at  5.92572  0.559732  0.094458  1.473996  0  GADD45A  203725_at  4.630924  0.538596  0.116304  1.452558  0  NPAS3  233865_at  4.466375  1.357999  0.30405  2.563295  0  CEP135  206003_at  4.434605  0.833131  0.18787  1.781547  0  MAP2K5 LOC25513 0  235601_at  4.370369  0.813179  0.186066  1.757078  0  236764_at  4.354703  1.093991  0.251221  2.134638  0  NEK9 SUV420H 1  231316_at  4.300176  0.708888  0.164851  1.634544  0  218242_s_at  4.25631  0.581956  0.136728  1.496877  0  CENTG2  204066_s_at  4.07371  0.547322  0.134355  1.46137  3.643493  HSPD1  200807_s_at  4.071336  0.419348  0.103  1.337323  3.643493  ZFP64  218968_s_at  4.052255  0.50861  0.125513  1.422679  3.643493  DDAH1  237290_at  4.020055  0.923623  0.229754  1.896873  3.643493  NRXN3  236750_at  4.000652  0.934733  0.233645  1.911536  3.643493  JMJD2C  239285_at  3.997896  0.954508  0.238752  1.937918  3.643493  DPY19L1  212792_at  3.963598  0.426847  0.107692  1.344293  3.643493  MAP4K4  218181_s_at  3.923466  0.689141  0.175646  1.612324  3.643493  DENR  231896_s_at  3.865146  0.558199  0.144418  1.472429  3.643493  FOXO1A  232882_at  3.85994  0.762256  0.197479  1.69614  3.643493  LRP2BP  207797_s_at  3.852534  0.445035  0.115518  1.361347  3.643493  FAM13A1  217047_s_at  3.839966  0.531933  0.138526  1.445866  3.643493  KBTBD10  219106_s_at  3.783322  1.158861  0.306308  2.23281  3.643493  WHSC1L1 HNRPA2B 1  242968_at  3.76722  1.056313  0.280396  2.079611  3.643493  225932_s_at  3.760506  0.661178  0.175821  1.581373  3.643493  SEC14L1  202084_s_at  3.731999  0.400408  0.10729  1.319881  3.643493  SIP1  211114_x_at  3.729181  0.583254  0.156403  1.498225  3.643493  C11orf30  219012_s_at  3.711454  0.777598  0.209513  1.714275  3.643493  PLEKHA8  227247_at  3.710058  0.471515  0.127091  1.386565  3.643493  SOX9  202936_s_at  3.708915  1.114911  0.300603  2.165817  3.643493  84  3.5. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.  12. 13. 14. 15. 16. 17. 18. 19. 20.  References Jemal, A., et al. Cancer statistics, 2009. CA Cancer J Clin (2009). Horner, M.J., et al. SEER Cancer Statistics Review, 1975-2006. Vol. 2009 (National Cancer Institute, Bethesda, MD, 2009). Amos, C.I., et al. Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat Genet 40, 616-622 (2008). Hung, R.J., et al. A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452, 633-637 (2008). Liu, Y., et al. Haplotype and cell proliferation analyses of candidate lung cancer susceptibility genes on chromosome 15q24-25.1. Cancer Res 69, 7844-7850 (2009). Thorgeirsson, T.E., et al. A variant associated with nicotine dependence, lung cancer and peripheral arterial disease. Nature 452, 638-642 (2008). Landi, M.T., et al. A genome-wide association study of lung cancer identifies a region of chromosome 5p15 associated with risk for adenocarcinoma. Am J Hum Genet 85, 679-691 (2009). You, M., et al. Fine mapping of chromosome 6q23-25 region in familial lung cancer families reveals RGS17 as a likely candidate gene. Clin Cancer Res 15, 2666-2674 (2009). Bailey-Wilson, J.E., et al. A major lung cancer susceptibility locus maps to chromosome 6q23-25. Am J Hum Genet 75, 460-474 (2004). Tessema, M., et al. Promoter methylation of genes in and around the candidate lung cancer susceptibility locus 6q23-25. Cancer Res 68, 1707-1714 (2008). Girard, L., Zochbauer-Muller, S., Virmani, A.K., Gazdar, A.F. & Minna, J.D. Genome-wide allelotyping of lung cancer identifies new regions of allelic loss, differences between small cell lung cancer and non-small cell lung cancer, and loci clustering. Cancer Res 60, 4894-4906 (2000). Goeze, A., et al. Chromosomal imbalances of primary and metastatic lung adenocarcinomas. J Pathol 196, 8-16 (2002). Belinsky, S.A., et al. Aberrant promoter methylation in bronchial epithelium and sputum from current and former smokers. Cancer Res 62, 2370-2377 (2002). Bild, A.H., et al. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439, 353-357 (2006). Landi, M.T., et al. Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival. PLoS One 3, e1651 (2008). Osborn, N.K., et al. Aberrant methylation of the eyes absent 4 gene in ulcerative colitis-associated dysplasia. Clin Gastroenterol Hepatol 4, 212-218 (2006). Zou, H., et al. Frequent methylation of eyes absent 4 gene in Barrett's esophagus and esophageal adenocarcinoma. Cancer Epidemiol Biomarkers Prev 14, 830834 (2005). Clark, S.W., Fee, B.E. & Cleveland, J.L. Misexpression of the eyes absent family triggers the apoptotic program. J Biol Chem 277, 3560-3567 (2002). Ohto, H., et al. Cooperation of six and eya in activation of their target genes through nuclear translocation of Eya. Mol Cell Biol 19, 6815-6824 (1999). Li, X., et al. Eya protein phosphatase activity regulates Six1-Dach-Eya transcriptional effects in mammalian organogenesis. Nature 426, 247-254 (2003). 85  21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39.  Tusher, V.G., Tibshirani, R. & Chu, G. Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98, 51165121 (2001). Zhang, X., et al. Loss of expression of GADD45 gamma, a growth inhibitory gene, in human pituitary adenomas: implications for tumorigenesis. J Clin Endocrinol Metab 87, 1262-1267 (2002). Miller, S.J., et al. Inhibition of Eyes Absent Homolog 4 expression induces malignant peripheral nerve sheath tumor necrosis. Oncogene (2009). Toyooka, S., et al. Smoke exposure, histologic type and geography-related differences in the methylation profiles of non-small cell lung cancer. Int J Cancer 103, 153-160 (2003). Tokuhata, G.K. & Lilienfeld, A.M. Familial aggregation of lung cancer in humans. J Natl Cancer Inst 30, 289-312 (1963). Sellers, T.A., et al. Evidence for mendelian inheritance in the pathogenesis of lung cancer. J Natl Cancer Inst 82, 1272-1279 (1990). Sellers, T.A., et al. Lung cancer detection and prevention: evidence for an interaction between smoking and genetic predisposition. Cancer Res 52, 2694s2697s (1992). Abdelhak, S., et al. A human homologue of the Drosophila eyes absent gene underlies branchio-oto-renal (BOR) syndrome and identifies a novel gene family. Nat Genet 15, 157-164 (1997). Abdelhak, S., et al. Clustering of mutations responsible for branchio-oto-renal (BOR) syndrome in the eyes absent homologous region (eyaHR) of EYA1. Hum Mol Genet 6, 2247-2255 (1997). Wayne, S., et al. Mutations in the transcriptional activator EYA4 cause late-onset deafness at the DFNA10 locus. Hum Mol Genet 10, 195-200 (2001). Pfister, M., et al. A 4-bp insertion in the eya-homologous region (eyaHR) of EYA4 causes hearing impairment in a Hungarian family linked to DFNA10. Mol Med 8, 607-611 (2002). Schonberger, J., et al. Mutation in the transcriptional coactivator EYA4 causes dilated cardiomyopathy and sensorineural hearing loss. Nat Genet 37, 418-422 (2005). Okabe, Y., Sano, T. & Nagata, S. Regulation of the innate immune response by threonine-phosphatase of Eyes absent. Nature 460, 520-524 (2009). Lonergan, K.M., et al. Identification of novel lung genes in bronchial epithelium by serial analysis of gene expression. Am J Respir Cell Mol Biol 35, 651-661 (2006). Chari, R., et al. SIGMA2: a system for the integrative genomic multi-dimensional analysis of cancer genomes, epigenomes, and transcriptomes. BMC Bioinformatics 9, 422 (2008). Ishkanian, A.S., et al. A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet 36, 299-303 (2004). Jong, K., Marchiori, E., Meijer, G., Vaart, A.V. & Ylstra, B. Breakpoint identification and smoothing of array comparative genomic hybridization data. Bioinformatics 20, 3636-3637 (2004). Barrett, T., et al. NCBI GEO: archive for high-throughput functional genomic data. Nucleic Acids Res 37, D885-890 (2009). Irizarry, R.A., et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249-264 (2003). 86  40. 41. 42. 43. 44. 45. 46.  Gentleman, R.C., et al. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol 5, R80 (2004). Schmittgen, T.D. & Livak, K.J. Analyzing real-time PCR data by the comparative C(T) method. Nat Protoc 3, 1101-1108 (2008). Shivapurkar, N., et al. Application of a methylation gene panel by quantitative PCR for lung cancers. Cancer Lett 247, 56-71 (2007). Shivapurkar, N., et al. Novel real-time PCR assay using a universal molecular marker for diagnosis of hematologic cancers. Int J Cancer 116, 656-660 (2005). Lonergan, K.M., et al. Transcriptome Profiles of Carcinoma-in-Situ and Invasive Non-Small Cell Lung Cancer as Revealed by SAGE. PLoS One (2010). Liu, P., et al. Familial aggregation of common sequence variants on 15q24-25.1 in lung cancer. J Natl Cancer Inst 100, 1326-1330 (2008). Higashi, H., et al. Down-regulation of Gadd45 expression is associated with tumor differentiation in non-small cell lung cancer. Anticancer Res 26, 2143-2147 (2006).  87  4.  Genetics and epigenetics contribute to AC and SqCC tumor phenotypes 3  4.1.  Introduction  Lung cancer is the leading cause of cancer-related deaths worldwide and despite current treatments, prognosis remains poor; with a five year survival of <15%1,2. Squamous cell carcinoma (SqCC) and adenocarcinoma (AC) are the predominant lung cancer subtypes and are traditionally regarded as a single disease entity in terms of systemic therapy2,3. However, these subtypes display distinct phenotypic characteristics probably related to differences in cell derivation, genetic alterations and pathogenetic pathways involved in their development4. These fundamental discrepancies in tumor biology may play a primary role in determining the poor outcomes of lung cancer patients, as biological differences that segregate with each subtype may also lead to variations in response to therapies5-7. Thus, distinguishing the key molecular mechanisms driving the development of each lung cancer subtype is needed to define more appropriate avenues for therapeutic intervention. The specific genes and cellular pathways responsible for the different phenotypes of SqCC and AC remain largely unknown. Initial gene expression profiling studies have yielded some insight into the tumor subtypes and are able to segregate tumors into histologic groupings based on multi-gene models8,9. However, since not all gene expression changes are causal to disease development, it is challenging to distinguish  3  A version of this chapter is being prepared for submission as a research manuscript. Lockwood WL and Wilson IM, Chari R, Coe BP, Yee J, English J, Murray N, Tsao MS, Minna JD, Gazdar AF, MacAulay CA, Lam S, Lam WL (2010) "Genetics and epigenetics contribute to AC and SqCC tumor phenotypes."  88  critical events from reactive changes through global gene expression profiles alone10. Gene expression changes corresponding with alteration at the DNA level, either through gene dosage or DNA methylation changes, are often regarded as evidence of causality. Such DNA level changes have previously been demonstrated to be critical deregulation events driving progression and other cancer phenotype 11-13. Hence, examining genetic and epigenetic events in conjunction with the gene expression aberrations should improve the identification of alterations causing lung cancer phenotypes. Although genetic differences between SqCC and AC have been described, most of these studies have relied on low resolution techniques and/or small sample sizes, limiting the ability to identify specific disruptions unique to each subtype14-17. Similarly, studies comparing epigenetic profiles of AC and SqCC tumors have been limited to the analysis of only a few genes at a time18,19. Recent advances in microarray technologies have increased our ability to understand the genomic mechanisms influencing tumorigenesis20. We performed a large-scale integrative analysis of 261 primary NSCLC tumors (169 AC and 92 SqCC), integrating DNA copy number and gene expression profiles to identify critical subtypespecific alterations. Additionally, we generated DNA methylation profiles for a subset of the 261 analyzed by copy number above. In total, 43 (30 AC and 13 SqCC) NSCLC tumors and 48 non-malignant specimens (30 parenchyma and 18 bronchial epithelia). We directly compared subtype alteration patterns in order to identify the critical DNA level alterations associated with either lung SqCC or lung AC, with any detected differences likely serving as the basis for differential behaviors in tumor biology and clinical outcomes. 89  4.2.  Results  4.2.1.  Assessment of global genomic instability in AC and SqCC  Carcinomas of all types are known to harbor many DNA-level alterations. Exposure to different carcinogens or different levels of carcinogens has been partly linked to these genomic disruptions21. Indeed, tobacco smoke has been linked to the induction of not only DNA mutations, but also broad chromosomal instability22. Based on the differing exposure to tobacco carcinogens of cells in the central airways (where SqCC tumors typically arise) and cells in the peripheral airways (where AC tumors typically arise), we sought first to determine whether global genomic instability was more prevalent in either of the two subtypes. To do so we generated and compared whole genome copy number profiles for 261 NSCLC tumors – 169 AC and 92 SqCC – by tiling-resolution array comparative genomic hybridization (CGH)23-26. After hybridization, the removal of systematic biases, and computational segmentation to identify regions of gain and loss, the number of gained, lost, and neutral clones was assessed for each profiled tumor. The relative genomic instability observed in AC and SqCC groups was then compared (Figure 4.1a). Comparison of the average number of gained and lost clones between groups was performed using the Mann-Whitney U-test. No significant differences between the two subtypes were found, a finding that is consistent with previous work that did not identify significant differences in nuclear DNA content across subtypes27. This analysis demonstrates that, overall, neither subtype has a particular proclivity for either gain or loss of DNA. As such, any observed differences in alteration frequency at a given locus can be more easily attributed to sub-type specific selection of alterations based on genomic location and genes included within. 90  4.2.2.  Identification of differential copy-number alteration patterns in AC and SqCC  Although no differences between subtypes exist in gain and loss numbers overall, if specific genetic pathways are involved in the development of SqCC and AC, we would expect to find differences in the location and frequency of alterations. In order to determine if specific genetic alterations unique to each NSCLC subtype exist, we aimed to identify recurrent, non-random regions of aberration in each group. Individual samples were grouped by their corresponding subtype and probes were aggregated into regions based on similar copy number status. The resulting frequency of alteration across all autosomes was determined and compared between subtypes using the Fisher’s exact test to identify regions of copy number disparity and the resulting pvalues were corrected for multiple comparisons with a cut-off of ≤ 0.01 considered significant. In addition, we required regions to be altered in >20% of samples from a subtype group and a difference between groups of >10% in order for them to be considered “of interest”. Figure 4.1b displays the frequency of both gain and loss across the entire genome for AC and SqCC and highlights the corresponding regions of difference that were identified. In sum, this analysis revealed 295 regions of significant copy number disparity between SqCC and AC, supporting our hypothesis that subtypes develop through dysregulation of different genetic pathways. 205 of the regions were SqCC-specific in their alteration pattern, whereas 89 where specific to AC. Although some of these regions overlapped between the two subtypes, the character of the alteration (i.e. gain versus loss) was specific to an individual group. Since these regions differed strongly in 91  their alteration status between the subtypes, we referred to these as subtype-specific copy number alterations. All together, these alterations covered approximately 550 Mbp of the genome, mapping to 34 of 39 autosomal chromosome arms, and ranging in size from large segments on chromosome arms (64.8 Mbp on 4q) to discrete peaks that were only kilobases in size (0.05 Mbp in multiple places).  4.2.3.  Subtype specific gene disruption in AC and SqCC  The discovery of regions of DNA copy number disparity between lung tumor subtypes suggests that genes within these areas might be responsible for the differential development and pathological characteristics between SqCC and AC. To validate the changes and identify the specific target genes of these alterations, we integrated DNA copy number and gene expression results (since the presumed downstream consequence of a DNA dosage alteration is that it alters cell phenotypes by impacting gene expression levels)28. For this purpose, gene expression profiles were generated for a subset of tumors that were analyzed by array CGH (20 SqCC and 29 AC tumors). We hypothesized that genes targeted by subtype-specific alterations would exhibit significantly different DNA dosage and gene expression levels between AC and SqCC. More specifically, we assumed that subtype-specific genes would be altered in the same direction at both the DNA and RNA level (i.e. gained/amplified genes would be over-expressed, while deleted genes would be under-expressed). Further, we performed our analysis on normal and tumor tissue to ensure that only alterations detected in tumor tissues remained as candidates (i.e. to confirm that candidate genes are not only differentially regulated between subtypes, but also differentially expressed relative to normal tissues). 92  The specific genes located within each subtype-specific copy number alteration were determined and the expression levels compared between the SqCC and AC samples to determine those that were differentially expressed (p≤0.001, corrected for 4669 and 2050 multiple comparisons). For SqCC, 4669 unique genes mapped to the subtypespecific copy number alterations representing an average of ~23 genes per region. 1109 (24%) of these were differentially expressed. In AC, 2050 unique genes were located in PSCNAs (~23 per region) and 225 (11%) were differentially expressed between subtypes. Subtype-specific genes were then filtered for those that matched the expression direction predicted from copy number status (based on the rationale described above). Using these strict criteria, 797 genes (17% of total) were maintained as candidates deregulated by subtype-specific copy number alteration in SqCC, while 171 (8% of total) were identified for AC. Although some genes overlapped, their disruption statuses were specific to the individual cancer type and are therefore referred to as subtype-specific targets. When combined, the SqCC and AC PSCNA regulated candidates represented 968 unique genes and showed a clear distinction in expression levels between the two subtypes. In addition to demonstrating a linkage between expression and copy number alteration, a candidate subtype-specific gene (regulated by copy number) was also required to only be deregulated in cancer tissues relative to normal tissue29. We analyzed the expression levels of candidates in an independent panel of 53 SqCC and 58 AC lung tumors and 67 samples of exfoliated bronchial cells from cancer-free individuals. In total, 655 of the 797 SqCC-specific and 143 of the 171 AC-specific genes had corresponding probes on this array platform. These genes were compared 93  between the respective cancer subtype and the normal bronchial cells in order to determine those that were significantly differentially expressed (p≤0.001) in the direction predicted by the corresponding subtype-specific copy number alteration in which they were located. This analysis revealed that 447 (68%) of the SqCC-specific and 71 (49%) of the AC-specific genes were deregulated in cancerous tissues. (In total, 492 unique gene alterations were uncovered between subtypes.) Since these genes met all three criteria of subtype-specific copy number alteration regulated targets described above, we determined that they are causal gene alterations driving the development of each subtype (Figure 4.2).  4.2.4.  Different gene networks are associated with the development of AC and SqCC  Cellular pathways and processes specifically disrupted in individual subtypes may reveal key oncogenic mechanisms driving the differential development of AC and SqCC. Thus, after identifying and validating the genes responsible for the differences between the subtypes, we next wanted to investigate their biological functions. To discover subtype-related networks of biologically-related genes we performed Ingenuity Pathway Analysis (IPA) of the 71 AC and 447 SqCC specific target genes. This analysis revealed two main gene networks with overlapping biological functions for each individual subtype (Figure 4.3). SqCCs exhibited disruptions in gene networks that function in regulating DNA replication, recombination and repair, with additional roles in lymphoid tissue structure and development. Genes involved in SqCC network 1 were associated with the binding and modification of histone protein H4, as well as the regulation of the NFKB complex (Figure 4.3b). SqCC network 2 involved genes that 94  control development of the nervous system and endocrine system. Meanwhile, the primary networks in AC displayed functions associated with cell-to-cell signaling, development, and drug metabolism. The main AC-specific gene network was composed primarily of genes regulated by the transcription factor HNF4α (Figure 4.3a), whereas AC network 2 contained numerous genes controlled by TGFβ and TP53. The independent nature of the gene networks implicated in AC and SqCC was further suggestive of distinct methods of tumorigenesis for the subtypes.  4.2.5.  Global subtype differences in DNA methylation levels  Unlike the genome, which is identical for most normal cells in the body, the epigenome is known to differ between tissue types30,31. Additionally, the genomes of cancerous cells are known to exhibit global hypomethylation to varying degrees depending on the tissue of origin (and sometimes within the tissue of origin)32. DNA methylation profiles are also influenced by mutational profiles within different cancer types as DNA hyper- and hypomethylation alterations are also known to be related to tissue and genetic background33 as well as smoking behavior34. Given the differing mutational spectra of the two NSCLC subtypes and their likely differing cells of origin, we investigated the overall DNA methylation level of 30 AC and 13 SqCC samples using the Illumina Infinium HumanMethylation 27 chip, which samples the DNA methylation level of over 27,000 CpG loci simultaneously. To enable future comparisons of SqCC and AC tumors to appropriate matched normal cells, DNA methylation profiles were also generated for 30 non-malignant lung tissue samples (as reference for AC) and 18 histologically normal exfoliated bronchial epithelial cell samples (as reference for SqCC) from patients with NSCLC. Analysis of CpG 95  dinucleotides located within CpG islands indicated that DNA methylation in the bronchial epithelia and the SqCC tumors was slightly lower than in the normal lung or AC tumors (Figure 4.4a). This finding is mirrored and exaggerated in the CpG dinucleotides not located within CpG islands as well, suggesting that the bronchial epithelia from the central airway are globally hypomethylated relative to the cells of the peripheral airways, whether cancerous or not. In this case the averages of the two groups are significantly different compared using a Mann-Whitney U test (p≤0.0001). To determine whether these differences were evident in tumor-specific epigenetic alterations, (i.e. those that exist within the tumor subtype when compared to an appropriate normal cell), we compared the differential methylation profiles of AC and SqCC tumors. These profiles were generated by subtracting the average normal DNA methylation profile for the 30 histologically normal lung parenchyma samples and the 18 bronchial epithelia samples from each of the 30 AC and 13 SqCC tumors respectively. In a finding that was similar to what was seen with the global assessment of copy number alterations (gain and loss), there were no significant differences between the AC differential profiles and the SqCC differential profiles (Figure 4.4c) in CpG island probes or non-CpG island probes (Figure 4.4d). Based on this observation, we again reasoned that any observed differences in hypermethylation or hypomethylation frequencies between AC and SqCC are likely to be due to subtype-specific selection of these alterations.  96  4.2.6.  Subtype-specific epigenetic alterations in AC and SqCC subtypes  Although no differences in global methylation changes were observed between subtypes, it is still possible that the two subtypes can possess differential alteration frequencies at individual loci. To determine whether AC and SqCC tumors possess significant differences in their DNA methylation patterns, we examined the frequency of DNA methylation alteration at each CpG site in both subtypes. As above, tumor DNA methylation levels were compared to the average of available normal reference tissue profiles. The frequency of probe hypermethylation and hypomethylation (|Δβ tumor normal |≥ 0.15) in AC and SqCC samples was compared using Fisher's exact test. Following correction for multiple comparisons, 2708 probes corresponding to 2384 genes were found to be significantly differentially methylated (p ≤0.05). The number of aberrantly methylated probes was not equally distributed between the two subtypes or between the two potential alterations (hyper- or hypomethylation). The SqCC group contained markedly more recurrently hyper- and hypomethylated loci than the AC group. This result was similar to the disparity in the numbers of subtype-specific copy number-regulated genes observed in the analysis of genomic alterations. In fact, only 8% of the 2708 significant probes were more frequently altered in AC, the rest being more commonly hyper- or hypomethylated in SqCC. To further refine the list of differentially methylated genes to those whose gene expression reflects levels expected based on their epigenetic alteration, we assessed the 2384 genes with differential methylation for differential expression between the subtypes, as well as differential expression from normal tissues, using a corrected p97  value threshold of p≤0.05. 32 AC candidate genes and 297 SqCC genes met these stringent criteria and were further analyzed as subtype-specific epigenetically regulated genes.  4.2.7.  Epigenetically regulated genes complement genetically regulated genes  To determine whether the 32 AC-specific and 297 SqCC-specific epigeneticallyregulated genes carried out functions similar to those subtype-specific genes discovered by the DNA copy number analysis described above (Section 4.2.4), pathway disruption analysis was performed. This revealed that the most significant epigenetically-regulated gene network in AC is involved in cell cycle, cell death, and cellular development. This is partly in contrast to the top AC network of copy number regulated genes which similarly have functions associated with tissue development, but also possess cell signaling and hematological system function in common. The overall degree of similarity between AC-specific genes that are genetically or epigenetically regulated is quite small. In contrast, the SqCC gene networks in both analyses are very similar. For example, DNA replication, recombination and repair are highly featured functions of genes identified both by DNA copy number and DNA methylation analyses of SqCC. Additionally, genes involved in immunological disease and lymphoid tissue structure and development were prominent. Of particular interest was the enrichment of aberrantly methylated genes in the small cell lung cancer signaling pathway (Figure 4.5a). This was the most significantly enriched canonical pathway in either subtype that was affected by DNA methylation alterations and it is of interest because both of these lung cancers arise in the central airways with similar exposure to cigarette smoke 98  carcinogens. E2F1 is among the hypomethylated and overexpressed genes represented in this pathway. This gene is known to be overexpressed in SCLC and to drive expression of EZH2, which is also overexpressed in SCLC35,36. To explore this pathway further, we investigated whether EZH2 was more highly expressed in SqCC than AC tumors, which is expected as a consequence of differential E2F1 expression. As expected, we found that EZH2 was expressed at a significantly higher level in SqCC tumors than AC tumors (Figure 4.5b). The differential expression of EZH2 in the two subtypes is significant, given the numerous differences in aberrant DNA methylation observed between the two - a process known to be linked to the polycomb group37. DNA copy number and DNA methylation data are complementary from a genespecific perspective as well. This is highlighted by 7 genes which are disrupted by gene-dosage in one subtype and DNA methylation in the other. PARP11 is one such example of differential activation/inactivation by DNA copy number and DNA methylation.  4.2.8.  Concerted genetic and epigenetic disruption of subtypespecific genes  In order to determine which genes, if any, were simultaneously disrupted by both DNA copy number and DNA methylation aberrations, we combined the subtype-specific gene lists derived using the two analytical approaches described above. Combining the 71 AC genes identified through their association with DNA copy number alteration and the 32 genes associated with DNA methylation aberrations did not yield any overlapping genes. This result was unsurprising given the observed lack of similarity at the level of function/network analysis. In SqCC however, combining the 447 copy-number 99  associated genes with the 297 DNA methylation genes yielded overlap of 38 genes. It is important to note that this may simply be due to the much larger number of detected alterations in the SqCC tumors, further work will be required to validate this. These genes exhibit frequent genetic, epigenetic, and subsequent gene-expression alterations that discriminate them from AC tumors. Notably, the well known 3p tumor suppressor gene (TSG) FHIT was among these genes. The differential methylation levels of FHIT are shown in Figure 4.5c. Loss of FHIT expression is associated with smoking and is more frequent in SqCC tumors than AC tumors, consistent with our data38-40. Hypermethylation of this gene has also been investigated as a potential biomarker for centrally-occurring lung cancers41. FHIT hypermethylation is also known to correspond with poor survival in lung cancer42.  4.2.9.  Subtype-specific genes are associated with distinct clinical characteristics  Next, we aimed to determine the influence of the diametrically-altered subtypespecific genes on the clinical characteristics of AC and SqCC. To test this, we determined the survival associations for the 26 genes displaying opposite genomic alteration patterns using a Mantel-Cox log rank test in both AC and SqCC tumors. Genes with expression that correlated with survival in both datasets (in the direction anticipated) were then sought. This identified only one gene candidate, MAPK1 which is lost in AC and gained in SqCC. More specifically, low and high MAPK1 expression was associated with survival in AC and SqCC tumors, respectively (Figure 4.6a and Figure 4.6b). No genes altered in AC and SqCC in opposite directions by DNA  100  methylation were identified, so this analysis was not undertaken for epigenetically regulated subtype-specific genes.  4.3.  Discussion  Previous studies suggest that distinct patterns of DNA alteration exist for AC and SqCC; however, the specific genes responsible for the different tumor phenotypes are largely unknown. In this study, we provide the first comprehensive investigation for the causal genetic and epigenetic alterations distinguishing AC and SqCC. We achieved this by integrating whole-genome expression, DNA copy number, and DNA methylation data. Our analysis revealed that NSCLC subtypes are distinct at both the genomic and epigenomic level. Further investigation identified genes altered in a subtype-specific manner. In addition, we discovered distinct gene networks associated with each lung cancer subtype, flagging distinct signaling pathways as contributing to tumorigenesis. We also found multiple subtype-type specific changes to be correlated with multiple clinical outcomes. The 259 subtype-specific copy number alterations detected in this study gave a general picture of the genetic pathways involved in the development of AC and SqCC. The subtype-specific copy number alterations are consistent with those indentified by previous conventional and array CGH studies which compared the two subtypes, but included additional regions14-17,43-46. Many specific regions were altered at higher frequency in either SqCC or AC, suggesting their importance in a single lung cancer subtype. Indeed, numerous genes which have previously been implicated in NSCLC tumorigenesis, prognosis and response to chemotherapy were preferentially disrupted in a specific subtype, a result with significant clinical implications. For example, 101  previously identified oncogenes such as NOTCH3 and FOXM1 were overexpressed through increased gene dosage specifically in SqCC while the tumor suppressors KEAP1 was deleted and underexpressed specifically in AC47. This is the first report suggesting that these previously established lung cancer-associated genes may actually be involved in subtype-specific tumorigenesis. It is also significant that numerous regions showed a completely opposite pattern of alteration depending on the lung cancer subtypes, with one having frequent gain and the other frequent loss. For example, a discrete alteration spanning 2.4 Mbp on chromosome bands 8p12-p11.23 was commonly gained in SqCC and lost in AC, implying that genes in these regions may play completely different roles during the development of the individual NSCLC subtypes, acting as tumor suppressor genes in AC and as oncogenes in SqCC. Such diametric alteration is seen when including epigenomic alterations as well. This is the case for PARP11, which is upregulated in SqCC by copy number gain and downregulated in AC by DNA hypermethylation. This information will become particularly important as targeted therapeutic strategies based around these genes develop. The development of MEK inhibitors highlights this point48: since activated MEK1 and MEK2 phosphorylate and activate ERK (MAPK1), the differential deregulation of MAPK1 in AC and SqCC tumors may be an important consideration in determining the efficacy of this treatment against lung cancer subtypes49. Similarly, numerous studies have aimed to identify genes associated with prognosis in NSCLC in order to better determine patient outcome50. Our data suggest that these relationships may be subtype-specific as well. For example, consistent with the identification of MAPK1 as a subtype specific gene that is activated in SqCC and inactivated in AC, low MAPK1 expression is associated with poor survival in AC (Figure 102  4.6a), while high MAPK1 survival is associated with poor prognosis in SqCC (Figure 4.6b). A broader gene network-based analysis of the copy number-regulated genes revealed additional insights into the oncogenic mechanisms driving the differential development of AC and SqCC. The top SqCC gene network was mainly associated with DNA replication, recombination and repair. In addition to these functions, histone modification genes were represented as well. Histones are fundamental building blocks of eukaryotic chromatin and are involved in myriad cellular processes, including replication, repair, recombination and chromosome segregation, as well as transcriptional regulation and maintenance of genomic integrity51,52. Recently, global alterations of histone modification patterns have been reported in human cancers 53. Our data suggest that direct deregulation of histone modification enzymes including ASF1B, PRMT1, SAE1, SET8, CHAF1A and UHRF1 may drive this phenomenon and play a key role during the development of lung SqCC. Since histone modifications also play an essential role in DNA replication54, there may be a synergistic effect between the histone modifying genes and replication/recombination associated genes that contributes to tumor development. Interestingly, histone modification alterations have been observed to occur more frequently in lung SqCC than AC, a result lending further credence to our findings55. The gene network detected as perturbed in AC subtype tumors contained genes mainly involved in regulating tissue development and cell-to-cell signaling and known to be targeted by the transcription factor HNF4α. HNF4α regulates a large set of genes in a cell-specific manner and is necessary for cell differentiation during normal embryonic development and maintenance of a differentiated epithelial phenotype in adults56.  103  Deregulation of HNF4α has been documented for other carcinomas where loss of expression leads to increased cellular proliferation, progression and dedifferentiation5761  . This data suggests that HNF4α may act as a tumor suppressor in epithelial  carcinogenesis56. Interestingly, although HNF4α was not affected, we found that numerous downstream targets of this gene are downregulated specifically in AC. Thus, this may have the same net affect as inactivation of HNF4α itself and lead to increased cellular proliferation during AC tumorigenesis. Concerted alterations to gene networks and pathways are not a feature that is limited to copy-number regulated genes. Indeed, we found that coupling subtypespecific DNA methylation profiles with matched gene expression alterations implicated numerous canonical signaling pathways in the development of SqCC and AC tumors. The significant enrichment of the small-cell lung cancer signaling pathway within the epigenetically altered SqCC genes was of particular interest. Transcription factor E2F1 was found to exhibit SqCC-specific hypomethylation and overexpression. E2F1 is known to be upregulated in SCLC tumors36 and its upregulation is also known to induce expression of EZH2, an oncogenic polycomb histone-methyltransferase (EZH2 upregulation results in evasion of apoptosis35). The relevance of this pathway to SqCC tumors is strengthened by our observation that EZH2 expression is significantly higher in SqCC than AC (Figure 4.5). This is particularly interesting given the potential dual role of EZH2 in different cancer types62. For example, it has been found to be mutated and inactivated in lymphomas, while we have observed it highly overexpressed in lung carcinomas62. The disruption of the polycomb group (preferentially in SqCC) is of particular interest because we have also identified SqCC-specific deregulation of numerous histone-modifying enzymes at the copy-number level. 104  In addition to the deregulation of histone modifying genes by gene-specific DNA copy number and DNA methylation alterations, we have uncovered evidence of global SqCC-specific epigenetic disruption. Our analysis of global DNA methylation changes in AC and SqCC tumors showed that SqCC tumors were more hypomethylated overall (Figure 4.4). There is precedent for this finding, as altered global methylation is thought to be a consequence of exposure to the carcinogens found in tobacco smoke33,34,63. Global hypomethylation, such as that caused by cigarette smoke, is also known to be associated with chromosomal instability. Although we did not observe any difference in the percentage of AC or SqCC genomes that were altered by copy number, we did identify a greater number of recurrent (as opposed to random) copy number alterations in the SqCC subtype. This may be indicative of a higher level of instability in the SqCC tumor genome, a process that may facilitate the selection of recurrent alterations. As referenced above, the accrual of and selection for specific alterations is likely to play a role not only in the development of cancer, but also the clinical management of disease (i.e. in drug response, overall survival, etc.). Indeed, genes known to influence NSCLC response to conventional chemotherapy were also deregulated in a subtypespecific manner. For example, the finding that ERCC1 disruption was subtype-specific is particularly significant. ERCC1 is a nucleotide excision repair gene which functions in repairing DNA adducts and lesions induced by smoking-related carcinogens64. As such, low expression levels of ERCC1 have been implicated in lung cancer susceptibility65 and tumorigenesis, whereas high expression levels are associated with favorable overall prognosis 64. However, since ERCC1 is also involved in the repair mechanism of cisplatin-induced DNA adducts in cancer cells, high expression levels lead to increased resistance to platinum-based chemotherapies66,67. Low expression, on the 105  other hand, leads to sensitivity to these drugs68. Underscoring the relevance of this finding are the results of recent clinical trials which have described a significantly better outcome for patients who received adjuvant cisplatin-based combination chemotherapy if their resected tumors expressed low levels of ERCC164,66. Our finding that this gene is inactivated specifically in AC tumors has major clinical consequences in terms of guiding disease management and treatment strategies in order to define appropriate treatment regimens for patients. This is consistent with a previous report demonstrating the subtype specificity of ERCC1 expression levels in NSCLC, and further highlights how biological differences between AC and SqCC may influence patient response to therapy69. Concerted DNA copy number and DNA methylation alterations yield insight into tumor biology as well. We show hypermethylation and deletion of FHIT to be a SqCCspecific event, confirming earlier studies describing inactivation of the gene at a higher frequency in SqCC than AC tumors38,42,70,71. While there were relatively few genes that were simultaneously activated/inactivated in SqCC by DNA copy number and methylation alterations (32), there was no overlap seen in AC. In fact, compared to SqCC tumors, AC tumors possessed fewer subtype-specific alterations linked to both DNA copy number and DNA methylation. It is not clear why this is the case, given the larger sample sets to draw from for each assay, but it is possible that AC tumors have higher levels of cellular and/or genetic heterogeneity than SqCC tumors. Heterogeneity of patient clinical-characteristics may also contribute to this, as non-smokers are more likely to get AC tumors, and cigarette smoke may play a role in contributing to specific genetic or epigenetic alterations48,72.  106  In conclusion, high resolution integrative analysis of NSCLC genomes and epigenomes delineated novel tumor subtype-specific genetic and epigenetic alterations responsible for driving the differential development and resulting phenotypes of AC and SqCC. For example, we identify, for the first time, subtype-specific inactivation of the tumor suppressor gene KEAP1 in AC tumors. The specific genes and networks identified in this study provide essential starting points for clarifying mechanisms of tumor differentiation and developing tailored therapeutics for lung cancer treatment. More generally, our results confirm at the molecular level that these lung cancer subtypes are distinct disease entities and should be studied separately when designing treatment strategies and testing new drugs in clinical trials.  4.4.  Materials and methods  4.4.1.  DNA samples  Formalin-fixed, paraffin embedded and fresh-frozen tissues were collected from St. Paul’s Hospital, Vancouver General Hospital and Princess Margaret Hospital following approval by the Research Ethics Boards. Hematoxylin and eosin stained sections for each sample were graded by a lung pathologist for use in selecting regions for microdissection. DNA was isolated using standard procedure with proteinase K digestion followed by phenol-chloroform extraction as previously described73. Patient demographics are located in Table 4.1.  4.4.2.  Tiling path array comparative genomic hybridization  Array hybridization was performed as previously described26,74,75. Briefly, equal amounts (200-400 ng) of sample and single male reference genomic DNA were 107  differentially labelled and hybridized to SMRT array v.2 (BCCRC Array Laboratory, Vancouver, BC), which is previously described to give optimal genome coverage23,76. Hybridized arrays were imaged using a charge-coupled device (CCD) camera system and analyzed using SoftWoRx Tracker Spot Analysis software (Applied Precision, Issaquah, WA). Systematic biases were removed from all array data files using a stepwise normalization procedure as previously described 25,28. SeeGH software was used to combine replicates and visualize all data as log2 ratio plots77,78. Stringently, all replicate spots with a standard deviation above 0.075 or signal to noise ratios below three were removed from further analysis. The clones were then positioned based on the human March 2006 (hg18) genome assembly. Genomic imbalances (gains and losses) within each sample were identified using aCGHSmooth24 with lambda and breakpoint per chromosome settings at 6.75 and 100, respectively (as previously described)26. The resulting frequency of alteration was then determined for each lung cancer cell type as described previously26.  4.4.3.  DNA methylation analysis  For 30 AC samples, 30 patient-matched non-malignant lung samples, 13 SqCC samples and 18 non-patient matched bronchial epithelia samples, DNA methylation profiling was performed using the Illumina HumanMethylation27 chip. Five hundred nanograms of DNA from each sample were analyzed by this technology. Normalized βvalues were obtained and only those with a detection p-value of ≤0.05 were used. When comparing tumor samples (AC/SqCC) and normal non-malignant samples (AC non-malignant parenchyma and bronchial epithelia), probes were deemed aberrantly  108  methylated if the absolute difference between tumor and the average of the appropriate normal samples was ≥0.15.  4.4.4.  Comparison of subtype alteration frequencies  Regions of differential copy number alteration between AC and SqCC genomes were identified as follows. Each array element was scored as 1 (gain/amplification), 0 (neutral/retention), or -1 (loss/deleted) for each individual sample. Values for elements filtered based on quality control criteria were inferred by using neighbouring clones within 10 Mb. Probes were then aggregated into genomic regions if the similarity in copy number status between adjacent clones was at least 90% across all samples from the same subtype. The occurrence of copy number gain/amplification, loss/deletion, and retention at each locus was then compared between AC and SqCC data sets using the Fisher’s exact test. Testing was performed using the R statistical computing environment on a 3 x 2 contingency table as previously described, generating a p-value for each clone26. A Benjamini-Hochberg multiple hypothesis testing correction based on the number of distinct regions was applied and resulting p-values ≤0.01 were considered significant. Adjacent regions within 1 Mb which matched both the direction of copy number difference and statistical significance were then merged. Finally, regions had to be altered in >20% of samples in a group and the difference between groups >10% to be considered. A similar approach was used for determining subtype-specific DNA methylation alterations. Frequencies of hypermethylation and hypomethylation were compared using a Fisher's exact test, followed by a Benjamini-Hochberg multiple testing  109  correction. A corrected p-value cut-off of p<0.05 was used to deem a probe differentially methylated between the two groups.  4.4.5.  Gene expression microarray analysis  Fresh-frozen lung tumors were obtained from Vancouver General Hospital as described above. Microdissection of tumor cells was performed and total RNA was isolated using RNeasy Mini Kits (Qiagen Inc., Mississauga, ON). Samples were labeled and hybridized to a custom Affymetrix microarray according to the manufacture’s protocols (Affymetrix Inc., Santa Clara, CA). In addition, RNA was obtained from exfoliated bronchial cells of lung cancer free individuals obtained during fluorescence bronchoscopy72. All individuals were either current or former smokers. Expression profiles were generated for all these cases using the Affymetrix U133 Plus 2 platform (Affymetrix Inc., Santa Clara, CA). All data was normalized using the Robust Multichip Average (RMA) algorithm in79. In addition, a publically available dataset downloaded from the Gene Expression Omnibus was used: Affymetrix U133 Plus 2 expression data was downloaded for accession number GSE314180.  4.4.6.  Statistical analysis of gene expression data  Gene expression probes were mapped to March 2006 (hg18) genomic coordinates and those within the regions of copy number difference between the subtypes were determined. Comparisons between expression levels for AC and SqCC tumors were performed using the Mann-Whitney U test and computed with the ranksum function in Matlab. As the direction of gene expression difference was predicted to match the direction of copy number difference, one tailed p-values were calculated. A Benjamini110  Hochberg multiple hypothesis testing correction was applied based on the total number of gene expression probes analyzed for each region. Probes with a corrected p-value ≤ 0.001 were considered significant. If multiple probes mapped to the same gene, the one with the lowest p-value was used. Resulting genes were then mapped to the corresponding probes on the Affymetrix U133 Plus 2 array in order to compare their expression in a second set of NSCLC tumors (GSE3141 above) against normal bronchial epithelial cells. If multiple probes were present for a gene, the one with the strongest p-value was used. All comparisons were performed using a one-tailed t-test with unequal variances in Excel and genes with a p<0.001 were considered significant. The fold-change for tumors versus normal tissues was then determined in order to determine genes expressed in the direction predicted by copy number.  4.4.7.  Survival analysis  Survival analysis was performed using the statistical toolbox in Matlab. Expression data for each gene were sorted and survival times were compared between the top 1/3 and bottom 1/3 in expression using a publicly available gene expression microarray dataset with survival data. Two tailed p-values were generated using a Mantel-Cox log test and those < 0.05 were considered significant. Kaplan-Meier plots were then generated for each gene of interest.  4.4.8.  Network identification  Functional identification of gene networks and canonical signalling pathways was performed using Ingenuity Pathway Analysis program (Ingenuity® Systems, www.ingenuity.com). AC and SqCC specific gene lists were imported as individual 111  experiments using the Core Analysis tool. The analysis was performed using Ingenuity Knowledge Database with the Affymetrix U133 Plus 2 platform as the reference set and was limited to direct and indirect relationships.  112  Figure 4.1 - Copy number alterations in AC and SqCC Copy number alterations in AC and SqCC a) Alteration frequencies for 169 AC (red) and 92 SqCC (blue) tumors are displayed across the entire human genome. Solid vertical black lines represent chromosome boundaries whereas the dotted black lines represent chromosome arm boundaries. The frequency of copy number gain is denoted in the top panel. Note the high frequency of 3q gain in the SqCC subtype, consistent with previous reports. Additional regions of copy-number difference are also clear, such as the more common gain of chromosome 2p in AC. b) The second panel (middle) shows the frequency of copy number loss. Common tumour suppressor gene loci such as chromosome 3p are common between AC and SqCC, but large differences exist in regions such as chromosome 4q. c) The significance of copy number disparity (inverse p-value) between AC and SqCC subtypes is depicted in the third (bottom) panel. Solid black lines represent regions considered statistically different (p ≤0.01) whereas grey lines are not.  113  114  Figure 4.2 - Differential expression as a result of subtype specific copy number alterations Transformed absolute expression data for the 492 unique genes exhibiting disruption in expression levels as a result of copy number differences are displayed. In addition, these genes are up or down-regulated in the subtype which they are disrupted compared to normal lung tissue (see results). High-level expression is indicated by red while black indicates progressively lower levels of expression. The AC samples are indicated by red highlighting on the top of each column, while SqCC samples are indicated by blue highlighting. Each gene is sorted according to its chromosomal position. There is a clear distinction in the expression of these genes indicating their specific involvement in the subtypes.  115  116  Figure 4.3 – Gene networks involved in the development of SqCC and AC Ingenuity Pathway Analysis was used to identify biologically related networks from the subtype specific genes deregulated by subtype-specific copy number alterations (see Methods). The top resultant gene networks for each subtype are displayed. a) AC network #1 of genes related to HNF4 signaling. b) SqCC network #1 displaying potential interactions between multiple histone regulating genes For both a) and b), solid lines denote direct interactions while dotted lines represent indirect interactions between the genes. Network components highlighted in red are upregulated in the corresponding subtype whereas those highlighted green are downregulated. Those not highlighted are used by the software to display relationships. Additional information about the genes and their interactions can be found at www.ingenuity.com. or within the discussion. In this diagram molecules are represented as such; corkscrews represent enzymes, y-shaped molecules are transmembrane receptors, thimble-shaped molecules are transporters, kinases are triangular, and circular molecules encompass all other gene products.  117  118  Figure 4.4 - Global DNA methylation patterns of NSCLC tumors and associated normal tissues Comparison of average DNA methylation levels between AC tumor, AC normal (histologically normal lung parenchyma), SqCC tumor, and bronchial epithelia. a) CpG island probe averages. The average of each of the profiles at probes located within CpG islands is plotted as a component within the box plot. In this panel, SqCC and bronchial epithelia samples appear to have slightly lower DNA methylation levels than the AC tumor and AC normal groups. b) Non-CpG island probe averages. The average of each of the profiles at probes not located within CpG islands is plotted as a component within the box plot. In this figure β-value is the level of methylation as defined by the methylated signal/total signal for each probe. In this panel, SqCC and bronchial epithelia are significantly lower in methylation level compared to the AC tumor or AC normal groups, indicating that outside of CpG islands, where the bulk of genomic methylation occurs, the central airway samples are more hypomethylated. c) Average differential methylation levels at CpG islands. The average differential is plotted for the 30 AC samples and the 13 SqCC samples. The two groups are very similar in their differential profile within CpG island probes. d) Average differential methylation levels at CpG sites not located within CpG islands. In this plots the average differential methylation level is plotted for probes that are not located within CpG islands. The two groups are again not significantly different by a Mann-Whitney U test.  119  120  Figure 4.5 - SCLC signaling is significantly enriched in epigenetically altered SqCC genes a) SCLC signaling components altered by DNA methylation in SqCC. In this schematic of the SCLC signaling pathway, genes that are hypomethylated and overexpressed are shown in red, and those that are hypermethylated and underexpressed are shown in green. Corkscrew shapes represent enzymes, triangular molecules are kinases, and pinched ovals are transcription factors. Components at all levels of the pathway are affected, including the transcription factor E2F1, which drives the expression of the oncogenic polycomb group member EZH2. b) EZH2 expression in 58 AC tumors and 53 SqCC tumors. EZH2 expression was assessed in an external dataset, and it was found to be higher, as predicted, in SqCC tumors compared to AC tumors using a Mann-Whitney U test (p<0.0001). c) FHIT differential methylation levels in SqCC and AC tumors. FHIT was shown to be deregulated by both deletion and hypermethylation in a manner that was specific to SqCC tumors. Show here are the differential DNA methylation levels for 30 AC tumors and 13 SqCC tumors. The SqCC tumors are hypermethylated to a much higher degree than the AC tumors, consistent with previous published findings.  121  122  Figure 4.6 - MAPK1 alteration and survival is different in AC and SqCC tumors a) Low MAPK1 levels are associated with poor prognosis in AC. The prognostic value of MAPK1 expression levels was evaluated in 58 AC tumors. Survival of the 1/3 lowest MAPK1 expressers is shown in red, and the top 1/3 is shown in blue. In this case, low expression of MAPK1 is significantly associated with poor prognosis when using a Mantel-Cox log test (p <0.005). This is relevant given that MAPK1 is deleted and underexpressed in AC tumors. b) High MAPK1 levels are associated with poor prognosis in SqCC. The prognostic value of MAPK1 expression levels was evaluated in 53 SqCC tumors. Survival of the 1/3 lowest MAPK1 expressers is shown in red, and the top 1/3 is shown in blue. In this case, high expression of MAPK1 is significantly associated with poor prognosis when using a Mantel-Cox log test (p <0.005). MAPK1 is gained and overexpressed in SqCC tumors. c) MAPK1 expression in 58 AC and 53 SqCC tumors. The expression of MAPK1 was assessed in 58 AC and 53 SqCC tumors, and compared. In this validation set, as predicted and shown in own data set, the expression of MAPK1 is higher in SqCC than in AC tumors.  123  124  Table 4.1 - Sample set clinical characteristics  Stage  Sex  Smoking Status  I IA IB II IIA IIB III IIIA IIIB IV n/a Female Male Current smoker Ex-smoker n/a Non-smoker  AC SqCC 23 15 23 3 30 14 12 15 8 4 20 13 0 0 14 8 8 6 27 10 4 4 106 26 63 66 48 80 3 38  30 61 1  125  4.5. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.  References Parkin, D.M., Bray, F., Ferlay, J. & Pisani, P. Global cancer statistics, 2002. CA Cancer J Clin 55, 74-108 (2005). Sato, M., Shames, D.S., Gazdar, A.F. & Minna, J.D. A translational view of the molecular pathogenesis of lung cancer. J Thorac Oncol 2, 327-343 (2007). Travis, W.D. Pathology of lung cancer. Clin Chest Med 23, 65-81, viii (2002). Giangreco, A., Groot, K.R. & Janes, S.M. Lung cancer and lung stem cells: strange bedfellows? Am J Respir Crit Care Med 175, 547-553 (2007). Broet, P., et al. Prediction of clinical outcome in multiple lung cancer cohorts by integrative genomics: implications for chemotherapy selection. Cancer Res 69, 1055-1062 (2009). Garraway, L.A. & Sellers, W.R. Lineage dependency and lineage-survival oncogenes in human cancer. Nat Rev Cancer 6, 593-602 (2006). Scagliotti, G., et al. The differential efficacy of pemetrexed according to NSCLC histology: a review of two Phase III studies. Oncologist 14, 253-263 (2009). Bhattacharjee, A., et al. Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA 98, 13790-13795. (2001). Thomas, R.K., Weir, B. & Meyerson, M. Genomic approaches to lung cancer. Clin Cancer Res 12, 4384s-4391s (2006). Coe, B.P., Chari, R., Lockwood, W.W. & Lam, W.L. Evolving strategies for global gene expression analysis of cancer. J Cell Physiol 217, 590-597 (2008). Feinberg, A.P. & Tycko, B. The history of cancer epigenetics. Nat Rev Cancer 4, 143-153 (2004). Hyman, E., et al. Impact of DNA amplification on gene expression patterns in breast cancer. Cancer Res 62, 6240-6245 (2002). Pollack, J.R., et al. Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc Natl Acad Sci USA 99, 12963-12968 (2002). Luk, C., Tsao, M.S., Bayani, J., Shepherd, F. & Squire, J.A. Molecular cytogenetic analysis of non-small cell lung carcinoma by spectral karyotyping and comparative genomic hybridization. Cancer Genet Cytogenet 125, 87-99 (2001). Pei, J., et al. Genomic imbalances in human lung adenocarcinomas and squamous cell carcinomas. Genes Chromosomes Cancer 31, 282-287 (2001). Sy, S.M., et al. Distinct patterns of genetic alterations in adenocarcinoma and squamous cell carcinoma of the lung. Eur J Cancer 40, 1082-1094 (2004). Tonon, G., et al. High-resolution genomic profiles of human lung cancer. Proc Natl Acad Sci U S A 102, 9625-9630 (2005). Toyooka, S., et al. Smoke exposure, histologic type and geography-related differences in the methylation profiles of non-small cell lung cancer. Int J Cancer 103, 153-160 (2003). Toyooka, S., et al. DNA methylation profiles of lung tumors. Mol Cancer Ther 1, 61-67 (2001). Lockwood, W.W., Chari, R., Chi, B. & Lam, W.L. Recent advances in array comparative genomic hybridization technologies and their applications in human genetics. Eur J Hum Genet 14, 139-148 (2006). 126  21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41.  Sozzi, G., et al. Association between cigarette smoking and FHIT gene alterations in lung cancer. Cancer Res 57, 2121-2123 (1997). Bardelli, A., et al. Carcinogen-specific induction of genetic instability. Proc Natl Acad Sci U S A 98, 5770-5775 (2001). Ishkanian, A.S., et al. A tiling resolution DNA microarray with complete coverage of the human genome. Nat Genet 36, 299-303 (2004). Jong, K., Marchiori, E., Meijer, G., Vaart, A.V. & Ylstra, B. Breakpoint identification and smoothing of array comparative genomic hybridization data. Bioinformatics 20, 3636-3637 (2004). Khojasteh, M., Lam, W.L., Ward, R.K. & MacAulay, C. A stepwise framework for the normalization of array CGH data. BMC Bioinformatics 6, 274 (2005). Coe, B.P., et al. Differential disruption of cell cycle pathways in small cell and non-small cell lung cancer. Br J Cancer 94, 1927-1935 (2006). Petersen, I., et al. Core classification of lung cancer: correlating nuclear size and mitoses with ploidy and clinicopathological parameters. Lung Cancer 65, 312-318 (2009). Lockwood, W.W., et al. DNA amplification is a ubiquitous mechanism of oncogene activation in lung and other cancers. Oncogene 27, 4615-4624 (2008). Croce, C.M. Oncogenes and cancer. N Engl J Med 358, 502-511 (2008). Doi, A., et al. Differential methylation of tissue- and cancer-specific CpG island shores distinguishes human induced pluripotent stem cells, embryonic stem cells and fibroblasts. Nat Genet 41, 1350-1353 (2009). Choi, S.H., et al. Changes in DNA methylation of tandem DNA repeats are different from interspersed repeats in cancer. Int J Cancer 125, 723-729 (2009). Hoffmann, M.J. & Schulz, W.A. Causes and consequences of DNA hypomethylation in human cancer. Biochem Cell Biol 83, 296-321 (2005). Yang, H.H., et al. Influence of genetic background and tissue types on global DNA methylation patterns. PLoS One 5, e9355. Vaissiere, T., et al. Quantitative analysis of DNA methylation profiles in lung cancer identifies aberrant DNA methylation of specific genes and its association with gender and cancer risk factors. Cancer Res 69, 243-252 (2009). Wu, Z.L., et al. Polycomb protein EZH2 regulates E2F1-dependent apoptosis through epigenetically modulating Bim expression. Cell Death Differ (2009). Eymin, B., Gazzeri, S., Brambilla, C. & Brambilla, E. Distinct pattern of E2F1 expression in human lung tumors: E2F1 is upregulated in small cell lung carcinoma. Oncogene 20, 1678-1687 (2001). Hussain, M., et al. Tobacco smoke induces polycomb-mediated repression of Dickkopf-1 in lung cancer cells. Cancer Res 69, 3570-3578 (2009). Geradts, J., Fong, K.M., Zimmerman, P.V. & Minna, J.D. Loss of Fhit expression in non-small-cell lung cancer: correlation with molecular genetic abnormalities and clinicopathological features. Br J Cancer 82, 1191-1197 (2000). Tomizawa, Y., et al. Clinicopathological significance of Fhit protein expression in stage I non-small cell lung carcinoma. Cancer Res 58, 5478-5483 (1998). Sozzi, G., et al. Loss of FHIT function in lung cancer and preinvasive bronchial lesions. Cancer Res 58, 5032-5037 (1998). de Fraipont, F., et al. Promoter methylation of genes in bronchial lavages: a marker for early diagnosis of primary and relapsing non-small cell lung cancer? Lung Cancer 50, 199-209 (2005). 127  42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57. 58. 59. 60. 61. 62.  Maruyama, R., Sugio, K., Yoshino, I., Maehara, Y. & Gazdar, A.F. Hypermethylation of FHIT as a prognostic marker in nonsmall cell lung carcinoma. Cancer 100, 1472-1477 (2004). Garnis, C., et al. High resolution analysis of non-small cell lung cancer cell lines by whole genome tiling path array CGH. Int J Cancer 118, 1556-1564 (2006). Petersen, I., et al. [Comparative genomic hybridization of bronchial carcinomas and their metastases]. Verh Dtsch Ges Pathol 81, 297-305 (1997). Petersen, I., et al. Patterns of chromosomal imbalances in adenocarcinoma and squamous cell carcinoma of the lung. Cancer Res 57, 2331-2335 (1997). Yakut, T., et al. Assessment of molecular events in squamous and nonsquamous cell lung carcinoma. Lung Cancer 54, 293-301 (2006). Lee, D.F., et al. KEAP1 E3 ligase-mediated downregulation of NF-kappaB signaling by targeting IKKbeta. Mol Cell 36, 131-140 (2009). Sun, S., Schiller, J.H. & Gazdar, A.F. Lung cancer in never smokers--a different disease. Nat Rev Cancer 7, 778-790 (2007). Gollob, J.A., Wilhelm, S., Carter, C. & Kelley, S.L. Role of Raf kinase in cancer: therapeutic potential of targeting the Raf/MEK/ERK signal transduction pathway. Semin Oncol 33, 392-406 (2006). Guo, N.L., et al. Confirmation of gene expression-based prediction of survival in non-small cell lung cancer. Clin Cancer Res 14, 8213-8220 (2008). Strahl, B.D. & Allis, C.D. The language of covalent histone modifications. Nature 403, 41-45 (2000). Esteller, M. Epigenetics in cancer. N Engl J Med 358, 1148-1159 (2008). Barlesi, F., et al. Global histone modifications predict prognosis of resected non small-cell lung cancer. J Clin Oncol 25, 4358-4364 (2007). Esteller, M. Cancer epigenomics: DNA methylomes and histone-modification maps. Nat Rev Genet 8, 286-298 (2007). Van Den Broeck, A., et al. Loss of histone H4K20 trimethylation occurs in preneoplasia and influences prognosis of non-small cell lung cancer. Clin Cancer Res 14, 7237-7245 (2008). Lazarevich, N.L. & Fleishman, D.I. Tissue-specific transcription factors in progression of epithelial tumors. Biochemistry (Mosc) 73, 573-591 (2008). Grigo, K., Wirsing, A., Lucas, B., Klein-Hitpass, L. & Ryffel, G.U. HNF4 alpha orchestrates a set of 14 genes to down-regulate cell proliferation in kidney cells. Biol Chem 389, 179-187 (2008). Lazarevich, N.L., et al. Progression of HCC in mice is associated with a downregulation in the expression of hepatocyte nuclear factors. Hepatology 39, 1038-1047 (2004). Lucas, B., et al. HNF4alpha reduces proliferation of kidney cells and affects genes deregulated in renal cell carcinoma. Oncogene 24, 6418-6431 (2005). Sel, S., Ebert, T., Ryffel, G.U. & Drewes, T. Human renal cell carcinogenesis is accompanied by a coordinate loss of the tissue specific transcription factors HNF4 alpha and HNF1 alpha. Cancer Lett 101, 205-210 (1996). Watt, A.J., Garrison, W.D. & Duncan, S.A. HNF4: a central regulator of hepatocyte differentiation and function. Hepatology 37, 1249-1253 (2003). Morin, R.D., et al. Somatic mutations altering EZH2 (Tyr641) in follicular and diffuse large B-cell lymphomas of germinal-center origin. Nat Genet 42, 181-185. 128  63. 64. 65. 66. 67.  68. 69. 70. 71. 72. 73. 74. 75. 76. 77. 78. 79. 80.  Hillemacher, T., et al. Global DNA methylation is influenced by smoking behaviour. Eur Neuropsychopharmacol 18, 295-298 (2008). Olaussen, K.A., Mountzios, G. & Soria, J.C. ERCC1 as a risk stratifier in platinum-based chemotherapy for nonsmall-cell lung cancer. Curr Opin Pulm Med 13, 284-289 (2007). Cheng, L., Spitz, M.R., Hong, W.K. & Wei, Q. Reduced expression levels of nucleotide excision repair genes in lung cancer: a case-control analysis. Carcinogenesis 21, 1527-1530 (2000). Herbst, R.S., Heymach, J.V. & Lippman, S.M. Lung cancer. N Engl J Med 359, 1367-1380 (2008). Vilmar, A. & Sorensen, J.B. Excision repair cross-complementation group 1 (ERCC1) in platinum-based treatment of non-small cell lung cancer with special emphasis on carboplatin: a review of current literature. Lung Cancer 64, 131-139 (2009). Felip, E. & Rosell, R. Testing for excision repair cross-complementing 1 in patients with non-small-cell lung cancer for chemotherapy response. Expert Rev Mol Diagn 7, 261-268 (2007). Olaussen, K.A., et al. DNA repair by ERCC1 in non-small-cell lung cancer and cisplatin-based adjuvant chemotherapy. N Engl J Med 355, 983-991 (2006). Kim, H., et al. Tumor-specific methylation in bronchial lavage for the early detection of non-small-cell lung cancer. J Clin Oncol 22, 2363-2370 (2004). Kim, J.S., et al. Aberrant methylation of the FHIT gene in chronic smokers with early stage squamous cell carcinoma of the lung. Carcinogenesis 25, 2165-2171 (2004). Chari, R., et al. Effect of active smoking on the human bronchial epithelium transcriptome. BMC Genomics 8, 297 (2007). Garnis, C., et al. Chromosome 5p aberrations are early events in lung cancer: implication of glial cell line-derived neurotrophic factor in disease progression. Oncogene 24, 4806-4812 (2005). Baldwin, C., Garnis, C., Zhang, L., Rosin, M.P. & Lam, W.L. Multiple microalterations detected at high frequency in oral cancer. Cancer Res 65, 75617567 (2005). Lockwood, W.W., Coe, B.P., Williams, A.C., MacAulay, C. & Lam, W.L. Whole genome tiling path array CGH analysis of segmental copy number alterations in cervical cancer cell lines. Int J Cancer 120, 436-443 (2007). Watson, S.K., Deleeuw, R.J., Ishkanian, A.S., Malloff, C.A. & Lam, W.L. Methods for high throughput validation of amplified fragment pools of BAC DNA for constructing high resolution CGH arrays. BMC Genomics 5, 6 (2004). Chi, B., DeLeeuw, R.J., Coe, B.P., MacAulay, C. & Lam, W.L. SeeGH - A software tool for visualization of whole genome array comparative genomic hybridization data. BMC Bioinformatics 5, 13 (2004). Chi, B., et al. MD-SeeGH: a platform for integrative analysis of multi-dimensional genomic data. BMC Bioinformatics 9, 243 (2008). Irizarry, R.A., et al. Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249-264 (2003). Bild, A.H., et al. Oncogenic pathway signatures in human cancers as a guide to targeted therapies. Nature 439, 353-357 (2006). 129  5.  Conclusions 4  5.1.  Research summary  Lung cancer is far from being a health problem of the past. In fact, lung cancer is still the foremost cause of cancer related deaths worldwide1. Even in Canada where the population at-large is relatively well-informed about the dangers of tobacco smoke, lung cancer claimed over 20,000 lives in 2009 - accounting for over one quarter of all cancer deaths2. The extremely aggressive nature of the disease is made clear by the fact that only 15% of people diagnosed with lung cancer will be alive in 5 years3. As sobering as this statistic may be, it is worth considering that lung cancer patients in North America enjoy the best prognosis of anywhere in the world. Even in the well developed countries of Europe, the five year survival rate is as low as 10%4. The situation is even more grim in the developing countries of the world where, in the last 20 years, the incidence of lung cancer has not hit a plateau or begun to decline, as it has in Canada, but has risen dramatically4. Puzzlingly however, despite clear indicators that lung cancer is a world health problem of epic proportions, it remains an underfunded area of research5. Despite this, researchers have achieved significant success in beginning to identify the molecular mechanisms of non-small cell lung cancer development. The identification and characterization of the mutations, copy number alterations, and epigenetic aberrations which are causal to lung cancer development is the first step towards the creation of improved diagnostic, chemopreventive, and chemotherapeutic approaches. 4  Portions of the text from this chapter appear in abstracts prepared for chapters 2-4.  130  In this thesis we began by developing and evaluating a novel framework for lung cancer genome profiling. To do this, we coupled whole-genome DNA methylation and whole-genome copy number assays for the first time, and demonstrated the utility of this approach for identifying cancer-related genes in NSCLC cells. Next, we used this approach to scan a locus previously associated with lung-cancer risk for novel lung tumor suppressor genes. We identified one such gene, EYA4, which we have identified to be involved in both sporadic and familial lung cancers. Lastly, we again utilized whole-genome DNA copy number and DNA methylation analyses to identify genes and pathways associated with phenotypic differences between the two main subtypes of NSCLC.  5.2.  Development of a strategy and technology for two-hit detection of tumor genes  DNA methylation is integral to normal development and disease processes. However, the genomic distribution of methylated sequences – the methylome – is not yet fully understood. We developed a platform technology for rapid assessment of methylation status throughout the human genome in a high-throughput manner. This was achieved by coupling a methylated DNA immunoprecipitation (MeDIP) method for isolating methyl cytosine rich fragments with array-based comparative genomic hybridization (array CGH). Using a combination of whole genome tiling path BAC arrays and CpG island microarrays, DNA methylation profiles are obtained simultaneously at both genome-wide and locus-specific levels. A comparison between male and female DNA using MeDIP-array CGH revealed unexpected hypomethylation of the inactive X-chromosome in gene-poor regions. Furthermore, comparisons 131  between cancer and non-cancer cell types yielded differential methylation patterns that link genetic and epigenetic instability offering a new approach to decipher misregulation in cancer. Finally, we provided new data showing epigenomic instability in lung cancer cells with concurrent regions of genetic and epigenetic alterations harboring known oncogenes. Indeed, our combination of multiple genomic dimensions to profile a single sample was a herald of things to come. Numerous studies since have leveraged multiple types of data for a single sample to help to separate and identify the genetic and epigenetic alterations that are most critical to a cancer cell6,7. In the years following this publication however, potential pitfalls to the MeDIP technique have been uncovered. The bias of MeDIP towards CG rich regions of the genome required the development of numerous bioinformatic approaches8,9 for dealing with the bias. Additionally, there is some evidence that despite the controls that are built into the assay, the copy number profile of a tumor sample may influence the DNA methylation profile obtained when using immunoprecipitation techniques10. Most significant from our perspective however was the comparative lack of resolution that MeDIP aCGH on a BAC array provides. While arrays with resolutions on the same scale as our BAC array have been used to identify novel and important cancer genes, they were primarily coupled with other techniques for discriminating methylated from unmethylated fragments (such as restriction enzyme digestion). While not nearly as comprehensive as MeDIP, these techniques are perhaps more suitable for the detection of small regions of the genome with methylation states that greatly differ between tumor and normal samples11. Indeed, the use of MeDIP with high-resolution oligonucleotide arrays has yielded numerous interesting findings across many tumor types7,12-14. More specifically however, these studies have confirmed our finding that DNA 132  hypomethylation is found in regions of copy number gain7,12. In addition, they have further highlighted the benefits of examining multiple dimensions in a single sample. In addition to publications supporting our findings, we have generated additional data within our lab using a totally different, bisulfite-based, DNA methylation assay which also supports our previous conclusions. For example, using the Illumina HumanMethylation 27 chip to assess DNA methylation in the lung adenocarcinoma cell line NCI-H1395, it is clear that significant hypomethylation of the 1q amplicon discussed in chapter 2 is present (Figure 5.1) What is not clear however is how the genomic structure or derivation of an amplicon impacts the DNA methylation status of that amplicon. The discovery that segmental duplications become hypomethylated may offer some insight12. For example, it may be that amplicon structures such as homogeneously staining regions (HSRs) are more prone to hypomethylation when compared to amplicons that are extrachromosomal in nature. Further experiments will be needed to answer these questions. This type of experiment can be undertaken without the resolution or sequence bias issues that have plagued array-based methylome research in the past, as high-throughput bisulfite sequencing is now feasible15. It would be possible to perform targeted high-coverage sequencing of specific genomic amplicons from a given cancer to determine their actual methylation level at single base resolution. By coupling this experiment with fluorescence in situ hybridization or spectral karyotyping, one would be able to draw numerous conclusions about the propensity (or lack thereof) for certain amplicon structures to become hypomethylated.  133  5.3.  Integrated analysis of DNA methylation and DNA copy number identifies EYA4 as a novel lung tumor suppressor gene  Tumor suppressor genes exhibiting biallelic disruptions are likely causal to carcinogenesis. Despite identification of a critical TSG locus on chromosome 6q23-25 by the Genetic Epidemiology of Lung Cancer Consortium (GELCC), no single, causative gene has been defined. Using integrated approaches to scan the genomes and epigenomes of sporadic lung adenocarcinomas for two-hit targets, we identified frequent deletion and hypermethylation of EYA4, located within the 6q susceptibility locus. Using gene-expression microarrays and quantitative PCR for our tumor set, and external validation sets, we found that EYA4 expression is drastically reduced in both major non-small cell lung cancer subtypes. Additionally, using a sequencing-based approach we identify underexpression of EYA4 in tumors as well as preneoplastic lesions, indicating that inactivation of EYA4 is likely to be involved in disease initiation and progression. DNA methylation in the CpG island upstream of EYA4 is strongly associated with transcriptional repression, as demonstrated by pharmacologic inhibition of DNA methyltransferases by treatment of methylated cells with 5’-azacytidine. Moreover, examination of familial lung cancers identifies EYA4 haplotypes to be significantly associated with increased lung cancer risk. Our functional experiments have validated a pro-apoptotic tumor suppressor role for this gene. Furthermore, we demonstrate that EYA4 expression is more strongly associated with clinical outcome than any other somatically altered gene in the 6q susceptibility region. The involvement of EYA4 in regulating apoptosis is underscored by the other main role of the gene in regulating the innate immune system. Interestingly, recent findings by the GELCC show 134  that at the 6q locus, risk-associated alleles greatly increase lung cancer risk in even light or never smokers. Taken together, one can envision a main role for EYA4 in modulating lung cancer risk. Without appropriately functioning EYA4, potentially preneoplastic cells are able to avoid apoptosis and the innate immune system is unable to detect and remove them. This work suggests that EYA4 is a gene within the lung cancer susceptibility locus critical to NSCLC development.  5.4.  The role of DNA copy number and DNA methylation alterations in NSCLC subtype phenotypes  Next we investigated the role that DNA copy number and DNA methylation alterations play in determining phenotypic differences in NSCLC subtypes. DNA copy number alteration frequencies were compared for a large number of AC and SqCC tumors. This identified many regions of recurrent copy number alteration occurring at different frequencies in AC and SqCC tumors. As these two subtypes do not have inherently different levels of genomic instability, any observed differences between the two are likely due to selection for advantageous loci. We then linked gene expression alterations in the tumors to these copy number differences, identifying those aberrantly expressed genes whose expression was linked to an increase or decrease in gene dosage. The functions of these subtype-specific genes were then evaluated using sophisticated gene-network analysis tools. What this revealed is that not only are chromosomal loci and the genes within them different between AC and SqCC, but the functions of the genes are largely different as well, reflecting perhaps reliance on dissimilar oncogenic pathways for tumor development.  135  These differentially regulated genes are potentially clinically relevant as well. For example, the observation that MAPK1 is gained and overexpressed in SqCC while being lost and underexpressed in AC has numerous implications. The development of drugs that target the MAPK pathway is one such implication, and it may prove necessary to evaluate AC and SqCC differentially based on the deregulation of this gene. Additionally, there is prognostic relevance in the differentially regulated genes that underscores their differing disruption. High expression of MAPK1 is associated with poor survival in SqCC, while low expression of the same gene is associated with poor survival in AC tumors. Differential subtype specific expression of ERCC1 is another example of why it will become critical to consider AC and SqCC tumors separately when designing new drug regimens. It has been shown that ERCC1 levels are significantly associated with response to DNA-damaging chemotherapeutics. Indeed, based on our data it might be reasonable to expect differential response to platinum based therapeutic modalities in AC and SqCC tumors. In addition to harboring many regions of differing DNA copy number alterations, we have identified significant differences in DNA methylation profiles between the two subtypes. This mirrors, but expands greatly, on previous findings16. To begin with, we detected markedly different levels of DNA methylation (overall) between AC and SqCC tumors, as well as associated normal tissues. By examining the differential methylation profiles of the AC and SqCC groups, we identified many genes that are differentially methylated between the two groups. Coupling this epigenomic data with gene expression data yielded genes that are differentially regulated by DNA methylation in the two subtypes. Interestingly, functional analysis of these genes revealed that in  136  SqCC the epigenomically regulated genes coincided very well with the genomically regulated genes, while in AC no such link was found. Also of note is our discovery that in SqCC, numerous differentially regulated (by DNA methylation) genes are members of the SCLC signaling pathway. This may point to a shared environment dictating epigenetic patterning, as both cancers occur centrally in the larger airways. We extended the relevance of this observation by demonstrating that the polycomb group member EZH2, a target of the SqCC-specific hypomethylated and overexpressed gene E2F1, is also differentially expressed between AC and SqCC subtypes. This is congruent with the findings of other groups that E2F1 drives EZH2 expression, and that they are both elevated in SCLC. As with MAPK1 and ERCC1, this could have potential therapeutic relevance, as treatments targeting EZH2 are being developed17. Lastly, we sought to determine what overlap there was between genetically and epigenetically regulated genes, if any. In AC tumors, we found no similar genes in our two analyses, however in SqCC we found 32 genes that are selected for by both DNA methylation and copy number alterations, underscoring their potential importance as key genes differentiating between SqCC and AC tumors. Of interest is that the well known tumor suppressor from chromosme 3p, FHIT, is one of these genes. We find that it is lost and hypermethylated in SqCC preferentially, in agreement with previous studies18-20. From a clinical perspective this is relevant for two reasons. First, because poor patient survival has been correlated with FHIT methylation21, and second because it has been investigated as a biomarker for centrally occurring lung cancers22.  137  Ultimately we are able conclude that AC and SqCC possess numerous differences at the copy number as well as the DNA methylation level which correspond with their differing phenotypes. This study also highlights the necessity of considering the molecular origins underlying lung cancer histologies when clinical management strategies are designed.  5.5.  Significance of work  In this thesis, we have developed tools for parallel analysis of genetic and epigenetic aberrations in tumor genomes. By doing this, we have gained numerous biological insights into the molecular basis of lung cancer, the world’s leading cause of cancer death. In particular, we have demonstrated the utility of integrating multiple levels of DNA analysis for identifying candidate genes that drive lung cancer phenotypes. One such candidate includes a novel tumor suppressor gene that is not only inactivated by multiple mechanisms in sporadic lung cancers, but has genotypic variants that are associated with familial lung cancer risk. We also identified molecular alterations specific to each of the two major subtypes of NSCLC, squamous cell carcinoma and adenocarcinoma. This adds to the growing body of data suggesting that these subgroups should be considered as distinct diseases in selection of therapy. Ultimately, the approaches and findings detailed in this thesis could have a profound impact on risk assessment and management strategies; critical genes identified by integrated analysis of lung tumor genomes and methylomes could serve as diagnostic or predictive biomarkers or as druggable targets for next-generation therapies. It is clear that the path to improved lung cancer survival rates rests, at least in part, with even deeper investigations of lung tumor genomes. 138  5.6.  Future directions  In the work described within this thesis, the most striking finding, and the most immediately evident follow-up questions are derived from chapter 2. Our findings indicate that EYA4 is a critical gene in NSCLC development and risk. Experiments to further characterize the function of the gene will be important, and may include investigation into the mechanism of apoptosis avoidance. Additionally, it will be useful to determine whether EYA4 is ever somatically mutated within cancers. It will also be of interest to discern how EYA4 genotype modulates lung cancer risk. Our discovery that EYA4 is hypermethylated in histologically-normal bronchial epithelium of patients with previous NSCLC tumors is also of significant interest. DNA hypermethylation is of great interest to the clinical community as a potential target for early detection and risk assessment. It will be of significant interest to test whether EYA4 methylation levels in bronchial epithelia, or perhaps even peripheral blood, could be used to detect or predict lung cancer.  139  Figure 5.1 - Hypomethylation of an amplicon in lung adenocarcinoma cells (a) This panel demonstrates the copy number for chromosome 1 in the NCI-H1395 lung adenocarcinoma cell line as determined by an Affymetrix 500k SNP array. The amplicon shown in Figure 2.3 is shown clearly, and has an estimated copy number of between 8 and 10. (b) DNA methylation levels were determined for NCI-H1395 using the Illumina HumanMethylation 27 bead chip array. Probes not located within CpG island were assessed and used for this figure, and they were grouped by their location either within the main amplicon or not. As shown in the figure, probes located within the amplicon (n=119) and not within the amplicon (n=795), are significantly different (p<0.0001: Mann-Whitney U Test). This shows that the amplicon is hypomethylated relative to the rest of the chromosome, confirming our results from chapter 2.  140  5.7. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.  References Horner, M.J., et al. SEER Cancer Statistics Review, 1975-2006. Vol. 2009 (National Cancer Institute, Bethesda, MD, 2009). Society, C.C. Canadian Cancer Society's Committee: Canadian Cancer Statistics 2009. (Toronto, 2009). Travis, W.D., Travis, L.B. & Devesa, S.S. Lung cancer. Cancer 75, 191-202 (1995). Behera, D. Managing lung cancer in developing countries: difficulties and solutions. Indian J Chest Dis Allied Sci 48, 243-244 (2006). Gritz, E.R., Sarna, L., Dresler, C. & Healton, C.G. Building a united front: aligning the agendas for tobacco control, lung cancer research, and policy. Cancer Epidemiol Biomarkers Prev 16, 859-863 (2007). Veeriah, S., Morris, L.G., Solit, D. & Chan, T.A. The familial Parkinson disease gene PARK2 is a multisite tumor suppressor on chromosome 6q25.2-27 that regulates cyclin E. Cell Cycle 9. Sadikovic, B., et al. Identification of interactive networks of gene expression associated with osteosarcoma oncogenesis by integrated molecular profiling. Hum Mol Genet 18, 1962-1975 (2009). Down, T.A., et al. A Bayesian deconvolution strategy for immunoprecipitationbased DNA methylome analysis. Nat Biotechnol 26, 779-785 (2008). Pelizzola, M., et al. MEDME: an experimental and analytical methodology for the estimation of DNA methylation levels based on microarray derived MeDIPenrichment. Genome Res 18, 1652-1659 (2008). Vega, V.B., Cheung, E., Palanisamy, N. & Sung, W.K. Inherent signals in sequencing-based Chromatin-ImmunoPrecipitation control libraries. PLoS One 4, e5241 (2009). Ching, T.T., et al. Epigenome analyses using BAC microarrays identify evolutionary conservation of tissue-specific methylation of SHANK3. Nat Genet 37, 645-651 (2005). Rauch, T.A., et al. High-resolution mapping of DNA hypermethylation and hypomethylation in lung cancer. Proc Natl Acad Sci U S A 105, 252-257 (2008). Cheung, H.H., et al. Genome-wide DNA methylation profiling reveals novel epigenetically regulated genes and non-coding RNAs in human testicular cancer. Br J Cancer 102, 419-427. Omura, N., et al. Genome-wide profiling of methylated promoters in pancreatic adenocarcinoma. Cancer Biol Ther 7, 1146-1156 (2008). Lister, R., et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315-322 (2009). Toyooka, S., et al. Smoke exposure, histologic type and geography-related differences in the methylation profiles of non-small cell lung cancer. Int J Cancer 103, 153-160 (2003). Fiskus, W., et al. Combined epigenetic therapy with the histone methyltransferase EZH2 inhibitor 3-deazaneplanocin A and the histone deacetylase inhibitor panobinostat against human AML cells. Blood 114, 27332743 (2009). 141  18. 19. 20. 21. 22.  Geradts, J., Fong, K.M., Zimmerman, P.V. & Minna, J.D. Loss of Fhit expression in non-small-cell lung cancer: correlation with molecular genetic abnormalities and clinicopathological features. Br J Cancer 82, 1191-1197 (2000). Tomizawa, Y., et al. Clinicopathological significance of Fhit protein expression in stage I non-small cell lung carcinoma. Cancer Res 58, 5478-5483 (1998). Sozzi, G., et al. Loss of FHIT function in lung cancer and preinvasive bronchial lesions. Cancer Res 58, 5032-5037 (1998). Maruyama, R., Sugio, K., Yoshino, I., Maehara, Y. & Gazdar, A.F. Hypermethylation of FHIT as a prognostic marker in nonsmall cell lung carcinoma. Cancer 100, 1472-1477 (2004). de Fraipont, F., et al. Promoter methylation of genes in bronchial lavages: a marker for early diagnosis of primary and relapsing non-small cell lung cancer? Lung Cancer 50, 199-209 (2005).  142  Appendix Appendix A – UBC Research Ethics Board Certificates Approval A certificate citing the approval of the University of British Columbia Research Ethics Board for work conducted in the course of this thesis is attached on the next page.  143  Page 1 of 1 UBC BCCA Research Ethics Board Fairmont Medical Building (6th Floor) 614 - 750 West Broadway Vancouver, BC V5Z 1H5 Tel: (604) 877-6284 Fax: (604) 708-2132 E-mail: reb@bccancer.bc.ca University of British Columbia - British Columbia Cancer Agency Website: http://www.bccancer.bc.ca > Research Ethics Research Ethics Board (UBC BCCA REB) RISe: http://rise.ubc.ca  Certificate of Expedited Approval: Annual Renewal PRINCIPAL INVESTIGATOR:  INSTITUTION / DEPARTMENT: BCCA/BCCA/Cancer Genetics & Wan Lam Development (BCCA) INSTITUTION(S) WHERE RESEARCH WILL BE CARRIED OUT: Institution  REB NUMBER: H08-01392 Site  BC Cancer Agency  Vancouver BCCA  Other locations where the research will be conducted:  N/A  PRINCIPAL INVESTIGATOR FOR EACH ADDITIONAL PARTICIPATING BCCA CENTRE: Vancouver: Wan Lam Vancouver Island: N/A Fraser Valley: N/A Southern Interior: N/A Abbotsford Centre: N/A SPONSORING AGENCIES AND COORDINATING GROUPS: Canadian Institutes of Health Research (CIHR) PROJECT TITLE: Development of a multi-spectral platform for integrated analysis of clinical and research samples. APPROVAL DATE: August 4, 2009  EXPIRY DATE OF THIS APPROVAL: August 4, 2010  PAA#: H08-01392-A003  CERTIFICATION: 1. The membership of the UBC BCCA REB complies with the membership requirements for research ethics boards defined in Division 5 of the Food and Drug Regulations of Canada. 2. The UBC BCCA REB carries out its functions in a manner fully consistent with Good Clinical Practices. 3. The UBC BCCA REB has reviewed and approved the research project named on this Certificate of Approval including any associated consent form and taken the action noted above. This research project is to be conducted by the provincial investigator named above. This review and the associated minutes of the UBC BCCA REB have been documented electronically and in writing.  The UBC BCCA Research Ethics Board has reviewed the documentation for the above named project. The research study as presented in documentation, was found to be acceptable on ethical grounds for research involving human subjects and was approved for renewal by the UBC BCCA REB.  UBC BCCA Ethics Board Approval of the above has been verified by one of the following: Dr. George Browman, Chair Dr. Lynne Nakashima, Second Vice-Chair  If you have any questions, please call: Bonnie Shields, Manager, BCCA Research Ethics Board: 604-877-6284 or e-mail: reb@bccancer.bc.ca Dr. George Browman, Chair: 604-877-6284 or e-mail: gbrowman@bccancer.bc.ca Dr. Lynne Nakashima, Second Vice-Chair: 604-707-5989 or e-mail: lnakas@bccancer.bc.ca  143 https://rise.ubc.ca/rise/Doc/0/JKQ8088GG9RKN55VLAL9OHM869/fromString.html  15/04/2010  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0071055/manifest

Comment

Related Items