UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Multi-'omics comparison of lung cancers from current and never smokers Thu, Kelsie L. 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2013_fall_thu_kelsie.pdf [ 1.02MB ]
Metadata
JSON: 24-1.0166807.json
JSON-LD: 24-1.0166807-ld.json
RDF/XML (Pretty): 24-1.0166807-rdf.xml
RDF/JSON: 24-1.0166807-rdf.json
Turtle: 24-1.0166807-turtle.txt
N-Triples: 24-1.0166807-rdf-ntriples.txt
Original Record: 24-1.0166807-source.json
Full Text
24-1.0166807-fulltext.txt
Citation
24-1.0166807.ris

Full Text

 MULTI-'OMICS COMPARISON OF LUNG CANCERS FROM CURRENT AND NEVER SMOKERS  by  Kelsie L. Thu  B.Sc., Simon Fraser University, 2007  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY  in  THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES  (Interdisciplinary Oncology)  THE UNIVERSITY OF BRITISH COLUMBIA  (Vancouver)     September 2013  ? Kelsie L. Thu, 2013  iiAbstract  Lung cancer is the leading cause of cancer death worldwide and a better understanding of the molecular alterations driving tumour biology is required to improve patient prognosis. This is especially true for never smokers (NS), which account for up to 25% of lung cancer cases globally. As the population of current smokers (CS) decreases due to smoking cessation and prevention initiatives, in the coming decades NS and former smokers (FS) will comprise a greater proportion of cases. We used an integrative 'omics approach to study lung cancer genomes of CS and NS to elucidate differential mechanisms and patterns of gene and pathway disruption likely to contribute to lung tumourigenesis. Lung cancers in CS and NS exhibit different clinical features and are known to preferentially select specific mutations (e.g. EGFR, KRAS, EML4-ALK), suggesting they are distinct diseases. Thus, we hypothesize that lung tumours of CS and NS exhibit disparate patterns of molecular alterations on a genome wide scale, reflecting the assumption that they develop through the differential selection of genes and pathways.  A large scale, multi-dimensional, genomics study has yet to be performed and holds great potential to reveal novel insights into the mechanisms of lung tumourigenesis in CS and NS. Therefore, we performed DNA copy number, methylation, gene expression and microRNA expression profiling on a panel of lung adenocarcinoma tumours from CS and NS in an attempt to characterize the genomic and epigenomic landscapes of these tumours. In addition to identifying commonly disrupted genes and pathways, our integrative genomic analysis revealed numerous differences between CS and NS lung tumours including: differing extents of copy number and methylation alterations, different patterns of miRNA disruption, and preferential disruption of genes and cellular pathways. Importantly, some of the prominently disrupted genes that drive deregulation of tumour promoting pathways may represent novel therapeutic targets and intervention points.  Collectively, this work provides further evidence that lung tumours of CS and NS develop through different molecular alterations which suggests patients will benefit from specific management strategies tailored to their smoking status. iiiPreface  The research in this thesis was conducted with ethics approval from the UBC Research Ethics Board, Certificate Numbers: EDRN H09-00008 and CCSRI H09-00934.  Portions of Chapters 1, 2, and 3 have been published as: Thu KL, Vucic EA, Chari R, Zhang W, Lockwood WW, English JC, Fu R, Wang P, Feng Z, MacAulay CE, Gazdar AF, Lam S, Lam WL (2012). Lung adenocarcinomas of never smokers and smokers are genomically distinct. PLoS ONE. 7(3):e33003.  Chapters 3, 5, and 6 were co-authored as manuscripts for publication. The following author lists apply for these chapters:  A version of Chapter 3 has been published. [Thu, K.L.], Vucic, E.A., Chari, R., Zhang, W., Lockwood, W.W., English, J.C., Fu, R., Wang, P., Feng, Z., MacAulay, C.E., Gazdar, A.F., Lam, S., Lam, W.L. (2012) Lung adenocarcinomas of never smokers and smokers are genomically distinct. PLoS ONE. 7(3):e33003. I am first author of this work and I wrote the manuscript, conducted data analysis and interpreted the results.  A version of Chapter 5 is being prepared for publication. Vucic, E.A., [Thu K.L.], Pikor, L.A., Ramnarine, V.R., Enfield, K.S., MacAulay, C.E., Lam, S., Lam, W.L. (2013). Contribution of smoking to miRNA deregulation in lung adenocarcinoma. I am co-first author of this manuscript. I helped design the study, perform data analysis and interpretation, and I co-wrote the manuscript.  A version of Chapter 6 has been submitted for publication. Mosslemi, M., [Thu, K.L.], Vucic, E.A., Pikor, L.A., Ng R.T., MacAulay, C.E., Lam, W.L. (2013). Development of a Multi-dimensional Integrative Tumor gene Ranking Algorithm (MITRA) for the identification of candidate genes in cancer. I am co-first author of this manuscript. I helped to develop the idea for this project, I performed the data analysis and interpretation, and I wrote the manuscript.  ivTable of Contents Abstract .................................................................................................................................... ii?Preface ..................................................................................................................................... iii?Table of Contents ................................................................................................................... iv?List of Tables .......................................................................................................................... ix?List of Figures .......................................................................................................................... x?List of Abbreviations ............................................................................................................. xi?Acknowledgements ............................................................................................................... xii?Dedication ............................................................................................................................. xiii?1    Chapter: Introduction ...................................................................................................... 1?1.1? Background on lung cancer ................................................................................................... 1?1.2? Lung cancer etiology ............................................................................................................. 2?1.3? Molecular pathology of lung cancer ..................................................................................... 3?1.4? Lung cancer in the context of smoking ................................................................................. 4?1.5? Integrative genomic analyses and the systems approach ...................................................... 5?1.6? Thesis theme and rationale .................................................................................................... 6?1.7? Research question, objectives and hypotheses ...................................................................... 6?1.8? Specific aims and thesis outline ............................................................................................ 7?2    Chapter: Sample collection and summary of molecular profiling ............................. 11?2.1? Patients and tissue accrual ................................................................................................... 11?2.2? Nucleic acid extraction ........................................................................................................ 11?2.3? Molecular profiling summary ............................................................................................. 11?2.3.1? DNA copy number arrays ............................................................................................... 14?2.3.2? DNA methylation arrays ................................................................................................. 14?2.3.3? mRNA expression arrays ................................................................................................ 15?2.3.4? miRNA sequencing ......................................................................................................... 15?2.3.5? Definition of gene disruption .......................................................................................... 16?3    Chapter: Copy number differences in lung AC of CS and NS ................................... 17?3.1? Introduction ......................................................................................................................... 17? v3.1.1? Effects of copy number alterations on gene expression and roles in tumourigenesis ..... 17?3.1.2? Rationale for investigating global copy number differences in CS and NS lung AC ..... 18?3.2? Methods ............................................................................................................................... 18?3.2.1? EGFR and KRAS mutation screening ............................................................................. 18?3.2.2? Single nucleotide polymorphism (SNP) arrays .............................................................. 19?3.2.3? Proportion of genome altered ......................................................................................... 19?3.2.4? Differentially altered regions .......................................................................................... 20?3.2.5? Identification of most prominent CNAs using GISTIC .................................................. 20?3.2.6? Validation of CNAs in external cohorts ......................................................................... 21?3.3? Results ................................................................................................................................. 22?3.3.1? Patient demographics ...................................................................................................... 22?3.3.2? EGFR and KRAS mutations segregate with smoking history ......................................... 22?3.3.3? Genomic landscape of copy number alterations in CS and NS ...................................... 23?3.3.4? Smoking is the clinical variable most strongly associated with observed differences ... 27?3.3.5? Genomic alterations common to CS and NS .................................................................. 28?3.3.6? High level DNA alteration patterns in CS and NS ......................................................... 29?3.3.7? Differentially altered regions in CS and NS ................................................................... 30?3.4? Discussion ........................................................................................................................... 35?3.4.1? Summary of findings ...................................................................................................... 35?3.4.2? Comparison of global genomic differences observed with findings in other studies ..... 35?3.4.3? Concordant and disparate genomic alterations identified in CS and NS and comparison with other studies ......................................................................................................................... 36?3.4.4? Implications of genomic differences observed between CS and NS .............................. 37?4    Chapter: DNA methylation differences in lung AC of CS and NS ............................ 39?4.1? Introduction ......................................................................................................................... 39?4.1.1? DNA methylation and regulation of gene expression ..................................................... 39?4.1.2? Aberrant DNA methylation in cancer and clinical significance ..................................... 40?4.1.3? Rationale for assessing DNA methylation in lung cancer .............................................. 40?4.2? Methods ............................................................................................................................... 41?4.2.1? DNA methylation arrays ................................................................................................. 41?4.2.2? Aberrantly methylated genes and differentially methylated regions .............................. 41?4.2.3? Validation of DNA methylation changes in external cohorts ......................................... 42?4.3? Results ................................................................................................................................. 43?4.3.1? Genome wide comparison of aberrant DNA methylation in CS and NS........................ 43? vi4.3.2? Smoking is the clinical variable most strongly associated with the extent of aberrant methylation in lung AC genomes ................................................................................................ 47?4.3.3? Prominently methylated regions in lung AC and differentially methylated regions in CS and NS epigenomes ..................................................................................................................... 48?4.3.4? Aberrantly methylated genes in lung AC of CS and NS ................................................ 49?4.3.5? Validation of observed differences in external cohorts .................................................. 51?4.4? Discussion ........................................................................................................................... 53?4.4.1? Summary of findings ...................................................................................................... 53?4.4.2? Comparison of findings to literature on DNA methylation differences in CS and NS ... 54?4.4.3? Potential roles of aberrant methylation in lung tumourigenesis ..................................... 56?4.4.4? Potential reasons for lack of validation of findings in external cohorts ......................... 56?4.4.5? Limitations and implications of findings regarding CS and NS lung AC DNA methylation patterns ..................................................................................................................... 57?5    Chapter: MicroRNA expression in the context of smoking and lung AC ................. 59?5.1? Introduction ......................................................................................................................... 59?5.1.1? MiRNAs deregulation in cancer ..................................................................................... 59?5.1.2? Effects of smoking on miRNA expression ..................................................................... 59?5.1.3? Rationale for assessing miRNA deregulation in lung AC of CS and NS ....................... 59?5.2? Methods ............................................................................................................................... 60?5.2.1? MicroRNA sequencing ................................................................................................... 60?5.2.2? Clustering of miRNA expression profiles ...................................................................... 60?5.2.3? miRNA expression patterns in non-malignant and malignant lung tissues .................... 61?5.2.4? miRNA target gene analysis ........................................................................................... 62?5.2.5? Validation of miRNA expression changes in the TCGA lung AC dataset ..................... 62?5.3? Results ................................................................................................................................. 63?5.3.1? Clustering of miRNA expression profiles ...................................................................... 63?5.3.2? miRNAs are differentially expressed between non-malignant lung tissues of CS and NS with lung AC................................................................................................................................ 64?5.3.3? miRNA are reversibly or irreversibly expressed in non-malignant tissues of individuals with lung AC................................................................................................................................ 68?5.3.4? CS, FS, and NS lung AC tumours exhibit aberrant expression of common and different miRNAs ....................................................................................................................................... 70?5.3.5? Potential biological roles of candidate miRNA specific to CS and NS .......................... 74?5.4? Discussion ........................................................................................................................... 75? vii5.4.1? Summary of findings ...................................................................................................... 75?5.4.2? miRNA expression patterns are influenced by smoking................................................. 75?5.5.1? Modulation of miRNA expression in non-malignant lung tissues in response to smoking and further expression deregulation in tumour tissues ................................................................ 77?5.5.2? miRNA are commonly and differentially deregulated in lung AC smoking groups ...... 78?5.5.3? Implications of deregulated miRNAs identified ............................................................. 79?6    Chapter: Development of an integrative genomics analysis strategy and application to lung AC of CS and NS ...................................................................................................... 81?6.1? Introduction ......................................................................................................................... 81?6.1.1? Multi-dimensional 'omics profiling of cancer and the need for novel integrative strategies for analyzing individual tumours ................................................................................. 81?6.1.2? Rationale for developing an algorithm to prioritize candidate genes discovered through integrative genomic analysis ........................................................................................................ 82?6.1.3? Potential for discovery of novel molecular mechanisms in CS and NS lung AC ........... 83?6.2? Methods ............................................................................................................................... 83?6.2.1? MITRA: principles and scheme ...................................................................................... 83?6.2.2? Scoring alterations in each data dimension ..................................................................... 87?6.2.3? Weighting of DNA alterations with concurrent expression changes .............................. 91?6.2.4? Integration of scores across multiple dimensions ........................................................... 92?6.2.5? Application of MITRA to breast cancer ......................................................................... 92?6.2.6? Assessing the robustness of the MITRA algorithm ........................................................ 94?6.2.7? Application of MITRA to lung AC from CS and NS ..................................................... 94?6.2.8? Differentially disrupted genes and pathway analyses ..................................................... 95?6.2.9? Validation of differentially disrupted genes in external cohorts ..................................... 95?6.3? Results ................................................................................................................................. 96?6.3.1? MITRA identifies known driver genes in breast cancer ................................................. 96?6.3.2? MITRA is robust to modification of scoring bins ........................................................... 98?6.3.3? Application of MITRA to individual lung AC tumours ................................................. 98?6.3.4? Genes differentially disrupted in CS and NS ................................................................ 102?6.3.5? Pathways most prominently disrupted in lung AC ....................................................... 104?6.3.6? Pathways differentially disrupted in CS and NS lung AC ............................................ 108?6.3.7? Dissection of frequently deregulated pathways ............................................................ 109?6.3.8? Benefit of the multi-dimensional, individual tumour approach .................................... 110? viii6.4? Discussion ......................................................................................................................... 113?6.4.1? Summary of findings .................................................................................................... 113?6.4.2? Utility of MITRA and potential for adaptation and improvement ................................ 113?6.4.3? Disrupted genes and pathways in the context of lung tumour biology ......................... 115?6.4.4? Challenge of validation in public datasets .................................................................... 117?6.4.5? Potential clinical implications of tumour system based 'omics analyses ...................... 118?7    Chapter: Conclusions ................................................................................................... 120?7.1? Summary of study and findings ........................................................................................ 120?7.2? Conclusions regarding the study hypotheses .................................................................... 121?7.3? Strengths and limitations of this study .............................................................................. 122?7.4? Overall significance and clinical implications of research findings ................................. 124?7.5? Future research directions ................................................................................................. 125?Bibliography ........................................................................................................................ 129?Appendix .............................................................................................................................. 148?    ixList of Tables   Table 2.1 Clinical features of lung AC samples in this study ................................................. 12?Table 2.2 Stage breakdown of lung AC samples in this study ............................................... 14?Table 2.3 Summary of datasets analyzed in this thesis ........................................................... 16?Table 3.1 Summary of PGA in lung AC of CS and NS .......................................................... 27?Table 3.2 MANOVA results for associations between PGA and clinical variables ............... 28?Table 3.3 High level CNAs common to CS and NS lung AC identified by GISTIC ............. 31?Table 3.4 Differentially altered regions common in the BCCA and MSKCC cohorts ........... 33?Table 3.5 Differentially altered regions common to the BCCA, MSKCC and TSP cohorts .. 33?Table 4.1 Summary of aberrant DNA methylation in lung AC of CS and NS ....................... 45?Table 4.2 MANOVA results for the association of aberrant DNA methylation with clinical and genetic variables for the BCCA cohort ............................................................................ 48?Table 4.3 Most prominent aberrantly methylated regions in lung AC ................................... 50?Table 4.4 Differentially methylated genes validated in the TCGA lung AC cohort .............. 53?Table 5.1 MANOVA results for the effects of clinical variables on miRNA expression clustering ................................................................................................................................. 66?Table 5.2 miRNAs differentially deregulated in non-malignant lung tissue of CS and NS with lung AC ................................................................................................................................... 67?Table 5.3 Reversible and irreversible miRNAs identified in non-malignant lung tissues of lung AC patients ..................................................................................................................... 70?Table 5.4 miRNAs differentially deregulated in lung AC tissues of CS, FS, and NS ............ 73?Table 6.1 'Omics platforms and data input formats compatible with MITRA ....................... 87?Table 6.2 Data input and calculated scores for the TCGA breast tumour, A0B7................... 90?Table 6.3 Summary of datasets used to develop and apply MITRA in Chapter 6. ................ 93?Table 6.4 Summary of scores generated by MITRA for TCGA breast tumours .................... 96?Table 6.5 Comparison of MITRA results for original and modified scoring bins ................. 99?Table 6.6 Most frequent genes disrupted in lung AC common to both smoking groups ..... 101?Table 6.7 Differentially disrupted genes validated in external cohorts ................................ 105?Table 6.8 Cellular pathways frequently disrupted in both CS and NS lung AC .................. 106?Table 6.9 Pathways differentially disrupted in CS and NS lung AC .................................... 108? xList of Figures  Figure 3.1 EGFR and KRAS mutation frequencies in the BCCA and MSKCC lung AC cohorts. .................................................................................................................................... 23?Figure 3.2 Genomic landscape of lung AC from CS and NS ................................................. 25?Figure 3.3 PGA in the BCCA and MSKCC lung AC cohorts. ............................................... 26?Figure 3.4 Genes commonly disrupted in CS and NS ............................................................ 29?Figure 3.5 MCRs of overlap for the six regions common to all three datasets ....................... 34?Figure 4.1 DNA methylation landscape of lung AC from CS and NS ................................... 44?Figure 4.2 Quantification of aberrant DNA methylation in CS and NS lung AC .................. 46?Figure 5.1 Hierarchical clustering of miRNA expression in tumour and adjacent non-malignant lung tissues and distribution of CS, FS, and NS in miRNA expression clusters ... 65?Figure 5.2 miRNAs whose expression is reversible or irreversible in non-malignant lung tissues of FS with lung AC. .................................................................................................... 69?Figure 5.3 Venn diagram depicting miRNAs commonly and differentially expressed in CS, FS and NS lung AC tissues ..................................................................................................... 72?Figure 5.4 Select pathways commonly and differentially affected by miRNA identified as smoking specific ..................................................................................................................... 76?Figure 6.1 Benefits of individual tumour analysis and integrative, multi-?omics tumour profiling................................................................................................................................... 84?Figure 6.2 Flowchart depicting the MITRA scheme .............................................................. 88?Figure 6.3 Range in MITRA scores generated for 64 lung AC tumours .............................. 100?Figure 6.4 Venn diagram illustrating the overlap in highly disrupted genes in lung AC of CS and NS ................................................................................................................................... 103?Figure 6.5 Venn diagram illustrating the overlap in highly disrupted pathways in lung AC of CS and NS ............................................................................................................................. 107?Figure 6.6 Wnt/?-catenin signaling can be disrupted through distinct mechanisms in individual tumours ................................................................................................................ 112?  xiList of Abbreviations  AC adenocarcinoma aCGH array comparative genomic hybridization BCCA British Columbia Cancer Agency BH Benjamini Hochberg CNA copy number alteration CS current smoker CSN current smoker non-malignant tissue CST current smoker tumour tissue DE differentially expressed DMG differentially methylated gene DMR differentially methylated region ES ever smoker FS former smoker FSN former smoker non-malignant tissue FST former smoker tumour tissue GEO Gene Expression Omnibus LCC large cell carcinoma MANOVA multivariate analysis of variance MCR minimal common region miRNA microRNA MITRA multi-dimensional integrative tumour gene ranking algorithm MSKCC Memorial Sloan Kettering Cancer Centre mtDNA mitochondrial DNA NS never smoker NSCLC non-small cell lung cancer NSN never smoker non-malignant tissue NST never smoker tumour tissue PGA proportion of genome altered SCC squamous cell carcinoma SCLC small cell lung cancer TCGA The Cancer Genome Atlas TSP tumour sequencing project  xiiAcknowledgements  I would like to thank members of the Wan Lam Lab, both past and present, who contributed to this work and provided useful insight and discussion regarding this project. I would also like to acknowledge my collaborators and the grant and scholarship funding that supported the research in this thesis. This work was supported by the following granting agencies: Canadian Institutes of Health Research, Canadian Cancer Society Research Institute, National Cancer Institute Early Detection Research Network, and the Canary Foundation. The following scholarships supported this work: Interdisciplinary Oncology Program Training Incentive Award, University of British Columbia Graduate Entrance Scholarship, Canadian Institutes of Health Research Frederick Banting and Charles Best Canada Graduate Scholarship Master?s Award, University of British Columbia Four Year Doctoral Fellowship, and the Vanier Canada Graduate Scholarship.  Lastly, I would like to thank Dr. Wan Lam and my supervisory committee members: Drs Stephen Lam, Calum MacAulay and Cathie Garnis, for providing excellent guidance and mentorship and my family and friends for their continuous support.  xiiiDedication  This thesis is dedicated to my family and friends.   11    Chapter: Introduction  1.1 Background on lung cancer  Lung cancer is the number one cause of cancer death worldwide 1. Despite advances in the last decade, the 5-year survival rate remains a dismal 16% 2. The poor prognosis for lung cancer is mainly attributable to a lack of early detection methods which leads to late stage diagnosis, and inability of current treatment modalities to cure patients of disease. Lung cancer is a heterogeneous disease that is subdivided into two major histological types: small cell carcinoma (SCLC) and non-small cell carcinoma (NSCLC). NSCLC comprises 85% of lung cancer cases and is further subdivided into adenocarcinoma (AC), squamous cell carcinoma (SCC), and large cell carcinoma (LCC) 3. Lung AC has become the most prevalent type, accounting for over 50% of all cases 4. Even within these types, immense histological and biological heterogeneity exists, complicating the study of lung cancer and emphasizing the need to treat clinical subtypes independently 5.  In the past decade, significant advances in our understanding of the molecular pathology of lung cancer have been made and translated into clinical practice. Genomic profiling studies have revealed clinically relevant molecular subtypes of NSCLC that are characterized by specific driver mutations which can be targeted with specific therapies to prolong patient survival 4,6. For example, EGFR mutations are prominent in lung AC of never smokers (NS) and can be targeted by inhibitors that block EGFR?s tyrosine kinase activity 7. Similarly, identification of EML4-ALK gene fusions led to the rapid use of ALK inhibitors for patients harbouring these alterations 8. Most recently, comprehensive genomic profiling has identified potential candidates for targeted therapies in SCC and SCLC, and development of inhibitors to target these alterations is underway 9,10. A valuable lesson from these discoveries is the importance of patient selection for targeted therapies, consistent with the principle of personalized medicine 11,12. Given current technologies, it is feasible to screen individual tumours to identify key driver genes to inform therapeutic decisions on a patient-by-patient basis. This concept has already been implemented in North America, for example, at the Memorial Sloan Kettering Cancer Centre (MSKCC) by way of the LC-MAP project, where  2tumours are routinely profiled to identify mutations for which targeted therapies are available 13.  It is clear that improved technologies have enabled discovery of novel targetable alterations in lung cancer, generating optimism for improving patient outcome. Development of novel detection and therapeutic strategies largely depends on further advancing our current understanding of lung tumourigenesis, especially through the elucidation of molecular mechanisms driving lung tumour initiation and progression 2. Although great strides have been made, many of the molecular mechanisms identified occur in a small percentage of patients and the mechanisms underlying a large fraction of lung cancer cases remain to be elucidated.  1.2 Lung cancer etiology  Lung cancer can be caused by a number of carcinogens, mostly arising from environmental or occupational exposures. Tobacco smoking is the most well known cause of lung cancer and is estimated to cause up to 90% of lung cancer cases in Western countries, however, only about 10-20% of smokers develop lung cancer 14,15. Cigarettes contain over 60 known carcinogens which cause DNA damage and activation of oncogenic pathways that promote lung tumourigenesis 14. Second hand smoke (SHS) is also associated with increased risk of developing lung cancer but is considered a relatively weak carcinogen and likely only accounts for a small proportion of lung cancer cases 15,16. Radon is estimated to be the second leading cause of lung cancer in several countries 17. Exposure of lung epithelial cells to radiation from radon inhalation can directly damage DNA or harm cells through the production of free radicals 18. Asbestos, arsenic, air pollution (such as automobile exhaust, coal and tar fumes, and cooking fumes) and viral infection (human papilloma virus and Epstein Barr virus) are additional causes of lung cancer, although reports on viral etiology are controversial 18. In addition to these environmental causes, a family history of lung cancer and specific genetic loci associated with susceptibility are associated with increased risk of developing lung cancer 15,19,20.    31.3 Molecular pathology of lung cancer  Lung tumour genomes harbour hundreds to tens of thousands of molecular alterations 21-32. These alterations include, but are not limited to, copy number, DNA methylation, sequence and gene expression changes. The majority of these alterations are passengers that have no effect on cell biology while others are driver alterations that confer a selective growth advantage to cancer cells 33. Thus, to understand the contribution of gene disruption to tumourigenesis it is critical to distinguish passengers from drivers.   Several driver genes and alterations have been identified in NSCLC including: mutations of EGFR, KRAS, HER2, PIK3CA, AKT, BRAF and MAP2K1; DNA amplifications of EGFR, HER2, MYC, SOX2 and MET; and gene fusions involving ALK, ROS1, and RET 2,4,34. Except for PIK3CA mutation, which can be found in EGFR and KRAS mutant tumours, these driver alterations are observed in a mutually exclusive manner. All of the genes above are considered oncogenes, but specific tumour suppressor genes (TSGs) are also recurrently inactivated in lung tumour genomes. These include: methylation of CDKN2A, RASSF1A, and FHIT; deletion of CDKN2A, RASSF1A, RB, FHIT, LKB1 and P53; and mutation of CDKN2A, RB, LKB1 and P53 35. The consequence of disruption of these and other driver genes is aberrant activation or inactivation of cellular pathways that promote the hallmarks of cancer and lung tumour development 36. Prominent pathways deregulated in lung cancer include EGFR, PI3K-AKT, P53, RB/E2F, and WNT signaling which regulate cell proliferation, cell death and survival, cell cycle, angiogenesis, invasion and DNA repair 37,38.   The establishment of these genes and pathways as drivers of lung tumourigenesis has led to the development of a number of small molecules and antibodies designed to inhibit them. Several inhibitors are currently used in the clinic while others are being developed or are in clinical trials to evaluate their efficiency in lung cancer patients 38. The use of erlotinib and crizotinib to treat tumours harbouring EGFR and ALK alterations, respectively, exemplifies the clinical utility of driver gene identification, and the importance of deciphering the molecular pathology of lung cancer in order to develop new therapeutic strategies for lung cancer patients 6.  41.4 Lung cancer in the context of smoking  While the majority of lung cancer cases can be attributed to tobacco smoking, up to one quarter of lung cancers worldwide arise in NS 39. When considered its own disease, lung cancer in NS is the seventh leading cause of cancer death worldwide 15. Studies have revealed that smoker and NS lung tumours are characterized by specific clinical and genetic features 15,27,40-44. For example, NS lung cancers are more strongly associated with female gender, East Asian ethnicity and AC histology 15. Molecularly, NS tumours have a significantly higher frequency of EGFR mutations and EML4-ALK inversions, 6. In smoker lung tumours, KRAS and P53 mutations are more common and DNA methylation levels of specific genes are greater 15,45. It has also been suggested that NS with lung cancer have a better survival rate 15,41. More recently, it was discovered that mitochondrial DNA (mtDNA) of NS harboured more alterations than mtDNA of smokers, although genomic sequencing studies have demonstrated that smoker tumours have up to 10 times more somatic mutations across the entire genome 23,26,29,46. Additionally, smoker genomes are dominated by G>T transversions while G>A transitions are the most prevalent type of mutation observed in NS genomes 23,26,29. Collectively, these findings of distinct clinical features and molecular alteration patterns in smoker and NS lung cancers support the idea that they are distinct diseases 15.   Despite the various discoveries regarding smoker and NS lung cancer to date, relatively few druggable targets have been identified and the somatic alterations driving a large fraction of tumours remain to be discovered. Of note, few studies have directly compared genomic alterations in smokers and NS, and they have been limited to a single data dimension such as copy number, DNA methylation, or genome sequence alone. Furthermore, studies investigating differential gene disruption in the context of pathway deregulation in these two groups are few.       51.5 Integrative genomic analyses and the systems approach  As alluded to above, a gene may be disrupted by several different mechanisms. For example, DNA level changes such as gene dosage, mutation, translocation or inversion; epigenetic changes such as DNA methylation, histone modification or miRNA deregulation; or post-transcriptional and post-translational modifications can aberrantly activate or inactivate a gene. Thus, to accurately determine gene disruption status, it is imperative to assess multiple mechanisms simultaneously. A multi-'omics integrative approach enables the identification of more genes and does not overlook genes that have a high cumulative frequency when multiple mechanisms are considered, but a low frequency of disruption by any single mechanism 47,48. Another advantage of this strategy is the identification of genes displaying bi-allelic disruption, which may signify cancer cell selection and represent causal events in tumour initiation or progression. The multi-'omics integrative approach also enhances pathway detection due to the identification of a greater number of disrupted genes, and because pathways whose individual components are disrupted through different mechanisms may be revealed 47,48. The Cancer Genome Atlas' (TCGA) mission to collect multi-dimensional 'omics data for large tumour cohorts reinforces the idea that multi-'omics data is valuable for furthering our current understanding of tumourigenesis and improving the chances of identifying significantly disrupted genes and pathways causal to tumour development.  Coupling an integrative multi-'omics approach with analysis of tumours on an individual basis offers even greater potential to identify molecular alterations responsible for tumour development. Treating each individual case as its own system can facilitate the elucidation of molecular mechanisms driving tumourigenesis on a personalized level. This concept is fitting with the idea of personalized medicine, where patient treatments are tailored to the specific molecular features of an individual tumour. Conventional approaches, such as those of the TCGA, group tumour and normal samples without considering patients individually, and perform two group statistical comparisons to identify somatic tumour alterations. Moreover, individual data dimensions are often analyzed independently and associations between them are estimated using correlative strategies 9,49-52. Although these approaches will identify the  6most prominent events among a group of tumours, driver genes prominently disrupted in only a small subset of individual tumours will be overlooked. Assessing each tumour as its own system using patient matched non-malignant tissue as a personalized reference for defining alterations accounts for individual variation and enables the discovery of alterations contributing to each individual tumour. Furthermore, simultaneous consideration of multiple data dimensions in one tumour system can reveal DNA alterations associated with consequential expression changes. Thus, the multi-dimensional, integrative systems approach is well suited for identifying molecular mechanisms and pathways responsible for tumour development.  1.6 Thesis theme and rationale  The theme of this thesis is to characterize global patterns of genomic and epigenomic disruption in lung AC in current smokers (CS) and NS. NS will comprise a larger proportion of lung cancer cases in the next few decades. Thus, improved understanding of NS lung carcinogenesis is essential as there is currently no way to identify NS with high risk of developing lung cancer besides family history. A genome wide multi-'omics characterization and comparison of lung tumour genomes from CS and NS will provide a more comprehensive understanding of how different genetic and epigenetic disruptions may act in concert to promote development of lung tumours with such disparate clinical features. Identification of molecular differences will provide further evidence that CS and NS lung cancers are distinct diseases driven by different molecular alterations and that patients would likely benefit from distinct clinical management strategies. The molecular alterations identified could lead to the development of smoking-specific treatment strategies. We restricted our analysis to active CS and NS as these two groups represent the most extreme phenotypes for smoking behaviour.   1.7 Research question, objectives and hypotheses  The research question underlying the work described in this thesis is:  Do lung tumours of CS and NS exhibit differential patterns of genome wide molecular alterations? The primary  7objective of this thesis is to perform multi-dimensional genomic profiling and integrative analysis to determine whether distinct molecular mechanisms of lung tumourigenesis are evident in CS and NS tumours. To answer this question, we set out to achieve the following goals:  1. Characterize the genomic and epigenomic landscapes of lung tumours from CS and NS to determine whether differential patterns of DNA alteration exist in these two groups. 2. Using an integrative genomics approach, determine what genes are frequently disrupted through different mechanisms in lung tumours of CS and NS, and identify the molecular pathways affected by deregulation of these genes.  Given the distinct clinical features that CS and NS lung cancer patients exhibit and the few locus specific, molecular differences that have been described to date, the overarching hypothesis for this thesis is:  Lung cancers in CS and NS arise through different molecular mechanisms. Specifically, we hypothesize that:  1. Different patterns of DNA alteration (genetic and epigenetic) and consequential gene and pathway deregulation are present in lung tumours of CS and NS. 2. Simultaneous analysis of genetic and epigenetic profiles of CS and NS lung tumours, i.e. integrative genomic analysis, will reveal unique molecular alterations underlying lung tumourigenesis in these two groups.  1.8 Specific aims and thesis outline  To address the question of whether or not lung cancers from CS and NS exhibit different genetic and epigenetic characteristics on a genome-wide level, which would imply they are driven by different molecular mechanisms, we devised the following specific aims.   8Aim 1: Perform genomic profiling including single nucleotide polymorphism, DNA methylation, miRNA and mRNA expression of lung ACs and matched non?malignant tissues from CS and NS. Chapter 2 is a methodology chapter that describes the collection of patient samples and provides a brief summary of the technologies used to generate genomic and epigenomic profiles. Detailed methods and analysis strategies applied to address subsequent aims are presented in the appropriate chapters.  Aim 2: Determine whether the genomic and epigenomic characteristics of CS and NS are statistically different. Chapters 3, 4, and 5 describe genome-wide comparisons of DNA copy number, methylation and miRNA expression characteristics of CS and NS lung AC, respectively. At the time Chapter 3 was published, previous studies addressing the question of smoking-specific genomic landscapes were limited by the use of low resolution (i.e. low density) copy number profiling technologies and the grouping of NSCLC types (AC and SQ) together, which are now known to display distinct genomic features. Thus, a comparison of CS and NS genomes of the same histological type using a high resolution (i.e. high density) platform was warranted. Similarly, studies to address differences in DNA methylation between CS and NS lung cancers have primarily focused on a single gene or a specific panel of genes, precluding conclusions from being made about genome-wide differences in DNA methylation patterns in these two groups. There was an evident need to specifically address the question of genome-wide DNA methylation differences in CS and NS lung AC, so we addressed this question in Chapter 4. Several lung cancer miRNA expression studies have been published to date but the conclusions made are based on miRNA profiles generated using microarrays which limit the miRNAs measured to the few hundred represented on the array. Moreover, there is a lack of reports on smoking-specific miRNA deregulation in the context of lung cancer. To investigate differences in miRNA expression profiles of CS and NS lung tumours, we used an unbiased sequencing approach and the results of this analysis are described in Chapter 5.    9Aim 3: Develop an integrative genomic analysis method to identify DNA alterations that are candidates for involvement in lung tumour biology. Deciphering potential driver from passenger DNA alterations in cancer is a significant challenge due to the massive amount of gene disruption observed in cancer genomes. It is apparent that genes are disrupted through a variety of molecular mechanisms which suggests it is important to survey multiple dimensions of the genome and epigenome to avoid overlooking biologically relevant genes. However, due to the large number of alterations, it can be challenging to organize and analyze data to identify disrupted genes likely to be directly involved in tumourigenesis. Thus, we sought to develop a straightforward strategy based on biological principles for the integration of multi-'omics data to identify and prioritize genes whose disruption is likely to have a role in tumour biology. Chapter 6 describes the algorithm we designed to address these challenges and its application to our cohort of CS and NS lung AC.   Aim 4: Perform pathway analysis on genes altered in CS and NS to understand gene disruption in the context of cellular functions. Integration of copy number, methylation and gene expression profiles for our lung tumour samples revealed up to hundreds of candidate genes in each tumour. To put these gene alterations into biological context for individual tumours, we performed pathway analysis. This enables the identification of similar functions or involvement of genes in molecular pathways and can provide insight into how disruption of various genes may play a role in tumour biology. Pathway analyses on genes identified from our integrative genomics analyses were done for each tumour and pathway disruption was compared between CS and NS. The results of these analyses are described in Chapter 6.  With any research it is imperative to confirm that observations made are not limited to one particular cohort or dataset. Validation of findings in additional samples is required to corroborate results and provide evidence that the findings of a study are not the result of factors specific to a particular research centre or set of samples. Therefore, to validate our discoveries, throughout this thesis we have investigated our findings in multiple, independent  10cohorts of lung AC in CS and NS from independent research centres when additional datasets were available.   112    Chapter: Sample collection and summary of molecular profiling  2.1  Patients and tissue accrual  Tumour and adjacent non-malignant lung tissues were accrued for 94 treatment naive lung AC patients undergoing surgical resection with curative intent. This cohort of tumours is referred to as the BCCA cohort. These included 42 current smokers (CS), 30 never smokers (NS), and 22 former smokers (FS). Smoking history was obtained through a detailed questionnaire. CS were defined as patients currently smoking at the time of diagnosis. NS were defined as patients who reported smoking fewer than 100 cigarettes in their lifetime. FS were defined as individuals who stopped smoking at least one year prior to diagnosis. All tissues were collected from the Tumour Tissue Repository of the British Columbia Cancer Agency or Vancouver General Hospital under informed, written patient consent and with approval from the University of British Columbia - BC Cancer Agency Research Ethics Board. The clinical features and smoking histories of the samples collected are summarized in Table 2.1. The samples collected are primarily from early stage lung cancers and the three smoking groups are well balanced for tumour stage which is important for the smoking group comparisons (Table 2.2).  2.2 Nucleic acid extraction  Fresh frozen tissues underwent pathological review to confirm AC histology and absence of cancer cells in the adjacent non-malignant lung tissue. Each tumour was micro dissected with the guidance of a pathologist to ensure >70% tumour cell content. DNA was extracted using standard phenol:chloroform procedures. RNA was extracted using Trizol.  2.3 Molecular profiling summary  DNA and RNA obtained from tumour and adjacent non-malignant lung tissues were subjected to various profiling platforms to enable genome wide comparisons of molecular alterations in CS and NS lung AC. DNA copy number, DNA methylation, and miRNA and  12Table 2.1 Clinical features of lung AC samples in this study Sample Gender Age Smoking Pack Years Mutation Race TNM Stage Data CS-1 F 61 CS 40 KRAS Caucasian T2N1M0 CN, ME, MI, E CS-2 F 58 CS 35 WT Caucasian T2N2M0 CN, ME, MI, E CS-3 F 64 CS 40 KRAS Caucasian T4N0M0 CN, ME, MI, E CS-4 F 69 CS 110 KRAS Caucasian T2N2M0 CN, ME, E CS-5 M 66 CS 44 WT Caucasian T2N0M0 CN, ME, MI, E CS-6 M 73 CS 44 WT Caucasian T1N0M0 CN, ME, MI, E CS-7 M 74 CS 45 WT Caucasian T1N0M0 CN, ME, MI, E CS-8 M 63 CS 120 KRAS Caucasian T2N2M0 CN, ME, MI, E CS-9 F 68 CS 30 KRAS Caucasian T2N0M0 CN, ME, MI, E CS-10 M 53 CS 50 KRAS Caucasian T2N0M0 CN, ME, MI, E CS-11 M 49 CS 30 KRAS Caucasian T2N2M0 CN, ME, MI, E CS-12 F 71 CS 45 KRAS Caucasian T1N0M0 CN, ME, MI, E CS-13 F 73 CS 45 unk unk T1N1M0 MI CS-14 M 70 CS unk unk unk T1N0M0 MI CS-15 F 68 CS 48 WT Caucasian T1N0M1 CN, ME, MI, E CS-16 M 65 CS 69 WT Caucasian T2N1M0 CN, ME, MI, E CS-17 F 64 CS 35 KRAS Caucasian T2N0M0 CN, ME, MI, E CS-18 F 64 CS 24 KRAS Caucasian T2N0M0 CN, E CS-19 F 67 CS 50 WT Caucasian T2N1M0 CN, ME, MI, E CS-20 F 72 CS 60 WT Caucasian T2N0M0 CN, ME, MI, E CS-21 F 67 CS 75 KRAS Caucasian T1N0M0 CN, ME, MI, E CS-22 F 57 CS 51 KRAS Caucasian T2N1M0 CN, ME, MI, E CS-23 M 45 CS 15 unk unk T1N0M0 MI CS-24 F 70 CS 93 KRAS Caucasian T2N0M0 CN, ME, MI, E CS-25 M 45 CS 2 EGFR Asian T1N0M0 CN, MI, E CS-26 F 50 CS 32 WT Caucasian T2N0M0 CN, E CS-27 F 66 CS 52 WT Caucasian T1N0M0 CN, MI, E CS-28 F 63 CS 60 KRAS Caucasian T2N0M0 CN, E CS-29 F 73 CS 25 KRAS Caucasian T2N0M0 CN, ME, MI, E CS-30 F 78 CS 60 KRAS Caucasian T1N0M0 CN, ME, MI, E CS-31 M 73 CS 110 WT Caucasian T1N0M0 CN, ME, MI, E CS-32 F 53 CS 16 KRAS Caucasian T2N2M0 CN, ME, MI, E CS-33 F 69 CS 50 WT Caucasian T1N0M0 CN, ME, MI, E CS-34 F 64 CS 40 KRAS Caucasian T2N0M0 CN, ME, MI, E CS-35 M 67 CS 40 KRAS Caucasian T2N0M0 CN, ME, MI, E CS-36 M 65 CS 55 KRAS Caucasian T1N1M0 CN, ME, MI, E CS-37 F 72 CS 55 KRAS Caucasian T2N0M0 CN, ME, MI, E CS-38 F 47 CS 11 WT Asian T1N0M0 CN, ME, MI, E CS-39 F 71 CS 62 KRAS Caucasian T2N1M0 CN, ME, MI, E CS-40 F 69 CS 46 WT Native T2N0M0 CN, ME, E CS-41 F 53 CS 36 KRAS Caucasian T1N0M0 CN, ME, MI, E CS-42 F 74 CS 55 KRAS Caucasian T1N0M0 CN, ME, MI, E FS-1 F 55 EX 15 unk unk T2N1M0 MI FS-2 M 75 EX 50 WT Caucasian T1N0M0 MI FS-3 F 79 EX unk KRAS Caucasian T1N0M0 MI FS-4 M 77 EX 84 KRAS Caucasian T2N0M0 MI FS-5 F 77 EX 50 KRAS Caucasian T1N0M1 MI FS-6 F 90 EX 25 WT Caucasian T2N1M0 MI FS-7 F 71 EX 34 unk unk T1N0M0 MI FS-8 F 74 EX 44 unk unk T1N0M0 MI FS-9 M 58 EX 58 WT Asian T2N1M0 MI    13Sample Gender Age Smoking Pack Years Mutation Race TNM Stage Data FS-10 F 82 EX unk EGFR Caucasian T2N0M0 MI FS-11 M 70 EX 82 KRAS Caucasian T1N0M0 MI FS-12 F 69 EX 88 KRAS Caucasian T2N2M0 MI FS-13 M 73 EX 48 unk unk T1N0M0 MI FS-14 F 66 EX 42 unk unk T1N0M0 MI FS-15 F 60 EX 14 EGFR Asian T2N1M0 MI FS-16 F 66 EX 60 KRAS Caucasian T2N0M0 MI FS-17 M 73 EX 44 KRAS Caucasian T2N0M0 MI FS-18 M 82 EX 100 WT Caucasian T1N0M0 MI FS-19 M 60 EX 27 unk unk T1N0M0 MI FS-20 F 79 EX 70 WT unk T2N0M0 MI FS-21 M 68 EX 1 unk unk T2N1M0 MI FS-22 F 79 EX 24 unk unk T2N1M0 MI NS-1 F 74 NS 0 EGFR Caucasian T2N1M0 CN, ME, E NS-2 F 55 NS 0 WT Caucasian T2N2M0 CN, ME, E NS-3 F 58 NS 0 KRAS Caucasian T4N0M0 CN, ME, E NS-4 M 63 NS 0 WT Caucasian T1N0M0 CN, ME, E NS-5 F 64 NS 0 EGFR Asian T2N0M0 CN, ME, E NS-6 F 80 NS 0 WT Caucasian T2N1M0 CN, ME, E NS-7 F 70 NS 0 EGFR Asian T3N2M0 CN, ME, MI, E NS-8 F 72 NS 0 EGFR Asian T2N0M0 CN, ME, MI, E NS-9 F 82 NS 0 EGFR Asian T3N0M0 CN, ME, MI, E NS-10 F 75 NS 0 WT Asian T2N0M0 CN, ME, E NS-11 M 72 NS 0 EGFR Asian T2N0M0 CN, ME, MI, E NS-12 M 66 NS 0 EGFR Asian T2N2M0 CN, ME, MI, E NS-13 M 74 NS 0 EGFR Asian T2N0M0 CN, ME, MI, E NS-14 F 77 NS 0 EGFR Asian T2N1M0 CN, ME, MI, E NS-15 F 73 NS 0 EGFR Asian T2N0M0 CN, ME, E NS-16 F 86 NS 0 KRAS Asian T2N0M0 CN, ME, E NS-17 F 52 NS 0 EGFR Asian T1N0M0 CN, ME, MI, E NS-18 F 39 NS 0 WT Asian T1N0M0 CN, ME, E NS-19 F 63 NS 0 WT Asian T2N1M0 CN, ME, E NS-20 F 77 NS 0 WT Asian T2N0M0 CN, ME, E NS-21 F 77 NS 0 KRAS Asian T1N0M0 CN, ME, E NS-22 F 71 NS 0 WT Asian T2N0M0 CN, ME, E NS-23 F 78 NS 0 EGFR Caucasian T1N0M0 CN, ME, MI, E NS-24 F 68 NS 0 WT Caucasian T1N0M0 CN, ME, E NS-25 M 71 NS 0 EGFR Asian T2N1M0 CN, ME, MI, E NS-26 M 81 NS 0 WT Caucasian T1N0M0 CN, ME, E NS-27 F 73 NS 0 EGFR Asian T2N0M0 CN, ME, MI, E NS-28 M 85 NS 0 EGFR Asian T1N0M0 CN, ME, MI, E NS-29 F 65 NS 0 EGFR Caucasian T2N2M0 CN, ME, MI, E NS-30 F 60 NS 0 EGFR Asian T4N1M0 CN, ME, MI, E unk = unknown; WT = wild type; CN = copy number; ME = methylation; MI = miRNA expression;  E = mRNA expression        14Table 2.2 Stage breakdown of lung AC samples in this study  Stage Proportion of Cases CS FS NS I 0.62 0.64 0.57 II 0.24 0.27 0.27 III 0.12 0.05 0.17 IV 0.02 0.05 0.00    mRNA expression were assessed. The specific platforms, data types, samples analyzed, and chapters in which the data are presented are summarized in Table 2.3. Public datasets from independent research centres, which were used to validate our findings, are also described in Table 2.3. Detailed methods summarizing individual data type analyses for the BCCA samples and external cohort samples are described in each respective chapter, while profiling platform descriptions are provided below.  2.3.1 DNA copy number arrays  Genomic DNA from tumour and matched non-malignant lung tissues were hybridized to Affymetrix SNP 6.0 arrays according to the manufacturer's instructions. This array contains over 900,000 non-polymorphic probes designed to assess DNA copy number. Raw CEL probe intensity files were processed and normalized using Partek Genomics Suite Software. Probe sequence, fragment length, GC content and background adjustments were applied to correct for biases in signal intensities. Copy number profiles were generated following the Copy Number, Paired Analysis Workflow such that matched non-malignant profiles were used as a copy number baseline for each respective tumour.  2.3.2 DNA methylation arrays  Genomic DNA from tumour and matched non-malignant lung tissues for 34 CS and 30 NS was bisulfite converted and hybridized to Illumina Infinium Human Methylation (HM) 27 arrays as previously described 53. The Illumina HM27 array assays over 27,000 CpG sites  15representing more than 14,000 unique genes. Raw methylation data were corrected for color bias and normalized using SSN normalization with the Bioconductor package lumi in R statistical computing software 54,55.  2.3.3 mRNA expression arrays  Total RNA from tumour and matched non-malignant lung tissues was used to generate mRNA expression profiles for 34 CS and 30 NS on the Illumina HT-12 Whole Genome 6, v3 BeadChip array. This array contains over 48,000 probes measuring expression levels for over 25,000 genes. Arrays were conducted according to the manufacturer's instructions. Bead-level data were pre-processed using the R package mbcb to perform background correction and probe summarization. Pre-processed data were then quantile normalized and log transformed. Gene expression fold changes between tumour and matched non-malignant lung tissues were calculated to define aberrant tumour expression.  2.3.4 miRNA sequencing  Total RNA from tumour and matched non-malignant lung tissues for 37 CS, 22 FS, and 14 NS was size fractionated to isolate small RNAs, including miRNAs. miRNA-seq libraries were constructed, bar-coded for multiplex sequencing and sequenced using a plate-based protocol developed at the British Columbia Genome Sciences Centre (BCGSC) using the Illumina HiSeq 2000 sequencing platform 52. Raw sequence reads were separated into individual samples based on the assigned indexes, adapter sequences were removed, and reads were trimmed based on quality control metrics. High quality reads were subsequently aligned to the NCBI GRCh37 reference genome and miRBase v18 using the BWA algorithm 56. Full description of library construction, sequencing, read pre-processing, alignment and annotation is described in the comprehensive molecular profiling of breast tumours report from the TCGA 52.   162.3.5 Definition of gene disruption  With regards to DNA copy number and methylation changes, genes were considered disrupted in tumours if they exhibited a structural difference compared to non-malignant tissue. For example, genes with copy gains or losses, or genes with aberrant DNA methylation were defined as disrupted in Chapters 3 and 4. In Chapter 5, miRNA were considered disrupted if they displayed changes in expression between tumour and non-malignant tissues. In Chapter 6 when copy number, methylation and mRNA gene expression data were integrated, genes were classified as disrupted if they exhibited gene expression changes between tumor and non-malignant tissues, with or without associated DNA level changes.  Table 2.3 Summary of datasets analyzed in this thesis Chapter(s) Source Profiling Platform Data Type Sample Number Purpose 3 BCCA Affymetrix SNP 6  Array DNA copy number 39 CS, 30 NS Discovery MSKCC Agilent 4x44K  aCGH DNA copy number 25 CS, 41 NS Validation TSP Affymetrix SNP 250K  Array DNA copy number 72 CS, 37 NS Validation 4 BCCA Illumina Infinium  HM27 BeadChip DNA methylation 34 CS, 30 NS Discovery TCGA Illumina Infinium  HM27 BeadChip DNA methylation 16 CS, 10 NS Validation GSE32866 Illumina Infinium  HM27 BeadChip DNA methylation 4 CS, 5 NS Validation 5 BCCA Illumina  HiSeq 2000 miRNA expression 37 CS, 22 FS, 30 NS Discovery TCGA Illumina  HiSeq 2000 miRNA expression 9 CS, 12 FS, 2 NS Validation 6 BCCA Illumina HT-12  BeadChip mRNA expression 34 CS, 30 NS Discovery GSE10072 Affymetrix HG  U133A Array mRNA expression 12 CS, 11 NS Validation GSE31210 Affymetrix HG  U133 Plus 2.0 Array mRNA expression 111 ES, 115 NS Validation    173    Chapter: Copy number differences in lung AC of CS and NS  3.1 Introduction  3.1.1 Effects of copy number alterations on gene expression and roles in tumourigenesis  Gene dosage, or copy number, alterations are a prominent mechanism of gene deregulation in cancer cells 57. Their effects on gene expression enable copy number alterations (CNAs) to contribute to tumourigenesis. For instance, deletion of both copies of a protein coding gene would leave a cell without DNA encoding that gene, causing lost expression of its mRNA transcript and protein product, which could potentially deregulate an entire cellular network affecting multiple biological functions. Similarly, DNA amplification results in extra copies of a gene that can drive higher than normal transcription rates and abnormally high protein levels, which may disrupt the balance of a finely tuned signaling cascade in the affected cell. Deletion of tumour suppressor genes and amplification of oncogenes are molecular events known to drive cancer biology 57,58.  CNAs can have significant roles in tumourigenesis due to their biological consequences and their utility as clinical biomarkers 59-65. Like epigenetic markers, CNAs are easy to assess in patient specimens and their biological effects can be exploited therapeutically, demonstrating their clinical utility. For example, DNA amplifications of EGFR and HER2 are prominent features of lung AC and breast cancers, respectively, and identification of these events led to the development of therapies targeted against them which have become routinely used in the clinic for the treatment of these diseases 7,64. In addition to directly identifying potential therapeutic targets, using copy number profiling to reveal the complement of DNA alterations, often referred to as the genomic landscape of a tumour, can provide insight into the mechanisms contributing to tumourigenesis such as genomic instability, which can further improve our understanding of tumour biology.   183.1.2 Rationale for investigating global copy number differences in CS and NS lung AC  Given the potential for copy number profiling to reveal novel therapeutic targets and to provide information about tumour biology, it is not surprising that it has become routine in the genomic analyses of cancer. Interestingly, different cancer subtypes display distinct copy number profiles which are associated with distinct phenotypic and clinical features 66. As smoker and NS lung cancers have been proposed to be distinct diseases, it is possible that their genomic landscapes differ due to selection for different CNAs that provide specific biological advantages in the tumours of each group.   We hypothesized that lung tumour genomes of smokers and NS would exhibit disparate patterns of DNA CNAs throughout the genome that may drive the distinct clinical presentation of these tumour types. Prior to this study, a group that sampled a small fraction of the genome using a low resolution comparative genomic hybridization technology reported that smoker tumours exhibited a greater extent of CNAs than NS tumours 27. To investigate this previous observation in our cohort and to define smoking related CNAs, we performed a global comparative analysis of copy number changes in lung cancers from CS and NS using a high resolution copy number profiling strategy. We restricted our study to lung AC, since it is the predominant form of lung cancer in NS and because NSCLC subtype specific CNAs are known to exist 67. We identified unique genomic features and NS-specific CNAs that were validated in two additional cohorts. These data may provide insight into the molecular basis for the differential tumour behaviors observed in the clinic.  3.2 Methods  3.2.1 EGFR and KRAS mutation screening  Genomic DNA from each tumour was sequenced to determine KRAS and EGFR mutation status by PCR amplification and product sequencing 68. KRAS and EGFR mutation screening was performed to assess the accuracy of the smoking histories collected for patients in the  19BCCA cohort, since these mutations are known to segregate with smokers and never smokers, respectively. Exons 19 and 21 and exon 2 were screened in EGFR and KRAS, respectively. PCR was performed on 50-100 ng of DNA in 25 ?L reactions using the Applied Biosystems GeneAmp PCR System 9700. Applied Biosystems BigDye Terminator v3.1 cycle sequencing kit and capillary instrumentation were used to sequence PCR products. Primer sequences and PCR conditions are published online 68. A Fisher's Exact test was used to assess associations of EGFR and KRAS mutation with smoking status. A p-value < 0.05 was considered significant. The relationships between clinical variables (stage, gender, age, smoking) and EGFR and KRAS mutation status were assessed using Pearson correlation.  3.2.2 Single nucleotide polymorphism (SNP) arrays  Genomic DNA from tumour and matched non-malignant lung tissues were hybridized to Affymetrix SNP 6.0 arrays according to the manufacturer's instructions and analyzed using Partek Genomics Suite Software. Copy number profiles were generated following the Copy Number, Paired Analysis Workflow such that matched non-malignant profiles were used as a copy number baseline for each respective tumour. Genomic segmentation was applied with significance thresholds to identify segmental regions of CNA (gain and loss) using the following parameters: signal to noise > 0.3, minimum of 50 markers per segment, p-value threshold of 10-7 for the statistical difference between intensities of adjacent segments, and p-value threshold of 10-7 for significance of deviation of intensities in tumour tissue from intensities in non-malignant lung. Identified segments were merged using a 1 Mbp window to combine adjacent regions of copy gain or loss. All genomic mapping was based on the March 2006, hg18 genome build. The genomic coordinates for RefSeq genes (hg18) were obtained from the UCSC genome browser 69.  3.2.3 Proportion of genome altered  For each of the 69 CS and NS tumours the sum of base pairs encompassed by CNAs (both gains and losses) was used to calculate the proportion of genome altered (PGA). Differences in PGA between tumour genomes were investigated using a two-tailed, Student's t-test. P- 20values < 0.05 were considered significant. Tumours with PGAs in the 5th and 95th percentiles were excluded to reduce the effects of outliers. A multivariate analysis of variance (MANOVA) test was performed in R to assess the contributions of clinical (stage, gender, age, race, and smoking status) and genetic (EGFR and KRAS mutation status) variables to the observed PGA.  3.2.4 Differentially altered regions  Segmental alterations identified in each tumour were parsed into typed copy numbers for each SNP array element, such that every array probe was scored as 1 (copy gain), 0 (copy neutral), or -1 (copy loss). Probes with similar copy number states within individual tumours were then collapsed into genomic regions across all tumour samples. The frequency of DNA gains, DNA losses, and neutral copy number were compared in CS (n=39) and NS (n=30) tumour genomes using a Fisher's Exact test with a p-value < 0.05 considered significant. The Fisher's Exact test was performed in R generating a p-value for each genomic region 70. Significant regions within 1 Mbp of each other and with the same copy number status were merged into single regions. Differentially altered regions were defined as having at least a 15% frequency difference between CS and NS and a frequency of at least 15% in one of the groups if both groups showed alteration.  3.2.5 Identification of most prominent CNAs using GISTIC  The GISTIC (Genomic Identification of Significant Targets in Cancer) algorithm was used to investigate high-level and prominent DNA alterations (defined as frequent alterations with high magnitude changes in some samples) in CS and NS in the BCCA cohort as it had the highest resolution copy number data for mapping focal genomic events 71. We performed GISTIC analysis on the segmental alteration data with the following parameters: amplification threshold = 0.848, deletion threshold = 0.737, join segment size = 50, q-value threshold of 0.05, and using the hg18 genome build. Regions identified in CS and NS were then compared to determine regions of overlap and regions of difference based on the region limit boundaries defined by GISTIC.  213.2.6 Validation of CNAs in external cohorts  Publically available microarray data with accompanying smoking status annotation was accessed to compare DNA copy number findings from the BCCA tumours with external cohorts (Table 2.3). Normalized, lung AC Agilent 4x44k array comparative genomic hybridization (aCGH) data generated by the Memorial Sloan Kettering Cancer Centre (MSKCC) was obtained from http://cbio.mskcc.org/Public/lung_array_data/ 72. The MSKCC dataset was comprised of 25 CS and 41 NS samples. Copy number profiles were generated using the segmentation algorithm, FACADE with default parameters and a baseline distribution of 10 kbp 73. EGFR and KRAS mutation data were also available for these tumours from the same source. Affymetrix SNP 250K array data generated by the Database of Genotypes and Phenotypes Tumour Sequencing Project (TSP) were also accessed (Study Accession: phs000144.v1.p1). Array data for 72 CS and 37 NS lung AC and matched non-malignant tissues were processed in Partek with the same normalization and segmentation parameters as the BCCA tumours, with the exception of using a 20 marker minimum for defining segments due to the lower array density.   PGA was calculated for each tumour in the MSKCC and TSP datasets. A Student's t-test was used to compare PGA in CS and NS tumours with a p-value < 0.05 considered significant. Differentially altered regions in CS and NS were identified in the MSKCC and TSP datasets applying the same strategy as that for the BCCA tumours. Minimal common regions (MCRs) of overlap for regions differentially altered in the same direction in all three datasets were mapped. Correlations between the MCRs identified were investigated in the BCCA tumours using a Pearson correlation test. A multifactor ANOVA was also performed to assess relationships between each of the six MCRs discovered and the clinical and genetic variables discussed above in the BCCA tumours (stage, gender, age, smoking status, race, and EGFR and KRAS mutation status).      223.3 Results  3.3.1 Patient demographics  Lung AC and matched non-malignant tissue specimens were collected from 69 patients, including 39 CS and 30 NS (Table 2.1). Collectively, CS and NS patients for this comparative study were well matched for age, gender and stage of disease; however, ethnic differences existed between CS and NS. Consistent with trends of higher incidence of lung cancer in NS among Asians compared to Caucasians, our NS cohort was significantly enriched for Asian patients (Fisher's Exact test, p = 1.3 x 10-8), while our CS cohort was enriched for Caucasian patients.  3.3.2 EGFR and KRAS mutations segregate with smoking history  Consistent with reported literature, mutation rates for EGFR and KRAS were associated with smoking status, confirming the accuracy of smoking history classifications. EGFR mutations (exons 19 and 21) were more frequent in NS (17/30 NS versus 1/39 CS, Fisher's Exact test, p = 2.8 x 10-7) while KRAS mutations were more frequent in CS (3/30 NS versus 24/39 CS, Fisher's Exact test, p = 1.4 x 10-5) (Figure 3.1a). Tumours arising in Asian tumours had significantly more EGFR mutations than those arising in Caucasians (15/23 Asian versus 3/45 Caucasian, Fisher's Exact test, p = 5.7 x 10-7) while Caucasians had significantly more KRAS mutations than Asians (2/23 Asian versus 25/45 Caucasian, Fisher's Exact test, p = 1.8 x 10-4). There was no difference in mutation rates in females compared to males. A Pearson correlation analysis confirmed these associations. EGFR mutations were negatively correlated with CS (Pearson's r = -0.61) while KRAS mutations were positively correlated with CS (Pearson's r = 0.52). EGFR mutations were also positively correlated with Asian ethnicity (Pearson's r = 0.63) and negatively correlated with KRAS mutations (Pearson's r = -0.48). Smoking status was also correlated with race (Pearson's r = -0.68), as our NS cohort was predominantly comprised of Asians.   23EGFR and KRAS mutation data was also available for the MSKCC lung AC tumours; consistent with the BCCA tumours, EGFR mutations were more prevalent in NS (23/41 NS versus 0/25 CS, Fisher's Exact test, p = 6.3 x 10-7) and KRAS mutations were more prevalent in CS (5/41 NS versus 8/25 CS, Fisher's Exact test, p = 0.06) (Figure 3.1b). There were no significant associations between EGFR or KRAS mutations and gender in the MSKCC dataset, and ethnic information was not available for analysis. In both the BCCA tumours and the MSKCC tumours, EGFR and KRAS mutations were mutually exclusive. Given the consistency of this data with the literature, we concluded that smoking histories for the BCCA and MSKCC lung tumours were accurate, ensuring we had appropriate samples to perform a CS versus NS comparison.   Figure 3.1  Figure 3.1 EGFR and KRAS mutation frequencies in the BCCA and MSKCC lung AC cohorts.  EGFR and KRAS mutation frequencies in our BCCA cohort (A) and the MSKCC cohort (B) were consistent with the patterns reported for CS and NS in the literature. EGFR mutations were significantly associated with NS while KRAS mutations were significantly associated with CS (Fisher?s Exact test, p < 0.05 indicated by an asterisk).   3.3.3 Genomic landscape of copy number alterations in CS and NS  Copy number profiles were generated for each BCCA lung tumour by performing genomic segmentation with stringent significance thresholds to ensure alterations called were non- 24random genetic events. All CNAs identified were somatic events as opposed to germ-line variants, as each profile was generated using matched non-malignant lung tissue as a reference. The frequency of CNAs throughout the genome (calculated using a moving average window of 500 SNP array probes) for the 39 CS and 30 NS is depicted in Figure 3.2. Upon plotting the frequencies of alteration at each locus in the genome, we noted that frequencies appeared to differ between CS and NS, despite the similar distribution of CNAs throughout the genome. To quantitatively assess this observation, we calculated the fraction of each tumour genome that was encompassed by CNAs and termed this measure proportion of genome altered (PGA). Comparison of PGA between CS and NS revealed that indeed, NS have a greater PGA than CS (Student's t-test, p = 0.03) (Figure 3.3a, Table 3.1). Although there was no significant difference between the fraction of genomes encompassed by copy gains, NS genomes had a larger fraction affected by DNA losses than CS (Student's t-test, p = 0.02) (Table 3.1). PGA was also greater in EGFR mutant (n=18) versus EGFR wild type (n=51) tumours (Student's t-test, p = 4.0x10-3) (Figure 3.3c). To address the possible influences of the mutation and ethnic imbalances in our cohort, PGA in EGFR mutants (n=17) versus EGFR wild types (n=13) and Asians (n=21) versus non-Asians (n=9) in NS tumours only were compared. There was no significant difference in PGA between mutant and wild type EGFR NS tumours (Student's t-test, p = 0.21). There was also no significant difference in PGA in Asian versus non-Asian NS tumours (Student's t-test, p = 0.06).  To verify our observations were not limited to the BCCA tumours alone, we investigated publically available data for CS and NS lung ACs from the MSKCC and lung TSP, respectively 32,72. Genomic profiling in these studies was performed using aCGH (MSKCC) and SNP arrays (TSP). Thus, we employed platform-appropriate methodologies to generate copy number profiles for each tumour. In the MSKCC dataset, for which EGFR and KRAS mutation frequencies were consistent with smoking status classifications (Figure 3.1b), we observed the same global pattern of CNAs; NS had greater PGA than CS (Student's t-test, p = 4 x 10-3), validating our findings  (Figure 3.3b). NS tumour genomes had a larger fraction affected by DNA gains (Student's t-test, p=0.01) and DNA losses (Student's t-test, p = 0.01) than CS. EGFR mutant tumours also exhibited greater PGA than EGFR wild type tumours  25Figure 3.2  Figure 3.2 Genomic landscape of lung AC from CS and NS Chr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122 XChr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122 XFrequencyGainLoss00.50.5GainLossABC Chr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122 XFrequencyGainLoss00.50.5 26Figure 3.2 Genomic landscape of lung AC from CS and NS. The frequency of DNA CNAs throughout the genome was plotted for CS (A) and NS (B) for our BCCA lung AC cohort. Across the genome, NS exhibited higher frequencies of CNA. Frequencies displayed were calculated using a moving average window of 500 SNP array probes. Differentially altered regions with a Fisher?s Exact test p < 0.05 are indicated in (C), with alterations more prominent in CS in blue and in NS red. Dotted lines represent chromosome centromeres.   Figure 3.3  Figure 3.3 PGA in the BCCA and MSKCC lung AC cohorts.  PGA was greater in NS than CS in both cohorts (Student?s t-test, p < 0.05) (A,B). PGA was also significantly greater in tumours with EGFR mutations compared to tumours with wild type EGFR in both cohorts (Student?s t-test, p < 0.05) (C,D). Asterisks indicated p-values < 0.05. Only one mutant EGFR case was a CS in the comparison illustrated in (C). 27(Student's t-test, p = 3 x 10-4), consistent with our observations in the BCCA tumours (Figure 3.3d).  We also interrogated the TSP dataset as an additional validation cohort of lung ACs from CS and NS, albeit acknowledging we were unable to confirm smoking histories with mutation information. Although it did not meet statistical significance, PGA in the TSP tumours was higher in NS than in CS (Student's t-test, p = 0.13). The reproducibility of our results in two additional datasets that were derived using different genomic profiling platforms, from independent research centers is evidence for our finding that large-scale differences in genomic landscapes exist between CS and NS tumours. Moreover, consistent with our results, EGFR mutations were also associated with higher PGA (Figure 3.3c,d), which was expected given that NS lung tumours and EGFR mutations are highly correlated.  Table 3.1 Summary of PGA in lung AC of CS and NS   3.3.4 Smoking is the clinical variable most strongly associated with observed differences  To further determine whether the observed genome differences were associated with clinical parameters other than smoking status and to account for the ethnic bias in our cohort, we investigated the PGA in tumours as a function of all clinical and genetic variables available for our BCCA tumours (stage, gender, age, smoking history, race, and EGFR and KRAS mutation status). A MANOVA analysis revealed that smoking status explained the greatest PGA Type Measure All Tumours CS NS P-val Copy Gains Average 0.19 0.17 0.21  Median 0.19 0.17 0.22 0.06 Range 0.02 - 0.36 0.03 - 0.35 0.02-0.36  Copy Losses Average 0.17 0.15 0.20  Median 0.16 0.14 0.20 0.02 Range 0 - 0.43 0 - 0.32 0 - 0.43  All Copy Number Alterations Average 0.36 0.32 0.41  Median 0.37 0.29 0.41 0.03 Range 0.02 - 0.72 0.03 - 0.66 0.02 - 0.72   28amount of variance observed in PGA (F = 3.64, p = 0.06) compared to all other factors, followed by EGFR mutation (F = 2.49, p = 0.12) (Table 3.2). Since NS tumours often harbour EGFR mutations this finding was not surprising. We also performed a multivariate analysis in NS tumours only, and found that EGFR mutations explained the greatest amount of variance in the NS tumours alone (F = 3.22, p = 0.09), although it was less significant than the association of smoking status and PGA in the entire tumour dataset.   Table 3.2 MANOVA results for associations between PGA and clinical variables Variable Df Sum Squares Mean Squares F-value Pr(>F) Stage 3 0.08302 0.027674 0.7237 0.54185 Gender 1 0.00512 0.005124 0.134 0.71561 Age 1 0.02771 0.027708 0.7246 0.39807 Smoking 1 0.13933 0.139329 3.6438 0.06115 EGFR 1 0.09534 0.095341 2.4934 0.11967 KRAS 1 0.00581 0.005811 0.152 0.69805 RACE 1 0.02434 0.024336 0.6364 0.4282 Residuals 59 2.25601 0.038237 - - Df = degrees of freedom   3.3.5 Genomic alterations common to CS and NS  We observed that overall, the global distribution of copy number events identified in smokers and NS in our cohort were similar and consistent with those previously described in the literature 21,25,27,32,72,74,75. For example, recurrent copy number gains on chromosomes 1q, 5p, 7p, 8q and 17q were prominent in both groups (frequency > 30% in CS and NS) and included known oncogenes such as ARNT, TERT, EGFR, MYC, and ERBB2 (Figure 3.4). We observed concurrent mutation and copy number gains of EGFR in 10 of 17 NS with EGFR mutation and in the one CS with an EGFR mutation, consistent with previous reports of mutation accompanied by DNA amplification 76. However, there was no significant difference in the occurrence of EGFR copy number gains in EGFR mutant versus wild type tumours (Fisher's Exact test, p=0.17). Common regions of copy number loss (frequency > 20% in both groups) included chromosomes 3p, 6q, 8p, 9p, 17p and 19p, and encompassed known tumour suppressors including FHIT, CDKN2A, TP53, and LKB1 (Figure 3.4). A Fisher's Exact test to  29compare alteration frequencies of these known genes in CS and NS revealed that none were significantly different between the two groups. The similarity in disruption to common lung AC genes in CS and NS highlights the need to identify novel genomic aberrations that underlie the distinct clinical phenotypes exhibited by CS and NS.   Figure 3.4  Figure 3.4 Genes commonly disrupted in CS and NS Frequencies of commonly disrupted genes in lung AC were investigated in our BCCA cohort of CS and NS. Although these genes were frequently disrupted in our dataset, none were significantly differentially altered between CS and NS. Fisher?s Exact test p-values comparing frequencies in CS and NS are indicated.   3.3.6 High level DNA alteration patterns in CS and NS  We next sought to determine whether the high-level DNA alteration profiles of CS and NS differed. High-level changes were defined as alterations with magnitudes of change exceeding one copy and/or alterations that are significantly prevalent. Such alterations may indicate the presence of important oncogenes (amplifications) or tumour suppressors (deletions).  We used the GISTIC algorithm to identify significant regions of focal DNA  30amplification and deletion 71. Using this approach we identified 107 events in CS and 50 events in NS (tables listing regions are published online 68). These findings suggest that although NS have a greater PGA, the lung tumour genomes of CS harbour more high-level DNA alterations than NS. A total of 27 regions overlapped in CS and NS. Of these, 13 were altered in the same direction (amplifications at 5p15.32, 7q11.21, 8p11.1, and 19q12; and deletions at 3p12.3, 5q31.3, 6q12, 6q16.1, 9p21.3, 10q21.1, 11q14.3, 19q12, and 22q13.31) while 14 were altered in opposite directions in CS and NS (7q11.21, 8p11.1, 19q12, 2q33.3, 5q31.3, 7p15.2, 8p23.3, 8q24.3, 9p21.3, 9q33.3, 10q21.1, 16q24.1, 18q23, 19q12). Some regions were large enough that two individual regions in tumours with the opposite smoking status mapped within them, producing both same and opposite direction alterations for some cytobands (7q11.21, 8p11.1, 19q12, 5q31.3, 9p21.3, 10q21.1 and 19q12). High-level DNA amplifications at 5p15.32 and 19q12 and deletion at 9p21.3 have been reported in lung AC previously 32. The most frequent regions identified by GISTIC in both CS and NS are summarized in Table 3.3.  3.3.7 Differentially altered regions in CS and NS  In addition to observing a global difference in PGA between CS and NS and differences in high-level alteration profiles, several genomic regions were found to have different alteration frequencies between the groups. For example, copy number gains of chromosome 1q were more frequent in CS, while gains on 5q, 7p, 16p and chromosome X were more frequent in NS (Figure 3.2). Additionally, chromosome 3p, 8p, 13q, 17p and 19q losses were more prevalent in NS. To investigate these regions using a statistical approach, we collapsed the CS and NS genomes into discrete regions (as described in Methods section 3.2.4) and compared the frequencies of alteration. We identified 313 genomic regions spanning chromosomes 1-22 and chromosome X that met our criteria for differential alteration status in the BCCA tumours. These regions included both gains and losses specific to NS and CS.  31Table 3.3 High level CNAs common to CS and NS lung AC identified by GISTIC Never Smokers Current Smokers Event Cytoband Chr Start End Size (Mbp) q-Value Freq. High Level Events Start End Size (Mbp) q-Value Freq. High Level Events Amp 7q11.21  chr7 57940128 62094909 4.15 0.001 0.27 7/30 57940128 62098219 4.16 0.000 0.23 9/39 Amp 19q12    chr19 24256798 32779216 8.52 0.009 0.23 6/30 24165484 32819473 8.65 0.000 0.26 9/39 Amp 20q13.33 chr20 59185038 62426596 3.24 0.042 0.20 5/30 59283093 62426596 3.14 0.004 0.15 6/39 Del 10q21.1  chr10 58736845 58775582 0.04 0.000 0.37 3/30 58742794 58775016 0.03 0.003 0.21 2/39 Del 11q14.3  chr11 89855999 90686553 0.83 0.003 0.30 0/30 87251347 88115585 0.86 0.014 0.18 1/39 Del 22q13.31 chr22 44621024 45245124 0.62 0.016 0.27 1/30 44424566 44631074 0.21 0.015 0.18 1/39 Del 3p12.3   chr3 76548884 77681632 1.13 0.000 0.37 2/30 76792826 78564868 1.77 0.039 0.15 1/39 Del 4p16.2   chr4 3186614 3520414 0.33 0.011 0.27 1/30 3379182 3520414 0.14 0.026 0.18 1/39 Del 4q13.1   chr4 65154749 65197754 0.04 0.004 0.27 2/30 63422139 63516496 0.09 0.024 0.15 1/39 Del 5q31.3   chr5 140141350 140614343 0.47 0.004 0.23 1/30 140503546 140610282 0.11 0.035 0.15 1/39 Del 6q12     chr6 67706678 67758736 0.05 0.037 0.23 1/30 67179217 68454950 1.28 0.003 0.23 0/39 Del 6q16.1   chr6 94018754 95135848 1.12 0.002 0.30 1/30 94527588 94644289 0.12 0.030 0.18 0/39 Del 9p21.3   chr9 20538755 28805650 8.27 0.000 0.53 2/30 21944954 22008780 0.06 0.002 0.21 1/39 Amp = amplification, Del = deletion Genomic coordinates presented are those for the March 2006, hg18 genome build. 32To identify regions of concordance across multiple datasets, we looked for differentially altered regions in the MSKCC tumours that overlapped with those identified in our BCCA dataset. Of the 68 regions that were differentially altered in the MSKCC data set between CS and NS, our analysis revealed 21 distinct regions, spanning 9 different chromosomes that overlapped with those identified in our BCCA tumours. Three regions, all copy number gains on chromosome 1q, were specific to CS while the remaining 18 regions included 9 copy number gains and 9 copy number losses, were specific to NS. The discrepancy in number of regions identified within each dataset is likely due to the higher density and better detection ability of the SNP array platform used for our BCCA analysis; the MSKCC copy number data were generated using lower resolution 4x44K aCGH arrays and the TSP data were generated using lower density SNP 250K arrays. Nevertheless, validation of the BCCA regions in an external dataset with well documented smoking histories demonstrates that in addition to exhibiting global differences in copy number patterns, CS and NS exhibit regional genomic differences as well. A list of the minimal common regions (MCRs) of differential alteration shared by the BCCA and MSKCC lung tumours is provided in Table 3.4.  We also applied this comparison to an additional cohort (TSP) to identify the most prominent differentially altered regions between CS and NS in the combined datasets acknowledging that this stringent criteria may overlook some regions. This analysis revealed six MCRs concordant in all three independent datasets. All of these regions were copy number gains specific to NS, and included two regions on chromosome 5q, three regions on chromosome 7p, and one region on chromosome 16p, which encompassed a total of 13 genes (Table 3.5, Figure 3.5). The smallest MCR defined was the region at 5q33.3, which was 179 kbp in size, while the largest MCR defined was 16p13.3-13.2, which was 3.6 Mbp. Interestingly, the frequency of copy number gains at 5q33.3 and 5q34 in CS was zero in every dataset. Gains on chromosome 7p (7p14.1 and 7p12.3) were on average 20% more frequent in NS than in CS across the three datasets, as were gains at 16p13.3-13.2. The NS-associated lung cancer oncogene EGFR did not map to the MCRs on chromosome 7p; however, we observed a significant association between both 7p12.3 DNA gains and EGFR gains and mutations (Fisher's Exact test, p=0.002 and p=0.010). This finding is consistent with the fact that EGFR function can be influenced by multiple genetic mechanisms.  33Table 3.4 Differentially altered regions common in the BCCA and MSKCC cohorts Cytoband Chr Start (bp) End (bp) Size (bp) Status 1q21.1 1 143628352 143921538 293186 Gain in Smokers 1q21.1 1 146866582 146938873 72291 Gain in Smokers 1q21.3 1 149678259 151582433 1904174 Gain in Smokers 5q33.3 5 158495214 158674554 179340 Gain in NS 5q34 5 162876149 165700950 2824801 Gain in NS 7p21.2 7 13416032 13719486 303454 Gain in NS 7p15.3-p15.2 7 24876559 25666836 790277 Gain in NS 7p14.1 7 40005232 41936324 1931092 Gain in NS 7p12.3 7 46603499 46892619 289120 Gain in NS 7p12.3 7 48447956 49667305 1219349 Gain in NS 8p23.1 8 9933149 10550456 617307 Loss in NS 8p21.2 8 25009640 25349074 339434 Loss in NS 8p21.1 8 28103525 29457984 1354459 Loss in NS 8p12 8 38180851 38440527 259676 Loss in NS 9q33.2 9 122816481 122966826 150345 Loss in NS 10q11.21 10 42955951 43010057 54106 Loss in NS 13q12.11 13 18601703 18653178 51475 Loss in NS 16p13.3-p13.2 16 5278276 8866443 3588167 Gain in NS 16p12.1-p11.2 16 27313417 27917742 604325 Gain in NS 18q21.2 18 46480704 46480763 59 Loss in NS 18q22.3 18 69959421 69976724 17303 Loss in NS Genomic coordinates presented are those for the March 2006, hg18 genome build.    Table 3.5 Differentially altered regions common to the BCCA, MSKCC and TSP cohorts Cytoband Chr Start End Size Status Genes in MCR 5q33.3 5 158495214 158674554 179340 Gain NS IL12B, RNF145, UBLCP1 5q34 5 162876149 165700950 2824801 Gain NS MAT2B 7p14.1 7 40455608 41936324 1480716 Gain NS C7ORF10, INHBA 7p12.3 7 46603499 46892619 289120 Gain NS - 7p12.3 7 48447956 49667305 1219349 Gain NS ABCA13 16p13.3-13.2 16 5278276 8866443 3588167 Gain NS A2BP1, ABAT, C16ORF68, CARHSP1, PMM2, TMEM186 Genomic coordinates presented are those for the March 2006, hg18 genome build.   34Figure 3.5  Figure 3.5 MCRs of overlap for the six regions common to all three datasets    35Figure 3.5 MCRs of overlap for the six regions common to all three datasets. Copy number segments for each dataset and each MCR are illustrated (A). MCRs are defined by hashed lines and base pair positions are indicated. Disruption frequencies of these regions in each dataset are shown in (B), with Fisher?s Exact test p-values indicated. All 6 regions are copy gains specific to NS tumours. Genomic coordinates presented are those for the March 2006, hg18 genome build.  3.4 Discussion  3.4.1 Summary of findings  In this study we sought to elucidate global genomic differences in lung ACs from NS and CS. Using a genome-wide comparison approach, we discovered that NS lung tumours have a greater proportion of their genomes altered than CS, we identified both disparate and common regions of genomic alteration in the tumour genomes of these two groups, and we validated our findings in two independent external cohorts from the Memorial Sloan Kettering Cancer Centre (MSKCC) and Tumour Sequencing Project (TSP). Importantly, EGFR and KRAS mutations in the BCCA and MSKCC tumours segregated with NS and CS, respectively, consistent with the reported literature. This confirmed the accuracy of our smoking status classifications and validated that these tumours were appropriate to perform a CS versus NS genome comparison.   3.4.2 Comparison of global genomic differences observed with findings in other studies  Other studies assessing the difference in genomic landscapes of ever smokers (including current and former smokers) and NS using lower resolution copy number profiling strategies reported that CS have more CNAs than NS 27,77. These studies quantified the number of copy number events as opposed to measuring PGA as in our study. Differences in the clinical features of the tumour samples assessed, in analysis methods, and copy number detection platforms used are possible reasons for the discrepancy in findings between our studies.  36Furthermore, even though our observation held true in independent datasets, we are mindful of the fact that the contributions of mutational and smoking status cannot be distinguished in our study due to the caveat that our CS and NS groups were not balanced for ethnicity or mutation (which is not surprising given the known clinical and molecular features associated with smoking status). Despite this caveat, our multivariate analysis suggested smoking history was the clinical variable most strongly associated with this observed difference. Nevertheless, it remains a possibility that PGA is associated with mutation, as PGA between CS and NS with no mutations was not significantly different. Our study was not powered to test adjusted effects of smoking or mutation type on PGA adjusting for all other confounding factors. This is because smoking and race are correlated with EGFR and KRAS mutation. A non-confounded comparison would require large numbers of patients in each smoking/mutation/race combination to decipher whether PGA differences are truly related to smoking status or if they are the result of mutational or ethnic status.  3.4.3 Concordant and disparate genomic alterations identified in CS and NS and comparison with other studies  Amidst the genomic instability observed in the lung AC tumours, we identified frequent genomic alterations whose recurrent nature signifies their selection in lung tumour genomes. The most frequently altered region detected in our 69 tumours was gain of 5p15.32-15.33 (51% of tumours), which harbours the hallmark cancer gene TERT. Gain of 5p was also the most common genomic alteration observed by Weir et al. in a collection of over 350 lung AC tumours 32. Other regions concordant in CS and NS included recurrent copy gains on 1q, 7p, 8q and 17q and losses on 3p, 6q, 8p, 9p, 17p and 19p which are prominent regions in lung AC 21,25,27,32,72,74,75. These regions harbour known lung cancer oncogenes and tumour suppressors and their presence in our cohort indicates our samples are representative of lung AC.   Having established that regions commonly altered in lung AC were not associated with smoking status, we proceeded to determine whether CS and NS lung cancer patients exhibit distinct genomic features. We hypothesize that the differentially altered regions that exist  37may play a role in the distinct clinical phenotypes observed in CS and NS. We identified six MCRs of copy number gain in NS across the three cohorts we studied (5q33.3, 5q34, 7p14.1, two at 7p12.3, and 16p13.3-13.2). Broet et al. described regions differentially altered in East-Asian and Western European lung AC but none overlapped with the smoking-related regions we identified, suggesting our regions are not ethnic specific 21. We cross referenced our differentially altered regions with regions of copy number gain and loss reported in another study of NS lung AC 25, and validated our finding of copy gains at 5q, 7p and 16p in NS. The two robust regions of gain at 5q33.3 and 5q34 that we identified are consistent with other studies which reported 5q gain in NS lung AC 21,25,27. Our analysis revealed three distinct MCRs on 7p, however none encompassed the lung cancer oncogene, EGFR, located on chromosome 7p11.2. The closest region (7p12.3) was situated 5.4 Mbp telomeric of the EGFR locus. The presence of these MCRs could imply that additional oncogenes are responsible for 7p gains, as previously suggested 78. Preferential copy gain at 16p in NS is one of the most consistently replicated NS-specific genetic alterations discovered to date, which implicates the importance of this region in NS tumour biology 15,21,25,41,44. We and others also observed an association between 16p gain and Asian ethnicity, however, this could reflect the fact that a large fraction of NS lung cancer patients are of Asian descent 15,21.   3.4.4 Implications of genomic differences observed between CS and NS  While many known regions of copy number alteration in lung AC were present in both our CS and NS cohorts, our results, along with the well established differences in mutational profiles and clinical features, suggest lung tumours of CS and NS develop through different molecular mechanisms. This may be similar to what has been observed in ovarian cancer, where Type I serous ovarian cancers are typically chromosomally stable and harbour mutations in the Ras signaling pathway, while high-grade serous ovarian cancers (Type II) are RAS wild type and exhibit widespread copy number aberrations 79. Intriguingly, Sidransky and co-workers recently discovered that NS lung AC genomes have a greater number of mitochondrial DNA alterations than CS 46. This finding is consistent with our discovery, providing additional evidence to support the concept that lung cancers in CS and  38NS are driven by different molecular alterations. We postulate that NS lung tumours acquire specific genetic alterations early in tumourigenesis that compromise genome integrity. For example, we hypothesize that NS could be inherently predisposed to genomic instability, or they could be exposed to non-tobacco related carcinogens that drive genomic instability. Elucidation of the precise mechanism driving this instability phenotype could potentially lead to targeted therapy for NS patients, or to identification of NS at risk of lung cancer development.  Our findings provide a novel line of evidence towards genetic differences between CS and NS lung tumours, namely, that the extent of segmental genomic alterations is greater in NS tumours. Collectively, our observations demonstrate that these lung tumours are globally and genetically different, which implies they are likely driven by distinct molecular events. Although the biological mechanism underlying our observations in NS remains unknown, elucidation of this mechanism is crucial to early detection and possibly treatment of these patients, as no known risk factors or molecular features exist to assess lung cancer risk in NS besides family history.   394    Chapter: DNA methylation differences in lung AC of CS and NS  4.1 Introduction  4.1.1 DNA methylation and regulation of gene expression  DNA methylation is the addition of a methyl group to cytosines in CpG dinucleotides, generating 5'-methylcytosine (5mC). Concentrated regions of CpGs are known as CpG islands, and are typically associated with gene promoters. DNA methylation is an important regulator of gene expression. In normal cells, the promoters of genes expressed in a particular cell type are not methylated which enables transcription, and gene silencing mediated by promoter methylation occurs at a small fraction of promoters to ensure specific gene expression patterns throughout development and in different differentiated tissues 80. A lack of DNA methylation in CpG islands creates an open chromatin structure that is amenable to binding of factors required for transcription. Conversely, the presence of methylation can repress transcription by blocking access of required proteins to DNA. Methylated DNA recruits methyl-binding domain proteins which themselves recruit factors responsible for remodelling chromatin into a compact state not amenable to transcription 80.   Methylation can also occur at CpG island shores and within gene bodies. CpG island shores are found within 2 kbp of CpG islands and are comprised of a lower CpG density than islands. While methylation of shores is associated with reduced gene expression, methylation in gene bodies is associated with active gene expression 80,81. In addition to its function in regulating gene expression, DNA methylation is involved in maintaining genome stability. CpG methylation is prominent in repetitive sequences of the genome where it serves to prevent activation of transposable elements which can lead to chromosomal instability 80. Thus, the effects of DNA methylation are dependent upon the location of methylation in the genome.     404.1.2 Aberrant DNA methylation in cancer and clinical significance  Disruption of DNA methylation is thought to be an early event in tumourigenesis, as aberrant methylation has been observed in various premalignant lesions and in non-malignant tissues adjacent to tumours 12,80. Tumour epigenomes are typically characterized by global hypomethylation and focal hypermethylation at specific CpG islands and shores. These events can contribute to destabilization of repetitive elements and consequential genomic instability, as well as activation of oncogenes and silencing of tumour suppressor genes (TSGs)80.  Many studies have revealed great potential for the utility of aberrant DNA methylation marks as clinical biomarkers 82,83. Specific hypermethylation events can be measured in readily available patient samples, such as blood, stool and urine to assess cancer risk and for cancer detection. Methylation marks also have prognostic utility and can predict response to chemotherapy 82,83. Because DNA methylation is a reversible process, development and application of demethylating agents is a promising therapeutic strategy for cancers characterized by hypermethylation of TSGs. Encouragingly, two inhibitors of DNA methyltransferases (the enzymes responsible for adding methyl groups to cytosine) have been approved for clinical use in myelodysplastic syndrome and several others are currently being evaluated in clinical trials for use in solid tumours 84.  4.1.3 Rationale for assessing DNA methylation in lung cancer  Like other malignancies, lung cancer exhibits DNA methylation changes affecting several genes with known roles in tumourigenesis, demonstrating its involvement in lung cancer biology 83. Based on the idea that the disparate clinical phenotypes exhibited by CS and NS lung cancer patients are driven by distinct molecular mechanisms, we performed genome wide DNA methylation profiling of lung AC epigenomes from CS and NS to determine whether distinct DNA methylation patterns exist between these two groups. We hypothesize that CS and NS lung tumour epigenomes harbour differentially methylated loci, resulting in  41the differential deregulation of genes that could underlie the observed differences in tumour biology of these two groups.  4.2 Methods  4.2.1 DNA methylation arrays  Genomic DNA from tumour and matched non-malignant lung tissues for 34 CS and 30 NS was bisulfite converted and hybridized to Illumina Infinium Human Methylation (HM) 27 arrays as previously described 53. Data normalization produced a log2 transformed M-value for every CpG assayed on the array. M-values were subsequently transformed into ?-values, which represent the percent methylation at a particular CpG site (?-value = methylated signal / total signal (methylated + unmethylated)) 54,55. Probes with a detection p-value > 0.05 were excluded from the methylation analysis.  4.2.2 Aberrantly methylated genes and differentially methylated regions  Differences in the DNA methylation profiles of 34 CS and 30 NS were investigated to identify aberrantly methylated genes and differentially methylated regions. For every CpG probe, the delta ?-value (d?V) was calculated to determine the difference in methylation between each tumour and its matched non-malignant tissue (d?V  = Tumour ?-value minus non-malignant ?-value). Probes with a d?V > 0.2 were considered hypermethylated in the tumour while probes with d?V < -0.2 were considered hypomethylated. The frequency of hyper- and hypomethylation was calculated for each probe across the CS and NS groups. A Student's t-test was used to investigate differences in the number of aberrantly methylated probes between the groups, with a p-value <0.05 considered significant. A multifactor ANOVA was performed to assess whether smoking status was the clinical variable most strongly associated with the observed differences in DNA methylation patterns considering all clinical data that was available (stage, gender, age, smoking status, race, and EGFR and KRAS mutation status). Factors with a MANOVA p-value < 0.05 were considered significantly associated with aberrant DNA methylation.  42To investigate the presence of differentially methylated regions (DMRs) in CS and NS epigenomes, the moving average of hyper- and hypomethylation frequencies was calculated for each group using windows of 30 adjacent methylation probes. Windows with a frequency of hyper- or hypomethylation of at least 20% in either CS or NS were manually inspected to identify individual probes with recurrent methylation. Methylated "regions" were defined as i) having a minimum of 10 probes with at least 70% of those probes displaying concordant methylation changes in at least 20% of either CS or NS samples, ii) having an average frequency of aberrant methylation of at least 20% across all probes in the region in one or both groups, and iii) including at least 5 different genes. These criteria were implemented to identify the most robust candidate DMRs. Once the methylated regions were defined they were systematically reviewed to determine whether they were commonly or differentially affected in CS and NS. If 60% of the probes comprising the region showed concordant aberrant methylation frequencies over 20% in CS and NS, the region was considered aberrantly methylated in both groups. DMRs were defined as regions for which only one group displayed frequent (>20%) aberrant methylation for at least 60% of the probes in the region, and the difference in methylation frequency of hyper- or hypomethylation between the groups exceeded 15%. To identify differentially methylated genes (DMGs), the frequencies of hyper- and hypomethylation were compared for all array probes between CS and NS using a 2x2 Fisher's Exact test. Since measurements for over 27,000 individual probes were compared, Benjamini-Hochberg (BH) multiple testing correction was performed and probes with a corrected p-value < 0.05 were considered significantly differentially methylated in CS and NS.  4.2.3 Validation of DNA methylation changes in external cohorts  Two external lung AC methylation datasets from the TCGA and GSE32866 were obtained and used as validation cohorts (Table 2.3). Level 3 (normalized and processed ?-value data as described in the TCGA Data Primer, https://wiki.nci.nih.gov/display/TCGA/TCGA+Data+Primer) methylation profiles were downloaded from the TCGA data portal (https://tcga-data.nci.nih.gov/tcga/) for 26 lung ACs; these included 16 CS and 10 NS samples. Four CS and four NS had patient matched tumour  43and non-malignant profiles, while 12 CS and 6 NS did not. GSE32866 methylation ?-value data were accessed from the Gene Expression Omnibus (GEO) and included 4 CS and 4 NS with patient matched tumour and non-malignant profiles, and one NS without a matched non-malignant profile. d?Vs for cases with matched tumour and non-malignant profiles were calculated as described earlier. For tumours without patient matched non-malignant profiles, non-malignant profiles of the same smoking history were averaged and used as a reference for calculating d?Vs in each dataset. Probes were defined as hyper- or hypomethylated as described in section 4.2.2 above. These datasets were generated using the Illumina HM27 DNA methylation array.   4.3 Results  4.3.1 Genome wide comparison of aberrant DNA methylation in CS and NS  DNA methylation profiles for each tumour were compared to those of patient matched non-malignant lung tissues, to identify aberrantly methylated probes. The frequency of aberrant methylation across the genome in CS and NS is depicted in Figure 4.1. Frequency as indicated was calculated by taking the moving average of hyper- and hypomethylation frequency across 30 probe windows for each group. Across the genome, the frequency of hypermethylation exceeded that of hypomethylation, with the exception of a few regions for which hypomethylation was more prominent (1q, 2p, 7q, 14q, 18q and 21q). This is not surprising given the fact that the majority of probes on the methylation array are located in gene promoters.   While the distribution of aberrant methylation across the genome was very similar between CS and NS, the frequency of hypermethylation appears to be greater in CS across the epigenome. To quantitatively determine whether any genome wide differences in methylation patterns exist between CS and NS, we compared the number of aberrantly methylated probes in each group (Table 4.1). This comparison revealed that CS have significantly more aberrant methylation than NS (Student's t-test, p = 0.01, Figure 4.2 a).  44Figure 4.1  Figure 4.1 DNA methylation landscape of lung AC from CS and NS 45Figure 4.1 DNA methylation landscape of lung AC from CS and NS. The frequency of aberrant DNA methylation across the genome was plotted for CS (A) and NS (B) for our BCCA lung AC cohort. Across the genome, hypermethylation was more prominent than hypomethylation. Frequencies displayed were calculated using a moving average window of 30 methylation array probes and CS exhibited greater aberrant DNA methylation overall. Several specific regions displayed very high frequencies of aberrant methylation. Dotted lines represent chromosome centromere.   This observation held true when aberrant methylation was broken down into hypermethylation (p = 0.03) but not for hypomethylation (p = 0.09) (Figures 4.2 b, c). We also compared the amount of aberrant methylation between EGFR mutant and wild type tumours, and KRAS mutant and wild type tumours to assess whether our results were confounded by mutation status. No significant difference in the extent of aberrant methylation was observed in either of these comparisons, suggesting smoking and not mutation was the clinical variable most correlated with the amount of aberrant methylation observed (Figures 4.2 e, f).    Table 4.1 Summary of aberrant DNA methylation in lung AC of CS and NS   Measure All Tumours CS NS P-val Hypermethylation Average 0.056 0.064 0.047  Median 0.048 0.060 0.045 0.031 Range 516-4160 539-4160 516-4049  Hypomethylation Average 0.052 0.059 0.044  Median 0.040 0.042 0.038 0.092 Range 298-5014 298-5014 335-3973  All Aberrant Methylation Average 0.108 0.123 0.091  Median 0.102 0.109 0.088 0.007 Range 837-7750 837-7750 1059-5329  Average and median refer to the proportion of total probes that are aberrantly methylated, and range refers to the number of aberrantly methylated probes. 46Figure 4.2  Figure 4.2 Quantification of aberrant DNA methylation in CS and NS lung AC  47Figure 4.2 Quantification of aberrant DNA methylation in CS and NS lung AC. For each tumour the number of aberrantly methylated probes was determined. The numbers of (A) total aberrant probes, (B) hypermethylated probes, and (C) hypomethylated probes were compared in CS and NS tumours of our BCCA cohort, revealing a greater extent of total aberrant methylation and hypermethylation in CS (Student?s t-test p < 0.05). Total aberrant methylation was also compared in EGFR mutant versus wild type (D), and KRAS mutant versus wild type tumours (E), but no significant differences were identified. Asterisks indicate p-values < 0.05.     4.3.2 Smoking is the clinical variable most strongly associated with the extent of aberrant methylation in lung AC genomes  To directly address the question of whether or not smoking status was the clinical variable most associated with the genome wide DNA methylation difference we observed, a MANOVA test was performed. Variance in the amount of aberrant DNA methylation observed was measured as a function of tumour stage, gender, age, smoking history, race, and mutation status. This analysis revealed that smoking status explained the greatest amount of variance in total aberrant DNA methylation observed (F = 8.41, p = 5.4 x 10-3) compared to all other factors (Table 4.2). Smoking status was closely followed by KRAS mutation (F = 3.98, p = 0.05), which is not surprising given that KRAS mutations are characteristic of CS. Aberrant hypermethylation was most strongly associated with smoking status (F = 4.66, p = 0.04), while the extent of hypomethylation was most significantly associated with KRAS mutation (F = 6.01, p = 0.02) followed by smoking status (F = 3.45, p = 0.07). These multivariate analyses confirm that smoking history explains the greatest amount of variance in the amount of methylation alteration observed.   48Table 4.2 MANOVA results for the association of aberrant DNA methylation with clinical and genetic variables for the BCCA cohort Response: All Aberrant Methylation Df Sum  Squares Mean  Squares F-val P-val Stage 3 5,749,256 1,916,419 1.1525 0.33639 Gender 1 469,181 469,181 0.2822 0.59746 Age 1 327,138 327,138 0.1967 0.65914 Smoking 1 13,981,636 13,981,636 8.4085 0.00539 EGFR 1 1,424 1,424 0.0009 0.97676 KRAS 1 6,611,377 6,611,377 3.9761 0.05121 RACE 1 774,494 774,494 0.4658 0.49785 Residuals 54 89,790,981 1,662,796 - - Response: Hypermethylation Df Sum  Squares Mean  Squares F-val P-val Stage 3 431,987 143,996 0.1824 0.90791 Gender 1 65,963 65,963 0.0836 0.77365 Age 1 1,255 1,255 0.0016 0.96834 Smoking 1 3,678,948 3,678,948 4.6599 0.03534 EGFR 1 180,074 180,074 0.2281 0.63487 KRAS 1 28,677 28,677 0.0363 0.84956 RACE 1 888,343 888,343 1.1252 0.29352 Residuals 54 42,632,841 789,497 - - Response: Hypomethylation Df Sum  Squares Mean  Squares F-val P-val Stage 3 3,607,673 1,202,558 1.2517 0.30016 Gender 1 886,989 886,989 0.9233 0.3409 Age 1 368,926 368,926 0.384 0.53807 Smoking 1 3,316,567 3,316,567 3.4522 0.06862 EGFR 1 149,466 149,466 0.1556 0.69481 KRAS 1 5,769,207 5,769,207 6.0052 0.01753 RACE 1 3,902 3,902 0.0041 0.94942 Residuals 54 51,877,957 960,703 - - Df = degrees of freedom   4.3.3 Prominently methylated regions in lung AC and differentially methylated regions in CS and NS epigenomes  Our frequency plot revealed several striking regions of aberrant DNA methylation in the lung AC epigenomes we profiled (Figure 4.1). Thus, we next sought to characterize these regions  49and determine whether any differentially methylated regions (DMRs) exist between CS and NS. Methylated regions were defined as i) having an average frequency of hyper- or hypomethylation of at least 20% in either group, ii) including at least 10 different array probes spanning at least 5 different genes, and iii) having at least 60% of the probes comprising the region displaying frequent (>20%) and concordant methylation changes in CS or NS. In our 64 lung ACs, we identified 16 methylated regions satisfying these criteria (Table 4.3). We noted that all but one of these regions spanned specific gene families (e.g. homeobox genes), suggesting aberrant DNA methylation at these loci could be selected to deregulate expression of genes with related functions.  We further investigated the 16 methylated regions to assess whether any were preferentially affected in either CS or NS. Five regions were commonly hypermethylated and 6 regions were commonly hypomethylated in CS and NS, and the remaining 5 regions were preferentially hypomethylated in CS (Table 4.3). Of the five candidate DMRs, only one region at 1q21.3 had a significantly different proportion of aberrantly methylated probes in the region (Fisher's Exact test, p = 4.8 x 10-3). This region, preferentially hypomethylated in CS, contained genes involved in the epidermal differentiation complex, including S100A, LCE and SPRR genes 85.  4.3.4 Aberrantly methylated genes in lung AC of CS and NS  In addition to investigating regional patterns and differences in aberrant DNA methylation, we were interested in assessing the methylation levels of specific genes reported to be hyper- or hypomethylated in lung cancer. Based on the literature, we compiled a list of aberrantly methylated genes in lung cancer, including 13 hypermethylated and 6 hypomethylated genes 22,86. Concordant with previous studies, we observed hypermethylation of RARB, DLEC1, RASSF1, APC, SOX17, CDKN2A, and TIMP3, and hypomethylation of NAV1 in at least 20% of our 64 lung tumours. We also observed hypermethylation of MGMT (17%), DAPK1 (14%), and GSTP1 (13%), and hypomethylation of ARNT (13%), AKT3 (9%), and ERBB2 (6%) at lower frequencies.   50Table 4.3 Most prominent aberrantly methylated regions in lung AC Region CS     Freq.   Hypo CS      Freq.    Hyper NS     Freq.   Hypo NS      Freq.   Hyper Change-    Group # Genes   Affected P-val   Hypo P-val    Hyper Gene                                                     Family 1q21.3 0.78 0.04 0.55 0.03 hypo CS 37 0.00 1.00 Epidermal differentiation complex 1q23.1 0.76 0.11 0.57 0.05 hypo CS 20 0.14 0.67 Fc receptor-like proteins and Class I antigen-presenting glycoproteins 2p12 1.00 0.00 1.00 0.00 hypo both 5 1.00 1.00 Regenerating islet-derived 2q31.1 0.00 1.00 0.00 1.00 hyper both 8 1.00 1.00 Homeobox D 6p22.1 0.00 0.71 0.00 0.67 hyper both 11 1.00 1.00 Histone H1 7p15.2 0.00 0.92 0.00 0.84 hyper both 10 1.00 0.67 Homeobox A 7q34 0.77 0.05 0.64 0.00 hypo both 11 0.51 1.00 None 8p23.1 0.80 0.00 0.55 0.00 hypo CS 9 0.18 1.00 Defensins 11p15.4 1.00 0.00 1.00 0.00 hypo both 9 1.00 1.00 Beta-like globins 11q12.1 0.72 0.06 0.61 0.00 hypo both 9 0.72 1.00 Membrane-spanning 4-domains subfamily A 12q13.13 0.00 1.00 0.00 0.75 hyper both 7 1.00 0.22 Homeobox C 14q32.13 0.75 0.00 0.69 0.00 hypo both 8 1.00 1.00 Serpin peptidase inhibitors, clade A 17q12 0.70 0.00 0.45 0.05 hypo CS 12 0.20 1.00 Chemokine (C-C motif) ligands 17q21.32 0.00 0.80 0.00 0.85 hyper both 8 1.00 1.00 Homeobox B 18q21.33 0.79 0.00 0.58 0.00 hypo CS 8 0.30 1.00 Serpin peptidase inhibitor, clade B 21q21.3 0.97 0.00 0.83 0.00 hypo both 23 0.19 1.00 Keratin associated proteins hypo = hypomethylation; hyper = hypermethylation; italics indicate the 5 regions preferentially hypomethylated in CS Italicized regions indicate those preferentially methylated in CS or NS. P-val indicates the p-value for a Fisher's Exact test comparing the frequencies of hypo- and hypermethylation for each region in CS compared to NS.  51We next asked whether any genes were differentially methylated in CS and NS, as DMGs could be involved in CS or NS-specific lung tumourigenesis. We compared the frequencies of hyper- and hypomethylation for all array probes and identified 1022 probes associated with 908 unique genes with a Fisher's Exact test p-value < 0.05. Of the 14 genes described as frequently methylated in our lung AC dataset described above, only CDKN2A was differentially methylated between CS and NS (Fisher's Exact test, p = 0.04). However, upon BH multiple testing correction for 27,578 statistical comparisons (the number of probes on the methylation array), only one probe corresponding to the CYP3A4 gene was differentially methylated between CS and NS (hypomethylated in CS, corrected p = 0.03).  4.3.5 Validation of observed differences in external cohorts  To determine whether our findings held true in additional datasets, we assessed DNA methylation patterns in external cohorts. Publically available DNA methylation data for cohorts with annotated smoking history were limited to i) a dataset of 4 CS and 5 NS with patient matched tumour and non-malignant lung tissues (GSE32866) 87, and ii) a dataset of 16 CS and 10 NS tumours with mostly unmatched non-malignant methylation profiles produced by the TCGA (https://tcga-data.nci.nih.gov/tcga/) (Table 2.3).   We first looked to validate our finding that lung ACs from CS harbour a greater number of DNA methylation changes than NS. Comparison of CS and NS in the TCGA and GSE32866 cohorts revealed no difference in aberrant methylation levels between the groups in either dataset. On average, CS had more aberrant methylation than NS in the TCGA dataset (Student's t-test, p = 0.66) and the opposite trend was observed in the GSE32866 dataset (Student's t-test, p = 0.73). These observations may suggest that the differences we observed in overall amount of aberrant methylation between CS and NS were limited to our own cohort. However, the validation datasets we assessed were limited by small sample size (GSE32866) and unmatched non-malignant reference samples (TCGA). Thus, for the probes assayed on the HM27 methylation array, we cannot conclusively determine whether or not CS exhibit a greater extent of aberrant methylation compared to NS.   52Of the five candidate DMRs at 1q21.3, 1q23.1, 8p23.1, 17q12, and 18q21.33, none were preferentially hypomethylated in CS in the independent datasets using the stringent thresholds we applied to our own dataset. This could reflect the fact that the amount of aberrant methylation detected in the external datasets was lower than the amount detected in the BCCA cohort, which in turn, could be due to the fact that our tumour samples were micro dissected. On average, the numbers of aberrantly methylated probes detected per tumour were 2900, 2400, and 2000 in the BCCA, TCGA, and GSE32866 cohorts, respectively. Reducing the proportion of probes showing frequent (> 20%) aberrant methylation in the region of interest from 70% to 40%, corroborated our finding of hypomethylation at 18q21.33 affecting the SERPINB genes in CS in the TCGA cohort. However, reducing this threshold in the GSE32866 cohort did not result in corroboration of any of the DMRs we identified, as the frequencies of aberrant methylation observed in this dataset were much lower and the sample size was small.   Finally, we investigated our candidate DMGs in the TCGA cohort. The GSE32866 dataset was insufficiently powered to detect DMGs due to its small sample size (4 CS versus 5 NS) and was therefore excluded for this analysis. CYP3A4, the only DMG that passed multiple testing correction in our dataset, was not differentially methylated in the TCGA cohort. We proceeded to examine the 820 candidate DMGs identified before multiple testing correction (uncorrected Fisher's Exact test p < 0.05), acknowledging that they could be false positive DMGs. 9 genes hypomethylated in CS, 5 genes hypomethylated in NS, and one gene hypermethylated in CS validated in the TCGA dataset (Table 4.4). These genes may be selected for deregulation in lung tumourigenesis through aberrant methylation in a smoking specific manner. However, like the DMRs we identified, these genes should be further validated in larger cohorts with more stringent significance thresholds applied.      53Table 4.4 Differentially methylated genes validated in the TCGA lung AC cohort   BCCA Cohort Frequencies TCGA Cohort Frequencies Probe Gene Direction CS NS Fisher Pval CS NS Fisher Pval cg11465372 KRT2B hypo CS 0.03 0.40 0.0024 0.00 0.60 0.0095 cg13950558 BLOC1S3 hypo CS 0.15 0.40 0.0186 0.13 0.80 0.0095 cg11599505 C20orf102 hypo CS 0.26 0.00 0.0259 0.50 0.00 0.0095 cg17919471 XAGE3 hypo CS 0.50 0.20 0.0294 0.50 0.00 0.0095 cg18118795 RBBP4 hypo CS 0.18 0.00 0.0186 0.50 0.00 0.0227 cg17761453 LOR hypo CS 0.24 0.03 0.0453 0.50 0.00 0.0227 cg26829529 SPACA3 hypo CS 0.32 0.60 0.0016 0.19 0.70 0.0367 cg18328933 ABHD14A hypo CS 0.41 0.67 0.0095 0.19 0.70 0.0367 cg04345908 HLA-DQB2 hypo CS 0.50 0.20 0.0244 0.44 0.00 0.0367 cg14972143 EIF4E hypo NS 0.38 0.13 0.0003 0.44 0.00 0.0009 cg23208152 ZMYND10 hypo NS 0.35 0.03 0.0269 0.56 0.10 0.0010 cg26728422 C16orf28 hypo NS 0.41 0.10 0.0435 0.56 0.10 0.0152 cg26803305 SLC14A1 hypo NS 0.41 0.13 0.0490 0.56 0.10 0.0152 cg00186701 TSPYL5 hypo NS 0.06 0.27 0.0363 0.00 0.30 0.0462 cg21475402 BCAN hyper CS 0.74 0.40 0.0108 0.75 0.30 0.0426   4.4 Discussion  4.4.1 Summary of findings  The aim of this work was to investigate genome wide differences in DNA methylation profiles of lung AC from CS and NS. In our cohort, we found that the genomic distribution of aberrant methylation is generally similar between the groups, but that CS epigenomes harboured a greater amount of aberrant DNA methylation overall. Moreover, we identified candidate regions and genes that were preferentially methylated in one group or the other; however, our results did not validate in independent datasets possibly due to some of the reasons described below. Given the lack of validation of our findings at this point, we conclude that the differences in CS and NS lung AC epigenomes may be modest. The differences we observed require further investigation in larger independent cohorts to robustly determine whether they are truly distinct features of these two groups.   544.4.2 Comparison of findings to literature on DNA methylation differences in CS and NS  Early methylation studies were limited by available technologies, and thus, the number of genes assayed. The development of arrays and sequencing strategies to assess genome wide and global methylation have improved our knowledge regarding the methylome in all types of lung cancer. However, studies designed to directly compare genome wide methylation profiles of CS and NS lung tumours are limited. Reports comparing methylation profiles of ever smoker (current and former smokers) and NS lung tumours have mostly investigated only small panels of 1-9 genes and overall methylation index, which represents the methylation status of all loci investigated. Such studies concluded that hypermethylation of CDKN2A, MGMT, and APC and the overall methylation index, are higher in ever smokers compared with never or light smokers 88-92. This is consistent with our findings of CS exhibiting a greater extent of aberrant hypermethylation than NS. Carcinogens in tobacco smoke are known to deregulate DNA methyltransferase 1 (DNMT1) to promote aberrant hypermethylation in lung tumours of smokers, further supporting our findings 93,94. The fact that we observed more hypermethylation than hypomethylation across the epigenomes we analyzed was not surprising given that the methylation array we used for profiling measures methylation primarily at gene promoters, which are normally unmethylated. Thus, given its design, this array favours detection of aberrant hypermethylation events in cancer epigenomes.   Attempts to corroborate the candidate DMRs and DMGs identified in our study with findings reported in the literature was challenging given the limited number of published, genome wide methylation comparative studies between CS and NS. CDKN2A hypermethylation is more prominent in ever smokers than NS, and our results were concordant with this observation. Unlike previous studies, we did not observe significant differences in aberrant methylation of RASSF1A, DAPK, MGMT, or APC, as both groups exhibited frequent methylation of these genes in our dataset. This could suggest these genes are not differentially methylated in CS and NS in our cohort, or could be due to the fact that different, differentially methylated CpG sites were assessed in previous studies that were not  55interrogated by the CpG sites on the methylation array we used. Comparison of our DMGs with NSCLC tumour-specific methylated regions reported by Carvalho et al. revealed four regions of overlap: hypermethylation of GATA3 (10p14) and HOXA9 (7p15.2) and hypomethylation of ZMYND10 (3p21.31) common to NSCLC, and hypermethylation of HOXA1 (7p15.2) specific to AC 22. Additional comparison of our methylated regions (Table 4.3) with those reported by Heller et al. in NSCLC revealed common hypermethylation at 5 regions: 2q31.1 (HOXD genes), 6p22.1 (HIST1H1 genes), 7p15.2 (HOXA genes), 12q13.13 (HOXC genes), and 17q21.32 (HOXB genes) 24. Aberrant DNA methylation is a well established mechanism of HOX gene disruption in multiple malignancies 95. One group reported preferential methylation of HOXA9 in NS with NSCLC 86,87,96; however, we observed methylation of HOXA genes to be common in CS and NS. Interestingly, whether HOX genes are up- or downregulated and the consequences of their deregulation are dependent upon cancer type 95. Taken together, our results are generally concordant with the few studies that compared CS and NS lung tumour epigenomes.  Of note, a collaborating group performed an independent analysis on our cohort using a strategy substantially different from ours. Selamat et al. grouped CS tumours and NS tumours and then compared the groups to identify differences based on tumour ?-values only (i.e. without considering methylation profiles of patient matched non-malignant samples) 87. This method is in stark contrast to our strategy of analyzing each tumour individually using its matched non-malignant profile as a reference for defining aberrant methylation and then comparing aberrant methylation patterns between groups. Selamat et al. reported a very strong positive correlation between median ?-values of CS and NS tumours and concluded that CS and NS lung AC have similar methylation patterns, however, they did not compare total aberrant methylation between groups. They identified 6 DMGs with different median ?-values in CS and NS tumour tissues, of which 5 overlapped with our candidate DMGs before multiple testing correction (IRF8, IHH, LGALS4, IL18BP, and VTN). The authors concluded that the differences in CS and NS epigenomes were modest, which is consistent with the conclusion from our analysis of the same samples.    564.4.3 Potential roles of aberrant methylation in lung tumourigenesis  Aberrant DNA methylation can cause tumour suppressor gene (TSG) silencing and oncogene activation, demonstrating its potential to contribute to tumourigenesis 80. Numerous genes we identified as hyper- and hypomethylated are established TSGs and oncogenes, respectively, according to a curated list of TSGs and oncogenes generated by the MSKCC 97. These included hypomethylation of SRC and hypermethylation of CDKN2B and WT1. This suggests the aberrant methylation we detected actually functions to promote tumour biology. Furthermore, SRC was frequently overexpressed and CDKN2B and WT1 were recurrently underexpressed in our dataset, indicating that these methylation changes likely have consequential effects on gene expression.  In addition to the known TSGs and oncogenes identified as aberrantly methylated, many of the candidate DMGs we identified are involved in biological functions relevant to cancer development and progression, including: cellular movement, growth and proliferation, cell death and survival, cell signaling and development. Interestingly, we noted aberrant methylation of several components of the Wnt/?-catenin signaling pathway, including DKKs, SFRPs, SOX15, WNTs and FRZs, implicating DNA methylation as a mechanism of Wnt pathway deregulation, which is prominent in various malignancies 98,99. Collectively, the aberrant methylation we observed affects genes with biological functions that could contribute to lung tumourigenesis. Our observations of DMGs could signify selection for gene deregulation in a smoking specific manner, implicating the involvement of different genes in CS and NS lung tumour biology.   4.4.4 Potential reasons for lack of validation of findings in external cohorts  Despite limited validation of the methylation differences we observed between CS and NS lung AC in two external cohorts, we were not discouraged by these results. The independent cohorts we assessed were not entirely appropriate for use as validation datasets in our study primarily because: i) we could not confirm whether tumour samples were micro dissected, and the quality of micro dissection could limit detection of aberrant methylation events  57which is consistent with our observations of fewer methylation events in these datasets; and ii) not all cases had patient matched tumour and non-malignant tissue, which precluded us from accounting for individual variation in methylation levels when defining aberrant methylation events specifically for individual cases. In addition to these issues, there are several other possible reasons for the poor validation of our findings in external datasets including: demographic differences in sample populations, limited sample size of ours and the external validation cohorts, and technical differences in data processing which could limit the detection of tumour specific molecular alterations. Of course we acknowledge that another possibility is that the differences we identified are modest and could just be features restricted to our own cohort. Moreover, given that we found a greater extent of aberrant hypermethylation in CS, some of the DMGs we identified as hypermethylated could just reflect the fact that CS showed more aberrant methylation overall in our dataset as opposed to signifying differential selection for aberrant methylation of specific genes. Thus, assessing the DMRs and DMGs we identified in larger cohorts with patient matched tumour and non-malignant profiles generated for DNA from micro dissected tumour tissue could shed light on whether or not the methylation differences we identified are actually smoking specific.  4.4.5 Limitations and implications of findings regarding CS and NS lung AC DNA methylation patterns  Our study is limited in its ability to determine global DNA methylation patterns because the methylation profiles we generated are based on measurements at ~28,000 individual CpG sites across the genome. In defining methylated regions, we assumed that the methylation status for stretches of DNA between CpG probes are similar based on observed methylation levels at relatively few CpG sites dispersed throughout the region. Although it is known that methylation at CpG dinucleotides is similar between neighbouring CpGs in CpG island gene promoters, a superior method for accurately defining methylated regions is bisulfite sequencing of genomic DNA. Sequencing approaches will soon be capable of assessing DNA methylation at every CpG site in the genome and thus, are a much more robust and comprehensive strategy for defining "regions" of methylation 22,100. Nevertheless, our  58strategy has provided insight about DNA methylation patterns in lung AC of CS and NS, and suggests genome wide differences may exist.  Our findings warrant further investigation and validation in a larger cohort. Based on the biological roles of genes we identified as differentially methylated, our work confirms the involvement of DNA methylation in lung AC pathogenesis. If the genome wide pattern of increased aberrant hypermethylation in CS relative to NS is confirmed in multiple independent datasets, this could provide a rationale for treating CS lung AC patients with demethylating agents alone or in combination with other targeted therapies or treatment regimes to reverse pathogenic methylation marks in an attempt to restore the normal state.    595    Chapter: MicroRNA expression in the context of smoking and lung AC  5.1 Introduction  5.1.1 MiRNAs deregulation in cancer  MicroRNAs (miRNA) are small non-coding RNAs between 18-25 bp in size that negatively regulate gene expression by directly inhibiting translation or inducing mRNA degradation 101. miRNAs are involved in almost all biological processes due to the ability of one miRNA to target up to 200 distinct mRNA transcripts. As such, interest in miRNAs, particularly in cancer has grown immensely in recent years. miRNA have been shown to contribute directly to tumour development, progression and treatment response in nearly all human cancers, representing promising and biologically relevant biomarkers 102,103. miRNA have also emerged as critical players in lung cancer 104.  5.1.2 Effects of smoking on miRNA expression  The molecular effects of smoking are widespread and are associated with genetic and epigenetic modifications that alter transcriptional regulation of many lung cancer related genes. Like protein coding genes, miRNA are broadly deregulated in response to active cigarette smoke. Animal studies have shown that miRNA disruption occurs in all lung cells exposed to cigarette smoke, in a dose-dependent manner, affecting functions such as anti-oxidant response, DNA repair, inflammation and apoptosis among others 105. Profiling of non-malignant lung tissues has revealed a strong concordance between miRNA altered in animal and human studies in response to cigarette smoke 105.   5.1.3 Rationale for assessing miRNA deregulation in lung AC of CS and NS  Given the disruption of miRNAs in lung cancer and the known deregulation of miRNA in response to cigarette smoke, we hypothesized that analogous to distinct smoking related patterns observed at the level of DNA, miRNA disruption patterns in non-malignant and  60malignant tissues of CS and NS lung AC patients would be distinct. Smoking specific miRNA expression patterns may be reflective of distinct tumourigenic processes selected by the unique tumour promoting environments in CS and NS. Since miRNA are capable of regulating key genes and pathways in cancer, differentially expressed miRNA may represent promising prognostic or therapeutic targets. Thus, we aimed to characterize and compare miRNA expression profiles of lung ACs from CS, FS, and NS, to identify differentially expressed miRNA across the smoking groups, and determine whether miRNA deregulation is yet another mechanism differentially disrupted in lung tumourigenesis of CS and NS.  5.2 Methods  5.2.1 MicroRNA sequencing  Total RNA from matched tumour and non-malignant lung tissues for 37 CS, 22 FS, and 14 NS was subjected to miRNA sequencing. miRNA-seq libraries were constructed, bar-coded for multiplex sequencing and sequenced using a plate-based protocol developed at the British Columbia Genome Sciences Centre (BCGSC) using the Illumina HiSeq 2000 sequencing platform; and miRNA alignment and quantification was performed using the BCGSC protocol 52. Expression for identical miRNAs expressed from different genomic locations were summarized (e.g. miR-101 is expressed from loci on chromosomes 1 and 9, so sequence reads attributed to its chromosome 1 locus and sequence reads attributed to its chromosome 9 locus were summed to determine the overall miR-101 expression level). Reads were normalized as reads per kilobase of exon model per million mapped reads (RPKM). miRNA with read counts < 1 were considered not expressed. In total, 919 miRNA were expressed in at least one sample across our cohort and were used for subsequent statistical analyses.  5.2.2 Clustering of miRNA expression profiles  miRNA expression patterns were compared in tumour and patient matched non-malignant lung tissues of CS, FS, and NS. To examine the overall similarities and differences in  61miRNA profiles, unsupervised hierarchical clustering was performed on all samples, tumour samples only, and normal samples only using Genesis Software 106. A Fisher's Exact test was performed to examine the association of miRNA expression with the tumour status and smoking histories of the profiles in the clusters identified. A Student's t-test was used to assess differences in pack years for CS and FS among the clusters identified. To determine which clinical factors were most strongly associated with grouping of non-malignant and tumour miRNA expression profiles into distinct clusters, a MANOVA test was performed. For all statistical tests, a p-value < 0.05 was considered significant.  5.2.3 miRNA expression patterns in non-malignant and malignant lung tissues  To identify miRNA in non-malignant lung tissue whose expression is likely modulated in response to smoking, we performed a non-parametric permutation test between non-malignant CS and NS tissues (CSN and NSN, respectively) using 10,000 permutations. Permutation scores were corrected for multiple testing using the BH method. miRNA that had a i) BH corrected p <  0.05 and ii) an average fold change  > 2.0 or < 0.5 were considered differentially expressed (DE) between CSN and NSN tissues. To identify smoking related miRNA that may be reversibly expressed upon smoking cessation, permutation tests were similarly run between FS non-malignant tissues (FSN), CSN and NSN groups. miRNA were considered reversibly expressed upon smoking cessation if they i) were DE between CSN and NSN and between CSN and FSN in the same direction, and ii) had a CSN/FSN fold change > = 2, but a FSN/NSN fold change < 2. Conversely, miRNA were considered irreversibly expressed upon smoking cessation if they i) were DE between CSN and NSN and DE between FSN and NSN, and ii) had a CS/FS fold change < 2, and a FS/NS fold change > = 2. To identify aberrantly expressed miRNAs in tumours for each smoking group, we applied a pair-wise Wilcoxon Sign Rank test coupled with fold change and frequency criteria to expression levels in our paired tumour and non-malignant samples. miRNAs with Wilcoxon BH corrected p-values < 0.05 and tumour/normal fold changes exceeding 2-fold in > 25% of cases were deemed recurrently aberrantly expressed within a particular smoking group. To determine miRNA whose frequency of disruption was significantly different  62between CS and NS groups, we applied a Fisher's Exact test with multiple testing correction (BH p < 0.05).   5.2.4 miRNA target gene analysis   To assess the potential biological consequences of miRNAs deregulated in CS and NS, we downloaded a list of all 2283 experimentally supported miRNA:mRNA interactions from miRTarBase (http://mirtarbase.mbc.nctu.edu.tw/) 107. Using this list, miRNAs specifically over- or underexpressed in CS or NS tumours were aligned to their mRNA targets and integrated with mRNA expression fold change data for 26 CS and 14 NS with patient matched mRNA and miRNA expression profiles to identify genes negatively regulated by the miRNA we identified. The frequency of cases exhibiting aberrant miRNA expression with negatively correlated mRNA expression (e.g. overexpression of miRNA and underexpression of its target gene, or vice versa) was determined on a tumour by tumour basis in each smoking group. mRNA targets exhibiting this inverse expression criteria in at least 20% of CS or NS were subjected to Ingenuity Pathway Analysis (IPA, www.ingenuity.com) which performs a Fisher's Exact test comparing the input gene list against gene lists annotated in hundreds of cellular pathways to identify significantly over-represented pathways in the input gene list. Pathways with a p-value < 0.05 were deemed significant.  5.2.5 Validation of miRNA expression changes in the TCGA lung AC dataset  Level 3 miRNA sequencing data (processed and normalized as described in the TCGA Data Primer, https://wiki.nci.nih.gov/display/TCGA/TCGA+Data+Primer) obtained from the TCGA was used as a validation set. This validation cohort was comprised of 12 FSN, 9 CSN, and 2 NSN (Table 2.3). miRNA identified as differentially expressed between CSN and NSN in our dataset were considered validated if the fold change between CSN and NSN observed in the TCGA samples matched the direction of that we observed in our own cohort. This dataset was also used to validate reversibly and irreversibly expressed miRNA identified in FSN. These miRNA were considered validated if the fold changes for CSN/FSN > 2, CSN/NSN > 2, and FSN/NSN < 2 for reversible miRNA; or, fold changes for CSN/FSN <2,  63CSN/NSN > 2, FSN/NSN > 2 for irreversibly expressed miRNA. As there were very few cases with matched tumour and non-malignant profiles in each smoking group, we were unable to examine miRNA identified as specifically deregulated in lung AC of CS, FS, or NS in the TCGA cohort.  5.3 Results  5.3.1 Clustering of miRNA expression profiles  To examine the similarity in miRNA expression profiles across the 146 tumour and non-malignant tissues, we performed unsupervised hierarchical clustering on the 919 miRNA we detected as expressed in at least one sample. Clustering analysis revealed two distinct clusters (Figure 5.1a); one comprised of only non-malignant ("normal") samples and the other comprised of all tumour samples and four non-malignant profiles from three CS and one FS, indicating a significant difference in the distribution of tumour and non-malignant profiles among the clusters (Figure 5.1 d, Fisher's Exact test, p = 2.2 x 10-16). Given that we were interested in determining the effects of smoking on miRNA expression, we next clustered non-malignant and tumour miRNA profiles independently. Clustering of non-malignant profiles revealed one cluster predominantly comprised of CS and another comprised of a mixture of CS, FS, and NS (Figure 5.1b,e). The difference in smoking histories between clusters was significant (Fisher's Exact test, p = 0.016). Of note, upon comparing the pack years for CS and FS in the two clusters, we found that the average number of pack years was greater in the cluster dominated by CS, for both CS (Student's t-test, p = 0.008) and FS (Student's t-test, p = 0.009) non-malignant samples. A MANOVA analysis revealed that in non-malignant samples, race was the clinical variable most strongly associated with cluster grouping (F-value = 52.291, p = 5.81 x 10-8), followed by smoking status (F-value = 15.204, p = 3.05 x 10-5), and the number of pack years (F-value = 7.394, p = 7.64 x 10-3) (Table 5.1). Clustering of tumour profiles again revealed two distinct clusters similar to those observed in non-malignant tissues (Figure 5.1c,f). The distribution of CS, FS, and NS was significantly different between clusters (Fisher's Exact test, p = 1.53 x 10-4), as were the pack years for FS (Student's t-test, p = 9.03 x 10-4) but not CS (Student's t-test, p = 0.242). Multivariate results  64determined that EGFR mutation (F-value = 17.422, p = 1.58 x 10-5) and smoking status (F-value = 12.742, p = 1.39 x 10-4) were most strongly associated with cluster grouping for the tumour samples (Table 5.1). Collectively, these results suggest miRNA expression profiles are dependent on smoking histories in both tumour and non-malignant lung tissues.  5.3.2 miRNAs are differentially expressed between non-malignant lung tissues of CS and NS with lung AC  Based on the observed clustering patterns, we aimed to identify miRNA differentially expressed in non-malignant tissues of CS and NS, as these two groups represent the most extreme smoking phenotypes. Since we interrogated non-malignant tissues from individuals with cancer, miRNA associated with specific smoking histories could be indicative of pre-malignant changes important to lung tumourigenesis. We identified 43 miRNA that were significantly, differentially disrupted between CSN and NSN (current and never smoker non-malignant tissues, respectively), of which 32 were overexpressed and 11 underexpressed in CSN compared to NSN (Table 5.2). To validate our findings, miRNA expression profiles from 9 CSN and 2 NSN tissues from lung AC patients were downloaded from the TCGA. Of the 43 miRNA we identified, 37 were annotated in the TCGA dataset and of the 37 miRNA available for cross cohort validation, 31 had detectable expression (RPKM > 0). Of these 31 miRNA, 87% (20/23) of the miRNA identified as elevated in CSN and 38% (3/8) of the miRNA underexpressed in CSN validated in the TCGA cohort, for a total validation rate of 74%. Thus, particular miRNAs seem to be differentially expressed in non-malignant tissues of lung AC patients in response to smoke exposure.     65Figure 5.1  Figure 5.1 Hierarchical clustering of miRNA expression in tumour and adjacent non-malignant lung tissues and distribution of CS, FS, and NS in miRNA expression clusters  66Figure 5.1 Hierarchical clustering of miRNA expression in tumour and adjacent non-malignant lung tissues and distribution of CS, FS, and NS in miRNA expression clusters. Clustering of all 146 miRNA expression profiles revealed two distinct clusters, one comprised of non-malignant samples, and the other comprised of mostly tumours (A). Clustering of non-malignant tissues only (B) and tumours only (C) also revealed two distinct clusters. The clusters identified based on all samples (A) were associated with malignancy, as clusters 1 and 2 were significantly enriched for tumours and non-malignant profiles, respectively (Fisher?s Exact test p < 0.05) (D). Assessment of the distribution of CS, FS, and NS within the clusters identified in non-malignant samples revealed enrichment for CS and FS with higher pack years of smoking history in cluster 1 compared to cluster 2 (Fisher?s Exact test p < 0.05) (E). The same trend was observed in the clusters identified based on tumour profiles (Fisher?s Exact test p < 0.05) (F).    Table 5.1 MANOVA results for the effects of clinical variables on miRNA expression clustering Non-malignant: Cluster1 vs Cluster 2 Df Sum  Squares Mean  Squares F-val P-val Stage 3 0.0688 0.0229 0.2665 0.84897 Gender 1 0.3354 0.3354 3.8972 0.057958 Age 1 0.069 0.069 0.8019 0.377892 Smoking 2 2.6167 1.3084 15.2037 3.05E-05 Race 1 4.5 4.5 52.2913 5.81E-08 Pack Years 35 7.394 0.2113 2.4549 0.007638 Residuals 29 2.4956 0.0861 - - Tumour: Cluster1 vs Cluster 2 Df Sum  Squares Mean  Squares F-val P-val Stage 3 0.0891 0.02972 0.1886 0.903166 Gender 1 0.075 0.07502 0.4762 0.496245 Age 1 0.0791 0.07915 0.5024 0.484735 Smoking 2 4.0144 2.00719 12.7421 0.000139 EGFR 2 5.4887 2.74434 17.4217 1.58E-05 KRAS 1 0.0013 0.00125 0.008 0.92958 Race 1 0.0051 0.00508 0.0322 0.85892 Pack Years 34 4.1517 0.12211 0.7752 0.759548 Residuals 26 4.0956 0.15752 - - Df = degrees of freedom  67Table 5.2 miRNAs differentially deregulated in non-malignant lung tissue of CS and NS with lung AC miRNA Average RPKM     CSN Average RPKM     NSN Corrected   P-val Status*          (relative         to NSN) Fold Change CSN/NSN Validation Status hsa-mir-106a 23 8 0 OE in CSN 2.83 Yes hsa-mir-1248 3 0 1.48 x 10-5 OE in CSN 3.32 Unknown hsa-mir-136 32 14 5.07 x 10-11 OE in CSN 2.32 Yes hsa-mir-142 3997 2002 1.84 x 10-8 OE in CSN 2.00 Yes hsa-mir-146a 245 122 1.43 x 10-8 OE in CSN 2.01 Yes hsa-mir-151b 3 0 0 OE in CSN 3.17 Unknown hsa-mir-154 5 2 1.24 x 10-11 OE in CSN 2.37 Yes hsa-mir-184 34 16 8.63 x 10-5 OE in CSN 2.10 Yes hsa-mir-185 114 56 5.65 x 10-14 OE in CSN 2.05 Yes hsa-mir-19a 26 8 0 OE in CSN 3.33 Yes hsa-mir-19b 226 87 0 OE in CSN 2.61 Yes hsa-mir-219 3 1 9.80 x 10-8 OE in CSN 2.33 No hsa-mir-339 126 42 1.44 x 10-7 OE in CSN 3.00 Yes hsa-mir-342 250 110 1.43 x 10-10 OE in CSN 2.29 Yes hsa-mir-3648 3 0 3.22 x 10-4 OE in CSN 3.32 Unknown hsa-mir-365a 86 35 3.35 x 10-9 OE in CSN 2.47 Unknown hsa-mir-365b 86 35 3.41 x 10-9 OE in CSN 2.46 Unknown hsa-mir-3687 2 0 1.47 x 10-4 OE in CSN 2.34 Unknown hsa-mir-369 10 5 2.27 x 10-10 OE in CSN 2.17 Yes hsa-mir-374c 4 0 0 OE in CSN 3.79 Unknown hsa-mir-377 4 1 1.45 x 10-6 OE in CSN 2.87 Yes hsa-mir-378a 320 109 0 OE in CSN 2.93 Unknown hsa-mir-378c 6 0 0 OE in CSN 6.22 Yes hsa-mir-425 346 106 2.12 x 10-8 OE in CSN 3.27 Yes hsa-mir-508 29 10 4.61 x 10-6 OE in CSN 3.04 No hsa-mir-509 9 2 1.35 x 10-6 OE in CSN 3.97 No hsa-mir-514a 19 6 1.48 x 10-5 OE in CSN 3.23 Unknown hsa-mir-539 3 1 5.94 x 10-8 OE in CSN 2.51 No hsa-mir-627 3 0 1.03 x 10-7 OE in CSN 3.10 Unknown hsa-mir-628 32 9 0 OE in CSN 3.73 Yes hsa-mir-629 67 33 0 OE in CSN 2.00 No hsa-mir-655 3 1 4.69 x 10-10 OE in CSN 2.38 No hsa-mir-107 313 1793 3.37 x 10-13 UE in CSN 0.17 Yes hsa-mir-1180 14 28 3.37 x 10-7 UE in CSN 0.49 No hsa-mir-148b 213 423 1.87 x 10-11 UE in CSN 0.50 No hsa-mir-18b 1 4 2.30 x 10-4 UE in CSN 0.38 Yes hsa-mir-23c 1 3 1.87 x 10-3 UE in CSN 0.45 Unknown hsa-mir-320b 23 124 1.37 x 10-12 UE in CSN 0.18 Yes hsa-mir-326 125 585 7.80 x 10-11 UE in CSN 0.21 Yes hsa-mir-3615 3 6 9.71 x 10-5 UE in CSN 0.50 Yes hsa-mir-4532 12 33 3.38 x 10-3 UE in CSN 0.36 Unknown hsa-mir-592 2 5 5.70 x 10-5 UE in CSN 0.36 Yes hsa-mir-934 2 4 2.51 x 10-9 UE in CSN 0.42 Yes *OE = overexpressed, UE = underexpressed; horizontal line separates over- and underexpressed genes 685.3.3 miRNA are reversibly or irreversibly expressed in non-malignant tissues of individuals with lung AC  As we also profiled FS, we next wondered whether any miRNA with differential expression in CSN and NSN displayed reversible or irreversible expression levels in FS as this is known to occur for protein coding genes 108,109. We hypothesized that differential expression in CSN and NSN and similar expression in FSN and NSN would indicate miRNA whose expression levels had returned to those of a NS in response to smoking cessation (i.e. reversible miRNAs). Conversely, miRNA with differential expression in CSN and NSN but similar expression levels in CSN and FSN would indicate miRNA whose expression changes persist upon smoking cessation (i.e. irreversible miRNA). Based on differential expression and fold change criteria outlined in section 5.2.3, we investigated the non-malignant profiles for miRNA exhibiting these expression patterns. This analysis identified 5 miRNA exhibiting patterns consistent with reversible expression and 16 miRNA with irreversible expression in FSN (Figure 5.2, Table 5.3). Considering 12 FSN in addition to the 9 CSN and 2 NSN profiles from the TCGA, we validated one miRNA, miR-592 as irreversibly expressed in FSN. However, as we could only assess 11/21 miRNA in the TCGA cohort due to annotation and detection differences between our miRNA expression profiles, our ability to validate our findings was limited. Interestingly, miR-1248, miR-3648, miR-3687, and miR-377, whose expression in FSN reverted down to levels similar to those observed in NSN, showed frequent (>25%) overexpression in FS tumours. This could suggest that upregulated expression of these miRNA (or, failure to reverse expression levels in FSN) in some FS may contribute to lung tumourigenesis.     69Figure 5.2  Figure 5.2 miRNAs whose expression is reversible or irreversible in non-malignant lung tissues of FS with lung AC. Comparison of miRNA expression patterns in CS, FS, and NS non-malignant tissues revealed several irreversibly and reversibly expressed miRNA. Irreversible miRNA exhibit similar expression levels in CS and FS, and disparate levels in FS and NS (A). miR-4532 and miR-592 exhibited this pattern, suggesting expression changes in these miRNA persist after smoking cessation. Conversely, reversible miRNA have similar levels in FS and NS, and disparate levels in FS and CS (B). miR-377 and miR-18b showed expression patterns consistent with reversibility upon smoking cessation. Asterisks indicate p-values < 0.05.    70Table 5.3 Reversible and irreversible miRNAs identified in non-malignant lung tissues of lung AC patients miRNA Class in FS Average CSN Average FSN Average NSN Fold Change CSN/NSN Fold Change CSN/FSN Fold Change FSN/NSN Validation Status hsa-mir-106a IRREV 23.28 18.86 8.24 2.83 1.23 2.29 No hsa-mir-151b IRREV 3.17 2.28 0.00 3.17 1.39 2.28 Unknown hsa-mir-19a IRREV 25.96 18.93 7.79 3.33 1.37 2.43 No hsa-mir-23c IRREV 1.19 1.31 2.67 0.45 0.91 0.49 Unknown hsa-mir-326 IRREV 124.77 229.78 585.45 0.21 0.54 0.39 No hsa-mir-374c IRREV 3.79 2.21 0.00 3.79 1.72 2.21 Unknown hsa-mir-378c IRREV 6.22 3.86 0.00 6.22 1.61 3.86 Unknown hsa-mir-425 IRREV 346.23 220.87 105.90 3.27 1.57 2.09 No hsa-mir-4532 IRREV 11.91 9.55 32.72 0.36 1.25 0.29 Unknown hsa-mir-508 IRREV 29.14 21.14 9.57 3.04 1.38 2.21 No hsa-mir-509 IRREV 8.58 6.46 2.16 3.97 1.33 2.99 No hsa-mir-514a IRREV 19.18 12.34 5.95 3.23 1.55 2.08 Unknown hsa-mir-539 IRREV 3.26 3.32 1.30 2.51 0.98 2.55 Unknown hsa-mir-592 IRREV 1.93 2.51 5.38 0.36 0.77 0.47 Yes hsa-mir-628 IRREV 32.07 30.81 8.59 3.73 1.04 3.59 No hsa-mir-629 IRREV 66.84 67.19 33.44 2 0.99 2.01 No hsa-mir-1248 REV 3.32 1.24 0.00 3.32 2.67 1.24 No hsa-mir-18b REV 1.37 3.37 3.56 0.38 0.41 0.95 No hsa-mir-3648 REV 3.32 0.00 0.00 3.32 3.32 1.00 Unknown hsa-mir-3687 REV 2.34 0.00 0.00 2.34 2.34 1.00 Unknown hsa-mir-377 REV 4.17 1.68 1.46 2.87 2.48 1.16 Unknown Horizontal line separates irreversible and reversible genes.   5.3.4 CS, FS, and NS lung AC tumours exhibit aberrant expression of common and different miRNAs  After characterizing the effects of smoking on non-malignant lung tissues from lung AC patients, we proceeded to investigate miRNA expression patterns in lung tumour tissue. We first compared expression profiles for tumour and patient matched non-malignant tissues to identify miRNA recurrently and differentially deregulated in tumours of each smoking group. This revealed: 232 overexpressed and 59 underexpressed miRNA in CS tumours; 271 overexpressed and 45 underexpressed miRNA in FS tumours; and 273 overexpressed and 34 underexpressed miRNA in NS tumours (Figure 5.3). We found that the majority of  71differentially expressed miRNAs were overexpressed and that most were common between the three smoking groups (Figure 5.3). Oncogenic miRNA overexpressed in all three smoking groups included miR-17, miR-20a, miR-21, and miR-106a which have a variety of roles in promoting cancer cell growth and inhibiting apoptosis 110. Conversely, tumour suppressive miRNA underexpressed in all three groups included let-7a, let-7c, miR-34c, miR-101, and miR-143 which positively regulate apoptosis and cell cycle arrest 110.   We also identified numerous miRNA specifically deregulated in one smoking group only, which are listed in Table 5.4. These miRNA had significantly different expression levels in tumour and matched non-malignant samples (Wilcoxon Sign Rank test, BH corrected p < 0.05) and exhibited a tumour/normal expression fold change greater than 2-fold in at least 25% of the samples in one smoking group only. All miRNA identified as specific to CS and NS were significantly differentially deregulated between these two groups (Fisher's Exact test, BH corrected p-value < 0.05) except for miR-5571. Due to the lack of patient matched tumour and non-malignant miRNA expression profiles in the TCGA database (2 CS, 4 FS, and 1NS), which precludes us from defining miRNA expression status in individual tumours and from statistically comparing expression levels between groups, we could not validate the smoking specific nature of the miRNAs we identified. Although the overlap among CS, FS, and NS tumours suggests common mechanisms of lung tumourigenesis are mediated by miRNA deregulation, the smoking-specific miRNA we identified could be involved in deregulating oncogenic processes specific to individual smoking groups.   72Figure 5.3  Figure 5.3 Venn diagram depicting miRNAs commonly and differentially expressed in CS, FS and NS lung AC tissues miRNAs exhibiting i) 2-fold expression changes between tumour and matched non-malignant samples in at least 25% of CS, FS, or NS, and ii) significantly different expression in tumour and non-malignant samples based on a Wilcoxon sign rank test, corrected p < 0.05 were assessed to determine the overlap in disruption between the groups. Overexpressed miRNA are depicted in (A) and underexpressed miRNA are depicted in (B). The majority of miRNAs differentially expressed between tumour and non-malignant tissue were overexpressed. We also noted that most of the miRNA we identified were deregulated in all three smoking groups.   73Table 5.4 miRNAs differentially deregulated in lung AC tissues of CS, FS, and NS Overexpressed miRNA Underexpressed miRNA CS FS NS CS FS NS miR-142 miR-105 miR-103a let-7b miR-203 miR-582 miR-18b miR-145 miR-1295a miR-135a miR-607 miR-337 miR-151a miR-150 miR-138 miR-369 miR-155 miR-181c miR-194 miR-411 miR-190b miR-185 miR-195 miR-4746 miR-1910 miR-20b miR-3065 miR-496 miR-221 miR-2114 miR-34b miR-545 miR-2355 miR-216a miR-378a miR-5571* miR-23c miR-217 miR-378c miR-576 miR-26a miR-30c miR-4532 miR-592 miR-28 miR-3150b miR-4536 miR-624 miR-30b miR-3158 miR-511 miR-7 miR-3187 miR-3168 miR-5683 miR-891a miR-320a miR-320e miR-676 miR-330 miR-329 miR-532 miR-3690 miR-340 miR-497 miR-3609 miR-504 miR-3614 miR-767 miR-4443 miR-944 miR-4497 miR-4646 miR-4745 miR-4791 miR-5091 miR-5187 miR-543 miR-5701 miR-612 miR-636 miR-639 miR-660 miR-663a miR-93     miR-532       miRNA expression levels were derived by miRNA sequencing and annotated using miRBase v18. * miR-5571 was not significantly differentially overexpressed between CS and NS based on a Fisher's Exact test   745.3.5 Potential biological roles of candidate miRNA specific to CS and NS  As we identified miRNA preferentially deregulated in either CS or NS, we were interested in determining how these miRNA could contribute to CS or NS specific lung tumourigenesis. For the miRNA specifically deregulated in CS or NS (regardless of FS), we i) identified their experimentally validated gene targets 107, ii) determined the frequency that miRNA deregulation was associated with consequential over- or underexpression of its target gene, and iii) performed pathway analyses on those with a frequency of inverse miRNA-mRNA expression levels > 20% in CS or NS. We reason that aberrant miRNA expression with concordant target gene over- or underexpression in the same tumour may signify regulation of the target gene by the miRNA in our particular cohort of lung tumours.  Cross referencing target genes of miRNA specifically deregulated in CS or NS with a list of known cancer genes curated by the MSKCC revealed several oncogenes and TSGs 97. In CS, we observed underexpression of miR-34b and miR-140 with consequential upregulation of their oncogenic target VEGFA, and underexpression of let-7b with upregulation of its target CDC25A. In NS, overexpression of miR-93 and miR-217 was associated with downregulation of the TSGs TUSC2 and FHIT, respectively. Pathway analysis on target genes of miRNA disrupted in CS and NS revealed both common and different pathways. Commonly affected pathways included several cancer pathways such as NSCLC, ovarian cancer, pancreatic AC, and glioma signaling (Figure 5.4). Some of the components in these pathways were commonly disrupted reflecting the fact that different miRNA disrupted in CS and NS regulate the same target genes, indicating that different miRNAs may be deregulated to achieve the same effect. Pathways preferentially disrupted through miRNA deregulation in CS included cell cycle regulation by BTG (B-cell translocation gene) family proteins (E2F3), melanoma signaling (MITF), and retinoic acid receptor activation (VEGFA, NCOR2) while those in NS included PI3K/AKT signaling (FOXO1, CDKN1A, BCL2), PPAR signaling (PPARG, TRAF6), and NF-?B signaling (TRAF6, TNFAIP3). Thus, miRNAs disrupted specifically in CS and NS may regulate different oncogenes and TSGs and are involved in cancer related pathways, suggesting miRNA deregulation could play a role in CS and NS specific lung tumourigenesis. However, biological validation of these correlations,  75specifically by measuring protein levels of the target genes and pathway components identified, is essential to confirm the proposed biological implications of these findings.  5.4 Discussion  5.4.1 Summary of findings  In this chapter we discovered that CS and NS exhibit common and disparate patterns of miRNA expression. We found that smoking modulates miRNA expression in non-malignant lung tissues as numerous miRNA were differentially expressed between CS and NS. FSN miRNA profiles were similar to both CS and NS, and in non-malignant tissues, we identified miRNA that are candidates for reversible or irreversible expression upon smoking cessation. We also identified several miRNA differentially expressed in CS and NS tumours that regulate important cancer genes and are involved in multiple oncogenic pathways, suggesting miRNA disruption is another mechanism that contributes to the differential development of lung cancer in CS and NS.  5.4.2 miRNA expression patterns are influenced by smoking  miRNA are known to be disrupted in response to cigarette smoke. However, to date, most studies aimed at deciphering the effects of smoking on miRNA expression patterns have been done in animals or have been limited by the use of miRNA microarrays, which are only capable of measuring a fraction of human miRNAs 111-116. We generated miRNA expression profiles by sequencing which is an unbiased approach capable of measuring all currently annotated miRNAs. Not surprisingly, we found that non-malignant and tumour profiles clustered separately, and primarily based on smoking status, with samples having higher pack-year histories clustering together. Thus, the miRNA expression profiles we generated are consistent with reports of smoking modulation of miRNA expression.     76Figure 5.4  Figure 5.4 Select pathways commonly and differentially affected by miRNA identified as smoking specific Pathway analyses were performed on biologically validated target genes of the miRNA we identified as differentially expressed in CS and NS smokers. This revealed numerous cancer related pathways that were commonly and differentially disrupted between the groups, indicating both redundancy and specificity in the biological consequences of differential miRNA disruption in CS and NS lung AC. The orange horizontal line indicates the threshold for significance (Fisher?s Exact test p < 0.05).    775.5.1 Modulation of miRNA expression in non-malignant lung tissues in response to smoking and further expression deregulation in tumour tissues  miRNA have numerous documented biological functions, including the induction of anti-oxidant, detoxification, inflammatory and apoptotic pathways, indicating that miRNA are important regulators of a broad range of protective cellular mechanisms induced upon cigarette smoke exposure 105. A number of miRNA are up- or downregulated in response to smoking to combat the toxic effects of smoke exposure 112. While expression of many miRNA can revert to normal levels after cessation of short term cigarette smoke exposure, chronic smoke exposure results in persistent activation of many pro- carcinogenic processes and the inactivation of tumour suppressing processes, as evidenced by multiple human and animal studies 105. In analyzing miRNA expression patterns of non-malignant lung tissues from individuals with lung AC, we discovered several miRNA with differential expression between CSN and NSN and some that exhibited reversible or irreversible phenotypes. Of the miRNAs differentially expressed between CSN and NSN tissues, miR-142, miR-154, and miR-378 exhibited significantly higher expression levels in CSN versus NSN consistent with the literature 105. miR-378 had irreversibly elevated expression, suggesting its persistence could promote smoker associated tumourigenesis. Interestingly, these three miRNAs have been implicated in inflammation, angiogenesis, proliferation, and drug resistance, all of which are hallmarks of cancer 105.  With respect to the reversibly expressed miRNA we identified, 4 of 5 which exhibited expression levels that reverted down to the level of NSN were frequently overexpressed in FS tumours, possibly implicating the importance of their expression in lung tumour development. Although our analyses revealed miRNA with trends of reversible and irreversible expression in FSN, we acknowledge that validation of these miRNA is required, especially given the overlap in miRNA expression levels we observed in CSN, FSN, and NSN.  It is possible that smoking may create a primed environment for lung cancer development through miRNA deregulation. The smoking related miRNA disruption we observed in non-malignant tissues indicates that deregulation of miRNA could be an early event in lung tumourigenesis of smokers. Although we acknowledge that without interrogation of non- 78malignant lung tissues from individuals without lung cancer, it is difficult to surmise whether differential expression of miRNAs in CSN compared to NSN is due to a biological response to active smoking or related to early tumourigenic processes specific to CS. Nevertheless, we noted three miRNA exhibiting differential expression in CSN and NSN and further expression deregulation in tumour tissues. These included miR-4532 which was significantly underexpressed in CSN compared to NSN and further underexpressed in CS tumours relative to CSN, and miR-142 and miR-369 which were overexpressed in CSN relative to NSN and further overexpressed in CS tumours relative to CSN. The effect of miR-142 on cancer biology appears to be context dependent as it has been reported to have TSG properties in colon and ovarian cancers and oncogenic properties in T-cell acute lymphoblastic leukemia117-119. Elevated levels of miR-142 were detected in the serum of early stage lung AC patients who experienced relapse, suggesting this miRNA may be a biomarker for assessing risk of recurrence 120. miR-369 has been shown to negatively regulate the de novo methyltransferases, DNMT3A and DNMT3B, and is predicted to regulate BRCA2, which implicates the potential for this miRNA to impair DNA methylation and repair processes in lung cancer 121. To date, there are no reports describing validated targets or functions of miR-4532, but interestingly, this miRNA was also irreversibly expressed in FS. Based on these findings, aberrant expression of miR-142, miR-369 and miR-4532 induced by smoking could represent premalignant changes that are further exacerbated in established tumours.  5.5.2 miRNA are commonly and differentially deregulated in lung AC smoking groups  The majority of miRNA differentially expressed between tumour and non-malignant lung tissues were common among smoking groups. Many of these miRNAs have been previously reported as tumour suppressors or as having oncogenic properties in lung cancer including: downregulation of miR-101, let-7a, let-7c, miR-1, miR-126, and miR-30a and upregulation of miR-106a, miR-17, miR-21, miR-31 and miR-92b 122,123. Since oncogenes and TSGs are critically involved in tumour development and progression, it is clear that deregulation of these miRNA can promote tumourigenesis. The fact that frequent, altered expression of some miRNA was observed in all smoking groups suggests disruption of these miRNA represents  79a common mechanism driving tumour biology in lung AC regardless of smoking status. Our results also provide evidence to the contrary, as the miRNA we identified as differentially deregulated between smoking groups suggest specific miRNA may be involved in mediating smoking specific mechanisms of lung tumourigenesis. We demonstrated that miRNA deregulation was frequently associated with inverse expression of target genes in individual samples, and analysis of the mRNA targets of smoking specific miRNA revealed that different oncogenes and TSGs were affected, and that common and different pathways were disrupted. These findings provide evidence of the potential biological consequences of miRNA deregulation and differential selection for disruption of specific miRNAs in CS and NS lung tumourigenesis. To our knowledge, this is the first study to directly compare miRNA expression patterns in CS and NS lung AC; thus, we could not validate the smoking specific nature of the miRNA deregulation we observed in independent cohorts.  5.5.3 Implications of deregulated miRNAs identified   miRNAs represent promising clinical biomarkers, providing a strong rationale for studying them in cancer 104,123. miRNA serve as diagnostic markers capable of distinguishing malignant and non-malignant lung tissues, detecting lung cancer based on miRNA expression in sputum samples or patient serum, and differentiating histological subtypes 104. The association of high miR-21 and miR-17-92 expression with poor patient outcome exemplifies the use of miRNAs as prognostic biomarkers 104. Furthermore, miRNAs are capable of mediating response to treatment and specific miRNA have been linked to drug sensitivity and resistance, demonstrating their predictive potential 104. Our finding of smoking associated deregulation of miRNAs in non-malignant lung tissues from individuals with cancer could suggest that smoking induces premalignant changes involved in tumour initiation. Although validation and further investigation of the biological effects of smoking mediated miRNA expression changes are required in human models, miRNA affected by cigarette smoking could represent targets for chemoprevention, another potentially clinically translatable finding 124. Numerous miRNAs that are up- or downregulated in response to chemoprevention agents have been identified. For example, miR-107, which we identified as downregulated in response to smoking, is upregulated in response to vitamin A 124. Thus,  80potential exists to reverse the effects of smoking induced miRNA changes in smokers at risk of developing lung cancer, and encouragingly, progress has already been made in this area in animal models 125.  Since we have discovered that CS and NS lung AC harbour molecular differences in other genetic and epigenetic dimensions, we were not surprised to identify differential miRNA expression patterns. DNA copy number and methylation changes may account for some of the global gene expression differences observed in CS and NS, and our results provide evidence that miRNAs underlie gene expression changes as well 126,127. We suspect that differential deregulation of miRNAs and their consequential effects on gene and pathway deregulation may contribute to the disparate clinical phenotypes observed in CS and NS lung AC patients. If the miRNA we identified as specific to CS or NS lung cancer are validated in larger comparative studies, they could represent novel therapeutic targets, with tumour response likely dependent on the smoking status of lung AC patients. Given that miRNA can have multi-faceted effects, the biological effects of miRNA deregulated in lung AC should be well characterized before designing therapies to inhibit them. This represents a significant challenge since in addition to the relatively few mRNA targets that have been proven experimentally, miRNA have hundreds to thousands of predicted targets that could be affected upon therapeutic manipulation of expression levels. Understanding the complement of biological consequences of miRNA inhibition or restoration in lung cells and in cells throughout the body will be critical for therapeutic development.     816    Chapter: Development of an integrative genomics analysis strategy and application to lung AC of CS and NS  6.1 Introduction  6.1.1 Multi-dimensional 'omics profiling of cancer and the need for novel integrative strategies for analyzing individual tumours  Genomic profiling of tumour tissues has become a routine practice in cancer research. These efforts have resulted in numerous discoveries, some of which have been translated into clinical practice to improve the management of cancer patients 12. Like the goal of this thesis, the aim of most profiling studies is to elucidate the molecular mechanisms underlying tumourigenesis to inform treatment decisions and improve patient prognosis. With the increasingly affordable cost and decreasing tissue requirements of most platforms, the interrogation of more than one 'omic level for a single tumour sample is now common. Work by the TCGA and others has led to the generation of thousands of multi-'omics molecular profiles for various human malignancies including: glioblastoma, colon, breast, ovarian, and squamous lung cancers, and represents a major effort towards this aim 9,49-52,128. Since data deposition is now required for publication in most journals, 'omics data in the public domain is becoming increasingly available and abundant. As cancer genomes are disrupted by a variety of genetic mechanisms (i.e. at multiple 'omics levels), strategies to integrate multi-dimensional data are essential to further our understanding of cancer biology 47-49. Furthermore, since personalized medicine is an emerging reality, understanding the mechanisms driving individual tumours is imperative for tailoring treatment to specific tumour features. Thus, the development of methods to integrate multi-'omics data on an individual tumour level is much needed.      826.1.2 Rationale for developing an algorithm to prioritize candidate genes discovered through integrative genomic analysis  Development of such analysis strategies is complicated by the fact that tumours harbour hundreds to thousands of gene disruptions across multiple 'omics levels, making it difficult to distinguish biologically relevant driver events from those with no biological consequences (i.e. passengers). Currently available algorithms for integrating 'omics data are based on complex correlative or regression based models whereby 'omics levels are analyzed independently or simultaneously, and then alterations are statistically compared between tumour and normal groups 129-133. However, using this type of approach can overlook biologically relevant genes disrupted in single tumours. Review of TCGA and other public datasets, revealed that few cases have profiles across multiple 'omics dimensions for both patient matched tumour and non-malignant samples, which likely explains why methods to analyze multi-'omics datasets on a per tumour basis are not yet widely available. We saw a need to address these challenges by developing an algorithm for application to individual tumours with multi-'omics data that ranks disrupted genes based on the extent of their disruption and likelihood of biological significance.  Based on the supposition that DNA alterations with concomitant gene expression changes are more likely to be biologically relevant, we have developed a novel, and unique multi-'omics integrative gene prioritization method:  Multi-dimensional Integrative Tumour gene Ranking Algorithm (MITRA), designed to identify and rank highly disrupted candidate genes in individual cancer genomes based on biological principles. MITRA is a straightforward, user friendly and easily adjustable algorithm that considers the magnitude of DNA and RNA level disruption and generates scores for every gene on a per tumour basis. These scores can be used as input for downstream applications such as pathway and network modeling, and to inform candidate gene selection for biological investigation. As proof of principle, we applied MITRA to a breast cancer dataset from the TCGA and demonstrated that MITRA highly ranks genes known to be involved in breast tumourigenesis while also identifying novel genes that may be important to the biology of breast cancer.   836.1.3 Potential for discovery of novel molecular mechanisms in CS and NS lung AC  Thus far in this thesis, individual data dimensions have been compared between CS and NS lung AC independently. In addition to identifying commonalities, our results have demonstrated that CS and NS lung tumours harbour genome wide differences in copy number, methylation and miRNA profiles. However, integrating these dimensions by analyzing them simultaneously in individual tumour systems holds great potential to provide much greater insight into the coordinated disruption of genes and pathways through different molecular mechanisms than any single dimension alone. A multi-dimensional, integrative genomics analysis of lung AC from CS and NS has not been done and could reveal novel molecular mechanisms driving lung tumourigenesis in these two groups.   6.2 Methods  6.2.1 MITRA: principles and scheme  We aimed to develop an integrative genomics algorithm for ranking disrupted genes by generating scores reflecting the magnitude of concomitant DNA and gene expression alterations. The purpose of scoring genes is to enable the prioritization of gene candidates for further study. For each gene, a directional score is calculated to signify up or down regulation according to its expression levels. Higher magnitude scores indicate prominently disrupted genes. We hypothesize that disrupted genes with the highest scores are the strongest candidates for involvement in tumourigenesis. Since solid tumours exhibit an enormous degree of inter- and intratumoural heterogeneity at the DNA and RNA levels, our algorithm is run on a per-tumour basis and ideally using paired tumour and patient matched non-malignant tissues, as genetic and epigenetic heterogeneity extends to normal tissues (Figure 6.1).    84Figure 6.1  Figure 6.1 Benefits of individual tumour analysis and integrative, multi-?omics tumour profiling    85Figure 6.1 Benefits of individual tumour analysis and integrative, multi-?omics tumour profiling. (A) Conventional cancer genomics strategies group tumour samples and normal samples and search for differences between groups using statistical tests. Grouping of samples averages individual features across groups limiting potential discoveries to only those that are most frequent across the group. Comparing gene alterations in tumour and non-malignant tissues from the same patient, on a case-by-case basis, accounts for individual variation and enables the detection of gene alterations contributing to each individual tumour. In this example, hypothetical gene Y is identified as important in the tumour group using the grouped approach, while genes X, Y, and Z are identified in different tumours using the individual tumour approach. Thus, the grouping method overlooks genes X and Y because they only occur in a single tumour and the individual tumour strategy reveals more genes unique to each sample. (B) Each circle represents gene status based on various aspects of the tumour genome in one tumour sample. If only two dimensions are assessed, for example copy number and gene expression, gene H is overlooked because the gene is only affected by copy number alteration in one sample and is therefore a low frequency event. Simultaneous addition of the DNA methylation dimension reveals gene H is disrupted in 4/5 tumours, thereby identifying this gene as recurrently altered and signifying its potential importance to tumour biology. Thus, multi-dimensional genomic profiling has greater potential to reveal genes frequently disrupted through different mechanisms. CN = copy number, E = expression, Me = DNA methylation. (C) Grouped versus individual tumour analyses also affect findings regarding pathway disruption. The grouped approach coupled with bi-dimensional profiling does not reveal any genes with significant (frequency based) disruption to input for pathway analysis; thus, disruption of the hypothetical pathway is overlooked. With the individual tumour approach and multi-?omics genomic profiling, different genes are identified through different mechanisms in different tumours, and pathway analysis on each individual tumour leads to identification of the hypothetical pathway as significantly disrupted in tumours.       86MITRA is based on three core hypotheses:  1. DNA is a heritable molecule propagated through cellular divisions. Cancer cells will select DNA alterations that confer a clonal expansion advantage; thus, genes with DNA level changes are more likely to be biologically relevant than genes with expression only changes. 2. Larger magnitude alterations (e.g. higher level copy number changes or high expression fold changes) are indicative of selection for advantageous genes. This assumption is based on the notion that maintaining high level events comes at a considerable energy cost to the cell. 3. DNA alterations, including DNA methylation and copy number changes, that exert a biological effect will be accompanied by consequential mRNA expression changes. DNA alterations without expression changes are more likely to be passenger or reactive alterations.  We designed MITRA to simultaneously assess multiple mechanisms of gene disruption by weighing the impact of a disruption on gene expression, and scoring genes based on their extent of disruption within individual tumours (Figure 6.2). The MITRA score generated by our algorithm ranks genes based on the likelihood that the disruption they undergo is biologically significant based on the hypotheses above. MITRA's scheme is unique due to the fact it integrates multi-'omics data but treats each tumour as its own system, addressing a need that was previously unmet.   The MITRA algorithm is amenable to multiple 'omics platforms and input data types (Table 6.1). As MITRA represents an expandable framework, it is feasible to add additional data dimensions as new 'omics profiling platforms emerge. To introduce and demonstrate MITRA, we incorporated three common, readily available 'omics data types: DNA copy number, methylation and gene expression. Input files consist of Excel files for each tumour sample with a gene identifier column followed by columns containing the copy number, methylation, and expression values for each gene. Table 6.2 provides suggested input types for each data dimension and the values used for our demonstration of MITRA are described  87below. For each input file, MITRA adds score columns for each data dimension in addition to a total score column. A gene matrix file sorted by the absolute value of scores is also generated and enables easy identification of the most highly disrupted genes (Table 6.2).   Table 6.1 'Omics platforms and data input formats compatible with MITRA Dimension Platforms Input Scoring Criteria and Data Interpretation Copy Number SNP arrays, array competitive genomic hybridization Log2 ratios or calculated copy numbers Bin 1: Single copy gain or loss; Bin 2: Amplification, homozygous deletion Methylation Methylation bead arrays, MeDIP-array, bisulfite sequencing Percent methylation (?-value), MeDIP-array log2 ratios, bisulfite-sequencing reads Bin 1: Low percent methylation; Bin 2: High percent methylation Expression Expression arrays,           RNA sequencing Log2 ratios or               RPKM values Bin 1: 2-4 fold; Bin 2: 4-10 fold; Bin 3: 10-50 fold;       Bin 4: >50 fold    6.2.2 Scoring alterations in each data dimension  For each data dimension we defined bins (ranges/states; high versus low) based on commonly used thresholds for defining gene alterations and assigned each bin a score 32,50. However, bin definitions can be easily adjusted by users to better suit their needs. For instance, users may derive their own score values and bin thresholds based on analyses of their own datasets or their own specific hypotheses. To reflect our second hypothesis that high level DNA changes are more likely to be biologically relevant, we assigned higher scores to high magnitude changes (Figure 6.2). MITRA classifies every gene's dimensional measurement into the appropriate bin for each dimension, and then assigns every gene a score for each dimension based on the predefined bins and scores.     88Figure 6.2  Figure 6.2 Flowchart depicting the MITRA scheme  89Figure 6.2 Flowchart depicting the MITRA scheme. DNA and RNA is extracted from patient matched tumour and adjacent non-malignant tissues and subjected to molecular profiling. Gene alterations across various data dimensions are identified for each individual tumour sample. The MITRA algorithm assesses the various gene disruptions observed for each tumour, and assigns a user-defined score based on the magnitude of DNA and/or RNA expression changes observed. For a non-zero score to be generated, the gene must exhibit concurrent DNA disruption and a concordant increase or decrease in expression. Genes will also receive a score if they display a gene expression change only. A weight is applied to enhance the scores of DNA disrupted genes based on the hypothesis that genes with alterations at the DNA level are more important to tumourigenesis. MITRA calculates an Integrated Score = ExprScore + Weight * (CNScore + MethScore). By generating scores, MITRA ranks genes based on the level of disruption they display and MITRA scores can be used for downstream applications such as prioritizing candidate genes for biological investigation or pathway and network analyses.   Methylation: DNA methylation, was assessed by the delta ?-value (d?V), which is the difference in percent methylation between the tumour and matched non-malignant sample. A difference in methylation levels of 20% or greater is commonly considered aberrant 50. Thus, for our analysis, a d?V ? 0.2 was defined as hypermethylation and a d?V ? -0.2 was defined as hypomethylation. We specified two scoring bins for each of hypermethylation and hypomethylation  (Bin 1: ?0.2-0.6 and Bin 2: ?0.6-1) to rank changes, and assigned a two point score difference for the d?V bins (Figure 6.2).  Copy number: For gene dosage, we utilized commonly applied log2 copy number ratio thresholds to call genes as gained (ratio > 0.3) or lost (ratio < -0.3) 32,50. We specified two categories for copy number scores: low level changes (e.g. single copy gains or losses, log2 ?0.3-0.8) and high level changes (e.g. DNA amplifications or deletions log2 > ?0.8) and assigned a two point difference between categories (Figure 6.2). The range of scores for both methylation and copy number are the same, (0 to 4), as we do not assume the effects of copy number alterations to be more or less important than those of methylation alterations.   90Table 6.2 Data input and calculated scores for the TCGA breast tumour, A0B7. Gene           Symbol CN log2 intensity Meth      d?V Expr      FC CN-    Score Meth-Score Expr-   Score Weight Total      Score CAMP -0.01 -0.35 217.44 0 2 16 2.77 21.54 C8orf44 1.87 -0.45 5.49 4 2 4 2.77 20.62 ERBB2* 2.13 0.01 10.32 4 0 8 2.77 19.08 C20orf85 0.57 -0.39 23.70 2 2 8 2.77 19.08 COX18 1.65 0.01 17.39 4 0 8 2.77 19.08 KMO 0.32 -0.45 11.80 2 2 8 2.77 19.08 VCPIP1 1.87 0.06 11.22 4 0 8 2.77 19.08 WISP1 0.98 0.12 11.01 4 0 8 2.77 19.08 PNMT 2.31 -0.22 3.50 4 2 2 2.77 18.62 AKR1B10 -0.07 -0.05 72.74 0 0 16 1 16 CCL11 -0.04 0.01 58.03 0 0 16 1 16 CLCA2 0.04 -0.19 80.20 0 0 16 1 16 EDN3 0.00 0.10 0.01 0 0 -16 1 -16 ENPP3 0.03 -0.03 61.15 0 0 16 1 16 FABP6 0.24 -0.15 92.24 0 0 16 1 16 HORMAD1 0.22 0.00 62.91 0 0 16 1 16 LEFTY1 0.27 -0.03 166.77 0 0 16 1 16 MMP13 0.06 0.03 89.76 0 0 16 1 16 NKX2-2 -0.01 0.02 72.82 0 0 16 1 16 NPTX2 -0.06 0.11 55.95 0 0 16 1 16 S100A8 0.22 -0.02 137.87 0 0 16 1 16 SFRP1 -0.22 -0.12 0.02 0 0 -16 1 -16 TBX22 0.07 -0.07 0.02 0 0 -16 1 -16 VCX3A -0.08 -0.05 85.81 0 0 16 1 16 BMP7 0.99 0.01 5.92 4 0 4 2.77 15.08 CD69 -0.11 -0.65 4.44 0 4 4 2.77 15.08 CHML 0.32 -0.25 6.13 2 2 4 2.77 15.08 GRB7* 2.13 -0.16 5.15 4 0 4 2.77 15.08 IKZF3 2.13 -0.01 4.18 4 0 4 2.77 15.08 ORMDL3 2.13 -0.02 4.03 4 0 4 2.77 15.08 SLC16A5 -0.04 0.64 0.21 0 -4 -4 2.77 -15.08 STARD3 2.31 0.03 5.72 4 0 4 2.77 15.08 TCAP 2.31 0.07 7.28 4 0 4 2.77 15.08 TMEM189 0.94 0.00 5.12 4 0 4 2.77 15.08 ZNF512B -0.32 0.22 0.24 -2 -2 -4 2.77 -15.08 * genes reported in HER2+ breast cancer; CN = copy number; d?V = delta beta value; FC = fold change  We did not prioritize genes with bi-allelic disruption (i.e. concurrent copy number and methylation changes affecting the same gene in the same sample) over genes with only one type of DNA alteration because we assume that large magnitude changes in a single dimension (methylation or copy number) may exert gene expression effects similar to those of bi-allelic changes. However, given MITRA's scoring formula, most bi-allelically affected genes will receive higher scores.     91mRNA Expression: A tumour-normal fold change of at least two was the minimum threshold used for defining aberrant expression; with a fold change > 2 and < 0.5 considered overexpression and underexpression, respectively. For both over- and underexpression, we specified four scoring bins spanning the range of fold changes observed (Bin1: 2-4, Bin2: 4-10, Bin3: 10-50, Bin4: > 50) (Figure 6.2). To limit the inclusion of "expression only" genes in the top ranking genes, we assigned expression scores such that expression changes falling within the highest magnitude bins would produce a total score comparable to scores for genes with DNA and associated RNA changes. High magnitude expression changes could be caused by an underlying DNA alteration we did not assess or detect and could therefore be important to cancer biology. For this reason we did not want to exclude these genes from potentially ranking among the top genes, even though we aimed to prioritize DNA disrupted genes with our ranking system.  6.2.3 Weighting of DNA alterations with concurrent expression changes  Based on our hypotheses, we applied a weight to genes with concurrent DNA and RNA expression changes to ensure that genes with DNA and consequential expression changes received higher scores. To determine the effect of DNA alterations on gene expression, we analyzed 104 tumours and matched normal samples from 4 different cancer types, for which copy number, methylation, and gene expression data were available 9,51,52,68,87 (Table 6.3). Data was acquired from the TCGA (https://tcga-data.nci.nih.gov/tcga/) and our BCCA lung AC cohort. Specific platforms and data processing information regarding the TCGA samples are described in Table 6.3 and additional details regarding this data are available online in the TCGA Data Primer (https://wiki.nci.nih.gov/display/TCGA/TCGA+Data+Primer). Across the 104 samples, the average gene expression fold change was 2.77 times higher for genes with DNA disruption versus genes without DNA alterations. Thus, a weight of 2.77 was set as the effect of DNA change on gene expression.  Genes with expression changes only were given a weight of one because i) expression changes are often reactive (i.e. passenger changes), ii) may be the result of disrupted upstream genes, or iii) may be due to an alteration in an 'omics level not assayed (e.g. non- 92coding RNA or histone modifications). Most cancer genomes sustain a considerable amount of DNA disruption, including methylation changes, chromosomal rearrangements, and whole arm amplifications and deletions, resulting in tens of thousands of DNA level alterations. Since the majority of these alterations are likely passengers, genes with DNA alterations only were given a weight of zero and hence no MITRA score. Importantly, users can easily modify the weighting parameters to suit their own tumour systems and hypotheses.  6.2.4 Integration of scores across multiple dimensions  To complete the integration of 'omics data for each sample, we calculated the MITRA score for each gene according to the following formula (Figure 6.2):   MITRA score = expression score + weight x ? (copy number score, methylation score)  The minimum score an altered gene (i.e. a gene that exhibits an mRNA expression change or DNA alteration) could be assigned for having the lowest ranking methylation or copy number and associated expression change is 2.77*(2) + 2 = 7.54; while the maximum score possible is 2.77*(4+4) + 16 = 38.16. Using the bin scores we defined, MITRA scores could range between 0 - 38.16, and genes with high magnitude expression only scores (? 16) have potential to rank among the top genes. Collectively, our scoring system assigns higher scores to genes with concurrent DNA and expression changes, consistent with the biological principles MITRA is based on.  6.2.5 Application of MITRA to breast cancer  To assess the ability of MITRA to identify known cancer genes, we applied the algorithm to a multi-dimensional breast cancer dataset from the TCGA (Table 6.3). Given that MITRA is designed to integrate multiple data dimensions for individual tumours, we selected only cases with paired, patient matched tumour and non-malignant profiles for copy number, DNA methylation, and gene expression, which amounted to 8 cases. Gene alterations were defined on a patient matched, tumour versus normal basis for each dimension except copy number, 93Table 6.3 Summary of datasets used to develop and apply MITRA in Chapter 6. Cancer Type Data Dimension Platform Data and Description                                               (TCGA data details can be found at https://tcga-data.nci.nih.gov/tcga/tcgaDataType.jsp) Samples with       Complete Datasets Breast                (https://tcga-data.nci.nih.       gov/docs /publications/   brca_2012/) Copy Number Affymetrix SNP 6 Level 3 - copy number alterations for segmented regions per sample (log2 ratio) 8 DNA methylation Illumina Infinium HM450            (HM27 probes) Level 3 - methylation value per gene per sample (?-value) Gene Expression Agilent 244K Genome Expression Array Level 3 - summarized expression calls for genes per sample (log2 ratio) Squamous Lung        (https://tcga-data.nci.nih.            gov/docs/publications/   lusc_2012/) Copy Number Affymetrix SNP 6 Level 3 - copy number alterations for segmented regions per sample (log2 ratio) 7 DNA methylation Illumina Infinium HM27 Level 3 - methylation value per gene per sample (?-value) Gene Expression Illumina HiSeq      RNA-sequencing Level 3 - summarized expression calls for genes per sample (log2 ratio) Colon                 (https://tcga-data.nci.nih.            gov/docs/publications/ coadread_2012/) Copy Number Affymetrix SNP 6 Level 3 - copy number alterations for segmented regions per sample (log2 ratio) 12 DNA methylation Illumina Infinium HM27 Level 3 - methylation value per gene per sample (?-value) Gene Expression Agilent 244K Genome Expression Array Level 3 - summarized expression calls for genes per sample (log2 ratio) Lung AC (BCCA)       (http://edrn.nci.nih.      gov/science-data) Copy Number Affymetrix SNP 6 Copy number calls for each gene 77 DNA methylation Illumina Infinium HM27 Methylation value per gene per sample (?-value) Gene Expression Illumina HT-12      BeadChip Array Normalized gene expression for each probe    94for which TCGA used a baseline copy number model generated from a pool of unmatched normal DNA to reveal segmental copy number alterations in each tumour.  6.2.6 Assessing the robustness of the MITRA algorithm  To assess the robustness of MITRA, we selected three breast cases and performed small manipulations of the bin criteria in each dimension. Score thresholds were changed by 0.02 for one dimension and by one direction at a time (increasing and decreasing copy number log2 ratio thresholds, increasing and decreasing d?V thresholds, and increasing and decreasing expression fold change thresholds). We then ran MITRA using the new scoring bins and compared the genes ranking in the top percentiles between results generated using the original score bins and the modified score bins and determined the proportion of overlapping genes identified in the results. We also determined the correlation coefficients between total scores generated using our default scoring bins and the modified scoring bins.  6.2.7 Application of MITRA to lung AC from CS and NS  After exemplifying the utility of MITRA with the breast cancer dataset, we proceeded to analyze our 64 lung AC cases (34 CS and 30 NS) for which we had copy number, methylation and gene expression profiles. Determination of gene copy number status (e.g. copy gain or loss) is described in section 3.2.2 in Chapter 3 and determination of methylation status is described in section 4.2.2 of Chapter 4. Calculation of gene expression fold changes are described in section 2.3.3 of Chapter 2. We used the same scoring bins for methylation d?Vs and expression fold changes, however, we altered the copy number bins because the segmentation algorithm we used in Partek Genomics Suite calculates a predicted number of copies; the copy number (CN) bins we used were bin 1 gain: 2.3 < CN < 5; bin 2 gain: CN > 5; bin 1 loss: 0.5 < CN <1.7; and bin 2 loss: CN < 0.5. We ran MITRA on each individual tumour and considered genes with MITRA scores ranking in the 99th and 1st percentiles most likely to be biologically relevant for that tumour.    956.2.8 Differentially disrupted genes and pathway analyses  To identify genes differentially disrupted between CS and NS based on our integrative MITRA analysis, we performed a Fisher's Exact test on the frequencies of deregulation of the top ranked genes in both groups. A p-value < 0.05 was considered significant. To identify candidate biological mechanisms disrupted by the gene deregulation we observed, we performed pathway analysis on each individual tumour using genes ranking in the top percentiles as input for Ingenuity Pathway Analysis (IPA). A Fisher's Exact test was also performed to identify differentially disrupted pathways in CS and NS, with a p-value < 0.1 considered significant.   6.2.9 Validation of differentially disrupted genes in external cohorts  Given the lack of external datasets with multi-'omics profiles for patient matched tumour and non-malignant tissues, we resorted to assessing the differentially disrupted genes we identified using mRNA expression data. Normalized Affymetrix U133 array data was obtained for the GSE31210 and GSE10072 datasets from GEO (Table 2.3). GSE31210 was comprised of tumour profiles for 111 ever smokers (ES) and 115 NS and non-malignant profiles for 12 ES and 8 NS. A fold change in gene expression was calculated for each tumour, using the average of non-malignant profiles of the same smoking type as a baseline. GSE10072 was comprised of 12 CS and 11 NS paired tumour and non-malignant profiles, and fold changes were calculated for each tumour using its matched non-malignant sample as a baseline. Genes with fold changes exceeding 2-fold were considered over- and underexpressed. A Fisher's Exact test was performed on the frequencies of over- and underexpression in CS and NS to identify differentially disrupted genes in these cohorts, with a p-value < 0.05 considered significant.       966.3 Results  6.3.1 MITRA identifies known driver genes in breast cancer  For each breast tumour 19,954 gene scores were generated. The range of scores varied substantially between tumours (Table 6.4). We hypothesize this could reflect several scenarios: i) tumour to non-malignant cell content varies per sample affecting magnitude of changes attributed to tumour DNA and RNA, ii) genomic, epigenomic and transcriptional variation exists between individuals, or iii) tumours exhibit different extents of genetic and epigenetic phenotypes. We did not wish to normalize scores across tumours or apply an arbitrary score threshold across all samples due to the tumour specific score variation we observed, thus, we chose to prioritize MITRA scores in each tumour based on the extreme (highest and lowest) score percentiles observed. We propose that scores falling into the extreme percentiles (99th and 1st percentiles) are most likely to have important roles in cancer biology for the tumour evaluated.   Table 6.4 Summary of scores generated by MITRA for TCGA breast tumours Breast Case A0B7 A0BQ A0H5 A153 A15M A0BW A0DL A158 Minimum -16 -19.08 -16 -21.54 -21.54 -21.54 -19.08 -32.62 Maximum 21.54 19.08 21.54 21.54 27.08 21.54 27.08 27.08 Median 0 0 0 0 0 0 0 0 Average 0.27 0.07 0.14 0.24 0.23 -0.07 0.06 -0.56 5th percentile -2 -2 -2 -8 -2 -7.54 -4 -9.54 1st percentile -7.54 -7.54 -7.54 -13.54 -8 -9.54 -8 -15.08 95th percentile 4 2 2 8 4 4 4 7.54 99th percentile 8 7.54 8 13.54 9.54 9.54 9.54 13.54   To assess whether MITRA assigned high scores to genes known to be involved in breast tumourigenesis we compiled a list of genes reported in the TCGA's comprehensive breast cancer profiling report as well as a recent review detailing the molecular subtypes of breast cancer 52,134. We then cross-referenced genes MITRA prioritized as highly disrupted in  97individual breast tumours (i.e. those with scores in the top percentiles) with the gene list generated based on the literature.  As anticipated, MITRA successfully identified multiple genes previously reported in breast tumourigenesis in each of the 8 paired TCGA breast cancer samples. Highly disrupted breast cancer genes ranking in the 99th and 1st percentiles of MITRA scores included: DNA amplification and upregulation of ERBB2 (1/8), EGFR (1/8), AKT3 (1/8); copy gain or hypomethylation and upregulation of GRB7 (2/8); hypomethylation with upregulation of PTPN22 (2/8); copy loss and hypermethylation leading to downregulation of FABP7 (5/8); copy loss and downregulation of RUNX1 (1/8) and CDKN2A (1/8); and hypermethylation associated with downregulation of TRIM29 (4/8). Encouragingly, two major genes affected by molecular changes associated with the HER2+ subtype of breast cancer, GRB7 and ERBB2, were detected in the one HER2+ breast tumour we assessed 134. Thus, applying MITRA to this dataset revealed biologically relevant genes with known importance to breast tumourigenesis in each patient, giving us confidence that our algorithm design is well suited for this purpose.  MITRA can also provide 'omics derived evidence that suggests potential involvement of other genes not described by the TCGA or molecular subtype articles, but previously described in other cancer studies. Highly disrupted genes revealed by MITRA that were not described in these reports included: downregulation of SFRP1 (7/8) and SOX10 (6/8) and upregulation of WISP1 (7/8), COL11A1 (6/8), and EXO1 (4/8). SFRP1 is an antagonist of the Wnt/?-catenin signaling pathway which is aberrantly activated in many cancer types including breast malignancies 98,99. WISP1 modulates many oncogenic phenotypes including cell proliferation, survival and differentiation and is reportedly highly expressed in many cancer types 135. EXO1 is a DNA repair gene and polymorphisms in this gene have been associated with cancer risk in several cancer types 136-139. Based on the functions of these genes reported in the literature, it is plausible they are selectively disrupted in breast cancer cells because their deregulation promotes tumourigenesis.    986.3.2 MITRA is robust to modification of scoring bins  We assessed the robustness of MITRA by modifying the score bins for each individual data dimension, re-applying MITRA, and comparing results between original and modified analyses. Across the three tumours, the overlap in genes prioritized between the original and modified analyses ranged from 90-100% (Table 6.5). The top percentile score thresholds were robust to bi-directional modifications of scoring bins for each data dimension and only changed for one of the 18 modified analyses performed; in the A0BW tumour, the top percentile score threshold changed from 9.54 in the original analysis to 8 in the modified analysis, although the overlap in genes identified was still 96%. Pearson correlation coefficients were calculated for total scores between the original and modified analyses and ranged from 0.97 to 1.00 indicating extremely concordant MITRA scores between the original and modified analyses. These results demonstrate that MITRA is very robust to minor manipulations of thresholds used to define the scoring bins.  6.3.3 Application of MITRA to individual lung AC tumours  Based on its performance in the breast cancer dataset, we were confident in MITRA's ability to prioritize established cancer genes from multi-'omics data input. Thus, we applied MITRA to our multi-'omics lung AC cohort of 34 CS and 30 NS to elucidate candidate genes involved in lung tumourigenesis. We plotted the range of MITRA scores (maximum, median, and minimum) for each tumour and did not observe any striking difference between CS and NS (Figure 6.3). Thus, we were confident no bias in MITRA scoring existed between the CS and NS groups. As with the breast cancer dataset, we used the 99th and 1st percentiles as thresholds to define the most prominently disrupted genes in each tumour sample, again reasoning that genes with the highest MITRA scores are mostly likely to contribute to tumour biology. Across the 64 tumours, we identified 6,254 unique genes that ranked in the top percentiles of at least one tumour. Of these genes, 361 were detected in at least 20% of the 64 tumours we assessed, including 134 up- and 227 downregulated genes. The most frequent  99Table 6.5 Comparison of MITRA results for original and modified scoring bins Sample Dimension          Modified Top Percentile (99th) Score Threshold Bottom Percentile (1st) Score Threshold Maximum    Score Minimum     Score Median    Score Number of Unique Genes in Top Percentiles Percent of Genes Overlapping with Original Results Total Score Correlation Coefficient (Original - Modified) A0B7    ?=0.02 Original results 8 -7.54 21.54 -16 0 413 - - Copy Number bins+? 8 -7.54 21.54 -16 0 404 98% 0.99 Copy Number bins-? 8 -7.54 21.54 -16 0 398 96% 0.99 Methylation bins +? 8 -7.54 21.54 -16 0 402 97% 0.99 Methylation bins -? 8 -7.54 21.54 -16 0 409 99% 0.99 Expression bins+? 8 -7.54 21.54 -21.54 0 410 99% 0.98 Expression bins -? 8 -7.54 21.54 -15.08 0 390 94% 0.99 A0BQ    ?=0.02 Original results 7.54 -7.54 19.08 -19.08 0 447 - - Copy Number bins+? 7.54 -7.54 19.08 -19.08 0 447 100% 1.00 Copy Number bins-? 7.54 -7.54 19.08 -19.08 0 447 100% 1.00 Methylation bins +? 7.54 -7.54 19.08 -19.08 0 435 97% 0.99 Methylation bins -? 7.54 -7.54 19.08 -19.08 0 437 98% 0.99 Expression bins+? 7.54 -7.54 19.08 -19.08 0 444 99% 0.98 Expression bins -? 7.54 -7.54 19.08 -15.08 0 427 96% 0.98 A0BW ?=0.02 Original results 9.54 -9.54 21.54 -21.54 0 297 - - Copy Number bins+? 9.54 -9.54 21.54 -21.54 0 297 100% 0.99 Copy Number bins-? 9.54 -9.54 21.54 -21.54 0 297 100% 1.00 Methylation bins +? 9.54 -9.54 21.54 -21.54 0 293 99% 1.00 Methylation bins -? 8 -9.54 21.54 -21.54 0 286 96% 1.00 Expression bins+? 9.54 -9.54 21.54 -27.08 0 295 99% 0.97 Expression bins -? 9.54 -9.54 21.54 -19.08 0 266 90% 0.97   100up- and downregulated genes across the entire cohort of 64 tumours are listed in Table 6.6. We hypothesize that these genes may be selected for disruption to promote oncogenic processes that are common to lung tumourigenesis in CS and NS.    Figure 6.3  Figure 6.3 Range in MITRA scores generated for 64 lung AC tumours The minimum, median, and maximum MITRA scores for each tumour were plotted. The median score for every tumour was zero. C = CS, N = NS. There was no obvious difference in the range of scores observed between CS and NS lung tumours.    101Table 6.6 Most frequent genes disrupted in lung AC common to both smoking groups Gene Symbol Freq.             Up-        Regulation Up Freq. Difference  CS-NS Freq.             Down-  Regulation Down Freq. Difference  CS-NS FAM83A 0.84 0.15 0.00 0.00 TMEM184A 0.78 0.03 0.00 0.00 CEACAM1 0.59 -0.14 0.02 -0.03 MMP11 0.59 -0.01 0.00 0.00 EEF1A2 0.58 0.02 0.00 0.00 PYCR1 0.50 0.00 0.00 0.00 ZNF750 0.45 -0.09 0.00 0.00 FUT2 0.42 0.10 0.00 0.00 MBTD1 0.42 -0.08 0.00 0.00 ETV4 0.41 0.07 0.00 0.00 WFDC3 0.39 0.11 0.00 0.00 CRABP2 0.38 0.14 0.00 0.00 PITX1 0.38 0.02 0.02 -0.03 SLC39A4 0.38 -0.05 0.00 0.00 ABCA4 0.36 -0.08 0.00 0.00 AGR2 0.36 -0.08 0.00 0.00 PROM2 0.36 -0.08 0.00 0.00 SLC22A18AS 0.36 -0.08 0.00 0.00 SPP1 0.36 -0.14 0.00 0.00 TOP2A 0.36 -0.01 0.00 0.00 FAM107A 0.00 0.00 0.86 -0.08 AGER 0.00 0.00 0.70 0.01 CDO1 0.00 0.00 0.69 -0.09 CA4 0.00 0.00 0.67 0.07 C19orf59 0.00 0.00 0.64 -0.05 SOX17 0.00 0.00 0.63 0.11 ITLN2 0.00 0.00 0.59 -0.07 MAMDC2 0.00 0.00 0.59 -0.01 SPARCL1 0.00 0.00 0.58 0.02 GPM6B 0.00 0.00 0.56 -0.07 LTC4S 0.00 0.00 0.55 0.03 LYVE1 0.00 0.00 0.55 -0.10 SOSTDC1 0.00 0.00 0.55 -0.04 CMTM2 0.00 0.00 0.53 -0.13 CYP4B1 0.02 -0.03 0.53 0.12 CYYR1 0.00 0.00 0.52 -0.10 MME 0.00 0.00 0.52 -0.03 SCARA5 0.00 0.00 0.52 0.03 FABP4 0.00 0.00 0.50 0.06 SH3GL3 0.00 0.00 0.50 0.13 Horizontal line separates up- and downregulated genes.     102The most well-known lung cancer gene among the 361 genes was RASSF1, which was downregulated in 23% of tumours. Both hypermethylation and copy number loss were mechanisms of RASSF1underexpression. EGFR also ranked highly across the 64 tumours, although its frequency of disruption was 13%. EEF1A2 and ETV4 have been previously described as oncogenes which is consistent with our observations of their upregulation in our cohort 140-142. FAM83A, the most frequently upregulated gene we identified (84% of cases), is highly expressed in lung AC and has been proposed as a lung cancer biomarker 143,144; furthermore, this gene was recently found to confer resistance to EGFR tyrosine kinase inhibitors 143-145. Conversely, SOX17 and CDO1 were among the most frequent downregulated genes. SOX17 is epigenetically silenced in several malignancies including lung cancer, and functions as an antagonist of the Wnt/?-catenin signaling pathway 146-148. CDO1 is also recurrently silenced through DNA methylation and is a TSG in multiple cancers 149. MITRA's prioritization of cancer genes with established, oncogenic roles in our dataset provided confidence that other genes we identified, which are not well characterized in the literature with respect to cancer, may represent novel candidates involved in lung tumourigenesis.   We proceeded to investigate whether recurrently disrupted genes were equally deregulated in lung AC of CS and NS. We identified genes with disruption frequencies of 20% or greater in either the CS or NS group and determined the overlap for up- and downregulated genes (Figure 6.4). This revealed that there were more down- than upregulated genes and that CS exhibit more gene deregulation than NS.  6.3.4 Genes differentially disrupted in CS and NS  Since our comparison of up- and downregulated genes in the two groups suggested there are numerous differentially disrupted genes in CS and NS tumours, we next aimed to characterize those that were significantly differentially altered. Applying the criteria of a Fisher's Exact test p-value < 0.05 and frequency difference between the groups exceeding 15% revealed 115 unique genes as significantly up- or downregulated in either CS or NS tumours. Of the 115 genes, 45 were concordantly, differentially expressed between CS and NS lung AC in at least one of two external cohorts (GSE31210 and GSE10072),  103demonstrating concordance of our data with other cohorts. The 45 differentially altered genes that validated in external cohorts are listed in Table 6.7.   Figure 6.4   Figure 6.4 Venn diagram illustrating the overlap in highly disrupted genes in lung AC of CS and NS Genes exhibiting frequencies of upregulation (A) or downregulation (B) in at least 20% of CS tumours or NS tumours were assessed to determine the overlap in gene disruption between the two groups. Overall, there were more downregulated genes than upregulated genes, and CS harboured more gene disruption than NS using this criteria.     104It is possible that we would have observed a higher validation rate if we had been able to analyze DNA level changes for the same cohorts. Cross referencing the 115 differentially disrupted genes with a list of cancer genes compiled by the MSKCC revealed the candidate oncogenes S100P, CYP24A1, and MSMB as preferentially upregulated in CS, while the candidate TSGs IRF8, SCGB3A1, and TSPAN32 were preferentially downregulated in CS and KL, PRKCDBP, and RECK were preferentially downregulated in NS 97. Of note, some genes interesting in the context of lung cancer biology were also disrupted more frequently in one group or the other, although not significantly according to the thresholds we applied. For example, EGFR was upregulated in 20% of NS compared to 6% of CS but was not significantly different based on the Fisher's Exact test (p-value = 0.1328) or frequency difference threshold. Nevertheless, this finding is concordant with the literature as EGFR activation is more prominent in NS lung cancer. The oncogene, MET, which was upregulated in 13% of NS and 0% of CS, was significantly differentially disrupted (Fisher's Exact test, p = 0.0431) but did not pass our frequency threshold criteria. Another example is PTGS2 (COX-2), which is involved in inflammation and mitogenesis. This gene was upregulated in 15% of CS and 0% of NS, but the Fisher's Exact test p-value was not significant (p = 0.0552). Collectively, our findings in lung AC demonstrate that application of the MITRA gene ranking scheme to individual tumours aptly identifies genes involved in lung tumourigenesis. Differential disruption of the genes we identified may implicate their involvement in underlying CS or NS specific mechanisms of lung tumour biology.  6.3.5 Pathways most prominently disrupted in lung AC  We next aimed to characterize genes prominently disrupted in our lung AC cohort in the context of cellular functions and pathways to understand how their deregulation could act concertedly to promote tumourigenesis. Pathway analysis was performed on genes ranked in the top percentiles for each individual tumour, and results were subsequently merged to identify pathways frequently disrupted across the tumours. In total, 415 unique pathways were affected in at least one tumour, encompassing a wide range of cellular functions including metabolic, immune, inflammation, and cancer-related processes. Pathways affected in greater than 20% of the 64 tumours are presented in Table 6.8.   105Table 6.7 Differentially disrupted genes validated in external cohorts Gene             Symbol Freq.           in CS Freq.          in NS Freq. Diff   (CS-NS) Direction       - Group P-val ABCA3 0.56 0.30 0.26 DOWN_CS 0.0463 ADHFE1 0.21 0.00 0.21 DOWN_CS 0.0119 ALDH1A1 0.35 0.13 0.22 DOWN_CS 0.0497 CASP1 0.18 0.00 0.18 DOWN_CS 0.0259 CD37 0.26 0.07 0.20 DOWN_CS 0.0486 ITLN1 0.71 0.40 0.31 DOWN_CS 0.0226 MACROD2 0.18 0.00 0.18 DOWN_CS 0.0259 NTM 0.47 0.13 0.34 DOWN_CS 0.0062 PHACTR2 0.18 0.00 0.18 DOWN_CS 0.0259 SCGB3A1 0.35 0.07 0.29 DOWN_CS 0.0067 SLC22A3 0.24 0.03 0.20 DOWN_CS 0.0294 SLPI 0.29 0.07 0.23 DOWN_CS 0.0259 FLRT2 0.00 0.17 -0.17 DOWN_NS 0.0187 LHX6 0.03 0.27 -0.24 DOWN_NS 0.0096 MSRB3 0.00 0.17 -0.17 DOWN_NS 0.0187 PRKCDBP 0.03 0.20 -0.17 DOWN_NS 0.0444 TEK 0.44 0.73 -0.29 DOWN_NS 0.0237 TUBB6 0.38 0.67 -0.28 DOWN_NS 0.0273 ASPM 0.38 0.13 0.25 UP_CS 0.0453 BLK 0.18 0.00 0.18 UP_CS 0.0259 CENPF 0.38 0.10 0.28 UP_CS 0.0107 DSP 0.29 0.00 0.29 UP_CS 0.0011 FGB 0.24 0.03 0.20 UP_CS 0.0294 HHIPL2 0.18 0.00 0.18 UP_CS 0.0259 HMGB3 0.26 0.07 0.20 UP_CS 0.0486 KLK12 0.24 0.00 0.24 UP_CS 0.0054 MSMB 0.29 0.07 0.23 UP_CS 0.0259 NAV1 0.18 0.00 0.18 UP_CS 0.0259 NEK2 0.44 0.07 0.37 UP_CS 0.0007 NMNAT2 0.26 0.03 0.23 UP_CS 0.0147 PTGES 0.18 0.00 0.18 UP_CS 0.0259 S100P 0.50 0.20 0.30 UP_CS 0.0186 TRIP13 0.29 0.03 0.26 UP_CS 0.0071 TUBB3 0.50 0.13 0.37 UP_CS 0.0029 UBE2C 0.53 0.20 0.33 UP_CS 0.0095 ABCC3 0.09 0.33 -0.25 UP_NS 0.0270 ACE2 0.03 0.27 -0.24 UP_NS 0.0096 CYP3A5 0.00 0.20 -0.20 UP_NS 0.0079 HSD17B2 0.00 0.27 -0.27 UP_NS 0.0013 ICA1 0.03 0.27 -0.24 UP_NS 0.0096 LGALS4 0.03 0.30 -0.27 UP_NS 0.0043 PDZK1IP1 0.03 0.23 -0.20 UP_NS 0.0211 PODXL2 0.15 0.40 -0.25 UP_NS 0.0269 SULT1C2 0.00 0.27 -0.27 UP_NS 0.0013 SYTL1 0.03 0.27 -0.24 UP_NS 0.0096 Horizontal line separates up- and downregulated genes.      106Table 6.8 Cellular pathways frequently disrupted in both CS and NS lung AC Cellular Pathway Freq. in    lung AC Freq.      in CS Freq.      in NS Freq. Diff.  (CS-NS) Granulocyte Adhesion and Diapedesis 0.94 0.94 0.93 0.01 Agranulocyte Adhesion and Diapedesis 0.91 0.88 0.93 -0.05 Atherosclerosis Signaling 0.77 0.88 0.63 0.25 Taurine Biosynthesis 0.66 0.65 0.67 -0.02 Leukocyte Extravasation Signaling 0.63 0.62 0.63 -0.02 Histamine Degradation 0.47 0.41 0.53 -0.12 LPS/IL-1 Mediated Inhibition of RXR Function 0.45 0.44 0.47 -0.03 Hepatic Fibrosis / Hepatic Stellate Cell Activation 0.42 0.41 0.43 -0.02 Sertoli Cell-Sertoli Cell Junction Signaling 0.42 0.44 0.40 0.04 Role of Osteoblasts, Osteoclasts and Chondrocytes in Arthritis 0.41 0.44 0.37 0.07 Serotonin Degradation 0.41 0.44 0.37 0.07 Eicosanoid Signaling 0.39 0.38 0.40 -0.02 Inhibition of Matrix Metalloproteases 0.36 0.35 0.37 -0.01 Bladder Cancer Signaling 0.34 0.35 0.33 0.02 Aryl Hydrocarbon Receptor Signaling 0.33 0.35 0.30 0.05 Role of IL-17A in Psoriasis 0.31 0.32 0.30 0.02 Wnt/?-catenin Signaling 0.31 0.26 0.37 -0.10 Fatty Acid ?-oxidation 0.30 0.32 0.27 0.06 VDR/RXR Activation 0.30 0.24 0.37 -0.13 Axonal Guidance Signaling 0.27 0.24 0.30 -0.06 Coagulation System 0.27 0.29 0.23 0.06 Estrogen Biosynthesis 0.27 0.26 0.27 0.00 IL-8 Signaling 0.27 0.21 0.33 -0.13 Hepatic Cholestasis 0.25 0.29 0.20 0.09 Tight Junction Signaling 0.25 0.29 0.20 0.09 Intrinsic Prothrombin Activation Pathway 0.23 0.29 0.17 0.13 Xenobiotic Metabolism Signaling 0.22 0.18 0.27 -0.09          107Five pathways were disrupted in over 60% of both CS and NS: granulocyte adhesion and diapedesis, agranulocyte adhesion and diapedesis, atherosclerosis signaling, taurine biosynthesis, leukocyte extravasation signaling (Table 6.8). Except for taurine biosynthesis, these pathways were commonly affected by disruption of matrix metalloproteases, claudins, cytokines, and phospholipases which may contribute to tumour invasiveness, migration, angiogenesis, and modulation of immune and inflammatory response which are key hallmarks of cancer 36,150,151. Further investigation of our pathway results revealed 88 pathways with a frequency of disruption exceeding 15% in either CS or NS tumours. Most of these pathways were commonly disrupted between the groups, although some appeared to be specific to CS or NS (Figure 6.5).   Figure 6.5  Figure 6.5 Venn diagram illustrating the overlap in highly disrupted pathways in lung AC of CS and NS Pathway analyses were run on each individual tumour and the results compared between CS and NS. A comparison of pathways with disruption frequencies exceeding 15% in either group is shown. Most of the pathways identified were commonly disrupted in CS and NS lung AC tumours.   1086.3.6 Pathways differentially disrupted in CS and NS lung AC  Consistent with our analysis for identifying differentially disrupted genes, we performed a Fisher's Exact test to identify pathways exhibiting statistically significant preferential disruption in CS or NS. In addition to our criteria for a p-value < 0.1, we again imposed a frequency difference threshold of 15% for defining pathways as differentially disrupted. We identified 13 pathways satisfying these criteria and most were preferentially disrupted in CS (Table 6.9). Both the anandamide degradation and ephrin receptor signaling pathways specific to NS have been implicated in cancer cell proliferation and cell death 152-154. In CS, the pathways identified could be broadly classified into those that regulate or are involved with metabolism, immune response, inflammation, and specific cancer pathways. Interestingly, three CS pathways seem to be involved in NF-?B signaling: IL-17A signaling in fibroblasts, TREM1 signaling and the NF-?B pathway itself. Since the genes contributing to disruption of these pathways were largely different, overlap in the biology of these processes could suggest crosstalk between pathways occurs to promote tumourigenesis. These findings demonstrate that, in addition to deregulating common pathways, prominently disrupted genes contribute to differential pathway deregulation, further suggesting different mechanisms of tumourigenesis contribute to the development of lung AC in CS and NS.  Table 6.9 Pathways differentially disrupted in CS and NS lung AC Pathway Freq.   in CS Freq.   in NS Freq. Diff.      (CS-NS) P-val Group Anandamide Degradation 0.00 0.17 -0.17 0.0187 NS Ephrin Receptor Signaling 0.06 0.23 -0.17 0.0709 NS Crosstalk between Dendritic Cells and Natural Killer Cells 0.24 0.00 0.24 0.0054 CS Extrinsic Prothrombin Activation Pathway 0.41 0.10 0.31 0.0095 CS Bile Acid Biosynthesis, Neutral Pathway 0.24 0.03 0.20 0.0294 CS NF-?B Signaling 0.24 0.03 0.20 0.0294 CS Atherosclerosis Signaling 0.88 0.63 0.25 0.0363 CS Communication between Innate and Adaptive Immune Cells 0.32 0.10 0.22 0.0378 CS IL-12 Signaling and Production in Macrophages 0.21 0.03 0.17 0.0575 CS Reelin Signaling in Neurons 0.21 0.03 0.17 0.0575 CS TREM1 Signaling 0.29 0.10 0.19 0.0676 CS IL-17A Signaling in Fibroblasts 0.32 0.13 0.19 0.0855 CS Leukotriene Biosynthesis 0.38 0.17 0.22 0.0934 CS  1096.3.7 Dissection of frequently deregulated pathways  We next wondered if pathways displaying similar disruption frequencies in CS and NS tumours might be altered through disruption of different components. Although we did not notice any pathways with strikingly different patterns of component alteration we identified two commonly deregulated pathways, axonal guidance signaling and xenobiotic metabolism, that were affected by multiple differentially disrupted genes. These pathways were also affected by genes that exhibited common disruption in both CS and NS, albeit in only a few individual tumours. Differentially disrupted components of the axonal guidance pathway, which has recently been implicated in cancer cell growth and survival, included upregulation of TUBB3 and MMP13 in CS and downregulation of TUBB6 in NS 155,156. In the xenobiotic metabolism pathway, CS displayed downregulation of ALDH1A1 while NS exhibited upregulation of SULT1C2, CYP3A5, and ABCC3. Thus, disruption of some pathway components in a smoking specific manner is evident, which could have therapeutic implications.  We were also interested in analyzing differences in pathway component disruption on an individual tumour level as tumours disrupted at distinct points in prominently disrupted pathways could potentially respond differently to therapeutics targeting those pathways; for example, this is known to be true for the EGFR-RAS signaling pathway. One of the most commonly disrupted pathways we identified was the Wnt/?-catenin pathway, which drives cellular proliferation and survival 98,99. Given that the Wnt pathway is so frequently altered in cancer and that it drives several hallmarks of malignancy, numerous inhibitors have been designed to inhibit the pathway and some are currently in pre-clinical trials 98. In our cohort, we found that downregulation of SOX17, SOX7, WNT3A, SFRP1, and WIF1 and upregulation of CDH3 were the most frequent Wnt components disrupted. These components are located at the top of the signaling cascade at the level of receptor ligand binding (WNT3A, SFRP1, WIF1, CDH3), and at the bottom of the cascade at the level of transcription (SOX17, SOX7). Thus, inhibitors targeting these intervention points may be best suited for use in lung AC. However, despite the high disruption frequency of these particular genes, we observed considerable heterogeneity in Wnt pathway component disruption in our lung tumours. For  110instance, we identified two tumours with very different patterns of Wnt disruption, as indicated in Figure 6.6. Thus, for Wnt inhibitors to have a chance at being therapeutically efficacious in lung cancer patients, it may be important to discern the mechanisms driving Wnt disruption in individual tumours to inform the selection of specific Wnt pathway inhibitors.   6.3.8 Benefit of the multi-dimensional, individual tumour approach  As previously discussed, using a multi-'omics data analysis approach can provide much more insight into tumour biology than a uni-dimensional analysis alone because it reveals a greater number of disrupted genes and pathways 47,48. This has been demonstrated previously and our integrative analysis results corroborate this concept. Using our multi-dimensional approach, we have identified several genes, and consequently, pathways that would have been otherwise overlooked had we performed a grouped tumour versus non-malignant analysis as opposed to an individual tumour analysis, and had we not profiled multiple genomic and epigenomic dimensions to consider different mechanisms of gene disruption (Figure 6.1). For example, some of the most frequently disrupted genes we identified in Table 6.6 were disrupted primarily through one type of DNA alteration, including: CDO1 (hypermethylated in 76% and copy loss in 16%) and SOX17 (hypermethylated in 87% and copy loss in 14%) which were predominantly disrupted through hypermethylation; and ETV4 (copy gain in 38% and hypomethylation in 3%) and EEF1A2 (copy gain in 53% and hypomethylation in 1%) which were disrupted mainly by copy gains. Conversely, MMP11 exhibited moderate frequencies of upregulation through both hypomethylation (21%) and copy gain (26%) but a much higher cumulative disruption frequency (59%) when both of these mechanisms and gene expression were considered simultaneously. These genes would have been overlooked had we investigated CNAs alone or aberrant DNA methylation alone, and if we had not assessed these alterations simultaneously within individual tumours. Consequently, the pathways these genes are involved in may also have been overlooked; this is especially evident for the taurine biosynthesis pathway which was deregulated through CDO1 downregulation in every tumour for which this pathway was significant.    111The heterogeneity we observed in pathway disruption, both in the components disrupted and the mechanisms of component disruption, further illustrates the importance of performing individual tumour analysis. For instance, 52 different components of the Wnt/?-catenin signaling pathway were disrupted in our cohort and these 52 genes were disrupted through a variety of different mechanisms in individual tumours (e.g. CNAs, aberrant methylation, or gene expression changes). MITRA prioritized 37 of these 52 genes in less than 20% of tumours, and had we only considered a single 'omics dimension, the frequency of disruption of these genes would have been even lower. If we had employed a group frequency based approach, most of these genes would not have passed the frequency criteria for pathway analysis input and thus, we may have overlooked the prominence of Wnt/?-catenin pathway disruption in our lung AC samples.   Our consideration of multiple 'omic dimensions simultaneously in individual tumours has enabled us to identify genes disrupted primarily by one type of genetic or epigenetic alteration and genes recurrently disrupted but at low, insignificant frequencies by any one mechanism. We performed our pathway analysis on individual tumours because we appreciate the heterogeneity that exists in pathway disruption in different tumours. Assessing the frequency of pathway deregulation based on disruption in individual tumours revealed pathways frequently deregulated through different components in different tumours and through different genetic or epigenetic mechanisms. Focusing on pathway deregulation as a whole (i.e. considering multiple genes/components and their consequential effects on pathway functionality), as opposed to a gene-centric approach, offers an alternative strategy for discovering new therapeutic intervention points that may have a broader application to lung AC patients. For example, our analysis revealed pathways with disruption frequencies (> 15% of tumours) greater than many of the rare tumour mutations that are therapeutically targeted in the 3-5% of patients that harbour them (e.g. ROS and RET fusions, BRAF, HER2, and PIK3CA mutations). Thus, by using an alternative method to the conventional grouped comparison strategy, which only reveals genes with the most prominent disruption frequencies, we were able to discover more molecular alterations and insight into their potential biological implications in individual tumours and across our lung AC cohort.  112Figure 6.6  Figure 6.6 Wnt/?-catenin signaling can be disrupted through distinct mechanisms in individual tumours  113Figure 6.6 Wnt/?-catenin signaling can be disrupted through distinct mechanisms in individual tumours. Disruption of the Wnt pathway is illustrated for two individual tumours, one CS and one NS. The NS tumour is affected primarily through CDH disruption and disruption of cytoplasmic components of the pathway such as SRC and DSH. The CS tumour displays prominent disruption at the level of receptor ligand binding, involving inactivation of multiple inhibitors including DKK, WIF, and SFRP. Distinct mechanisms of pathway disruption are evident in these tumours, and this heterogeneity could be clinically important for informing Wnt inhibitor selection.   6.4 Discussion  6.4.1 Summary of findings  We developed a novel algorithm to integrate multi-dimensional 'omics data for the analysis of tumour genomes called MITRA. The purpose of MITRA was to rank genes based on the magnitude of their disruption to prioritize gene candidates likely to be causally involved in tumourigenesis. We demonstrated the ability of MITRA to identify well established genes in a multi-'omics breast cancer dataset, and based on this validation of its utility, we applied MITRA to our lung AC cohort. Consistent with the findings described in previous chapters, we observed genes and pathways commonly and differentially disrupted in CS and NS, suggesting lung AC develops through common and different genetic and epigenetic alterations and consequential pathway deregulation that promote lung tumourigenesis. Our results demonstrate the knowledge that can be obtained by employing an individual tumour, systems based analysis strategy.   6.4.2 Utility of MITRA and potential for adaptation and improvement  We developed MITRA, an algorithm designed for integrating multiple 'omics data types and to facilitate the identification of biologically relevant genes in cancer by ranking them based on the magnitude of DNA and RNA level alterations they exhibit. MITRA has several  114attractive features including: ease of use, adaptability to different data types, the option for users to define their own scoring criteria, amenability for adding or removing different data dimensions, and the ability to analyze individual tumours as opposed to commonly applied grouped approaches. When applied to a multi-'omics breast cancer cohort from the TCGA, MITRA identified known breast cancer genes and genes with functions likely to contribute to tumourigenesis but not previously described in breast cancer before, demonstrating its ability and potential to make biologically relevant findings.  We chose a scoring system because scoring effectively ranks genes based on their extent of disruption which was fitting with our hypotheses. Prioritized gene lists generated based on MITRA scores can serve as input for downstream applications. For example, this approach can be a highly informative method for pathway analysis, as opposed to traditional pathway enrichment strategies which rely on input from a single dimension (usually gene expression fold change). Using the top ranked genes from MITRA, we performed pathway analyses on all 64 lung AC tumours and revealed numerous pathways that are involved with tumour biology.  It is likely that genes are disrupted through mechanisms we did not incorporate into the MITRA algorithm, such as mutation or miRNA alterations, which are highly relevant tumourigenic events. As these data types become increasingly available and better understood, it will be possible to incorporate them. Some important factors must be considered when integrating additional data dimensions into MITRA. miRNA for instance, have many computationally predicted mRNA targets, the vast majority of which remain to be experimentally proven. Additionally, disruption of a single miRNA may affect as many as 200 mRNA targets, further complicating the interpretation of miRNA deregulation. For mutation data, mutation type and predicted consequences on gene function and cell biology require interpretation. One possibility would be to first apply a mutation prediction algorithm to score mutations prior to input into MITRA. The complexity and varied biological consequences of many genetic and epigenetic alterations suggest incorporation of such data types at present would be largely based on speculation or computational predictions which may or may not be biologically relevant. Thus, for the purpose of introducing MITRA, we  115restricted the incorporated data types to those with widely characterized and generally accepted biological effects.  6.4.3 Disrupted genes and pathways in the context of lung tumour biology  Our MITRA analysis of lung AC revealed numerous genes and pathways prominently disrupted in both CS and NS or in one group or the other. The genes and pathways we identified included previously described oncogenes, TSGs, and lung cancer biomarkers. Moreover, many of the genes and pathways were involved in processes associated with the hallmarks of cancer, suggesting the gene disruption MITRA prioritized is likely relevant to tumour biology. This is consistent with the hypothesis that all causal genes in cancer can be categorized into a finite number of cancer pathways controlling cell fate, survival and genomic integrity 33. The most prominent pathway described in lung AC is the EGFR/RAS/PI3K pathway which drives the transcription of genes involved in proliferation, invasion, angiogenesis, metastasis, and resistance to apoptosis 37. Our pathway analyses revealed this pathway was affected in 3% of the tumours we analyzed, which was not surprising given that critical components of the pathway, EGFR, KRAS, and PI3KCA, are predominantly disrupted through point mutations which we did not incorporate into our MITRA analysis. However, we know that 62% of the CS and 57% of the NS in our study harboured KRAS and EGFR mutations, respectively, providing evidence that this pathway is prominently disrupted in our cohort.   Several pathways that we identified, especially those strongly associated with CS, were involved with modulating immune and inflammatory response, processes which have recently become hot topics in lung cancer research 150,157-159. The fact that such pathways are prominently disrupted in CS likely reflects the stress imposed on lung epithelial cells by tobacco smoke 157. Lung tumours are known to manipulate the microenvironment through secretion of factors that promote suppression of tumour killing immune cells and recruitment of immunosuppressive cells 158,159. The disrupted genes and pathways we identified are consistent with this phenotype; for example, we observed deregulation of numerous genes encoding chemokines and cytokine receptors including interleukins that promote tumour cell  116survival 160, and eicosanoid related pathways that produce prostoglandins and leukotrienes that generate a favourable tumour environment 161. Thus, our results are consistent with previous findings in lung cancer. Encouragingly, based on the realization that these processes are integral to lung tumourigenesis, chemoprevention strategies to temper inflammation and immunotherapies have been proposed for the treatment of high risk individuals and patients with lung cancer 158,160,162. Although we did not validate the disruption of these genes and pathways biologically, our results suggest they could be promising targets to investigate in future studies.  Surprisingly, we observed a number of pathways that did not seem relevant to lung epithelial cells, such as, bile acid biosynthesis, reelin signaling in neurons, leukocyte extravasation, granulocyte adhesion and diapedesis and the role of osteoblasts, osteoclasts and chondrocytes in rheumatoid arthritis. Since these pathways seemed to be specific to non-lung cells, we examined the genes contributing to their disruption. This revealed numerous genes with roles in cancer associated pathways, erasing any doubts about the relevance of disruption of these processes in the lung tumours we studied. For instance, i) the bile acid biosynthesis pathway was disrupted through aldo-keto reductase genes and some of these genes act as prostoglandins to promote tumours 163; ii) reelin signaling has been implicated in breast and esophageal cancer migration and metastasis demonstrating it is not limited to neurons 164,165, and iii) the rheumatoid arthritis related pathway we identified was comprised of numerous Wnt pathway components, which is not unforeseen given the role of Wnt signaling in bone regeneration and the fact that we also identified Wnt/?-catenin signaling as frequently disrupted. The IPA database contains hundreds of annotated pathways, which contain hundreds of genes themselves; thus, since individual genes may be involved in multiple pathways, it is not unexpected that pathways with seemingly disparate cellular functions or contexts are deemed significant when only one seems relevant to a particular cancer phenotype. Furthermore, tumours may activate cellular processes characteristic of different cell types to enable a specific phenotype. For example, to acquire a cellular morphology capable of facilitating invasion, migration or metastasis, tumour cells may activate or inactivate genes involved in cell adhesion and extracellular matrix remodelling. In this  117context, disruption of pathways such as leukocyte extravasation and granulocyte adhesion and diapedesis make sense.  6.4.4 Challenge of validation in public datasets  To date, there are few lung cohorts with multi-'omics data for patient matched tumour and non-malignant lung tissues. Although the TCGA has generated hundreds of 'omics profiles, the majority are for tumour samples, and of the small number of profiles available for non-malignant lung specimens, very few are from patients with matching tumour data. This is not a concern for the TCGA given that they perform analyses using a two group approach (all tumours versus all normals); however, our strategy of treating each tumour as its own system requires patient matched non-malignant profiles for defining tumour acquired alterations and to account for individual variation in order to delineate the specific mechanisms of tumourigenesis in individual lung AC cases. Another drawback of publically available 'omics datasets is the limited availability of clinical annotation for the specimens studied. Although the TCGA provides this data, few of the hundreds of datasets deposited in the public domain have this information. Since our work revolves around elucidating differences in CS and NS lung cancer, smoking history information is critical for validating our findings. As a result of these limitations, we were unable to investigate the genes and pathways we identified in external cohorts using the strategy we developed. Nevertheless, since the genes we described as differentially disrupted were prioritized based on DNA level alterations associated with consequential expression changes, we looked at mRNA expression data in independent cohorts. In total, 45/115 (39%) genes displayed significant and concordantly different expression changes between CS and NS in one of two external cohorts. Thus, over one third of the genes we identified were corroborated as differentially disrupted in independent datasets at the expression level. However, it is likely that a greater number of genes would have validated if we had been able to consider DNA alterations in these genes simultaneously, using a similar approach to the one we employed to discover differentially disrupted genes.     1186.4.5 Potential clinical implications of tumour system based 'omics analyses  Over the past few years, clinical oncologists have been progressing towards personalizing cancer treatment. The idea is to screen patient tumours for specific, druggable driver mutations so that appropriate targeted therapies may be used for treatment in hopes of achieving improved patient prognosis compared to standard chemotherapy 12,13. For example, the Norwegian Cancer Genomics Consortium has initiated a project to incorporate genome sequencing into standard clinical practice for cancer patients 166. This emphasizes the acknowledgement that studying individual tumours to understand patient specific mechanisms of tumourigenesis is critical for rational therapeutic decision making and improving patient outcome. We designed the MITRA algorithm with a tumour system based 'omics approach in mind. Our results indicated heterogeneity in gene and pathway disruption in CS and NS lung tumour groups, and across individual tumours. This provides a strong rationale for performing individualized tumour system analyses, as grouped approaches cannot identify genes that are not recurrently disrupted within a group.  By examining different mechanisms of gene disruption simultaneously on a patient by patient basis, we identified numerous genes with very high frequencies of disruption that were not previously implicated in lung cancer, suggesting these genes have been overlooked in previous studies because only one or two 'omics dimensions were assessed. In a clinical setting, failure to detect disruption of a gene that promotes tumour progression could dramatically alter patient outcome. Therefore, it is essential to assess multiple mechanisms of gene and pathway disruption to accurately determine whether they are deregulated or not. Furthermore, determining the location of component disruption within a pathway could be beneficial for informing how and where to target a given pathway.  Had we not performed multi-'omic, individual tumour profiling, we may have overlooked the Wnt/?-catenin pathway as it was disrupted through numerous components at low frequencies and through several different mechanisms in different tumours. The Wnt pathway has become a strong candidate for therapeutic intervention in lung and other cancers given the prominence of its activation in solid tumours, and this has spurred the development and  119identification of many Wnt inhibitors 98,99. These inhibitors target various levels of the Wnt/?-catenin signaling cascade including receptor ligand binding at the cell membrane, ?-catenin complex formation in the cytoplasm, and transcriptional complex function in the nucleus. Based on the heterogeneous molecular disruption we identified in Wnt pathway components, it is likely that different tumours would respond differently to inhibitors targeting different aspects of the pathway. Thus, if currently available Wnt inhibitors were approved for clinical use in lung AC patients in the future, it would be imperative to understand the complement of component disruption contributing to Wnt pathway deregulation in order to select the most beneficial inhibition strategy. Although our lung AC analysis was small scale and the genes and pathways we identified require validation in larger cohorts using a similar approach, the method we have developed and the principles we have demonstrated suggest our individualized, tumour system multi-'omics strategy could provide clinically valuable information to inform patient management if integrated into clinical practice.  1207    Chapter: Conclusions  7.1 Summary of study and findings  Lung cancer kills nearly 1.5 million people worldwide each year, emphasizing the fact that novel diagnostic and therapeutic strategies are desperately needed to improve patient outcome 1. In order to improve our ability to prevent, treat, and manage this disease, an improved understanding of how molecular alterations and their consequential effects on cellular pathway disruption function to promote tumour development and progression is of utmost importance.  Given the enormous efforts put forth to promote smoking cessation and prevention initiatives, in the next few decades NS (and FS) will constitute a larger proportion of lung cancer patients 15. It is a well established concept that lung tumours in CS and NS are distinct disease entities based on the disparate clinical phenotypes they display 15,41,43,44. At the DNA level, clinically relevant molecular differences discovered to date are gene-specific and cannot account for all of the clinical differences exhibited by CS and NS lung cancer patients 15,41,44. A recent study showed that NS have more mtDNA mutations and mtDNA content than smokers 46; and most recently, genome sequencing studies have identified differences in the mutational spectra of CS and NS lung tumours 23,26,29. Although these new findings suggest the molecular distinction between CS and NS tumours extends beyond EGFR and KRAS mutations and ALK fusions, the driver alterations for up to 50% of cases remain to be elucidated which highlights the need to reveal novel molecular mechanisms of tumourigenesis. The goal of this work was to characterize global molecular differences across multiple genomic and epigenomic dimensions in lung ACs from CS and NS, and to identify prominently disrupted genes and pathways that may represent novel therapeutic targets. During this process, we also developed a multi-'omics, integrative approach to analyze tumours as individual systems in order to facilitate the discovery of molecular mechanisms likely contributing to tumour biology in individual tumours.   121In Chapters 3, 4 and 5, genome wide DNA copy number, methylation, and miRNA profiles were examined in lung AC and compared between CS and NS. Considering global characteristics, we found that NS tumour genomes had a greater extent of copy number alterations than CS, while CS had a greater extent of DNA methylation changes and aberrant miRNA expression. Considering alterations to specific loci and miRNA, we noted both commonalities and differences. The prominently disrupted genes and miRNA we identified were involved in a variety of cellular processes that have been implicated in cancer in general and in lung cancer specifically, indicating their importance to tumour biology.  In Chapter 6, we aimed to develop a method to enable integration of the multi-'omics data we generated. We hypothesized that considering multiple mechanisms of gene disruption simultaneously for individual tumours would be more informative and would enhance our ability to detect gene and pathway deregulation in lung AC. We designed an algorithm called MITRA to rank genes based on the magnitude of disruption they experience in individual tumours, reasoning that genes exemplifying high level DNA and consequential gene expression changes are more likely to be biologically relevant and to have roles in tumourigenesis. Consistent with our findings in Chapters 3-5, application of MITRA to CS and NS tumours revealed numerous prominently disrupted genes, some of which were preferentially disrupted in one smoking group or the other. Pathway analyses on the most highly ranked genes in each individual tumour identified several cellular processes affected. These pathways, and the genes contributing to their disruption, were involved in numerous hallmarks of cancer including: proliferation, cell death, cell survival, metabolism, inflammation and immune evasion 36.  7.2 Conclusions regarding the study hypotheses  The research question underlying this thesis was, do lung tumours of CS and NS exhibit differential patterns of genome wide molecular alteration? We hypothesized that lung cancers in CS and NS arise through different molecular mechanisms, and accordingly, that they would display disparate patterns of gene and pathway deregulation. Furthermore, we hypothesized that performing multi-'omics, integrative analyses on individual tumours would  122facilitate the discovery of molecular alterations promoting tumourigenesis in these two groups. The results of this study support these hypotheses. Our characterization of multiple 'omics dimensions revealed that, in addition to exhibiting similarities, CS and NS lung AC displayed numerous differences in the molecular alterations they undergo, both at the level of individual gene disruption and the global extent of disruption. Our finding of differences in copy number, methylation and miRNA expression alterations are consistent with previous reports of differing genome wide mRNA expression profiles of lung AC from CS and NS 127,167,168. Therefore, our results taken together with those of previous studies provide multifaceted evidence that lung AC of CS and NS exhibit genome wide differences in the molecular alterations they sustain. We believe this indicates that lung tumours in these groups develop and progress, at least in part, through the differential selection of genetic alterations that drive oncogenic processes specific to CS and NS tumourigenesis. It is possible that these differences could underlie the different clinical features associated with the smoking histories of lung cancer patients. Of course, we do not overlook the fact that many genes and pathways we identified were commonly disrupted in CS and NS, which suggests that disruption of some critical cellular processes are required for lung tumourigenesis, regardless of smoking history.   7.3 Strengths and limitations of this study  There were a number of strengths and limitations associated with this study. To our knowledge, our study represents the first multi-'omics, integrative analysis of lung AC, and the most comprehensive comparison of molecular alteration patterns in CS and NS lung cancers to date. Having patient matched, adjacent non-malignant samples to use as a baseline for defining tumour alterations is a strength of our study, since this method accounts for individual variability, although we do not overlook the fact that this tissue is obtained from a site located within the tumour field and could harbour DNA and RNA changes itself which could potentially influence our results. Another strength is our assessment of multiple mechanisms of gene disruption simultaneously, as this is a novel strategy for identifying complementary alterations that disrupt tumour promoting genes and pathways. Our unique method of analyzing tumours individually, treating each as its own system comprised of  123various molecular alterations, enabled the identification of candidate cellular pathways and mechanisms driving tumourigenesis in individual patients, which is important for personalizing medicine.   In terms of limitations, our lung AC cohort was relatively small in size. In light of this, validation of our findings in larger, independent cohorts is necessary. Unfortunately, there are no publically available cohorts with multi-'omics profiles for patient matched tumour and non-malignant samples, precluding us from validating our findings using the same approach we applied to our cohort. Furthermore, of the uni-dimensional datasets available in the public domain, very few have smoking history annotation. Thus, we may need to develop a novel approach for validating our findings by integrating uni-dimensional 'omics datasets from different cohorts and using tumour and non-malignant profiles from different patients. Another limitation of our study is the potential for mutation and ethnic status to confound our results, as described in Chapter 3. Since EGFR mutation and Asian ethnicity are strongly associated with NS, and KRAS mutation with CS, it is difficult to attribute the differences we identified solely to smoking history. For this reason, we performed multivariate analyses to demonstrate that smoking was the variable most strongly associated with the findings we observed. A cohort ten times the size of the cohort we studied would be required to fully decipher the effects of these variables on the differences we observed. Another challenge associated with this study is the interpretation of pathway analysis results. We identified pathways with significantly different frequencies of disruption in CS and NS but we were cautious not to make claims about pathway activation or inactivation. For most pathways to be identified as statistically significant in IPA, multiple components of the pathway must be disrupted. Thus, for pathways affected by numerous genes, it is difficult to hypothesize the cumulative biological effect of concerted up- and downregulation of various pathway components disrupted, especially for complex pathways comprised of dozens to hundreds of components.      1247.4 Overall significance and clinical implications of research findings  Our finding of molecular differences in CS and NS lung tumours provides a rationale for the stratification of patients based on smoking status in future studies, which will facilitate the discovery of additional differences between CS and NS lung cancer. Such prospective findings will have significant implications and may lead to the development of clinical tools that could be utilized to improve the prognosis of both CS and NS patients. Recent lung cancer research has demonstrated the clinical importance of classifying lung tumours based on histology and molecular pathology, as different histological or molecular types of lung cancer have different clinical phenotypes such as prognosis, aggressiveness, and response to treatment 5,169. We propose that pending further validation of our findings, CS and NS lung cancer patients would likely benefit from separate management strategies tailored to the distinct features their tumours exhibit.  We have also demonstrated a unique strategy involving multi-'omics integrative analyses of individual tumour genomes. This method identified several prominently disrupted genes that are strong candidates for causal involvement in lung tumourigenesis. We described the importance of assaying multiple mechanisms of disruption simultaneously to accurately determine the alteration status of a gene. It may be important to examine multiple modes of disruption for well-known lung cancer genes in the clinic to ensure that aberrant gene activation or inactivation is not overlooked, and that treatment decisions are made based on accurate disruption status. Our analyses identified numerous genes not previously characterized with respect to lung cancer, which could be candidates for lung cancer biomarkers considering the high disruption frequencies they displayed.   Lastly, we showed the utility of our analysis method for dissecting disruption of cellular pathways. Given that our multi-'omics integrative strategy is capable of detecting more disrupted genes because it considers multiple mechanisms of disruption, it has greater sensitivity to identify disrupted pathways 47. We demonstrated the heterogeneity that individual tumours can display at the pathway level, using the Wnt/?-catenin pathway as an example, and suggested that understanding the location of pathway disruption may be  125important for informing treatment decisions. The EGFR/RAS/PI3K signaling pathway is a lung cancer pathway for which this concept is already known to be important. EGFR and KRAS are both frequently activated in lung AC; however, mutations in these genes are mutually exclusive and their deregulation disrupts different levels of the signaling pathway. Patients harbouring EGFR mutations respond to EGFR targeting tyrosine kinase inhibitors and therapeutic antibodies, but KRAS mutant patients respond adversely to these therapies 170. Therefore, understanding the component disruption that contributes to pathway deregulation is critical for selecting therapies. Furthermore, since some of the pathways we identified were so frequently disrupted, we suggest that more emphasis should be placed on identifying pathway level inhibitors in addition to the current paradigm which focuses on targeting specific tumour mutations and gene fusions. Such pathway targeting therapies could potentially benefit a larger fraction of lung AC patients relative to those targeting some of the rare mutations in lung tumours.  7.5 Future research directions  There are several avenues of research that should be pursued to further evaluate the significance of the discoveries made in this study and to address additional questions arising from the results of this work. These include: i) external cohort validation of disrupted genes identified, ii) biological validation of the candidate genes identified, iii) assessment of associations between gene disruption and clinical phenotypes, iv) characterizing the heterogeneity in gene and pathway disruption within CS and NS tumour groups, and v) assessing the therapeutic potential of prominently disrupted pathways discovered. First and foremost, the molecular alterations that we identified as prominently disrupted across the entire lung AC cohort, and those that were specific to CS or NS should be robustly validated in prospective, independent cohorts as they become available. This is especially true for the genes we identified that do not have well characterized roles in tumourigenesis based on the literature, as such novel genes could be promising targets to follow up and characterize in the context of lung cancer.    126Another future direction is the biological and functional characterization of the many genes and pathways we identified to confirm that they are in fact involved in lung tumourigenesis. If disruption of a gene contributes to tumour biology, it should provide a selective advantage that cancer cells exploit to promote growth and proliferation. Consequently, manipulation of candidate genes in vitro and in vivo models should alter the tumourigenic properties of cancer cells which can be measured and quantified using functional assays that assess hallmark properties of cancer cells. Numerous lung cancer cell lines have been developed and well characterized, providing useful tools for this purpose 171. Importantly, EGFR and KRAS mutant cell lines are available, providing models for NS and CS lung tumour biology, respectively. Conversely, the ability of a gene to transform non-malignant cells can also be assessed in lung epithelial cell models 172. Pending biological validation in vitro, candidate genes can then be assessed for their roles in tumourigenesis in animal models. Several mouse strains engineered to express or repress lung cancer genes such as EGFR, KRAS, ROS1, ALK, PTEN, and P53 are available for assessing candidate gene manipulation in specific genetic contexts 173-175.  In addition to biological and functional characterization, candidate genes can also be investigated for clinical significance. For instance, genes should be assessed in well annotated clinical cohorts to determine whether their expression holds any prognostic or predictive value. Genes associated with tumour invasiveness, relapse, or therapeutic efficiency may represent clinically useful biomarkers, strengthening the evidence supporting their involvement in tumourigenesis. If such associations are identified, they can be investigated in model systems to validate predicted involvement in tumour phenotypes like metastasis and drug response.  Our findings suggest that heterogeneity with respect to the genes and pathways that were disrupted across the lung AC cohort and within the CS and NS tumour groups exists. With the exception of five pathways that were disrupted in over 60% of both CS and NS tumour groups, disruption of many pathways identified occurred in a small number of CS or NS tumours. If a large enough cohort could be assembled, it would be interesting to determine whether clinically relevant subgroups within CS and NS tumours exist based on gene and  127pathway disruption discovered through multi-'omics integration. For example, it could be hypothesized that specific subtypes of NS lung cancer exist because these tumours may be initiated by different carcinogens. While on the topic of heterogeneity, it is worthwhile mentioning that heterogeneity is also prominent within individual tumours 176. Thus, it would be interesting to study lung AC in this context, and to compare heterogeneity in CS and NS samples and how it relates to clinical phenotypes. One way to address this would be to perform single cell analysis 177. This approach has been applied to the study of haematological malignancies, which are highly amenable to flow sorting platforms 178. Single cell analysis of a large population of tumour cells would provide insight about different tumour clones with different patterns of gene and pathway disruption. This in turn could be useful for understanding aspects of tumour biology such as therapeutic response and drug resistance, and hence, for informing therapeutic selection. However, application of this strategy to solid tumours would require development of methods to dissociate tumour cells without modifying their biological features during the process.                                                                                                                                                       Lastly, another future direction that could prove to be very clinically relevant is assessment of the therapeutic potential of the strongest gene and pathway candidates. Genes and pathways whose disruption is i) validated in external cohorts, ii) demonstrated to contribute to tumourigenesis, and iii) associated with clinical phenotypes such as drug response would be rational candidates to investigate therapeutically. Obviously this research would be dependent on drug availability. The Wnt/?-catenin pathway was prominently disrupted in our lung AC cohort, with a frequency of disruption of 26% in CS and 37% in NS. Analysis of this pathway in a larger cohort could reveal that disruption of Wnt signaling is significantly more frequent in NS. Regardless, we observed disparate patterns of Wnt pathway component deregulation in different tumours, suggesting sensitivity of cancer cells exhibiting Wnt activation likely depends on the location of pathway disruption and the mode of action of the inhibitor selected. Given that several Wnt inhibitors currently exist and are being evaluated in preclinical models, a comprehensive dissection of Wnt pathway disruption in a large cohort of lung AC could reveal distinct patterns of Wnt component disruption in subsets of tumours 179,180. Thus, characterization of Wnt and other disrupted pathways in individual tumours  128could reveal the most beneficial therapeutic intervention points, and this information could be used to select inhibitors targeting these points of the pathway.   129Bibliography  1. Jemal, A. et al. Global cancer statistics. CA Cancer J Clin 61, 69-90 (2011). 2. Nana-Sinkam, S.P. & Powell, C.A. Molecular Biology of Lung Cancer: Diagnosis and Management of Lung Cancer, 3rd ed: American College of Chest Physicians Evidence-Based Clinical Practice Guidelines. Chest 143, e30S-9S (2013). 3. Herbst, R.S., Heymach, J.V. & Lippman, S.M. Lung cancer. N Engl J Med 359, 1367-80 (2008). 4. Pao, W. & Girard, N. New driver mutations in non-small-cell lung cancer. Lancet Oncol 12, 175-80 (2011). 5. Travis, W.D., Brambilla, E. & Riely, G.J. New pathologic classification of lung cancer: relevance for clinical practice and clinical trials. J Clin Oncol 31, 992-1001 (2013). 6. Pao, W., Iafrate, A.J. & Su, Z. Genetically informed lung cancer medicine. J Pathol 223, 230-40 (2011). 7. Pao, W. & Chmielecki, J. Rational, biologically based treatment of EGFR-mutant non-small-cell lung cancer. Nat Rev Cancer 10, 760-74 (2010). 8. Gerber, D.E. & Minna, J.D. ALK inhibition for non-small cell lung cancer: from discovery to therapy in record time. Cancer Cell 18, 548-51 (2010). 9. Network, T.C.G.A. Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519-25 (2012). 10. Rudin, C.M. et al. Comprehensive genomic analysis identifies SOX2 as a frequently amplified gene in small-cell lung cancer. Nat Genet 44, 1111-6 (2012). 11. Chen, R. & Snyder, M. Systems biology: personalized medicine for the future? Curr Opin Pharmacol 12, 623-8 (2012). 12. Vucic, E.A. et al. Translating cancer 'omics' to improved outcomes. Genome Res 22, 188-95 (2012). 13. Kris, M.G.L., C. Y. et al. Initial results of LC-MAP: An institutional program to routinely profile tumor specimens for the presence of mutations in targetable pathways in all patients with lung adenocarcinoma. in 2010 ASCO Annual Meeting Vol. 28 (2010).  13014. Hecht, S.S. Tobacco carcinogens, their biomarkers and tobacco-induced cancer. Nat Rev Cancer 3, 733-44 (2003). 15. Sun, S., Schiller, J.H. & Gazdar, A.F. Lung cancer in never smokers--a different disease. Nat Rev Cancer 7, 778-90 (2007). 16. Besaratinia, A. & Pfeifer, G.P. Second-hand smoke and human lung cancer. Lancet Oncol 9, 657-66 (2008). 17. Organization, W.H. Radon and Cancer. WHO handbook on indoor radon (2009). 18. Hajdu, S.I. Much overlooked causes of lung cancer. Ann Clin Lab Sci 41, 97-101 (2011). 19. Marshall, A.L. & Christiani, D.C. Genetic susceptibility to lung cancer--light at the end of the tunnel? Carcinogenesis 34, 487-502 (2013). 20. Bailey-Wilson, J.E. et al. A major lung cancer susceptibility locus maps to chromosome 6q23-25. Am J Hum Genet 75, 460-74 (2004). 21. Broet, P. et al. Genomic profiles specific to patient ethnicity in lung adenocarcinoma. Clin Cancer Res 17, 3542-50 (2011). 22. Carvalho, R.H. et al. Genome-wide DNA methylation profiling of non-small cell lung carcinomas. Epigenetics Chromatin 5, 9 (2012). 23. Govindan, R. et al. Genomic landscape of non-small cell lung cancer in smokers and never-smokers. Cell 150, 1121-34 (2012). 24. Heller, G. et al. Genome-wide CpG island methylation analyses in non-small cell lung cancer patients. Carcinogenesis 34, 513-21 (2013). 25. Job, B. et al. Genomic aberrations in lung adenocarcinoma in never smokers. PLoS One 5, e15145 (2010). 26. Liu, P. et al. Identification of somatic mutations in non-small cell lung carcinomas using whole-exome sequencing. Carcinogenesis 33, 1270-6 (2012). 27. Massion, P.P. et al. Smoking-related genomic signatures in non-small cell lung cancer. Am J Respir Crit Care Med 178, 1164-72 (2008). 28. Pleasance, E.D. et al. A small-cell lung cancer genome with complex signatures of tobacco exposure. Nature 463, 184-90 (2010). 29. Seo, J.S. et al. The transcriptional landscape and mutational profile of lung adenocarcinoma. Genome Res 22, 2109-19 (2012).  13130. Shedden, K. et al. Gene expression-based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study. Nat Med 14, 822-7 (2008). 31. Ding, L. et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069-75 (2008). 32. Weir, B.A. et al. Characterizing the cancer genome in lung adenocarcinoma. Nature 450, 893-8 (2007). 33. Vogelstein, B. et al. Cancer genome landscapes. Science 339, 1546-58 (2013). 34. Pao, W. & Hutchinson, K.E. Chipping away at the lung cancer genome. Nat Med 18, 349-51 (2012). 35. Toyooka, S. et al. Molecular oncology of lung cancer. Gen Thorac Cardiovasc Surg 59, 527-37 (2011). 36. Hanahan, D. & Weinberg, R.A. Hallmarks of cancer: the next generation. Cell 144, 646-74 (2011). 37. Brambilla, E. & Gazdar, A. Pathogenesis of lung cancer signalling pathways: roadmap for therapies. Eur Respir J 33, 1485-97 (2009). 38. Ray, M.R., Jablons, D. & He, B. Lung cancer therapeutics that target signaling pathways: an update. Expert Rev Respir Med 4, 631-45 (2010). 39. Parkin, D.M., Bray, F., Ferlay, J. & Pisani, P. Global cancer statistics, 2002. CA Cancer J Clin 55, 74-108 (2005). 40. Sanchez-Cespedes, M. et al. Chromosomal alterations in lung adenocarcinoma from smokers and nonsmokers. Cancer Res 61, 1309-13 (2001). 41. Subramanian, J. & Govindan, R. Lung cancer in never smokers: a review. J Clin Oncol 25, 561-70 (2007). 42. Toyooka, S. et al. Mutational and epigenetic evidence for independent pathways for lung adenocarcinomas arising in smokers and never smokers. Cancer Res 66, 1371-5 (2006). 43. Samet, J.M. et al. Lung cancer in never smokers: clinical epidemiology and environmental risk factors. Clin Cancer Res 15, 5626-45 (2009). 44. Rudin, C.M. et al. Lung cancer in never smokers: molecular profiles and therapeutic implications. Clin Cancer Res 15, 5646-61 (2009).  13245. Lee, Y.J. et al. Lung cancer in never smokers: change of a mindset in the molecular era. Lung Cancer 72, 9-15 (2011). 46. Dasgupta, S. et al. Mitochondrial DNA mutations in respiratory complex-I in never-smoker lung cancer patients contribute to lung cancer progression and associated with EGFR gene mutation. J Cell Physiol (2011). 47. Chari, R., Coe, B.P., Vucic, E.A., Lockwood, W.W. & Lam, W.L. An integrative multi-dimensional genetic and epigenetic strategy to identify aberrant genes and pathways in cancer. BMC Syst Biol 4, 67 (2010). 48. Chari, R. et al. Integrating the multiple dimensions of genomic and epigenomic landscapes of cancer. Cancer Metastasis Rev 29, 73-93 (2010). 49. Network, T.C.G.A. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061-8 (2008). 50. Network, T.C.G.A. Integrated genomic analyses of ovarian carcinoma. Nature 474, 609-15 (2011). 51. Network, T.C.G.A. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330-7 (2012). 52. Network, T.C.G.A. Comprehensive molecular portraits of human breast tumours. Nature 490, 61-70 (2012). 53. Bibikova, M. et al. Genome-wide DNA methylation profiling using Infinium(R) assay. Epigenomics 1, 177-200 (2009). 54. Du, P., Kibbe, W.A. & Lin, S.M. lumi: a pipeline for processing Illumina microarray. Bioinformatics 24, 1547-8 (2008). 55. Du, P. et al. Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics 11, 587 (2010). 56. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-60 (2009). 57. Tang, Y.C. & Amon, A. Gene copy-number alterations: a cost-benefit analysis. Cell 152, 394-405 (2013). 58. Gordon, D.J., Resio, B. & Pellman, D. Causes and consequences of aneuploidy in cancer. Nat Rev Genet 13, 189-203 (2012).  13359. Lockwood, W.W. et al. DNA amplification is a ubiquitous mechanism of oncogene activation in lung and other cancers. Oncogene 27, 4615-24 (2008). 60. Feder, M. et al. Clinical relevance of chromosome abnormalities in non-small cell lung cancer. Cancer Genet Cytogenet 102, 25-31 (1998). 61. Lockwood, W.W., Chari, R., Chi, B. & Lam, W.L. Recent advances in array comparative genomic hybridization technologies and their applications in human genetics. Eur J Hum Genet 14, 139-48 (2006). 62. Pegram, M.D., Konecny, G. & Slamon, D.J. The molecular and cellular biology of HER2/neu gene amplification/overexpression and the clinical development of herceptin (trastuzumab) therapy for breast cancer. Cancer Treat Res 103, 57-75 (2000). 63. Pollack, J.R. et al. Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc Natl Acad Sci U S A 99, 12963-8 (2002). 64. Slamon, D.J. & Clark, G.M. Amplification of c-erbB-2 and aggressive human breast tumors? Science 240, 1795-8 (1988). 65. Stuart, D. & Sellers, W.R. Linking somatic genetic alterations in cancer to therapeutics. Curr Opin Cell Biol 21, 304-10 (2009). 66. Albertson, D.G., Collins, C., McCormick, F. & Gray, J.W. Chromosome aberrations in solid tumors. Nat Genet 34, 369-76 (2003). 67. Lockwood, W.W. et al. Divergent genomic and epigenomic landscapes of lung cancer subtypes underscore the selection of different oncogenic pathways during tumor development. PLoS One 7, e37775 (2012). 68. Thu, K.L. et al. Lung adenocarcinoma of never smokers and smokers harbor differential regions of genetic alteration and exhibit different levels of genomic instability. PLoS One 7, e33003 (2012). 69. Rhead, B. et al. The UCSC Genome Browser database: update 2010. Nucleic Acids Res 38, D613-9 (2010). 70. Coe, B.P. et al. Differential disruption of cell cycle pathways in small cell and non-small cell lung cancer. Br J Cancer 94, 1927-35 (2006).  13471. Beroukhim, R. et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc Natl Acad Sci U S A 104, 20007-12 (2007). 72. Chitale, D. et al. An integrated genomic analysis of lung cancer reveals loss of DUSP4 in EGFR-mutant tumors. Oncogene 28, 2773-83 (2009). 73. Coe, B.P., Chari, R., MacAulay, C. & Lam, W.L. FACADE: a fast and sensitive algorithm for the segmentation and calling of high resolution array CGH data. Nucleic Acids Res 38, e157 (2010). 74. Broet, P., Tan, P., Alifano, M., Camilleri-Broet, S. & Richardson, S. Finding exclusively deleted or amplified genomic areas in lung adenocarcinomas using a novel chromosomal pattern analysis. BMC Med Genomics 2, 43 (2009). 75. Newnham, G.M. et al. Integrated mutation, copy number and expression profiling in resectable non-small cell lung cancer. BMC Cancer 11, 93 (2011). 76. Soh, J. et al. Oncogene mutations, copy number gains and mutant allele specific imbalance (MASI) frequently occur together in tumor cells. PLoS One 4, e7464 (2009). 77. Huang, Y.T. et al. Cigarette smoking increases copy number alterations in nonsmall-cell lung cancer. Proc Natl Acad Sci U S A 108, 16345-50 (2011). 78. Campbell, J.M. et al. Integrative genomic and gene expression analysis of chromosome 7 identified novel oncogene loci in non-small cell lung cancer. Genome 51, 1032-9 (2008). 79. Bowtell, D.D. The genesis and evolution of high-grade serous ovarian cancer. Nat Rev Cancer 10, 803-8 (2010). 80. Portela, A. & Esteller, M. Epigenetic modifications and human disease. Nat Biotechnol 28, 1057-68 (2010). 81. Irizarry, R.A. et al. The human colon cancer methylome shows similar hypo- and hypermethylation at conserved tissue-specific CpG island shores. Nat Genet 41, 178-86 (2009). 82. Laird, P.W. The power and the promise of DNA methylation markers. Nat Rev Cancer 3, 253-66 (2003).  13583. Heller, G., Zielinski, C.C. & Zochbauer-Muller, S. Lung cancer: from single-gene methylation to methylome profiling. Cancer Metastasis Rev 29, 95-107 (2010). 84. Baylin, S.B. & Jones, P.A. A decade of exploring the cancer epigenome - biological and translational implications. Nat Rev Cancer 11, 726-34 (2011). 85. Kypriotou, M., Huber, M. & Hohl, D. The human epidermal differentiation complex: cornified envelope precursors, S100 proteins and the 'fused genes' family. Exp Dermatol 21, 643-9 (2012). 86. Anglim, P.P., Alonzo, T.A. & Laird-Offringa, I.A. DNA methylation-based biomarkers for early detection of non-small cell lung cancer: an update. Mol Cancer 7, 81 (2008). 87. Selamat, S.A. et al. Genome-scale analysis of DNA methylation in lung adenocarcinoma and integration with mRNA expression. Genome Res 22, 1197-211 (2012). 88. Divine, K.K. et al. Multiplicity of abnormal promoter methylation in lung adenocarcinomas from smokers and never smokers. Int J Cancer 114, 400-5 (2005). 89. Liu, Y., Lan, Q., Siegfried, J.M., Luketich, J.D. & Keohavong, P. Aberrant promoter methylation of p16 and MGMT genes in lung tumors from smoking and never-smoking lung cancer patients. Neoplasia 8, 46-51 (2006). 90. Scesnaite, A. et al. Similar DNA methylation pattern in lung tumours from smokers and never-smokers with second-hand tobacco smoke exposure. Mutagenesis 27, 423-9 (2012). 91. Toyooka, S. et al. Smoke exposure, histologic type and geography-related differences in the methylation profiles of non-small cell lung cancer. Int J Cancer 103, 153-60 (2003). 92. Gu, J. et al. Aberrant promoter methylation profile and association with survival in patients with non-small cell lung cancer. Clin Cancer Res 12, 7329-38 (2006). 93. Damiani, L.A. et al. Carcinogen-induced gene promoter hypermethylation is mediated by DNMT1 and causal for transformation of immortalized bronchial epithelial cells. Cancer Res 68, 9005-14 (2008).  13694. Lin, R.K. et al. The tobacco-specific carcinogen NNK induces DNA methyltransferase 1 accumulation and tumor suppressor gene hypermethylation in mice and lung cancer patients. J Clin Invest 120, 521-32 (2010). 95. Shah, N. & Sukumar, S. The Hox genes and their roles in oncogenesis. Nat Rev Cancer 10, 361-71 (2010). 96. Kim, D.S. et al. Epigenetic inactivation of Homeobox A5 gene in nonsmall cell lung cancer and its relationship with clinicopathological features. Mol Carcinog 48, 1109-15 (2009). 97. Higgins, M.E., Claremont, M., Major, J.E., Sander, C. & Lash, A.E. CancerGenes: a gene selection resource for cancer genome projects. Nucleic Acids Res 35, D721-6 (2007). 98. Barker, N. & Clevers, H. Mining the Wnt pathway for cancer therapeutics. Nat Rev Drug Discov 5, 997-1014 (2006). 99. MacDonald, B.T., Tamai, K. & He, X. Wnt/beta-catenin signaling: components, mechanisms, and diseases. Dev Cell 17, 9-26 (2009). 100. Hansen, K.D. et al. Increased methylation variation in epigenetic domains across cancer types. Nat Genet 43, 768-75 (2011). 101. Bartel, D.P. MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116, 281-97 (2004). 102. Baer, C., Claus, R. & Plass, C. Genome-wide epigenetic regulation of miRNAs in cancer. Cancer Res 73, 473-7 (2013). 103. Nana-Sinkam, S.P. & Croce, C.M. Clinical applications for microRNAs in cancer. Clin Pharmacol Ther 93, 98-104 (2013). 104. Enfield, K.S., Pikor, L.A., Martinez, V.D. & Lam, W.L. Mechanistic Roles of Noncoding RNAs in Lung Cancer Biology and Their Clinical Implications. Genet Res Int 2012, 737416 (2012). 105. De Flora, S. et al. Smoke-induced microRNA and related proteome alterations. Modulation by chemopreventive agents. Int J Cancer 131, 2763-73 (2012). 106. Sturn, A., Quackenbush, J. & Trajanoski, Z. Genesis: cluster analysis of microarray data. Bioinformatics 18, 207-8 (2002).  137107. Hsu, S.D. et al. miRTarBase: a database curates experimentally validated microRNA-target interactions. Nucleic Acids Res 39, D163-9 (2011). 108. Chari, R. et al. Effect of active smoking on the human bronchial epithelium transcriptome. BMC Genomics 8, 297 (2007). 109. Bosse, Y. et al. Molecular signature of smoking in human lung tissues. Cancer Res 72, 3753-63 (2012). 110. Wang, D. et al. Human microRNA oncogenes and tumor suppressors show significantly different biological patterns: from functions to targets. PLoS One 5(2010). 111. Perdomo, C., Spira, A. & Schembri, F. MiRNAs as regulators of the response to inhaled environmental toxins and airway carcinogenesis. Mutat Res 717, 32-7 (2011). 112. Russ, R. & Slack, F.J. Cigarette-Smoke-Induced Dysregulation of MicroRNA Expression and Its Role in Lung Carcinogenesis. Pulm Med 2012, 791234 (2012). 113. Izzotti, A. et al. Downregulation of microRNA expression in the lungs of rats exposed to cigarette smoke. FASEB J 23, 806-12 (2009). 114. Izzotti, A., Calin, G.A., Steele, V.E., Croce, C.M. & De Flora, S. Relationships of microRNA expression in mouse lung with age and exposure to cigarette smoke and light. FASEB J 23, 3243-50 (2009). 115. Izzotti, A. et al. Modulation of microRNA expression by budesonide, phenethyl isothiocyanate and cigarette smoke in mouse liver and lung. Carcinogenesis 31, 894-901 (2010). 116. Izzotti, A. et al. Dose-responsiveness and persistence of microRNA expression alterations induced by cigarette smoke in mouse lung. Mutat Res 717, 9-16 (2011). 117. Mackenzie, T.N. et al. Triptolide Induces the Expression of miR-142-3p: a Negative Regulator of Heat Shock Protein 70 and Pancreatic Cancer Cell Proliferation. Mol Cancer Ther (2013). 118. Shen, H. & Laird, P.W. Interplay between the Cancer Genome and Epigenome. Cell 153, 38-55 (2013). 119. Lv, M. et al. An oncogenic role of miR-142-3p in human T-cell acute lymphoblastic leukemia (T-ALL) by targeting glucocorticoid receptor-alpha and cAMP/PKA pathways. Leukemia 26, 769-77 (2012).  138120. Kaduthanam, S. et al. Serum miR-142-3p is associated with early relapse in operable lung adenocarcinoma patients. Lung Cancer 80, 223-7 (2013). 121. Pastrello, C., Polesel, J., Della Puppa, L., Viel, A. & Maestro, R. Association between hsa-mir-146a genotype and tumor age-of-onset in BRCA1/BRCA2-negative familial breast and ovarian cancer patients. Carcinogenesis 31, 2124-6 (2010). 122. Guan, P., Yin, Z., Li, X., Wu, W. & Zhou, B. Meta-analysis of human lung cancer microRNA expression profiling studies comparing cancer tissues with normal tissues. J Exp Clin Cancer Res 31, 54 (2012). 123. Du, L. & Pertsemlidis, A. microRNAs and lung cancer: tumors and 22-mers. Cancer Metastasis Rev 29, 109-22 (2010). 124. Yi, B., Piazza, G.A., Su, X. & Xi, Y. MicroRNA and Cancer Chemoprevention. Cancer Prev Res (Phila) 6, 401-9 (2013). 125. Izzotti, A. et al. Chemoprevention of cigarette smoke-induced alterations of MicroRNA expression in rat lungs. Cancer Prev Res (Phila) 3, 62-72 (2010). 126. Dutu, T. et al. Differential expression of biomarkers in lung adenocarcinoma: a comparative study between smokers and never-smokers. Ann Oncol 16, 1906-14 (2005). 127. Miura, K. et al. Laser capture microdissection and microarray expression analysis of lung adenocarcinoma reveals tobacco smoking- and prognosis-related molecular profiles. Cancer Res 62, 3244-50 (2002). 128. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346-52 (2012). 129. Dees, N.D. et al. MuSiC: identifying mutational significance in cancer genomes. Genome Res 22, 1589-98 (2012). 130. Gevaert, O. & Plevritis, S. Identifying master regulators of cancer and their downstream targets by integrating genomic and epigenomic features. Pac Symp Biocomput, 123-34 (2013). 131. Louhimo, R., Lepikhova, T., Monni, O. & Hautaniemi, S. Comparative analysis of algorithms for integration of copy number and expression data. Nat Methods 9, 351-5 (2012).  139132. Youn, A. & Simon, R. Identifying cancer driver genes in tumor genome sequencing studies. Bioinformatics 27, 175-81 (2011). 133. Zhang, S. et al. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Res 40, 9379-91 (2012). 134. Eroles, P., Bosch, A., Perez-Fidalgo, J.A. & Lluch, A. Molecular biology in breast cancer: intrinsic subtypes and signaling pathways. Cancer Treat Rev 38, 698-707 (2012). 135. Berschneider, B. & Konigshoff, M. WNT1 inducible signaling pathway protein 1 (WISP1): a novel mediator linking development and disease. Int J Biochem Cell Biol 43, 306-9 (2011). 136. Bayram, S., Akkiz, H., Bekar, A., Akgollu, E. & Yildirim, S. The significance of Exonuclease 1 K589E polymorphism on hepatocellular carcinoma susceptibility in the Turkish population: a case-control study. Mol Biol Rep 39, 5943-51 (2012). 137. Luo, X., Hong, X.S., Xiong, X.D., Zeng, L.Q. & Lim, C.E. A single nucleotide polymorphism in EXO1 gene is associated with cervical cancer susceptibility in Chinese patients. Int J Gynecol Cancer 22, 220-5 (2012). 138. Song, F. et al. Exonuclease 1 (EXO1) gene variation and melanoma risk. DNA Repair (Amst) 11, 304-9 (2012). 139. Tran, P.T., Erdeniz, N., Symington, L.S. & Liskay, R.M. EXO1-A multi-tasking eukaryotic nuclease. DNA Repair (Amst) 3, 1549-59 (2004). 140. Lee, M.H. & Surh, Y.J. eEF1A2 as a putative oncogene. Ann N Y Acad Sci 1171, 87-93 (2009). 141. Dhulipal, P.D. Ets oncogene family. Indian J Exp Biol 35, 315-22 (1997). 142. Futreal, P.A. et al. A census of human cancer genes. Nat Rev Cancer 4, 177-83 (2004). 143. Lee, S.Y. et al. FAM83A confers EGFR-TKI resistance in breast cancer cells and in mice. J Clin Invest 122, 3211-20 (2012). 144. Li, Y. et al. BJ-TSA-9, a novel human tumor-specific gene, has potential as a biomarker of lung cancer. Neoplasia 7, 1073-80 (2005). 145. Liu, L. et al. Detection of circulating cancer cells in lung cancer patients with a panel of marker genes. Biochem Biophys Res Commun 372, 756-60 (2008).  140146. Yin, D. et al. SOX17 methylation inhibits its antagonism of Wnt signaling pathway in lung cancer. Discov Med 14, 33-40 (2012). 147. Fu, D.Y. et al. Sox17, the canonical Wnt antagonist, is epigenetically inactivated by promoter methylation in human breast cancer. Breast Cancer Res Treat 119, 601-12 (2010). 148. Zhang, W. et al. Epigenetic inactivation of the canonical Wnt antagonist SRY-box containing gene 17 in colorectal cancer. Cancer Res 68, 2764-72 (2008). 149. Brait, M. et al. Cysteine dioxygenase 1 is a tumor suppressor gene silenced by promoter methylation in multiple human cancers. PLoS One 7, e44951 (2012). 150. Vendramini-Costa, D.B. & Carvalho, J.E. Molecular link mechanisms between inflammation and cancer. Curr Pharm Des 18, 3831-52 (2012). 151. Soini, Y. Tight junctions in lung cancer and lung metastasis: a review. Int J Clin Exp Pathol 5, 126-36 (2012). 152. Genander, M. & Frisen, J. Ephrins and Eph receptors in stem cells and cancer. Curr Opin Cell Biol 22, 611-6 (2010). 153. Patsos, H.A. et al. The endogenous cannabinoid, anandamide, induces cell death in colorectal carcinoma cells: a possible role for cyclooxygenase 2. Gut 54, 1741-50 (2005). 154. De Petrocellis, L. et al. The endogenous cannabinoid anandamide inhibits human breast cancer cell proliferation. Proc Natl Acad Sci U S A 95, 8375-80 (1998). 155. Mehlen, P., Delloye-Bourgeois, C. & Chedotal, A. Novel roles for Slits and netrins: axon guidance cues as anticancer targets? Nat Rev Cancer 11, 188-97 (2011). 156. Biankin, A.V. et al. Pancreatic cancer genomes reveal aberrations in axon guidance pathway genes. Nature 491, 399-405 (2012). 157. Milara, J. & Cortijo, J. Tobacco, inflammation, and respiratory tract cancer. Curr Pharm Des 18, 3901-38 (2012). 158. Houghton, A.M. Mechanistic links between COPD and lung cancer. Nat Rev Cancer 13, 233-45 (2013). 159. Jadus, M.R. et al. Lung cancer: a classic example of tumor escape and progression while providing opportunities for immunological intervention. Clin Dev Immunol 2012, 160724 (2012).  141160. Dasanu, C.A., Sethi, N. & Ahmed, N. Immune alterations and emerging immunotherapeutic approaches in lung cancer. Expert Opin Biol Ther 12, 923-37 (2012). 161. Wang, D. & Dubois, R.N. Eicosanoids and cancer. Nat Rev Cancer 10, 181-93 (2010). 162. Thomas, A. & Hassan, R. Immunotherapies for non-small-cell lung cancer and mesothelioma. Lancet Oncol 13, e301-10 (2012). 163. Penning, T.M. & Byrns, M.C. Steroid hormone transforming aldo-keto reductases and cancer. Ann N Y Acad Sci 1155, 33-42 (2009). 164. Stein, T. et al. Loss of reelin expression in breast cancer is epigenetically controlled and associated with poor prognosis. Am J Pathol 177, 2323-33 (2010). 165. Yuan, Y., Chen, H., Ma, G., Cao, X. & Liu, Z. Reelin is involved in transforming growth factor-beta1-induced cell migration in esophageal carcinoma cells. PLoS One 7, e31802 (2012). 166. Skotheim, R.I. et al. [Genome sequencing for personalized cancer treatment]. Tidsskr Nor Laegeforen 132, 2406-8 (2012). 167. Takeuchi, T. et al. Expression profile-defined classification of lung adenocarcinoma shows close relationship with underlying major genetic changes and clinicopathologic behaviors. J Clin Oncol 24, 1679-88 (2006). 168. Lam, D.C. et al. Establishment and expression profiling of new lung cancer cell lines from Chinese smokers and lifetime never-smokers. J Thorac Oncol 1, 932-42 (2006). 169. Levy, M.A., Lovly, C.M. & Pao, W. Translating genomic information into clinical medicine: lung cancer as a paradigm. Genome Res 22, 2101-8 (2012). 170. Eberhard, D.A. et al. Mutations in the epidermal growth factor receptor and in KRAS are predictive and prognostic indicators in patients with non-small-cell lung cancer treated with chemotherapy alone and in combination with erlotinib. J Clin Oncol 23, 5900-9 (2005). 171. Gazdar, A.F., Girard, L., Lockwood, W.W., Lam, W.L. & Minna, J.D. Lung cancer cell lines as tools for biomedical discovery and research. J Natl Cancer Inst 102, 1310-21 (2010).  142172. Ramirez, R.D. et al. Immortalization of human bronchial epithelial cells in the absence of viral oncoproteins. Cancer Res 64, 9027-34 (2004). 173. Kwon, M.C. & Berns, A. Mouse models for lung cancer. Mol Oncol 7, 165-77 (2013). 174. Arai, Y. et al. Mouse model for ROS1-rearranged lung cancer. PLoS One 8, e56010 (2013). 175. Soda, M. et al. A mouse model for EML4-ALK-positive lung cancer. Proc Natl Acad Sci U S A 105, 19893-7 (2008). 176. Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N Engl J Med 366, 883-92 (2012). 177. Irish, J.M., Kotecha, N. & Nolan, G.P. Mapping normal and cancer cell signalling networks: towards single-cell proteomics. Nat Rev Cancer 6, 146-55 (2006). 178. Kornblau, S.M. et al. Dynamic single-cell network profiles in acute myelogenous leukemia are associated with patient response to standard induction therapy. Clin Cancer Res 16, 3721-33 (2010). 179. Anastas, J.N. & Moon, R.T. WNT signalling pathways as therapeutic targets in cancer. Nat Rev Cancer 13, 11-26 (2013). 180. Gehrke, I., Gandhirajan, R.K. & Kreuzer, K.A. Targeting the WNT/beta-catenin/TCF/LEF1 axis in solid and haematological cancers: Multiplicity of therapeutic options. Eur J Cancer 45, 2759-67 (2009).        143Appendix  This appendix lists all of the publications, that were either published, accepted, currently in submission, or prepared for submission that I was part of during my degree. Co-first authorships are underlined.  1. Thu KL, Vucic EA, Kennett JY, Heryet C, Brown CJ, Lam WL, Wilson IM (2009) Methylated DNA immunoprecipitation.  Journal of Visualized Experiments 23. pii: 935. 2. Lam K, Thu K, Moore M, Gries G (2009) Bacteria on house fly eggs, Musca domestica, suppress fungal growth in chicken manure through nutrient depletion or antifungal metabolites.  Naturwissenschaften. 96(9): 1127-1132. 3. Thu KL, Lam WL, Coe BP. (2009) The emerging role of copy number variation in cancer. Cell Science.  6(2): 98, 1-23. 4. Thu KL, Pikor L, Kennett J, Alvarez C, Lam WL (2010) Methylation analysis by DNA immunoprecipitation.  Journal of Cellular Physiology.  222(3): 522-31. 5. Chari R, Thu KL, Wilson IM, Lockwood WW, Lonergan KM, Coe BP, Malloff CA, Gazdar AF, Lam S, Garnis C, MacAulay CE, Alvarez CE, Lam WL (2010) Integrating the multiple dimensions of genomic and epigenomic landscapes of cancer.  Cancer and Metastasis Reviews. 29(1):73-93. 6. Vucic EA, Thu KL, Williams AC, Lam WL, Coe BP (2010) Copy number variations in the human genome and strategies for analysis.  Methods in Molecular Biology. 628: 103-117. 7. Lockwood WW, Chari R, Coe BP, Thu KL, Garnis C, Malloff CA, Campbell J, Williams AC, Hwang D, Buys TPH, Yee J, English JC, MacAulay C, Tsao MS, Gazdar AF, Minna JD, Lam S, Lam WL (2010) Integrative genomic analyses identify BRF2 as a novel lineage-specific oncogene in lung squamous cell carcinoma. PLoS Medicine 7(7): e1000315, 1-14. 8. Thu KL, Chari R, Lockwood WW, Lam S, Lam WL (2011) miR-101 DNA copy loss is a prominent subtype specific event in lung lancer.  Journal of Thoracic Oncology. 6: 1594-1598.  1449. Thu KL, Pikor LA, Chari R, Wilson IM, MacAulay CE, English JC, Tsao MS, Gazdar AF, Lam S, Lam WL, Lockwood WW (2011) Genetic disruption of KEAP1/CUL3 E3 ubiquitin ligase complex components is a key mechanism for NF-kappaB pathway activation in lung cancer. Journal of Thoracic Oncology. 6:1521-1529. 10. Vucic EA, Thu KL, Robison K, Rybaczyk LA, Chari R, Alvarez CE, Lam WL (2012) Translating cancer 'omics' to improved outcomes. Genome Research. 22:188-195. 11. Thu KL, Vucic EA, Chari R, Zhang W, Lockwood WW, English JC, Fu R, Wang P, Feng Z, MacAulay CE, Gazdar AF, Lam S, Lam WL (2012).  Lung adenocarcinomas of never smokers and smokers are genomically distinct.  PLoS ONE. 7(3):e33003, 1-10.   This publication is included as Chapter 3 of this thesis. 12. Lockwood WW, Thu KL, Lin L, Pikor LA, Chari R, Lam WL, Beer DG (2012) Integrative genomics identified RFC3 as an amplified candidate oncogene in esophageal adenocarcinoma. Clinical Cancer Research. 18(7):1936-1946. 13. Lockwood WW, Wilson IM, Coe BP, Chari R, Pikor LA, Thu KL, Yee J, English J, Murray N, Tsao MS, Minna J, Gazdar AF, MacAulay CE, Lam S, Lam WL (2012).  Divergent genomic and epigenomic landscapes of lung cancer subtypes underscore the selection of different oncogenic pathways during tumor development. PLoS ONE. 7(5):e37775, 1-18. 14. Thu KL, Radulovich N, Becker-Santos DD, Pikor LA, Pusic A, Lockwood WW, Lam WL, Tsao MS (2013)  SOX15 is a candidate tumor suppressor in pancreatic cancer with a potential role in Wnt/?-catenin signaling.  Oncogene. 2013 Jan 14. doi: 10.1038/ onc.2012.595. [Epub ahead of print] 15. Shien K, Toyooka S, Yamamoto H, Soh J, Jida M, Thu KL, Hashida S, Maki Y, Ichihara E, Asano H, Tsukuda K, Takigawa N , Kiura K, Gazdar AF, Lam WL, Miyoshi S (2013) Acquired resistance to EGFR inhibitors is associated with manifestation of stem cell-like properties in cancer cells. Cancer Research. 73(10):3051-61. 16. Pikor LA, Thu KL, Vucic EA, Lam WL (2013). The detection and implication of genome instability in cancer.  Cancer and Metastasis Reviews. 2013 May 1. [Epub ahead of print]  14517. Hubaux R, Thu KL, Coe BP, MacAulay C, Lam S, Lam WL (2013). EZH2 promotes E2F driven SCLC tumorigenesis through modulation of apoptosis and cell cycle regulation. Journal of Thoracic Oncology. 8(8):1102-6. 18. Hubaux R, Thu KL, Lam WL (2013) Response to:  Do mutations of the enhancer of zeste homolog 2 gene exist in small-cell lung cancer? Submitted. 19. Wilson IM, Vucic EA, Enfield KSS, Zhang YA, Chari R, Thu KL, Lockwood WW, Radulovich N, Starczynowski D, Banath JP, Zhang M, Pusic A, Fuller M, Lonergan KM, Yee J, English JC, Buys TPH, Selamat SA, Laird-Offringa IA, Liu P, Anderson M, You M, Tsao MS, Brown CJ, Bennewith KL, MacAulay CE, Karson A, Gazdar AF, Lam S, Lam WL (2013) EYA4 is inactivated biallelically at a high frequency in sporadic lung cancer and is associated with familial lung cancer risk. Oncogene. In press. 20. Coe BP, Thu KL, Aviel-Ronen S, Vucic EA, Gazdar AF, Lam S, Tsao MS, Lam WL (2013). Genomic deregulation of the E2F/Rb pathway leads to activation of the oncogene EZH2 in small cell lung cancer. PLOS ONE. 8(8):e71670. 21. Tam KW, Zhang W, Soh J, Stastny V, Chen M, Sun H, Thu K, Rios JJ, Yang C, Marconett CN, Selamat SA, Laird-Offringa IA, Taguchi A, Hanash S, Shames D, Ma X, Zhang MQ, Lam WL, Gazdar AF, (2013) CDKN2A/p16 inactivation mechanisms and their relationship to smoke exposure and molecular features in non-small cell lung cancer. Journal of Thoracic Oncology. In press. 22. Vucic EA, Chari R, Wilson IM, Cotton AM, Thu KL, Kennett JY, Zhang M, Lonergan KM, Steiling K, Brown CJ, McWilliams A, Ohtani K, Lenburg ME, Sin DD, Spira A, MacAulay CE, Lam S, Lam WL (2013) DNA methylation is globally disrupted and associated with expression changes in COPD small airways. Submitted. 23. Pikor LA, Lockwood WW, Thu KL, Vucic EA, Chari R, Lam S, Gazdar AF, Lam WL (2013) Integrative analysis identifies YEATS4 as a novel oncogene in NSCLC that regulates the p53 pathway. Submitted. 24. Vucic EA, Thu KL, Pikor LA, Enfield KS, MacAulay CE, Jurisica I, Lam S, Lam WL (2013). Lung cancer from smoker and non-smoker patients show distinct disruption patterns of miRNA mediated gene networks. Submitted.  A version of this manuscript is included as Chapter 5 of this thesis.  14625. Mosslemi M, Thu KL, Vucic EA, Pikor LA, Ng RT, MacAulay CE, Lam WL (2013). Development of a Multi-dimensional Integrative Tumor gene Ranking Algorithm (MITRA) for the identification of candidate genes in cancer. Submitted.  Part of this manuscript is included as Chapter 6 of this thesis. 26. Becker-Santos DD, Thu KL, Pikor LA, Vucic EA, MacAulay CE, Jurisica I, Robinson WP, Lam S, Lam WL (2013). miRNA expression in human lung cancer and fetal lung: a comparative study. In preparation. 27. Poon AH, Madore AM, Vucic EA, Chouiali F, Thu KL, Lam S, Hamid Q, Lam WL, Laprise C (2013). Genetic risk factors of lung cancer contributed to asthma risk. Submitted. 28. Martinez VD, Thu KL, Vucic EA, Hubaux R, Adonis M, Gil L, MacAulay C, Lam S, Lam WL (2013). Whole genome sequencing analysis identifies a distinctive mutational spectrum in an arsenic-related lung tumor. Journal of Thoracic Oncology. In press. 29. Martinez VD, Vucic EA, Thu KL, Pikor LA, Lam S, Lam WL (2013). Disruption of KEAP1/CUL3/RBX1 E3-ubiquitin ligase complex components by multiple genetic mechanisms is associated with poor prognosis in head and neck cancer. Submitted. 30. Thu KL, Mosslemi M, Vucic EA, Pikor LA, Zhang W, Selamat S, Laird-Offringa I, Gazdar AF, Ng RT, English JC , Macaulay CE, Lam S, Lam WL (2013) Integrative genomic analysis reveals common and differential pathway disruption in current and never smoker lung adenocarcinoma. In preparation.  Part of this manuscript is included as Chapter 6 of this thesis. 31. Mosslemi M, Martinez VD, Thu KL, Lonergan KM, Lam WL (2013) Integrative multidimensional analysis of cancer genomes. In preparation. 32. Martinez VD, Vucic EA, Pikor LA, Thu KL, Hubaux R, Lam WL (2013) Frequent concerted genetic mechanisms disrupt multiple components of the NRF2 inhibitor KEAP1/CUL3/RBX1 E3-ubiquitin ligase complex in thyroid cancer. Submitted. 33. Martinez VD, Vucic EA, Thu KL, Pikor LA, Lam WL (2013). High frequency of DNA-level alterations affecting NRF2 inhibitory complex in ovarian cancer. Submitted.   

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0166807/manifest

Comment

Related Items