UBC Theses and Dissertations
Advancing the serial analysis of gene expression technique and its application to the study of the development of squamous cell lung cancer Zuyderduyn, Scott Dorjan
Lung cancer is one of the most common and deadliest forms of cancer. Squamous cell lung carcinomas (SCC), a common lung cancer subtype, feature a series of identifiable premalignant and early malignant forms that progress sequentially into full-blown tumours. This thesis describes a sophisticated and statistically rigorous analysis of global gene expression profiles taken from samples of several key stages of progression. This dataset was generated using serial analysis of gene expression (SAGE), a powerful transcriptome profiling technique that captures small sequence tags from each transcript in an mRNA population. These tags can then be counted and mapped back to a matching transcript sequence to quantitatively determine the expression of a given gene. The analysis identified several genes which show changes in expression that are highly correlated with the progressive steps of SCC. In addition, gene expression changes were identified in samples of bronchial epithelium that correspond to an acute response to tobacco smoke exposure, a major contributor to SCC development. The use of multiple sample types, the presence of extensive cellular heterogeneity, and the rarity of biological material for the purpose of validation introduced an additional layer of complexity that are not well-suited to conventional methods of SAGE analysis. To address these challenges, this thesis describes the development of two methodological improvements to SAGE data analysis. The first describes a computational strategy to identify additional sequence information that effectively increases the length of SAGE tag sequences, greatly enhancing the fidelity of tag to gene mapping. The second describes a new statistical method that shows improved performance in modelling SAGE data. The Poisson mixture model used in this work provides better estimates of statistical significance, is highly effective when using multiple sample types, and is a flexible framework for more complex meta-analyses.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International