UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Mining of differential expression across thousands of conditions Lim, Nathaniel Zhin-Loong

Abstract

Differential expression (DE) analysis is performed to identify genes associated to a phenotype based on changes in RNA expression levels. The result of various bioinformatics analyses is a hit list of genes that requires further interpretation to identify the functions of these genes and prioritize the genes for further study; there is currently a lack of objective metrics for gene prioritization. The ease of generating transcriptomic data has resulted in the accumulation of massive amounts of data in repositories (“NCBI GEO”). In my thesis, I investigate means of harnessing this archived data for interpreting hit lists. First, I describe the development of Gemma, a large corpus containing over 10,000 curated and reprocessed datasets made suitable for data mining. I contributed by establishing the curation guidelines of using ontology concepts during dataset annotation, and characterizing Gemma’s features. Next, I describe the evaluation of Connectivity Map (CMap), a hit list interpretation framework designed for in silico repositioning of previously approved drugs for treating human diseases. Through a series of analyses, I demonstrated that drug repositioning results between two versions of CMap are discordant, and is caused by low reproducibility of DE profiles both between and within each CMap. This demonstrates the importance of high-quality data and careful evaluation of hit list interpretation frameworks. Finally, in a collaboration, we showed that there are huge differences in how often genes are differentially expressed (“DE prior”) across a large corpus of human datasets. We proposed that the prior could be used to facilitate hit list interpretation, identifying genes that are more specifically DE in a studied phenotype. I expanded this work further by examining variables that may influence the DE prior such as microarray platform gene coverage; I found the DE prior robust to these variables. I also demonstrate that given enough data, context (e.g. tissue) or topic specific DE priors can be developed for topic-specific applications. My work contributes to our knowledge of patterns of gene differential expression and their utility in addressing questions related to gene function in human health and disease.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International