UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Large-scale mining of differential expression data for insight into gene function Sicherman, Jordan


A persistent challenge in genetics and genomics is the interpretation of “hit lists” of genes, leading to the development of, and almost universal application of methods such as Gene Ontology (GO) enrichment analysis. While these methods have been of unquestionable utility, GO enrichment and similar approaches based on gene annotations leave much to be desired and they are often used as a “sanity check” rather than a way to make discoveries. To offer a complementary perspective with the potential to remedy some existing challenges, I developed and evaluated an algorithm that helps put hit lists of genes into biological context by performing large-scale mining on patterns of differential expression (DE). In this work, I present the development and evaluation of my algorithm which mines over 10,000 transcriptomic datasets in a process we term “condition enrichment”. The output of the algorithm is a list of biological condition comparisons (drug treatments, diseases, etc.) scored according to their relatedness (in terms of DE) to the query genes. I show that performing searches on gene sets of a priori interest enables my algorithm to effectively identify known gene-condition relationships in real and simulated data, providing a useful summary of the condition comparisons most highly associated with the differential expression of the gene set. Finally, I present a powerful open-source web application to provide researchers access to Gemma DE, in the hope that it will aid future research.

Item Citations and Data


Attribution-NonCommercial-ShareAlike 4.0 International