UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Identification and exploration of gene product annotation instability and its impact on current usages Sedeño Cortés, Adriana Estela

Abstract

Proteins are macromolecules responsible for a wide range of activities in the structure and function of cells. Their activities have been described in different contexts as a mean to elucidate their ``function". These descriptions have been captured across biological databases in a standardized format called Gene Ontology Annotations (GOA), to disseminate the knowledge and extrapolate the information to other proteins whose function is still unknown. Furthermore, the annotations are used to analyse and interpret data from high-throughput studies and also as a benchmark for the assessment of protein function prediction algorithms. Constant changes occur in GOA that can potentially impact such usages, but only limited effort has been put into exploring their instability, or to assess the impact that these changes have on reproducibility or interpretation of previous analyses. In the present work, I performed the most comprehensive analysis of the annotation instability for 14 representative model organisms (E.coli, fruit fly, Mouse, etc.). The results showed important instability patterns that were species-specific. As such information would be of use to the community to trace the instability of annotations of their interest, a web-based visualization tool was built to track these changes on a protein, functional term and species specific basis. Additionally, we identified artifacts on the annotation data that can be attributed to curation patterns. We propose such artifacts to be considered for a more accurate assessment of function prediction algorithms. Furthermore, the impact that changes in the annotations have on common settings like gene set enrichment analyses was also explored. In particular, 2,000 datasets were used to assess the robustness of enrichment results over time. On average, the results would display a 60% similarity after only 2 years. However, cases were found were the similarity will drop 80% within the same year, demonstrating the impact that the instability has on such applications. In conclusion, the results of this work will prove useful for those who use the annotations to interpret their studies to assess their reliability on a case-by-case scenario.

Item Citations and Data

Rights

Attribution-NonCommercial 2.5 Canada