UBC Theses and Dissertations
Evaluating and improving the accuracy of computational gene-finding on mammalian DNA sequences Rogic, Sanja
This thesis presents work in one of the main research areas in Computational Biology: computational gene-finding in higher eukaryotic genomic DNA. Programs for identification of gene structures have been in existence for more than a decade, but today they are used more extensively than ever to analyze the enormous amount of sequence data coming from various genome sequencing projects. Consequently, their impact on research in the area of genomics and beyond is substantial. The thesis has two distinguishable parts: the first presents an evaluation and comprehensive analysis of the current generation of gene-finding programs. For this purpose a new, thoroughly filtered and biologically validated test dataset of genomic sequences was assembled. The basic prediction accuracy of the programs tested was calculated and the relationships between various sequence and prediction features and programs' accuracy were analyzed. The second part of the thesis presents the development and results of methods for combination of the predictions from two gene-finding programs. Three methods were developed, each having some advantages over the other two, and each of them offering higher prediction accuracy on the test dataset than any gene-finding program currently available.
Item Citations and Data