Evaluating and improving the accuracy of computational gene-finding on mammalian DNA sequences

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Evaluating and improving the accuracy of computational gene-finding on mammalian DNA sequences Rogic, Sanja

Abstract

This thesis presents work in one of the main research areas in Computational Biology: computational gene-finding in higher eukaryotic genomic DNA. Programs for identification of gene structures have been in existence for more than a decade, but today they are used more extensively than ever to analyze the enormous amount of sequence data coming from various genome sequencing projects. Consequently, their impact on research in the area of genomics and beyond is substantial. The thesis has two distinguishable parts: the first presents an evaluation and comprehensive analysis of the current generation of gene-finding programs. For this purpose a new, thoroughly filtered and biologically validated test dataset of genomic sequences was assembled. The basic prediction accuracy of the programs tested was calculated and the relationships between various sequence and prediction features and programs' accuracy were analyzed. The second part of the thesis presents the development and results of methods for combination of the predictions from two gene-finding programs. Three methods were developed, each having some advantages over the other two, and each of them offering higher prediction accuracy on the test dataset than any gene-finding program currently available.

Item Metadata

Title	Evaluating and improving the accuracy of computational gene-finding on mammalian DNA sequences
Creator	Rogic, Sanja
Publisher	University of British Columbia
Date Issued	2000
Description	This thesis presents work in one of the main research areas in Computational Biology: computational gene-finding in higher eukaryotic genomic DNA. Programs for identification of gene structures have been in existence for more than a decade, but today they are used more extensively than ever to analyze the enormous amount of sequence data coming from various genome sequencing projects. Consequently, their impact on research in the area of genomics and beyond is substantial. The thesis has two distinguishable parts: the first presents an evaluation and comprehensive analysis of the current generation of gene-finding programs. For this purpose a new, thoroughly filtered and biologically validated test dataset of genomic sequences was assembled. The basic prediction accuracy of the programs tested was calculated and the relationships between various sequence and prediction features and programs' accuracy were analyzed. The second part of the thesis presents the development and results of methods for combination of the predictions from two gene-finding programs. Three methods were developed, each having some advantages over the other two, and each of them offering higher prediction accuracy on the test dataset than any gene-finding program currently available.
Extent	4941340 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-07-13
Provider	Vancouver : University of British Columbia Library
Rights	For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
DOI	10.14288/1.0051314
URI	http://hdl.handle.net/2429/10747
Degree (Theses)	Master of Science - MSc
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2000-11
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

ubc_2000-0554.pdf -- 4.71MB

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.

Open Collections

UBC Theses and Dissertations

Evaluating and improving the accuracy of computational gene-finding on mammalian DNA sequences Rogic, Sanja

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights