- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Evaluating and improving the accuracy of computational...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Evaluating and improving the accuracy of computational gene-finding on mammalian DNA sequences Rogic, Sanja
Abstract
This thesis presents work in one of the main research areas in Computational Biology: computational gene-finding in higher eukaryotic genomic DNA. Programs for identification of gene structures have been in existence for more than a decade, but today they are used more extensively than ever to analyze the enormous amount of sequence data coming from various genome sequencing projects. Consequently, their impact on research in the area of genomics and beyond is substantial. The thesis has two distinguishable parts: the first presents an evaluation and comprehensive analysis of the current generation of gene-finding programs. For this purpose a new, thoroughly filtered and biologically validated test dataset of genomic sequences was assembled. The basic prediction accuracy of the programs tested was calculated and the relationships between various sequence and prediction features and programs' accuracy were analyzed. The second part of the thesis presents the development and results of methods for combination of the predictions from two gene-finding programs. Three methods were developed, each having some advantages over the other two, and each of them offering higher prediction accuracy on the test dataset than any gene-finding program currently available.
Item Metadata
Title |
Evaluating and improving the accuracy of computational gene-finding on mammalian DNA sequences
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2000
|
Description |
This thesis presents work in one of the main research areas in Computational Biology:
computational gene-finding in higher eukaryotic genomic DNA. Programs for identification
of gene structures have been in existence for more than a decade, but today they are used
more extensively than ever to analyze the enormous amount of sequence data coming from
various genome sequencing projects. Consequently, their impact on research in the area of
genomics and beyond is substantial.
The thesis has two distinguishable parts: the first presents an evaluation and
comprehensive analysis of the current generation of gene-finding programs. For this purpose
a new, thoroughly filtered and biologically validated test dataset of genomic sequences was
assembled. The basic prediction accuracy of the programs tested was calculated and the
relationships between various sequence and prediction features and programs' accuracy were
analyzed. The second part of the thesis presents the development and results of methods for
combination of the predictions from two gene-finding programs. Three methods were
developed, each having some advantages over the other two, and each of them offering
higher prediction accuracy on the test dataset than any gene-finding program currently
available.
|
Extent |
4941340 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2009-07-13
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
|
DOI |
10.14288/1.0051314
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2000-11
|
Campus | |
Scholarly Level |
Graduate
|
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.