HMM converter a tool box for hidden Markov models with two novel, memory efficient parameter training algorithms

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

HMM converter a tool box for hidden Markov models with two novel, memory efficient parameter training algorithms Lam, Tin Yin

Abstract

Hidden Markov models (HMMs) are powerful statistical tools for biological sequence analysis. Many recently developed Bioinformatics applications employ variants of HMMs to analyze diverse types of biological data. It is typically fairly easy to design the states and the topological structure of an HMM. However, it can be difficult to estimate parameter values which yield a good prediction performance. As many HMM-based applications employ similar algorithms for generating predictions, it is also time-consuming and error-prone to have to re-implement these algorithms whenever a new HMM-based application is to be designed. This thesis addresses these challenges by introducing a tool-box, called HMMC0NvERTER, which only requires an XML-input file to define an HMM and to use it for sequence decoding and parameter training. The package not only allows for rapid proto-typing of HMM-based applications, but also incorporates several algorithms for sequence decoding and parameter training, including two new, linear memory algorithms for parameter training. Using this software package, even users without programming knowledge can quickly set up sophisticated HMMs and pair-HMMs and use them with efficient algorithms for parameter training and sequence analyses. We use HMMCONVERTER to construct a new comparative gene prediction program, called ANNOTAID, which can predict pairs of orthologous genes by integrating prior information about each input sequence probabilistically into the gene prediction process and into parameter training. ANNOTAID can thus be readily adapted to predict orthologous gene pairs in newly sequenced genomes.

Item Metadata

Title	HMM converter a tool box for hidden Markov models with two novel, memory efficient parameter training algorithms
Creator	Lam, Tin Yin
Publisher	University of British Columbia
Date Issued	2008
Description	Hidden Markov models (HMMs) are powerful statistical tools for biological sequence analysis. Many recently developed Bioinformatics applications employ variants of HMMs to analyze diverse types of biological data. It is typically fairly easy to design the states and the topological structure of an HMM. However, it can be difficult to estimate parameter values which yield a good prediction performance. As many HMM-based applications employ similar algorithms for generating predictions, it is also time-consuming and error-prone to have to re-implement these algorithms whenever a new HMM-based application is to be designed. This thesis addresses these challenges by introducing a tool-box, called HMMC0NvERTER, which only requires an XML-input file to define an HMM and to use it for sequence decoding and parameter training. The package not only allows for rapid proto-typing of HMM-based applications, but also incorporates several algorithms for sequence decoding and parameter training, including two new, linear memory algorithms for parameter training. Using this software package, even users without programming knowledge can quickly set up sophisticated HMMs and pair-HMMs and use them with efficient algorithms for parameter training and sequence analyses. We use HMMCONVERTER to construct a new comparative gene prediction program, called ANNOTAID, which can predict pairs of orthologous genes by integrating prior information about each input sequence probabilistically into the gene prediction process and into parameter training. ANNOTAID can thus be readily adapted to predict orthologous gene pairs in newly sequenced genomes.
Extent	2755022 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-03-09
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0051281
URI	http://hdl.handle.net/2429/5786
Degree	Master of Science - MSc
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2009-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

HMM converter a tool box for hidden Markov models with two novel, memory efficient parameter training algorithms Lam, Tin Yin

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights