- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Inference of rates across sites via an expectation...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Inference of rates across sites via an expectation maximization algorithm Zhao, Tingting
Abstract
The rates of nucleotide substitution can be different from genes to genes. Moreover, different regions of the same gene can have different rates of mutation as well. Many attempts have been tried to allow for the variable rates across different nucleotide sites. A rate factor coming from the continuous distribution has been introduced to deal with the problem. However, for computation reasons, this method can only scale to less than a dozen sequences. Later studies use a discrete gamma distribution to approximate the gamma distribution. The main contribution of our work is that we propose a discrete distribution over the rate factor which is more flexible while preserving attractive computational properties. We make inference about the rate factor and its distribution via an Expectation Maximization (EM) algorithm. We evaluate our method by both simulations and a real dataset. From the real dataset, it reflects that the method is useful for large phylogenies with even thousands of sequences. We analyze the identifiability of our model for a pair of DNA sequences under certain conditions. We also prove for certain types of rate matrices, this model is non-identifiable.
Item Metadata
Title |
Inference of rates across sites via an expectation maximization algorithm
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2013
|
Description |
The rates of nucleotide substitution can be different from genes to genes. Moreover, different regions of the same gene can have different rates of mutation as well. Many attempts have been tried to allow for the variable rates across different nucleotide sites. A rate factor coming from the continuous distribution has been introduced to deal with the problem. However, for computation reasons, this method can only scale to less than a dozen sequences. Later studies use a discrete gamma distribution to approximate the gamma distribution.
The main contribution of our work is that we propose a discrete distribution over the rate factor which is more flexible while preserving attractive computational properties. We make inference about the rate factor and its distribution via an Expectation Maximization (EM) algorithm. We evaluate our method by both simulations and a real dataset. From the real dataset, it reflects that the method is useful for large phylogenies with even thousands of sequences. We analyze the identifiability of our model for a pair of DNA sequences under certain conditions. We also prove for certain types of rate matrices, this model is non-identifiable.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2013-08-29
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0074174
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2013-11
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International