UBC Theses and Dissertations
Inference of rates across sites via an expectation maximization algorithm Zhao, Tingting
The rates of nucleotide substitution can be different from genes to genes. Moreover, different regions of the same gene can have different rates of mutation as well. Many attempts have been tried to allow for the variable rates across different nucleotide sites. A rate factor coming from the continuous distribution has been introduced to deal with the problem. However, for computation reasons, this method can only scale to less than a dozen sequences. Later studies use a discrete gamma distribution to approximate the gamma distribution. The main contribution of our work is that we propose a discrete distribution over the rate factor which is more flexible while preserving attractive computational properties. We make inference about the rate factor and its distribution via an Expectation Maximization (EM) algorithm. We evaluate our method by both simulations and a real dataset. From the real dataset, it reflects that the method is useful for large phylogenies with even thousands of sequences. We analyze the identifiability of our model for a pair of DNA sequences under certain conditions. We also prove for certain types of rate matrices, this model is non-identifiable.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International