UBC Research Data

Data from: A Poissonian model of indel rate variation for phylogenetic tree inference Zhai, Yongliang; Alexandre, Bouchard-Cote

Description

Abstract
While indel rate variation has been observed and analyzed in detail, it is not taken into account by current indel-aware phylogenetic reconstruction methods. In this work, we introduce a continuous time stochastic process, the geometric Poisson indel process, that generalizes the Poisson indel process by allowing insertion and deletion rates to vary across sites. We design an efficient algorithm for computing the probability of a given multiple sequence alignment based on our new indel model. We describe a method to construct phylogeny estimates from a fixed alignment using neighbor joining. Using simulation studies, we show that ignoring indel rate variation may have a detrimental effect on the accuracy of the inferred phylogenies, and that our proposed method can sidestep this issue by inferring latent indel rate categories. We also show that our phylogenetic inference method may be more stable to taxa subsampling than methods that either ignore indels or indel rate variation.; Usage notes
Molluscan RNA dataThis dataset is obtained from http://www.rna.icmb.utexas.edu/SIM/4D/Mollusk/alignment.gb. This dataset in nexus format is converted from the original dataset in GenBank format using EMBOSS. The dataset in fasta format used in the data analysis section of the paper can be obtained directly from https://github.com/yzhai220/geopip together with the source code.molluscan.nexus.txt

Item Media

Item Citations and Data

Licence

CC0 Waiver

Usage Statistics