UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Overcoming missing data in phylogenetic analysis of shotgun sequencing to detect HIV adaptation to immune response Nguyen, Thuy


DNA sequencing gives us insight into how viruses adapt to their host immune systems. Studies of viral populations typically employ deep amplicon sequencing with next-generation reads to capture a detailed sample of genetic variation in a population. The high amount of overlapping sites in a multiple sequence alignment of reads from amplicon sequencing form ideal input for phylogenetic reconstruction, a necessary step for studying evolutionary relations in a population. However, the typical short read lengths of < 600 bp from next generation sequencing technology with the best sequence error rate impose a severe limit on the width of genomic regions for which evolutionary relationships can be analyzed. Shotgun sequencing, in which DNA is fragmented at random positions, is an efficient alternative to amplicon sequencing for covering wider regions of a genome with sufficient depth. Due to the random staggered positions of shotgun reads in a genome, an extremely high percentage of missing data can result in multiple sequence alignment of shotgun sequencing. The absence of sequence homology across the entire set of short reads makes it impossible to reconstruct a phylogenetic tree, limiting the utility of shotgun data for phylogenetic analysis. We developed the Umberjack software pipeline, which employs a 'sliding window' approach to minimize the effect of missing data during phylogenetic reconstruction and obtain evolutionary statistics to detect sites under selection. Using Umberjack to measure a new metric of directional selection I, significant directional selection was detected in treatment-naive HIV populations at sites with previously documented associations with cytotoxic T-lymphocyte (CTL) response. Further, substitutions towards wild-type amino acids were found to occur early within the population's history, but rarely occurred at a site after the appearance of a CTL escape mutation. Measuring the same metric I in drug treated HIV populations, the directional selection due to the constant pressure of drug treatment was much greater than the directional selection from the immune system.

Item Media

Item Citations and Data


Attribution-NonCommercial 4.0 International