UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A bioinformatic workflow for analyzing whole genomes in rare Mendelian disease Couse, Madeline Hazel

Abstract

The vast majority of the human genome (~98%) is non-coding. A symphony of non-coding sequences resides in the genome, interacting with genes and the environment to tune gene expression. Functional non-coding sequences include enhancers, silencers, promoters, non-coding RNA and insulators. Variation in these non-coding sequences can cause disease, yet clinical sequencing in patients with rare Mendelian disease currently focuses mostly on variants in the ~2% of the genome that codes for protein. Indeed, variants in protein-coding genes that can explain a phenotype are identified in less than half of patients with suspected genetic disease by whole exome sequencing (WES). With the dramatic reduction in the cost of whole genome sequencing (WGS), development of algorithms to detect variants longer than 50 bp (structural variants, SVs), and improved annotation of the non-coding genome, it is now possible to interrogate the entire spectrum of genetic variation to identify a pathogenic mutation. A comprehensive pipeline is needed to analyze non-coding variation and structural variation from WGS. In this thesis, I developed and benchmarked a bioinformatics workflow to detect pathogenic non-coding SNVs/indels and pathogenic SVs, and applied this workflow to unsolved patients with rare Mendelian disorders. The pipeline detected ~80-90% of deletions, ~90% of duplications, ~65% inversions, and ~50% of insertions in a simulated genome and the NA12878 genome. The pipeline captured the majority of known pathogenic non-coding single nucleotide variant (SNVs) and insertion deletions (indels), and selectively prioritized a spiked-in known pathogenic non-coding SNV. Several interesting candidate variants were detected in patients, but none could be convincingly implicated as pathogenic. The bioinformatic workflow described in this thesis is complementary to sequencing pipelines that analyze only protein-coding variants from whole genomes. Application of this workflow to larger cohorts of patients with rare Mendelian diseases should identify pathogenic non-coding variants and SVs to increase diagnostic yield of clinical sequencing studies, assist management of genetic diseases, and contribute knowledge of novel pathogenic variants to the scientific community.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International