UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A bioinformatic workflow to analyze single cell template strand sequencing data Mattsson, Carl-Adam

Abstract

Structural variants (SVs) contribute greater diversity at the nucleotide level between two human genomes than any other form of genetic variation and are three-fold more likely to correlate in genome-wide association studies (GWAS) than single nucleotide variants (SNVs). Using short-read, high-throughput sequencing technologies to uncover such variation has proven to be troublesome and the methods to detect SVs depend on indirect inferences. However, while larger (>5kb) copy number variations (CNVs) could be characterized using read-depth-based algorithms, this approach often fails for smaller and balanced events. Another fundamental problem for detection of SVs from short-read sequencing is inherent to the predominant data type and typical SV detection algorithm that is effective in unique sequences often fails within complex genomic regions, which have been proven to be highly enriched for SVs. In addition, most SV discovery methods do not indicate the haplotype-origin for a given SV and require parental sequencing for this information. For a more complete description and interpretation of human genomic information in relation to phenotypes such as e.g. cancer predisposition and response to therapies, it will, therefore, be necessary to arrange sequence data into parental haplotypes and ascertain polymorphic inversions with respect to such haplotypes. All this can be achieved using Strand-seq. Strand-seq complements other sequencing approaches by providing crucial information about the genetic make-up of individuals that cannot be obtained in any other way. To make Strand-seq available for human studies worldwide is an immense challenge. Library construction, as well as data analysis, needs to be further developed, integrated and made user-friendly to allow accurate and rapid interpretation of results. Here we present a custom bioinformatics pipeline for analyzing Strand-seq data that streamlines the workflow of raw sequence read alignment, putative variant calling, variant call refinement and haplotype assembly by integrating current available Strand-seq specific tools. In addition, relevant metric data are compiled and visualized, ensuring and reinforcing the potential of Strand-seq as a robust sequencing method for uncovering clinically significant SVs and the assembly of WGH without additional parental genomic data.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International