UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Structural variant calling and resolution from long reads sequencing data Fan, Jeremy

Abstract

Structural variants (SVs) are classified as large scale DNA modifications exceeding 50 base pairs. Despite their relatively low abundance, they can play a large functional role in the progression of diseases like cancer. Traditionally, SVs have been studied using short reads sequencing, but this approach limits by the length of reads, and misses out on events such as large insertions. In cancer, specific SVs are clinically actionable, and thus is imperative to accurately detect them. This thesis aims to assess read-based SV callers, aligners, and reference genomes for long reads to establish best practices, investigate methods for reducing false positives in SV detection by integrating various reference genomes and tools, and illustrates the benefits of SV calling with long reads using the Personalized OncoGenomics (POG) cohort. Key findings include the necessity of achieving a minimum coverage of 15X for accurate SV event detection in germline samples, underscoring the importance of employing an ensemble SV calling approach to capture diverse signals. We demonstrate the ability of long reads to enhance resolution for complex SV events overlooked by Illumina sequencing, illustrated by examples involving predicted pathogenic cancer genes like SMG1 and HIRA. The study confirms biological literature findings, such as high insertion signals in microsatellite instable tumours and unique inversion counts indicative of a unique signal of SVs, “tyfonas”. However, challenges like coverage persist, leading to false negatives. The Nanopore POG dataset holds promise for future SV calling software development. Overall, the research highlights the crucial role of long-read sequencing in somatic SV calling in cancer research, emphasizing the intricacies involved in accurately characterizing SV events in tumour genomes.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International