UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Simulating chromoanagenesis for tool development and testing Jenkins, Kyle Leu

Abstract

The human genome is large and complex. Variations in the genome of an organism can have drastic health implications from cancer to constitutional disease. Most variants involve changes of just one or a few nucleotides, but differences like structural variants can cause significant changes to larger sections of the genome. One rare group of structural variants is chromoanagenesis, where a catastrophic rearrangement of a large section of the genome occurs during a single event. Whereas simpler events involve one or a few breakpoints and may result in localized duplications, inversions, or deletions of genetic fragments within a section of the genome, a single chromoanagenesis event can have hundreds of breakpoints where each broken segment of the chromosome may be unchanged, inverted, duplicated, or deleted in whole or in part before the pieces reassemble in a different order. Chromoanagenesis has most often been described in cancer among other signs of genomic instability, but there have been cases of such events in patients with other diseases as well. Because of the complexity of chromoanagenesis and the genomic context it is often found in, getting accurate sequence-level characterization of cases has been difficult. Developing bioinformatics tools to detect and fully resolve chromoanagenesis in sequence data sets is challenging and expensive, with new technologies providing new avenues of detection but posing different difficulties to resolve. In this thesis, I report Muddler, a simulator I have developed for chromoanagenesis events that can be used with various available software tools to produce data sets that resemble those obtained with different genomic technologies (Next Generation Sequencing, Optical Maps, etc.). These simulated datasets can then be evaluated with sequence analysis tools to understand the strengths and limitations of genomic technologies and new software for characterizing chromoanagenesis and to assist in the development of better tools for this purpose. The method for generating simulated data is presented along with five complex simulated events and their analysis using technology-specific analytical tools/pipelines. To illustrate the utility of Muddler in the use cases provided, each simulated event has at least 150 breakpoints and around 100 combined duplications or deletions.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International