UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Database-driven whole genome profiling for stratifying Triple Negative Breast Cancers (TNBC) Asiimwe, Rebecca

Abstract

Whole genome sequencing of cancers for variant discovery and patient stratification generates vast amounts of data including on the order of 10^6 relevant features per sample. The current practice is to store this data in flat files whose structure complicates tasks required to optimally store, query and conduct integrative data mining and analysis of orthogonally collected data such as phenotype and clinical outcomes. In this study we designed, developed and optimized an object-relational database to support optimal storage, integration, querying, analysis and visualization of largescale whole genome profiling data at the level of genome-wide individual somatic variants (CNAs, SNVs, SVs and indels). We structured variant data from analytics pipelines and implemented a PostgreSQL database in which we bulk-loaded clinical outcomes and somatic variants from 88 Triple Negative Breast cancers (TNBCs). Our focus on TNBC was driven by the current and urgent need for better characterization of the genetic, molecular and clinical biomarkers of this heterogeneous, more aggressive and difficult to treat breast cancer subtype for which there are limited treatment options. Secondly, our inclination to whole genome sequencing (WGS) was attributed to the ability of WGS approaches to provide an in-depth analysis and elucidation of the landscape of mutations occurring across the genome that may reflect specific mutational processes as targetable vulnerabilities in human cancers. However, a whole genome sequencing study in TNBC at scale to investigate genomic properties as a stratification tool has not been undertaken. Hinged on these notions, we applied the developed database and present its indispensable utility in supporting optimal access, exploration, analysis and visualization of genomic contents of patient tumours to support quality control, inference of patterns of mutations and genomic events underpinning a patient’s disease, population level aggregation analysis, gene mutation visualization and patient stratification. Furthermore, we developed Genome-Miner, a web-based database user interface to additionally support interactive and convenient access, sharing, interrogation and visualization of collected data across various research groups. We anticipate the database infrastructure we present will have utility in other whole genome studies and push the field beyond the use of flat files for managing whole genome datasets in cancer.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International