UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Visualizing heterogeneous data in genomic epidemiology Crisan, Anamaria


Technological innovations have allowed for a greater variety of data, most notably microbial genomic data, to be collected, integrated, analyzed, and visualized for epidemiological investigations. While analytic methods have evolved in light of this technological change, data visualizations systems have lagged behind. I take a novel approach that integrates methods from information visualization, human computer interaction, machine learning, and statistics to address unmet data visualization needs in microbial genomic epidemiology (genEpi). This approach also enables me to generate study artifacts that can be used to address regulatory and organizational constraints arising in domains where the use of data is highly restricted. I first present a mixed methods approach to understand the needs, data, tasks, and constraints of public health stakeholders that are charged with interpreting the findings of these data. I demonstrate how this approach can be used to communicate new and heterogeneous types of data in a clinical report that is read by stakeholders in different roles. I next present a novel method for systematically reviewing data visualizations that I use to develop a Genomic Epidemiology Visualization Typology (GEViT), which enables others to explore and characterize the way the data could be visualized. Finally, I use these collective findings to inform the design and implementation of data visualization tools: Adjutant, the GEViT Gallery, minCombinR, and GEViTRec. Adjutant enables rapid and unsupervised topic clustering of PubMed article corpuses to aid systematic and literature reviews. The GEViT gallery is a browsable interface for exploring data visualizations specific to the microbial genEpi domain. minCombinR lowers the burden to stakeholders for generating combinations of data visualizations for heterogeneous data. Finally, GEViTRec takes a novel approach to the automatic generation of data visualizations that can help stakeholders familiarize themselves with new data. All of these tools integrate with analytic methods. This research makes novel contributions to the design and implementation of data visualization systems that impact microbial genomic epidemiological data collected for public health investigations. The challenges addressed here are not unique to this domain and my contributions are extensible to other domains grappling with heterogeneous, multidimensional, and restricted data.

Item Citations and Data


Attribution-ShareAlike 4.0 International