UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Towards accurate compound annotation in mass spectrometry-based global metabolomics Xing, Shipei

Abstract

Metabolomics is an emerging omics study that aims to characterize the entire metabolome in a biological system. Mass spectrometry (MS) is a preferred analytical technique for metabolomics research owing to its high sensitivity and highly specific structural information content. However, it remains a longstanding challenge to accurately translate MS signals into chemical language, thus hindering the downstream biological interpretation. This dissertation presents computational strategies contributing to tandem mass (MS/MS) spectral interpretations with the aid of machine learning and statistical approaches. Chapter 1 provides a holistic introduction to MS-based metabolomics and the developed bioinformatic tools for uncovering the unidentified metabolic features in untargeted metabolomics. Chapter 2 describes a novel MS/MS spectral comparison algorithm, Core Structure-based Search (CSS), which searches for structural analogs of unknown MS/MS spectra within the existing MS/MS reference libraries. CSS shows improved correlations with structural similarity in large-scale benchmarking. In Chapter 3, a deep learning-based tool is developed for automated extraction of steroid-like metabolic features from the untargeted metabolomics data by classifying MS/MS fragmentation patterns. This biology-driven metabolomics pipeline enables metabolite characterization and discovery on the compound class level. Chapter 4 depicts the purification of chimeric MS/MS spectra using a random forest model. Purified MS/MS spectra are demonstrated to yield better spectral matching results against MS/MS reference libraries. Chapter 5 describes the systematic analysis of radical fragment ions in MS/MS through MS/MS database mining. Larger than expected percentages of radical ions are present in collision- induced dissociation-based MS/MS; relationships between radical ion percentages and compound classes, chemical substructures and collision energies are also investigated. Chapter 6 discusses a standalone platform, BUDDY, for molecular formula discovery via bottom-up MS/MS interrogation and experiment-specific global peak annotation. BUDDY further integrates machine-learned ranking and significance control, showing improved formula annotation accuracy and lower computational cost than other benchmarking tools. Applying BUDDY on repository- scale recurrent unidentified MS/MS spectra, we discovered >5,000 chemical database-unarchived molecular formulae with high confidence. Overall, this dissertation demonstrates computational contributions to enriching structural insights into MS-based untargeted metabolomics data, thus paving the way for understanding biological mechanisms behind various health disorders and diseases from the perspective of small molecules.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International