An investigation into the utility of guilt by association machine learning algorithms for the prioritization of autism spectrum disorder candidate risk genes

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

An investigation into the utility of guilt by association machine learning algorithms for the prioritization of autism spectrum disorder candidate risk genes Gunning, Margot Patricia Rainbow

Abstract

Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by impairments in social interaction and communication, and restrictive repetitive behaviours or interests, with extreme phenotypic and genetic heterogeneity. Currently, genetic association studies have identified 90 risk genes with high confidence out of an estimated 1000. Researchers have begun to use machine learning methods leveraging heterogeneous biological network data in attempts to aid in discovery of ASD risk genes. However, the real-world utility of these studies is questionable: network-based machine learners are often biased towards well studied genes because they operate on a principle called “guilty by association.” In this thesis, I evaluate and compare genetic and computation approaches to ASD risk gene prioritization. I demonstrate that network-based computational approaches are adding little additional useful information compared to genetic approaches for prioritization. Furthermore, I demonstrate that gene expression profiles, and generic measures of disease gene likelihood may provide less biased contextual information that can be used to supplement genetic association data to prioritize ASD risk genes. Lastly, I discuss how data quality and data dependence impacts evaluation of machine learning algorithms and genetic association studies.

Item Metadata

Title	An investigation into the utility of guilt by association machine learning algorithms for the prioritization of autism spectrum disorder candidate risk genes
Creator	Gunning, Margot Patricia Rainbow
Publisher	University of British Columbia
Date Issued	2019
Description	Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by impairments in social interaction and communication, and restrictive repetitive behaviours or interests, with extreme phenotypic and genetic heterogeneity. Currently, genetic association studies have identified 90 risk genes with high confidence out of an estimated 1000. Researchers have begun to use machine learning methods leveraging heterogeneous biological network data in attempts to aid in discovery of ASD risk genes. However, the real-world utility of these studies is questionable: network-based machine learners are often biased towards well studied genes because they operate on a principle called “guilty by association.” In this thesis, I evaluate and compare genetic and computation approaches to ASD risk gene prioritization. I demonstrate that network-based computational approaches are adding little additional useful information compared to genetic approaches for prioritization. Furthermore, I demonstrate that gene expression profiles, and generic measures of disease gene likelihood may provide less biased contextual information that can be used to supplement genetic association data to prioritize ASD risk genes. Lastly, I discuss how data quality and data dependence impacts evaluation of machine learning algorithms and genetic association studies.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2020-01-03
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0387455
URI	http://hdl.handle.net/2429/73091
Degree (Theses)	Master of Science - MSc
Program (Theses)	Bioinformatics
Affiliation	Science, Faculty of
Degree Grantor	University of British Columbia
Graduation Date	2020-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

An investigation into the utility of guilt by association machine learning algorithms for the prioritization of autism spectrum disorder candidate risk genes Gunning, Margot Patricia Rainbow

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights