- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- An investigation into the utility of guilt by association...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
An investigation into the utility of guilt by association machine learning algorithms for the prioritization of autism spectrum disorder candidate risk genes Gunning, Margot Patricia Rainbow
Abstract
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by impairments in social interaction and communication, and restrictive repetitive behaviours or interests, with extreme phenotypic and genetic heterogeneity. Currently, genetic association studies have identified 90 risk genes with high confidence out of an estimated 1000. Researchers have begun to use machine learning methods leveraging heterogeneous biological network data in attempts to aid in discovery of ASD risk genes. However, the real-world utility of these studies is questionable: network-based machine learners are often biased towards well studied genes because they operate on a principle called “guilty by association.” In this thesis, I evaluate and compare genetic and computation approaches to ASD risk gene prioritization. I demonstrate that network-based computational approaches are adding little additional useful information compared to genetic approaches for prioritization. Furthermore, I demonstrate that gene expression profiles, and generic measures of disease gene likelihood may provide less biased contextual information that can be used to supplement genetic association data to prioritize ASD risk genes. Lastly, I discuss how data quality and data dependence impacts evaluation of machine learning algorithms and genetic association studies.
Item Metadata
Title |
An investigation into the utility of guilt by association machine learning algorithms for the prioritization of autism spectrum disorder candidate risk genes
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2019
|
Description |
Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by impairments in social interaction and communication, and restrictive repetitive behaviours or interests, with extreme phenotypic and genetic heterogeneity. Currently, genetic association studies have identified 90 risk genes with high confidence out of an estimated 1000. Researchers have begun to use machine learning methods leveraging heterogeneous biological network data in attempts to aid in discovery of ASD risk genes. However, the real-world utility of these studies is questionable: network-based machine learners are often biased towards well studied genes because they operate on a principle called “guilty by association.” In this thesis, I evaluate and compare genetic and computation approaches to ASD risk gene prioritization. I demonstrate that network-based computational approaches are adding little additional useful information compared to genetic approaches for prioritization. Furthermore, I demonstrate that gene expression profiles, and generic measures of disease gene likelihood may provide less biased contextual information that can be used to supplement genetic association data to prioritize ASD risk genes. Lastly, I discuss how data quality and data dependence impacts evaluation of machine learning algorithms and genetic association studies.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2020-01-03
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0387455
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2020-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International