- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Research Data /
- The Impact of Multifunctional Genes on "Guilt by Association"...
Open Collections
UBC Research Data
The Impact of Multifunctional Genes on "Guilt by Association" Analysis Gillis, Jesse; Pavlidis, Paul
Description
Many previous studies have shown that by using variants of "guilt-by-association", gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the "associations" in the data (e.g., protein interaction partners) of a gene are necessary in establishing "guilt". In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies.
Item Metadata
Title |
The Impact of Multifunctional Genes on "Guilt by Association" Analysis
|
Creator | |
Contributor | |
Date Created |
2011
|
Date Issued |
2019-03-11
|
Description |
Many previous studies have shown that by using variants of "guilt-by-association", gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the "associations" in the data (e.g., protein interaction partners) of a gene are necessary in establishing "guilt". In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies.
|
Subject | |
Type | |
Notes | |
Date Available |
2019-03-11
|
Provider |
University of British Columbia Library
|
License |
CC0 Waiver
|
DOI |
10.14288/1.0363909
|
URI | |
Publisher DOI | |
Rights URI | |
Aggregated Source Repository |
Dataverse
|
Item Media
Item Citations and Data
Licence
CC0 Waiver