Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA)

UBC Faculty Research and Publications

Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA) Gillis, Jesse; Pavlidis, Paul

Abstract

The assignment of gene function remains a difficult but important task in computational biology. The establishment of the first Critical Assessment of Functional Annotation (CAFA) was aimed at increasing progress in the field. We present an independent analysis of the results of CAFA, aimed at identifying challenges in assessment and at understanding trends in prediction performance. We found that well-accepted methods based on sequence similarity (i.e., BLAST) have a dominant effect. Many of the most informative predictions turned out to be either recovering existing knowledge about sequence similarity or were "post-dictions" already documented in the literature. These results indicate that deep challenges remain in even defining the task of function assignment, with a particular difficulty posed by the problem of defining function in a way that is not dependent on either flawed gold standards or the input data itself. In particular, we suggest that using the Gene Ontology (or other similar systematizations of function) as a gold standard is unlikely to be the way forward.

Item Metadata

Title	Characterizing the state of the art in the computational assignment of gene function: lessons from the first critical assessment of functional annotation (CAFA)
Creator	Gillis, Jesse; Pavlidis, Paul
Publisher	BioMed Central
Date Issued	2013-04-22
Description	The assignment of gene function remains a difficult but important task in computational biology. The establishment of the first Critical Assessment of Functional Annotation (CAFA) was aimed at increasing progress in the field. We present an independent analysis of the results of CAFA, aimed at identifying challenges in assessment and at understanding trends in prediction performance. We found that well-accepted methods based on sequence similarity (i.e., BLAST) have a dominant effect. Many of the most informative predictions turned out to be either recovering existing knowledge about sequence similarity or were "post-dictions" already documented in the literature. These results indicate that deep challenges remain in even defining the task of function assignment, with a particular difficulty posed by the problem of defining function in a way that is not dependent on either flawed gold standards or the input data itself. In particular, we suggest that using the Gene Ontology (or other similar systematizations of function) as a gold standard is unlikely to be the way forward.
Genre	Article
Type	Text
Language	eng
Date Available	2016-01-09
Provider	Vancouver : University of British Columbia Library
Rights	Attribution 4.0 International (CC BY 4.0)
DOI	10.14288/1.0223203
URI	http://hdl.handle.net/2429/56334
Affiliation	Medicine, Faculty of; Psychiatry, Department of; Other UBC; Non UBC
Citation	BMC Bioinformatics. 2013 Apr 22;14(Suppl 3):S15
Publisher DOI	10.1186/1471-2105-14-S3-S15
Peer Review Status	Reviewed
Scholarly Level	Faculty
Copyright Holder	Gillis and Pavlidis; licensee BioMed Central Ltd.
Rights URI	http://creativecommons.org/licenses/by/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Faculty Research and Publications