Open Collections

UBC Faculty Research and Publications

The use of Gene Ontology terms for predicting highly-connected 'hub' nodes in protein-protein interaction… Hsing, Michael; Byler, Kendall G; Cherkasov, Artem Sep 16, 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
52383-12918_2008_Article_237.pdf [ 1.45MB ]
Metadata
JSON: 52383-1.0228390.json
JSON-LD: 52383-1.0228390-ld.json
RDF/XML (Pretty): 52383-1.0228390-rdf.xml
RDF/JSON: 52383-1.0228390-rdf.json
Turtle: 52383-1.0228390-turtle.txt
N-Triples: 52383-1.0228390-rdf-ntriples.txt
Original Record: 52383-1.0228390-source.json
Full Text
52383-1.0228390-fulltext.txt
Citation
52383-1.0228390.ris

Full Text

ralssBioMed CentBMC Systems BiologyOpen AcceResearch articleThe use of Gene Ontology terms for predicting highly-connected 'hub' nodes in protein-protein interaction networksMichael Hsing*†1, Kendall Grant Byler†2 and Artem Cherkasov2Address: 1Bioinformatics Graduate Program, Faculty of Graduate Studies, University of British Columbia. 100-570 West 7th Avenue. Vancouver, BC, V5T 4S6, Canada.  and 2Division of Infectious Diseases, Department of Medicine, Faculty of Medicine, University of British Columbia. D 452 HP, VGH. 2733 Heather Street. Vancouver, BC, V5Z 3J5, Canada. Email: Michael Hsing* - mhsing@interchange.ubc.ca; Kendall Grant Byler - kbyler@interchange.ubc.ca; Artem Cherkasov - artc@interchange.ubc.ca* Corresponding author    †Equal contributorsAbstractBackground: Protein-protein interactions mediate a wide range of cellular functions andresponses and have been studied rigorously through recent large-scale proteomics experimentsand bioinformatics analyses. One of the most important findings of those endeavours was theobservation that 'hub' proteins participate in significant numbers of protein interactions and playcritical roles in the organization and function of cellular protein interaction networks (PINs) [1,2].It has also been demonstrated that such hub proteins may constitute an important pool ofattractive drug targets.Thus, it is crucial to be able to identify hub proteins based not only on experimental data but alsoby means of bioinformatics predictions.Results: A hub protein classifier has been developed based on the available interaction data andGene Ontology (GO) annotations for proteins in the Escherichia coli, Saccharomyces cerevisiae,Drosophila melanogaster and Homo sapiens genomes. In particular, by utilizing the machine learningmethod of boosting trees we were able to create a predictive bioinformatics tool for theidentification of proteins that are likely to play the role of a hub in protein interaction networks.Testing the developed hub classifier on external sets of experimental protein interaction data inMethicillin-resistant Staphylococcus aureus (MRSA) 252 and Caenorhabditis elegans demonstratedthat our approach can predict hub proteins with a high degree of accuracy.A practical application of the developed bioinformatics method has been illustrated by the effectiveprotein bait selection for large-scale pull-down experiments that aim to map complete protein-protein interaction networks for several species.Conclusion: The successful development of an accurate hub classifier demonstrated that highly-connected proteins tend to share certain relevant functional properties reflected in their GeneOntology annotations. It is anticipated that the developed bioinformatics hub classifier willrepresent a useful tool for the theoretical prediction of highly-interacting proteins, the study ofcellular network organizations, and the identification of prospective drug targets – even in thosePublished: 16 September 2008BMC Systems Biology 2008, 2:80 doi:10.1186/1752-0509-2-80Received: 1 May 2008Accepted: 16 September 2008This article is available from: http://www.biomedcentral.com/1752-0509/2/80© 2008 Hsing et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 14(page number not for citation purposes)organisms that currently lack large-scale protein interaction data.BMC Systems Biology 2008, 2:80 http://www.biomedcentral.com/1752-0509/2/80BackgroundA broad range of cellular functions are mediated throughcomplex protein-protein interactions, which are com-monly visualized as two-dimensional networks connect-ing thousands of proteins by their physical interactions.Such a network perspective suggests that cellular effectsand functions of proteins can only be fully understood incontext with their interacting partners in a protein interac-tion network (PIN).The study of PINs has been made possible through recentadvancements in high-throughput proteomics that havedetected protein-protein interactions on a genome-widescale and have generated large amounts of interactiondata for several species including Saccharomyces cerevisiae[3-7], Escherichia coli [8], Drosophila melanogaster [9],Caenorhabditis elegans [10], and Homo sapiens [11,12]. Thecorresponding protein interaction networks have beenmade publicly accessible through open access databasessuch as IntAct [13] and DIP [14].The accumulated protein interaction data have furthersupported recent protein network analyses that demon-strated the scale-free organization of PINs, where themajority of proteins have a low number of interactions inthe network, with a few highly-connected proteins (alsocalled hubs) having a significant number of interactingpartners [1,2]. Such inhomogeneous network topologyallows a PIN to be robust against random removal of pro-tein nodes, but vulnerable to targeted removal of networkhubs [15]. In addition, previous studies have showndefined relationships between the degree of connectivityof proteins in PINs, their sequence conservation, and cel-lular essentiality properties [16,17]. Those studies indi-cated that highly-connected proteins (or hubs) representvery attractive subjects for understanding cellular func-tions, identifying novel drug targets, and for use in therational design of large-scale pull-down experiments.Although large-scale PINs have already been experimen-tally determined for several species (and thus representsuitable training sets for hub-predicting bioinformaticsapproaches), in general, protein interaction data are stilllacking for many organisms. Thus, several computationalapproaches have been developed to predict protein-pro-tein interactions utilizing existing bioinformatics datasuch as gene proximity information [18,19], gene fusionevents [20,21], gene co-expression data [22-24], phyloge-netic profiling [25], orthologous protein interactions [26]and identification of interacting protein domains [27-30].Several bioinformatics approaches have also been devel-oped to identify hypothetical interactions between pro-teins based on their three-dimensional structures [31,32]identification of pairwise protein-protein interactionswith varying degrees of accuracy [35]; however, none ofthem have been explicitly focused on predicting hypothet-ical hub proteins.At the same time, it is reasonable to hypothesize that hubproteins should share certain common sequence or struc-tural features that not only enable them to participate inmultitudes of protein interactions, but also can be utilizedfor the theoretical identification of such hub proteinswithout prior knowledge of the corresponding PINs.Therefore, the goal of this study is to develop such a 'hubpredictor' (or classifier), capitalizing on experimental andbioinformatics data available to date for proteins in sev-eral model organisms with already-determined PINs.We have focused the construction of the hub classifier onGene Ontology (GO) data, which provide functionalannotations for individual proteins using an expertknowledge base [36-38]. The advantage of applying GOannotation to hub prediction lies in the readily availableinformation for proteins in hundreds of species. Impor-tantly, the GO annotations have been shown to reflect cer-tain properties that can mediate protein-proteininteractions [35], but the annotation itself does not relyon the availability of corresponding experimental data.Thus, the GO-based hub classifier should be suitable forpredicting highly-connected proteins, even in organismsthat lack protein interaction data.Here we present the development of such a hub proteinclassifier, trained on the existing GO and protein-proteininteraction data for Escherichia coli, Saccharomyces cerevi-siae, Drosophila melanogaster and Homo sapiens species. Thegenerated models were cross-validated and tested on twoexternal protein interaction data sets: Methicillin-resistantStaphylococcus aureus (MRSA) 252 and Caenorhabditis ele-gans. The developed bioinformatics approach has notonly demonstrated an improved accuracy in identifyinghighly-connected PIN nodes (as compared to homology-or protein domain-based predicting methods), but hasalso shown an improved speed and a lower demand oncomputational resources.To illustrate a possible application of the developed tool,we have used it for rationalizing a bait selection strategyfor a large-scale protein complex pull-down experiment.MethodsData acquisitionProtein-protein interaction dataProtein interaction data used for the training and testingof the hub protein classifier were obtained from the IntActPage 2 of 14(page number not for citation purposes)or by applying text-mining techniques [33,34]. Tradition-ally, such computational predictions have focused on thedatabase [13] for the following species: Escherichia coli K12 (taxonomy ID: 83333), Saccharomyces cerevisiae (taxon-BMC Systems Biology 2008, 2:80 http://www.biomedcentral.com/1752-0509/2/80omy ID: 4932), Drosophila melanogaster (taxonomy ID:7227), and Homo sapiens (taxonomy ID: 9606) (acquisi-tion date: Sep. 25th, 2007). Two external validation datasets were collected for protein interactions in MRSA252(provided by the PREPARE project in Vancouver B.C. Can-ada [39]) and Caenorhabditis elegans (obtained from IntActdatabase on Sep. 25th, 2007). Table 1 lists the totalnumber of proteins and their interactions of the four spe-cies in the training and testing, which have been com-bined into a single data set for the subsequent analyses.Similar information on the external validation sets isshown in Table 2.Hub proteins were identified based on their numbers ofprotein interactions and their percentile ranking relativeto other proteins in the same species. Proteins of the samespecies were divided into different percentile groups,sorted by the number of protein-protein interactions in adecreasing order (ie. higher percentile proteins have moreinteractions than lower percentile proteins). It is clear thathub proteins have more interactions than non-hubs, butcurrently there is no consensus on exactly how manyinteractions a hub protein should have. Often, hubs aredefined arbitrarily to have at least certain number of inter-actions [40]. In our study, the hub selection criterion wasbased on the position of a sharp turn (or inflection point)on an accumulative protein interaction distribution plotfrom each of the four species. As shown in Figure 1, theprotein interactions followed a power law distribution,such that a sharp turn is visible around the 90th proteinpercentile position on the interaction plots.To achieve a consistent hub definition across the fourstudied species, hub proteins were defined as above orequal to the 90th percentiles of interactors; in other words,the hubs represented the top 10 percent of highly-con-nected interactors, and the non-hubs were consisted of thebottom 90 percent of the proteins. Using this definition,hub proteins were determined from each of the four PINsindividually. At the 90th protein percentile, E. coli hubshave at least 20 protein interactions, S. cerevisiae hubshave at least 33 protein interactions, D. melanogaster hubshave at least 16 protein interactions, and H. sapiens hubshave at least 13 interactions. The number of assigned huband non-hub classifications is shown in Table 1.Figure 2 illustrates the subsequent steps involved in thedevelopment of the hub protein classifiers and their cor-responding bioinformatics analyses.Gene Ontology (GO) dataEach protein obtained from the IntAct database was iden-tified by a unique UniProt accession number, which ena-bled a fast collection of GO annotation data from theUniprot Retrieval System [37,41] (Uniprot protein dataobtained on Oct. 1st, 2007). The complete UniProt pro-tein annotation pages were downloaded as flat texts,which were then parsed by PERL scripts to extract the GOannotations in the three categories: biological process,molecular function, and cellular component. Becauseeach GO term could be assigned to a different level of theannotation hierarchy, we established a fixed general GOlevel that represented all of the specific GO terms of theproteins in the study. This general GO annotation levelwas determined based on the GO slim project, which pro-vides a list of generic GO terms on which many bioinfor-matics analyses can be performed [42]. Importantly, theGO slim generic terms provided a reasonable number ofprotein 'predictors' for a machine learning method toeffectively operate. The tool 'map2slim' [43] was used tomap specific GO terms to the 'GO slim' generic terms (GOannotation files were obtained from [44] on Oct. 17th,Table 1: A summary of protein interaction and GO annotation data used in the training and testing of the hub classifiers.Training/Testing set E. coli S. cerevisiae D. melanogaster H. sapiens total of 4 species# of proteins 2860 5397 6935 6592 21784# of hubs (10% of total proteins) 286 535 628 620 2069# of non-hubs (90% of total proteins) 2574 4862 6307 5972 19715# of protein interactions 13888 37167 19994 19115 90164minimum # of interactions per hub 20 33 16 13# of proteins with at least one GO term 1378 4738 5931 5097 17144# of proteins without any GO term 1482 659 1004 1495 4640% of proteins with at least one GO term 48.18% 87.79% 85.52% 77.32% 78.70%# of different GO terms – process 30 41 48 49 50# of different GO terms – function 21 37 38 37 40# of different GO terms – component 4 27 31 29 35# of different GO terms – total 55 105 117 115 125Page 3 of 14(page number not for citation purposes)The top table lists the protein interactions and hubs in each of the four species, and the bottom part of the table lists the number of unique GO terms for each annotation category.BMC Systems Biology 2008, 2:80 http://www.biomedcentral.com/1752-0509/2/802007; GO format-version: 1.2, GO date: 16:10:200716:19, GO revision: 5.514; GO slim format-version: 1.2,GO slim date: 01:10:2007 16:53, GO slim revision:1.682). This generic version of GO slim contained 53[biological process] terms, 42 [molecular functions] termsand 37 [cellular component] terms.Table 1 and 2 list the number of GO slim terms used toannotate the proteins in each species and the number ofthe proteins with or without a GO annotation term.All protein interaction data and GO annotations werestored in a local MySQL database for fast data searchingand reporting.Hub protein classification by boosting treesTo train models that classify a protein as a hub or a non-hub, the protein interaction data from the four specieswere combined into a single data set (90,164 interactionsinvolving 2,069 hubs and 19,715 non-hubs). A four-foldcross-validation strategy was used in which four non-over-lapping testing sets (25% of the total protein set), andfour training sets (75% of the total protein set) were uti-lized for building the hub classifiers. Each training andtesting set maintained the same hub to non-hub (1:9)ratio. In addition, the proteins in the training sets havemaintained the same distribution of GO annotation termsas the proteins in the testing sets. Figure 3 illustrates thedistribution of each of the 125 GO terms, represented bythe percentage of proteins with this term in the trainingcated an equal GO distribution between the training andtesting sets. It is also shown that the majority of the GOterms were associated with less than 10% of the proteinsin a given data set.We focused the machine-learning effort on hub classifica-tion by applying boosting trees, which is one of the bestmethods for classifying complex data and providing inter-pretable results [45]. The training and testing of the hub-predicting classification trees were performed on 125 GOterms as predictor variables by using the boosting treeapplication as implemented in STATISTICA version 8[46]. The input data were formatted as tables of binarydata, where each column represented a GO term variable(1 = present, 0 = absent) and each row represented a sam-ple protein.Four classifiers were built (one for each of the four train-ing sets) and compiled in the C++ language under Linux.In addition to the four testing sets in the cross-validationstudy, the best of the four hub classifiers has been vali-dated on two external data sets, which were consisted ofexperimentally-determined PINs in MRSA252 and C. ele-gans. The classifier predicted each protein in the data setsas either a hub or a non-hub, and the classification statis-tics were calculated as the following:Sensitivity = TP/(TP + FN)Specificity = TN/(TN + FP)Accuracy = (TP + TN)/(TP + TN + FP + FN)PPV (Positive Predictive Value) = TP/(TP + FP)NPV (Negative Predictive Value) = TN/(TN + FN), where TP = True Positive, FP = False Positive, TN = TrueNegative, and FN = False Negative.A useful output feature of the boosting tree method is therelative predictor importance, which measures the averageinfluence of a predictor variable on the prediction out-come over all of the trees [45]. The most important predic-tor is assigned a value of 100, and the other variables arescaled accordingly.Comparison of the hub classifiers with other existing protein interaction prediction approachesTo further assess the performance of the hub classifieragainst other existing approaches for predicting hub pro-teins, we applied three different types of bioinformaticsmethods to construct hypothetical PINs in MRSA252,Table 2: A summary of protein interaction and GO annotation data used in the external validation of the hub classifiers.External validation set MRSA252 C. elegans# of proteins 133 2890# of hubs (10% of total proteins) 13 276# of non-hubs (90% of total proteins) 120 2614# of protein interactions 2401 4594minimum # of interactions per hub 45 7# of proteins with at least one GO term 109 2403# of proteins without any GO term 24 487% of proteins with at least one GO term 81.95% 83.15%# of different GO terms – process 27 46# of different GO terms – function 19 34# of different GO terms – component 5 22# of different GO terms – total 51 102The top table lists the protein interactions and hubs in each of the two species, and the bottom part of the table lists the number of unique GO terms for each annotation category.Page 4 of 14(page number not for citation purposes)sets vs. the testing sets of the four cross-validation sam-ples. A high correlation R2 values of 0.9981 ~0.9983 indi-where hub proteins were determined by the number ofpredicted pairwise protein-protein interactions.BMC Systems Biology 2008, 2:80 http://www.biomedcentral.com/1752-0509/2/80Hypothetical PIN – pathway mapsThe first type of hypothetical PIN represented the knownprotein-protein interactions available for MRSA252. Atotal of 513 protein interactions were manually extractedfrom the pathway maps in the KEGG database [47](acquisition date: May 3rd, 2006).Hypothetical PIN – orthologous interactionsThe second type of PIN was constructed based on knownprotein-protein interactions between orthologs fromthree other species: Helicobacter pylori, Saccharomyces cere-visiae, and Escherichia coli. The experimental PIN in H.pylori was obtained from the BIND database [48] (acquisi-tion date: Aug. 11th, 2005). Two sources were used tobuild the S. cerevisiae PIN: the BIND database (acquisitiondate: Aug. 11th, 2005) and Gavin's study [6] (acquisitiondate from the IntAct database [13]: Feb. 7th, 2006). We2656 protein sequences in MRSA252 were obtained fromthe RefSeq databases at NCBI [49] (acquisition date: Feb.4th, 2006). The orthologs of the interacting proteins fromeach of the above species were identified in MRSA252 byusing the program InParanoid [50] (version 1.35). If apair of MRSA252 proteins whose orthologs interacted inone of the three species, the pair was assigned as an inter-acting protein pair. A total of 3258 protein interactionswere predicted for this type of MRSA252 PIN reconstruc-tion.Hypothetical PIN – interacting domainsThe third type of MRSA PIN was predicted based on pro-tein domain-domain interactions. First, the presence ofPfam domains [51] in each of the 2656 MRSA252 pro-teins was determined by scanning the Pfam domain pro-files (version 19.0) with the program HMMER [52]Accumulative protein interaction distribution plotsFig re 1Accumulative protein interaction distribution plots. a) E. coli, b) S. cerevisiae, c) D. melanogaster, d) H. sapiens. On each plot, the (x, y) coordinate of the sharp turn or the inflection point is shown.Page 5 of 14(page number not for citation purposes)extracted the E. coli PIN in Butland's study [8] from theIntAct database [13] (acquisition date: Apr. 13th, 2006).(version 2.3.2). Second, domain-domain interaction datawere acquired from two sources: InterDom [53] (version:BMC Systems Biology 2008, 2:80 http://www.biomedcentral.com/1752-0509/2/801.2) and iPfam [54] (version: 19.0). If a pair of MRSA252proteins contained interacting domains according to oneof the two sources, the pair was assigned as an interactingprotein pair. A total of 11,608 protein interactions werepredicted based by this method.Validating the prediction on an experimental MRSA252 PINThe experimental MRSA252 PIN provided by the PRE-PARE project contained interaction data for 133 proteinsand was used as the external validation set for measuringWe have compared the prediction results in two differentways. In the first type of comparison, both the hub classi-fier and the combined hypothetical PINs classified the133 MRSA proteins as hubs or non-hubs, while the same133 proteins were also classified as hubs or non-hubsbased on the experimental results provided by PREPARE.In the case of the hub classifier, hubs and non-hubs werereported explicitly from the prediction program. In thecases of hypothetical and experimental PINs, hubs weredefined as above or equal to the 90th percentile of proteinsA flow chart of the development of the hub protein classifiers and their corresponding bioinformatics analysesFigure 2A flow chart of the development of the hub protein classifiers and their corresponding bioinformatics analyses.Page 6 of 14(page number not for citation purposes)the prediction performance of the hub classifier and thedifferent types of hypothetical PINs.ranked by the number of interactions (same criterion asthe hub classifier). The following classification statisticsBMC Systems Biology 2008, 2:80 http://www.biomedcentral.com/1752-0509/2/80were calculated: sensitivity, specificity, accuracy, PPV andNPV.the case of the hub classifier, the proteins were rankedbased on the differences between predicted hub probabil-ities and non-hub probabilities as computed by the boost-Distribution of GO annotation terms between the training and testing sets in the four cross-validation samplesFigure 3Distribution of GO annotation terms between the training and testing sets in the four cross-validation sam-ples. Each point on a graph represents the percentage of proteins annotated with a given GO term in the training set (x-axis), and the percentage of proteins annotated with the same GO term in the testing set (y-axis). All four plots were fitted with lin-ear regression lines, with high R2 values of 0.998. This indicates an equal distribution of the GO terms between the training and testing sets of the four samples.Page 7 of 14(page number not for citation purposes)In the second type of comparison, we compared rankedlists of proteins based on their 'hub-likeness' property. Ining tree method. In the case of the hypothetical andexperimental PINs, the proteins were ranked by theirBMC Systems Biology 2008, 2:80 http://www.biomedcentral.com/1752-0509/2/80numbers of protein interactions. The ranked lists werecompared to the list of proteins ranked by the number ofexperimental interactions in MRSA252 by using a Spear-man rank order correlation as implemented in STATIS-TICA 8.Validating the prediction on an experimental C. elegans PINIn addition to MRSA252, we have tested the hub proteinclassifier on an external set of protein interaction data inC. elegans. The same procedure was applied to determinehub prediction statistics, as described above.Test of significanceTo test the hub protein classifier against a null hypothesis,which claims there is no difference of GO term distribu-tion between hubs and non-hubs, we have randomizedthe protein interaction data in the following ways. Firstly,the same 5445 proteins in the testing set (25% of the totalprotein set consisted of the four species) for the hub clas-sifier were used in the construction of a randomized dataset. Secondly, 10% of those proteins were randomlyassigned as hubs, while the other 90% of proteins wererandomly assigned as non-hubs. Thirdly, the GO termsoriginally associated with those proteins were randomlydistributed within the data set. The combination of theabove two randomization methods ensured that there wasno significant difference in GO term distribution betweenthe hub and non-hub proteins. Finally, the hub classifierwas used to predict hubs and non-hubs in the rand-omized data set, and prediction statistics were obtained.Simulation of protein bait selections and network coverageThe effectiveness of protein bait selections assisted by thehub classifier has been simulated by using yeast protein-protein interaction data determined by protein-complexpull-down and mass spectrometry experiments, availablefrom Gavin's study [6]. One major goal of such large-scaleexperiments is to maximize the number of protein inter-actions identified by using a small set of proteins as 'baits'to pull down their interactors (preys). Therefore, it is cru-cial to select protein baits based on properties that willproduce the best network coverage, as measured by theratio between the number of protein interactions identi-fied by an experiment and the total number of interac-tions in an organism.In our simulation experiments, 18,028 interactions,involving 2551 proteins from Gavin's yeast data set(acquisition date from the IntAct database [13]: Feb. 7th,2006), were hypothetically treated as the total number ofprotein interactions in Saccharomyces cerevisiae. To simu-late the bait selection process, we selected a subset of pro-teins (ranged from 5% up to 100% of the 2551 yeastproteins) as baits and calculated the number of interac-tions such baits would "pull-out" from the yeast interac-tion data set and computed the overall network coverage.Two selection criteria were used. In one simulation, thebaits were randomly selected from the total pool of theyeast proteins. In the other simulation, the baits wereselected from the pool of hub proteins predicted by thehub classifier.In addition to the bait selection strategy described above(referred to as one-round selection), we simulated the net-work coverage results by applying a second round of selec-tions. In this type of selection, baits were divided into twosets: one-third as the first round of baits, and two-thirds asthe second round of baits. The first-round baits were cho-sen by either random selection or by hub prediction. Thesecond round of baits was selected from the most abun-Table 3: Prediction performance of the hub classifier in the combined data set of the four speciesHub classifier (# of nodes in each tree = 15, FN: FP penalty = 1:1.9, total # of trees = 187)Trainingobserved predicted non-hub predicted hub sensitivity specificity accuracy PPV NPVnon-hub 13381 1405 36.51% 90.50% 85.37% 28.75% 93.14%hub 986 567Testingobserved predicted non-hub predicted hub sensitivity specificity accuracy PPV NPVnon-hub 4415 514 28.10% 89.57% 83.75% 22.00% 92.25%hub 371 145Allobserved predicted non-hub predicted hub sensitivity specificity accuracy PPV NPVnon-hub 17796 1919 34.41% 90.27% 84.96% 27.06% 92.91%hub 1357 712Page 8 of 14(page number not for citation purposes)The observed vs. predicted hubs and non-hubs and their corresponding classification statistics are shown for the best classifier based on the training, testing and all (training + testing) data setsBMC Systems Biology 2008, 2:80 http://www.biomedcentral.com/1752-0509/2/80dant preys pulled down by the first round of baits. Suchan approach is also referred to as the "name your friend"method and has been applied to maximize the effective-ness in vaccinations against infectious diseases [55,56], aswell as in some protein complex experiments [8].Results and DiscussionPrediction performance of the hub prediction classifierOne prediction model was constructed for each of thefour cross-validation samples; therefore, a total of fourhub classifiers were generated. The executable files of theclassifiers were complied by the Gnu C++ compiler inLinux. The classifier programs used a list of query proteinsand their corresponding GO term occurrences as the inputfile, and produced the same list of the proteins with hubprediction results and probability scores. The runningtime was only a few seconds for predicting hubs from over21,000 proteins on a 3.0 GHz Pentium D personal com-puter.Table 4: Hub prediction comparison of the classifier and the hypothetical PINs in MRSA252.Hub classifierobserved predicted non-hub predicted hub sensitivity specificity accuracy PPV NPVnon-hub 109 11 30.77% 90.83% 84.96% 26.67% 92.37%hub 9 4Hypothetical PIN – pathway mapsobserved predicted non-hub predicted hub sensitivity specificity accuracy PPV NPVnon-hub 111 9 0.00% 92.50% 83.46% 0.00% 89.52%hub 13 0Hypothetical PIN – orthologous interactionsobserved predicted non-hub predicted hub sensitivity specificity accuracy PPV NPVnon-hub 110 10 23.08% 91.67% 84.96% 23.08% 91.67%hub 10 3Hypothetical PIN – interacting domainsobserved predicted non-hub predicted hub sensitivity specificity accuracy PPV NPVnon-hub 117 3 0.00% 97.50% 87.97% 0.00% 90.00%hub 13 0Combined hypothetical PIN – (pathway maps + orthologous interactions)observed predicted non-hub predicted hub sensitivity specificity accuracy PPV NPVnon-hub 110 10 23.08% 91.67% 84.96% 23.08% 91.67%hub 10 3Combined hypothetical PIN – (pathway maps + orthologous interactions + interacting domains)observed predicted non-hub predicted hub sensitivity specificity accuracy PPV NPVnon-hub 108 12 7.69% 90.00% 81.95% 7.69% 90.00%hub 12 1The prediction performance of the hub classifier is compared to that of the hypothetical PINs in MRSA252. The classification statistics is reported.Table 5: Comparing ranked lists of hub-likeness properties between the classifier and the hypothetical PINs in MRSA252.Hub prediction methods correlation coefficientHub classifier 0.320523Hypothetical PIN – pathway maps 0.108682Hypothetical PIN – orthologous interactions 0.27396Hypothetical PIN – interacting domain -0.291846Combined hypothetical PIN – (pathway maps + orthologous interactions) 0.23882Combined hypothetical PIN – (pathway maps + orthologous interactions + interacting domains) -0.011494Page 9 of 14(page number not for citation purposes)The ranked protein lists based on hub-likeness properties, produced by either the classifier or the hypothetical PINs, has been compared to that of the experimental PIN in MRSA252. The coefficient of Spearman rank order correlation is reported with p-value < 0.05.BMC Systems Biology 2008, 2:80 http://www.biomedcentral.com/1752-0509/2/80Overall, the classification statistics were consistentbetween the training and testing sets for the four classifi-ers. Within the training sets, the sensitivity of the classifi-ers ranged from 33.33% ~36.51%, the specificity rangedfrom 90.50% ~90.94%, and the accuracy ranged from85.21% ~85.58%; PPV (positive predictive value) variedfrom 27.40% ~29.12%, and NPV (Negative predictivevalue) varied from 92.86% ~93.14%. Within the testingsets, the sensitivity ranged from 25.87% ~30.89%, thespecificity ranged from 89.45% ~91.09%, and the accu-racy ranged from 83.75% ~85.37%; PPV varied from21.51% ~26.71% and NPV varied from 92.04% ~92.61%.The classification statistics on the best of the four hub clas-sifiers is shown in Table 3.We have further validated the prediction accuracy of thebest hub classifier in the external MRSA252 data set. Asindicated in Table 4, in comparison to the other proteinprediction methods, the hub classifier has the highest pre-diction statistics, with 30.77% sensitivity, 90.83% specifi-city, 84.96% accuracy, 26.67% PPV and 92.37% NPV. Thenext best hub prediction result was achieved by the hypo-thetical MRSA PIN based on orthologous interactions. Onthe other hand, the results from the predicted PINs ofpathway maps and interacting domains were poor asnone of them had any true positives.In the other comparison, we correlated a ranked list ofproteins based from their 'hub-likeness' (determinedfrom either the hub classifier or the hypothetical PINs) tothat of the experimental MRSA PIN. As shown in Table 5,the hub classifier had a correlation coefficient of 0.32 –highest among all other methods. The next best correla-tion was achieved by the hypothetical PIN of orthologousinteractions.In addition to MRSA252, the hub protein classifier hasachieved comparable prediction results in the C. elegansvalidation data set, with 32.97% sensitivity, 86.84% spe-cificity, 81.70% accuracy, 20.92% PPV and 92.46% NPV,as shown in Table 6.The prediction statistics of the hub classifier on the rand-omized data set are summarized in Table 7. The resultshows that the hub classifier was not able to achieve a sig-nificant hub prediction when the GO terms and proteinhubs were randomly assigned. The prediction onlyreached 11.43% sensitivity and 8.39% PPV in the rand-omized set, compared to 28.10% sensitivity and 22.00%PPV in the testing set before the randomizations. The spe-cificity and NPV were comparable before and after therandomizations, due to the inherited 1:9 ratio betweenthe number of hubs and non-hubs. Therefore, it is easierto make a correct prediction on non-hub proteins thanhub proteins. The comparison of the prediction resultsbetween the testing set and the randomized set indicatesthat hub proteins have a distinct distribution of GO terms,which contributed to the predictability of the hub classi-fier.Overall, the hub classifier built on the Gene Ontologyannotations achieved high specificity and NPV, but hadlower than expected sensitivity and PPV. We attribute thisto the lack of GO annotations for certain proteins in thetraining sets, as the level of annotations varied among thefour species. For instance, S. cerevisiae had the highest per-centage of the proteins with GO annotations (87.8%),while only 48.2% of the proteins in E. coli had any GOannotation. Therefore, the performance of the current hubclassifier primarily relied on the number of GO annota-tions available for each species. We expect the sensitivityvalue of the hub classifier to be improved when moreannotation data become available for the four species inthe training sets.Table 6: Hub prediction result in C. elegans.observed predicted non-hub predicted hub sensitivity specificity accuracy PPV NPVnon-hub 2270 344 32.97% 86.84% 81.70% 20.92% 92.46%hub 185 91The prediction performance of the hub classifier was validated, based on the experimental PIN in C. elegans.Table 7: Hub prediction result in the randomized data set.observed predicted non-hub predicted hub sensitivity specificity accuracy PPV NPVnon-hub 4285 644 11.43% 86.93% 79.78% 8.39% 90.36%hub 457 59Page 10 of 14(page number not for citation purposes)The prediction performance of the hub classifier was tested on the null hypothesis that there is no difference of GO term distribution between hubs and non-hubs.BMC Systems Biology 2008, 2:80 http://www.biomedcentral.com/1752-0509/2/80Page 11 of 14(page number not for citation purposes)Table 8: Top 20 important GO term predictors.GO ID GO name GO Type predictor importanceGO:0005730 nucleolus cellular component 100GO:0003723 RNA binding molecular function 97GO:0005515 protein binding molecular function 96GO:0006412 translation biological process 95GO:0006139 nucleobase, nucleoside, nucleotide and nucleic acid metabolic process biological process 90GO:0006996 organelle organization and biogenesis biological process 89GO:0030246 carbohydrate binding molecular function 87GO:0005840 ribosome cellular component 86GO:0005777 peroxisome cellular component 85GO:0009719 response to endogenous stimulus biological process 82GO:0007049 cell cycle biological process 81GO:0004871 signal transducer activity molecular function 77GO:0005654 nucleoplasm cellular component 77GO:0008219 cell death biological process 75GO:0006118 electron transport biological process 73GO:0006259 DNA metabolic process biological process 73GO:0050789 regulation of biological process biological process 73GO:0006950 response to stress biological process 72GO:0005811 lipid particle cellular component 71GO:0008135 translation factor activity, nucleic acid binding molecular function 70Network coverage of different bait selection strategies in protein complex pull-down experiments, simulated in Saccharomyces cerevisiaeFigure 4Network coverage of different bait selection strategies in protein complex pull-down experiments, simulated in Saccharomyces cerevisiae.BMC Systems Biology 2008, 2:80 http://www.biomedcentral.com/1752-0509/2/80GO term predictor importanceAn indicator of the contribution of each GO term used inthe boosted trees classifiers was provided by the relativeimportance of predictors in the training output. The impor-tance value ranged from 0 to 100, where 100 indicatedthat a predictor had the most influence on the hub predic-tion outcome, and 0 meant a predictor had the least influ-ence. The top 20 GO annotation terms that were likely tobe shared among hub proteins are listed in Table 8.The top GO terms included several annotations such as'RNA binding', 'translation', and 'ribosome', commonlyused to annotate ribosomal proteins, which were oftenidentified as the top interacting proteins in other experi-ments [6,8]. The list of important predictors indicated thathub proteins tend to participate in several common cellu-lar processes, including translation, nucleotide metabo-lism, organelle biogenesis, cell cycle, signal transduction,cell death, and electron transport.Applying hub classifier to protein bait selectionThe bait selection strategy, assisted by the hub classifier,was simulated in the experimental PIN of Saccharomycescerevisiae. In the case of one-round selection, choosingbaits that were predicted as hubs by the classifier hasgreatly increased the network coverage in comparison torandom selection. For instance, as illustrated in Figure 4,when 15% of total proteins were selected as baits based onthe result of the hub classifier, 42.39% of the network cov-erage was achieved. On the other hand, only 26.53% ofthe network coverage was generated by the random baitselection.In the case of the two-round selection, the network cover-age produced by either random or hub bait selection hasshown a great improvement from the one-round selec-tion. The hub bait selection performed slightly better thanrandom in the two-round selection.The results suggest that the hub classifier is a useful toolfor selecting baits and prioritizing proteins for proteininteraction experiments. Although it was not explored inthe present study, we expect that the hub classifier can alsoassist in the identification of highly-interacting proteins inpathogens as potential drug targets.ConclusionWe have studied the available interaction and GeneOntology data for proteins in Escherichia coli, Saccharomy-ces cerevisiae, Drosophila melanogaster and Homo sapiensgenomes. By utilizing the boosting trees classificationmethod, we have shown that highly-connected proteinsin the studied PINs share certain common GO terms; thisThis classifier has improved accuracy for hub predictionrelative to other traditional approaches for protein inter-action prediction. It is anticipated that the hub classifiercan serve as a useful tool to identify highly-interactingproteins in species without any available protein interac-tion data, with potential applications in optimizing pro-tein pull-down experiments and identifying new drugtargets against pathogens.AvailabilityThe source code and executable program of the hub clas-sifier is freely available for download at: http://www.cnbi2.com/hub/Authors' contributionsMH acquired and analyzed protein interaction and GeneOntology data, designed and developed the hub classifi-ers, built the hypothetical PINs, simulated the protein baitselection experiments, and drafted and revised the manu-script. KGB analyzed the statistical models and tools ofboosting trees, and revised the manuscript. AC conceivedand designed the study, and revised the manuscript.AcknowledgementsMH was supported by the Michael Smith Foundation for Health Research (MSFHR) and the Natural Sciences and Engineering Research Council (NSERC). KGB and AC were funded by Genome Canada and Genome BC through the PRoteomics for Emerging PAthogen REsponse (PREPARE) project.References1. Barabasi AL, Oltvai ZN: Network biology: understanding thecell's functional organization.  Nat Rev Genet 2004, 5(2):101-113.2. Albert R: Scale-free networks in cell biology.  J Cell Sci 2005,118(Pt 21):4947-4957.3. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lock-shon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y,Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, John-ston M, Fields S, Rothberg JM: A comprehensive analysis of pro-tein-protein interactions in Saccharomyces cerevisiae.Nature 2000, 403(6770):623-627.4. Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A compre-hensive two-hybrid analysis to explore the yeast proteininteractome.  Proc Natl Acad Sci USA 2001, 98(8):4569-4574.5. Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A,Taylor P, Bennett K, Boutilier K, Yang L, Wolting C, Donaldson I,Schandorff S, Shewnarane J, Vo M, Taggart J, Goudreault M, Muskat B,Alfarano C, Dewar D, Lin Z, Michalickova K, Willems AR, Sassi H,Nielsen PA, Rasmussen KJ, Andersen JR, Johansen LE, Hansen LH, Jes-persen H, Podtelejnikov A, Nielsen E, Crawford J, Poulsen V,Sorensen BD, Matthiesen J, Hendrickson RC, Gleeson F, Pawson T,Moran MF, Durocher D, Mann M, Hogue CW, Figeys D, Tyers M:Systematic identification of protein complexes in Saccharo-myces cerevisiae by mass spectrometry.  Nature 2002,415(6868):180-183.6. Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, RauC, Jensen LJ, Bastuck S, Dumpelfeld B, Edelmann A, Heurtier MA,Hoffman V, Hoefert C, Klein K, Hudak M, Michon AM, Schelder M,Schirle M, Remor M, Rudi T, Hooper S, Bauer A, Bouwmeester T,Casari G, Drewes G, Neubauer G, Rick JM, Kuster B, Bork P, RussellRB, Superti-Furga G: Proteome survey reveals modularity ofthe yeast cell machinery.  Nature 2006, 440(7084):631-636.7. Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, PuPage 12 of 14(page number not for citation purposes)observation enabled the development of a hub classifiercapable of distinguishing hub proteins from non-hubs.S, Datta N, Tikuisis AP, Punna T, Peregrin-Alvarez JM, Shales M,Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A,Beattie B, Richards DP, Canadien V, Lalev A, Mena F, Wong P, Star-BMC Systems Biology 2008, 2:80 http://www.biomedcentral.com/1752-0509/2/80ostine A, Canete MM, Vlasblom J, Wu S, Orsi C, Collins SR, ChandranS, Haw R, Rilstone JJ, Gandi K, Thompson NJ, Musso G, St Onge P,Ghanny S, Lam MH, Butland G, Altaf-Ul AM, Kanaya S, Shilatifard A,O'Shea E, Weissman JS, Ingles CJ, Hughes TR, Parkinson J, GersteinM, Wodak SJ, Emili A, Greenblatt JF: Global landscape of proteincomplexes in the yeast Saccharomyces cerevisiae.  Nature2006, 440(7084):637-643.8. Butland G, Peregrin-Alvarez JM, Li J, Yang W, Yang X, Canadien V,Starostine A, Richards D, Beattie B, Krogan N, Davey M, Parkinson J,Greenblatt J, Emili A: Interaction network containing con-served and essential protein complexes in Escherichia coli.Nature 2005, 433(7025):531-537.9. Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL,Ooi CE, Godwin B, Vitols E, Vijayadamodar G, Pochart P, MachineniH, Welsh M, Kong Y, Zerhusen B, Malcolm R, Varrone Z, Collis A,Minto M, Burgess S, McDaniel L, Stimpson E, Spriggs F, Williams J,Neurath K, Ioime N, Agee M, Voss E, Furtak K, Renzulli R, AanensenN, Carrolla S, Bickelhaupt E, Lazovatsky Y, DaSilva A, Zhong J, Stan-yon CA, Finley RL Jr, White KP, Braverman M, Jarvie T, Gold S, LeachM, Knight J, Shimkets RA, McKenna MP, Chant J, Rothberg JM: A pro-tein interaction map of Drosophila melanogaster.  Science2003, 302(5651):1727-1736.10. Li S, Armstrong CM, Bertin N, Ge H, Milstein S, Boxem M, VidalainPO, Han JD, Chesneau A, Hao T, Goldberg DS, Li N, Martinez M, RualJF, Lamesch P, Xu L, Tewari M, Wong SL, Zhang LV, Berriz GF, Jaco-tot L, Vaglio P, Reboul J, Hirozane-Kishikawa T, Li Q, Gabel HW,Elewa A, Baumgartner B, Rose DJ, Yu H, Bosak S, Sequerra R, FraserA, Mango SE, Saxton WM, Strome S, Heuvel S Van Den, Piano F,Vandenhaute J, Sardet C, Gerstein M, Doucette-Stamm L, GunsalusKC, Harper JW, Cusick ME, Roth FP, Hill DE, Vidal M: A map of theinteractome network of the metazoan C. elegans.  Science2004, 303(5657):540-543.11. Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N,Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N,Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV,Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E,Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY,Smolyar A, Bosak S, Sequerra R, Doucette-Stamm L, Cusick ME, HillDE, Roth FP, Vidal M: Towards a proteome-scale map of thehuman protein-protein interaction network.  Nature 2005,437(7062):1173-1178.12. Stelzl U, Worm U, Lalowski M, Haenig C, Brembeck FH, Goehler H,Stroedicke M, Zenkner M, Schoenherr A, Koeppen S, Timm J, Mint-zlaff S, Abraham C, Bock N, Kietzmann S, Goedde A, Toksoz E,Droege A, Krobitsch S, Korn B, Birchmeier W, Lehrach H, WankerEE: A human protein-protein interaction network: a resourcefor annotating the proteome.  Cell 2005, 122(6):957-968.13. Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S,Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, MargalitH, Armstrong J, Bairoch A, Cesareni G, Sherman D, Apweiler R:IntAct: an open source molecular interaction database.Nucleic Acids Res 2004:D452-455.14. Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D:The Database of Interacting Proteins: 2004 update.  NucleicAcids Res 2004:D449-451.15. Albert R, Jeong H, Barabasi AL: Error and attack tolerance ofcomplex networks.  Nature 2000, 406(6794):378-382.16. Jeong H, Mason SP, Barabasi AL, Oltvai ZN: Lethality and central-ity in protein networks.  Nature 2001, 411(6833):41-42.17. He X, Zhang J: Why do hubs tend to be essential in protein net-works?  PLoS Genet 2006, 2(6):e88.18. Dandekar T, Snel B, Huynen M, Bork P: Conservation of geneorder: a fingerprint of proteins that physically interact.Trends Biochem Sci 1998, 23(9):324-328.19. Overbeek R, Fonstein M, D'Souza M, Pusch GD, Maltsev N: The useof gene clusters to infer functional coupling.  Proc Natl Acad SciUSA 1999, 96(6):2896-2901.20. Marcotte EM, Pellegrini M, Ng HL, Rice DW, Yeates TO, EisenbergD: Detecting protein function and protein-protein interac-tions from genome sequences.  Science 1999,285(5428):751-753.21. Enright AJ, Iliopoulos I, Kyrpides NC, Ouzounis CA: Protein inter-action maps for complete genomes based on gene fusionevents.  Nature 1999, 402(6757):86-90.22. Ge H, Liu Z, Church GM, Vidal M: Correlation between tran-scriptome and interactome mapping data from Saccharo-myces cerevisiae.  Nat Genet 2001, 29(4):482-486.23. Grigoriev A: A relationship between gene expression and pro-tein interactions on the proteome scale: analysis of the bac-teriophage T7 and the yeast Saccharomyces cerevisiae.Nucleic Acids Res 2001, 29(17):3513-3519.24. Jansen R, Greenbaum D, Gerstein M: Relating whole-genomeexpression data with protein-protein interactions.  GenomeRes 2002, 12(1):37-46.25. Pellegrini M, Marcotte EM, Thompson MJ, Eisenberg D, Yeates TO:Assigning protein functions by comparative genome analy-sis: protein phylogenetic profiles.  Proc Natl Acad Sci USA 1999,96(8):4285-4288.26. Matthews LR, Vaglio P, Reboul J, Ge H, Davis BP, Garrels J, Vincent S,Vidal M: Identification of potential interaction networks usingsequence-based searches for conserved protein-proteininteractions or "interologs".  Genome Res 2001,11(12):2120-2126.27. Gomez SM, Rzhetsky A: Towards the prediction of completeprotein – protein interaction networks.  Pac Symp Biocomput2002:413-424.28. Ng SK, Zhang Z, Tan SH: Integrative approach for computa-tionally inferring protein domain interactions.  Bioinformatics2003, 19(8):923-929.29. Obenauer JC, Yaffe MB: Computational prediction of protein-protein interactions.  Methods Mol Biol 2004, 261:445-468.30. Reiss DJ, Schwikowski B: Predicting protein-peptide interac-tions via a network-based motif sampler.  Bioinformatics 2004,20(Suppl 1):I274-282.31. Lu L, Lu H, Skolnick J: MULTIPROSPECTOR: an algorithm forthe prediction of protein-protein interactions by multimericthreading.  Proteins 2002, 49(3):350-364.32. Aloy P, Russell RB: Interrogating protein interaction networksthrough structural biology.  Proc Natl Acad Sci USA 2002,99(9):5896-5901.33. Daraselia N, Yuryev A, Egorov S, Novichkova S, Nikitin A, Mazo I:Extracting human protein interactions from MEDLINE usinga full-sentence parser.  Bioinformatics 2004, 20(5):604-611.34. Hoffmann R, Krallinger M, Andres E, Tamames J, Blaschke C, ValenciaA: Text mining for metabolic pathways, signaling cascades,and protein networks.  Sci STKE 2005, 2005(283):pe21.35. Qi Y, Bar-Joseph Z, Klein-Seetharaman J: Evaluation of differentbiological data and computational classification methods foruse in protein interaction prediction.  Proteins 2006,63(3):490-500.36. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM,Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M,Rubin GM, Sherlock G: Gene ontology: tool for the unificationof biology. The Gene Ontology Consortium.  Nat Genet 2000,25(1):25-29.37. Camon E, Magrane M, Barrell D, Lee V, Dimmer E, Maslen J, Binns D,Harte N, Lopez R, Apweiler R: The Gene Ontology Annotation(GOA) Database: sharing knowledge in Uniprot with GeneOntology.  Nucleic Acids Res 2004:D262-266.38. Rhee SY, Wood V, Dolinski K, Draghici S: Use and misuse of thegene ontology annotations.  Nat Rev Genet 2008, 9(7):509-515.39. PRoteomics for Emerging PAthogen REsponse (PREPARE)[http://www.prepare.med.ubc.ca/]40. Haynes C, Oldfield CJ, Ji F, Klitgord N, Cusick ME, Radivojac P, Uver-sky VN, Vidal M, Iakoucheva LM: Intrinsic disorder is a commonfeature of hub proteins from four eukaryotic interactomes.PLoS Comput Biol 2006, 2(8):e100.41. UniProt batch retrieval system   [http://beta.uniprot.org/?tab=batch]42. Go Slim   [http://www.geneontology.org/GO.slims.shtml]43. map2slim   [http://search.cpan.org/~cmungall/go-perl/scripts/map2slim]44. the Gene Ontology   [http://www.geneontology.org/]45. Hastie T, Tibshirani R, Friedman J: The elements of statisticallearning; data mining, inference, and prediction.  New York:Springer; 2001. 46. STATISTICA   [http://www.statsoft.com/]Page 13 of 14(page number not for citation purposes)47. Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M,Katayama T, Kawashima S, Okuda S, Tokimatsu T, Yamanishi Y:Publish with BioMed Central   and  every scientist can read your work free of charge"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."Sir Paul Nurse, Cancer Research UKYour research papers will be:available free of charge to the entire biomedical communitypeer reviewed and published immediately upon acceptancecited in PubMed and archived on PubMed Central BMC Systems Biology 2008, 2:80 http://www.biomedcentral.com/1752-0509/2/80KEGG for linking genomes to life and the environment.Nucleic Acids Res 2008:D480-484.48. Alfarano C, Andrade CE, Anthony K, Bahroos N, Bajec M, Bantoft K,Betel D, Bobechko B, Boutilier K, Burgess E, Buzadzija K, Cavero R,D'Abreo C, Donaldson I, Dorairajoo D, Dumontier MJ, DumontierMR, Earles V, Farrall R, Feldman H, Garderman E, Gong Y, GonzagaR, Grytsan V, Gryz E, Gu V, Haldorsen E, Halupa A, Haw R, HrvojicA, Hurrell L, Isserlin R, Jack F, Juma F, Khan A, Kon T, Konopinsky S,Le V, Lee E, Ling S, Magidin M, Moniakis J, Montojo J, Moore S, MuskatB, Ng I, Paraiso JP, Parker B, Pintilie G, Pirone R, Salama JJ, Sgro S,Shan T, Shu Y, Siew J, Skinner D, Snyder K, Stasiuk R, Strumpf D,Tuekam B, Tao S, Wang Z, White M, Willis R, Wolting C, Wong S,Wrong A, Xin C, Yao R, Yates B, Zhang S, Zheng K, Pawson T, Ouel-lette BF, Hogue CW: The Biomolecular Interaction NetworkDatabase and related tools 2005 update.  Nucleic Acids Res2005:D418-D424.49. Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences(RefSeq): a curated non-redundant sequence database ofgenomes, transcripts and proteins.  Nucleic Acids Res2007:D61-65.50. Remm M, Storm CE, Sonnhammer EL: Automatic clustering oforthologs and in-paralogs from pairwise species compari-sons.  J Mol Biol 2001, 314(5):1041-1052.51. Finn RD, Mistry J, Schuster-Bockler B, Griffiths-Jones S, Hollich V,Lassmann T, Moxon S, Marshall M, Khanna A, Durbin R, Eddy SR, Son-nhammer EL, Bateman A: Pfam: clans, web tools and services.Nucleic Acids Res 2006:D247-251.52. HMMER   [http://hmmer.janelia.org/]53. Ng SK, Zhang Z, Tan SH, Lin K: InterDom: a database of putativeinteracting protein domains for validating predicted proteininteractions and complexes.  Nucleic Acids Res 2003,31(1):251-254.54. Finn RD, Marshall M, Bateman A: iPfam: visualization of protein-protein interactions in PDB at domain and amino acid reso-lutions.  Bioinformatics 2005, 21(3):410-412.55. Kretzschmar M, van Duynhoven YT, Severijnen AJ: Modeling pre-vention strategies for gonorrhea and Chlamydia using sto-chastic network simulations.  Am J Epidemiol 1996,144(3):306-317.56. Muller J, Schonfisch B, Kirkilionis M: Ring vaccination.  J Math Biol2000, 41(2):143-171.yours — you keep the copyrightSubmit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.aspBioMedcentralPage 14 of 14(page number not for citation purposes)

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.52383.1-0228390/manifest

Comment

Related Items