UBC Faculty Research and Publications

Ensemble-based prediction of RNA secondary structures Aghaeepour, Nima; Hoos, Holger H Apr 24, 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


52383-12859_2012_Article_6014.pdf [ 324.11kB ]
JSON: 52383-1.0215979.json
JSON-LD: 52383-1.0215979-ld.json
RDF/XML (Pretty): 52383-1.0215979-rdf.xml
RDF/JSON: 52383-1.0215979-rdf.json
Turtle: 52383-1.0215979-turtle.txt
N-Triples: 52383-1.0215979-rdf-ntriples.txt
Original Record: 52383-1.0215979-source.json
Full Text

Full Text

Aghaeepour and Hoos BMC Bioinformatics 2013, 14:139http://www.biomedcentral.com/1471-2105/14/139METHODOLOGY ARTICLE Open AccessEnsemble-based prediction of RNA secondarystructuresNima Aghaeepour1 and Holger H Hoos2*AbstractBackground: Accurate structure prediction methods play an important role for the understanding of RNA function.Energy-based, pseudoknot-free secondary structure prediction is one of the most widely used and versatileapproaches, and improved methods for this task have received much attention over the past five years. Despite theimpressive progress that as been achieved in this area, existing evaluations of the prediction accuracy achieved byvarious algorithms do not provide a comprehensive, statistically sound assessment. Furthermore, while there isincreasing evidence that no prediction algorithm consistently outperforms all others, no work has been done toexploit the complementary strengths of multiple approaches.Results: In this work, we present two contributions to the area of RNA secondary structure prediction. Firstly, we usestate-of-the-art, resampling-based statistical methods together with a previously published and increasingly widelyused dataset of high-quality RNA structures to conduct a comprehensive evaluation of existing RNA secondarystructure prediction procedures. The results from this evaluation clarify the performance relationship between tenwell-known existing energy-based pseudoknot-free RNA secondary structure prediction methods and clearlydemonstrate the progress that has been achieved in recent years. Secondly, we introduce AveRNA, a generic andpowerful method for combining a set of existing secondary structure prediction procedures into an ensemble-basedmethod that achieves significantly higher prediction accuracies than obtained from any of its component procedures.Conclusions: Our new, ensemble-based method, AveRNA, improves the state of the art for energy-based,pseudoknot-free RNA secondary structure prediction by exploiting the complementary strengths of multiple existingprediction procedures, as demonstrated using a state-of-the-art statistical resampling approach. In addition, AveRNAallows an intuitive and effective control of the trade-off between false negative and false positive base pairpredictions. Finally, AveRNA can make use of arbitrary sets of secondary structure prediction procedures and cantherefore be used to leverage improvements in prediction accuracy offered by algorithms and energy modelsdeveloped in the future. Our data, MATLAB software and a web-based version of AveRNA are publicly available athttp://www.cs.ubc.ca/labs/beta/Software/AveRNA.BackgroundRNAs are amongst the most versatile and oldestbiomolecules; they play crucial roles in many biologi-cal processes. As in the case of proteins, the functionof many types of RNAs critically depends on the three-dimensional structure of the molecules. However, the 3Dstructure of RNAs is determined to a larger degree by their*Correspondence: hoos@cs.ubc.ca2Department of Computer Science, University of British Columbia, Vancouver,BC, V6T 1Z4, CanadaFull list of author information is available at the end of the articlesecondary structure, which arises from base-pairing inter-actions within an RNA strand and stacking of the resultingbase pairs.Since the direct determination of 3D structures isdifficult and costly, computational structure predictionmethods, and in particular, secondary structure predic-tion methods, are widely used. A prominent and versa-tile approach for predicting RNA secondary structuresis based on thermodynamic models, such as the Turnermodel [1], and uses dynamic programming algorithms(such as the Zuker & Stiegler algorithm [2]), to find astructure with minimum free energy (MFE) for a spe-cific RNA sequence. Over the last five years, considerable© 2013 Aghaeepour and Hoos; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of theCreative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use,distribution, and reproduction in any medium, provided the original work is properly cited.Aghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 2 of 16http://www.biomedcentral.com/1471-2105/14/139improvements in the predictions obtained by such algo-rithms have been achieved.It is important to note that, while it might seem natu-ral to use experiments to determine the parameters of athermodynamic model and machine learning and optimi-sation to determine those of a stochasticmodel, because ofthe equivalence between the free energy and probability ofRNA structures, in principle, both approaches can be usedin either setting. Indeed, the largest improvement in pre-diction accuracy has resulted from the use of sophisticatedmethods for estimating the thermodynamic parameters ofa given energy model (in particular, the Turner model),based on a set of reliable RNA secondary structures [3-5].Particularly good results have been achieved for meth-ods in which parameter estimation additionally takes intoaccount thermodynamic data from optical melting exper-iments, such as CG, LAM-CG and BL, [4,5] and expandthe standard energy model with probabilistic relation-ships between structural features (e.g., hairpin loops ofdifferent lengths), such as BL-FR [5]. Improved predic-tion accuracy has also been reported for an approach thatdetermines structures with maximum expected accuracy(MEA) rather than minimum free energy, based on basepairing probabilities obtained from a partition-functioncalculation [3,6,7]. CONTRAfold implements a condi-tional log-linearmodel (which generalizes upon stochasticcontext-free grammars) for structure prediction. Max-Expect starts from base-pair probabilities calculated bypartition functions [8] and uses dynamic programming(similar to CONTRAfold) to predict the MEA struc-ture [6]. And finally, CentroidFold uses a similar strategyexcept that it uses a weighted some of true positives andtrue negatives as the objective function [7].While the overall improvement in accuracy achievedover the baseline provided by the Zuker & Stiegleralgorithm using the Turner model is clearly significant,there is less certainty about the more modest perfor-mance relationships between some of the more recentmethods. For example, Lu et al. reported a differenceof only 0.3% in average sensitivity between their Max-Expect procedure and CONTRAfold 2.0 [3]. Similarly,Andronescu et al. found a 0.5% difference in averageF-measure between CONTRAfold 2.0 and their CG∗ pro-cedure [5]. Whether such small performance differencescan be considered significant is an open question; in fact,a cross-validation experiment for the BL and LAM-CGparameter estimation methods suggests that 3% differ-ences in accuracy may be statistically significant, butthe evidence is far from conclusive [5]. This suggeststhat there is a need for methods that make it possi-ble to assess the statistical significance of differences inprediction accuracy observed between RNA secondarystructure prediction methods. In this work we presentsuch methods, based on two well-established resamplingtechniques from statistics, bootstrapped confidence inter-vals and permutation tests. Using these methods and awell-studied, large set of trusted RNA secondary struc-tures, we assess progress and the state of the art inenergy-based, pseudoknot-free RNA secondary structureprediction.Also, it has been demonstrated that the accuraciesof predictions based on their BL∗, CG∗ and Turner99parameter sets (see their Supplementary Results C) arenot consistent across large and diverse sets of RNAs, andthat differences in accuracy for many individual RNAsoften deviate markedly from the average accuracy valuesmeasured across the entire set [5]. This suggests that bycombining the predictions obtained from different proce-dures, better results can be achieved than by using anyone of the given procedures in isolation. This general ideahas been previously applied to a wide range of problemsin computing science (where it underlies the fundamentalapproaches of boosting and bagging [9]). More recently,it has been used successfully for solving various problemsfrom computational biology, including gene prediction[10], clustering protein-protein interaction networks [11],as well as analysis of data from microarrays [12] and flowcytometry [13].Here, we introduce a generic RNA secondary structureprediction procedure that, given an RNA sequence, usesan ensemble of existing prediction procedures to obtain aset of structure predictions, which are then combined ona per-base-pair-basis to produce a combined prediction.Empirical analysis demonstrate that this ensemble-basedprediction procedure, which we dub AveRNA, outper-forms the previous state-of-the-art secondary structureprediction procedures on a broad range of RNAs. Onthe S-STRAND2 dataset [14], AveRNA obtained an aver-age F-measure of 71.6%, compared to the previous bestvalue of 70.3% achieved by BL-FR∗ [5]. AveRNA caneasily be extended with new prediction procedures; fur-thermore, it provides an intuitive way of controlling thetrade-off between false positive and false negative predic-tions. This is useful in situations where high sensitivityor high PPV may be required and allows AveRNA toachieve a sensitivity of over 75% and a PPV of over 83% onS-STRAND2.MethodsIn this section, we first describe the data set and pre-diction accuracy measures used in our work. Next, weintroduce the statistical methodology for the empiricalassessment of RNA secondary structure prediction algo-rithms we developed in this work. This is followed by abrief summary of the set of procedures for MFE-basedpseudoknot-free RNA secondary structure prediction weused in this work. Finally, we present AveRNA, our novelRNA secondary structure prediction approach, whichAghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 3 of 16http://www.biomedcentral.com/1471-2105/14/139combines predictions obtained from a diverse given set ofprocedures by means of weighted per-base-pair voting.Data setsIn this work, we used the S-STRAND2 dataset [14],which consists of 2518 pseudoknot-free secondary struc-tures from a wide range of RNA classes, includingribosomal RNAs, transfer RNAs, transfer messengerRNAs, ribonuclease P RNAs, SRP RNAs, hammerheadribozymes and group 1/2 introns [15-20]. This large anddiverse set is comprised of highly accurate structuresand has been used for the evaluation of secondary struc-ture prediction accuracy in the literature [5]. For theparts of our work involving the optimization of pre-diction accuracy, in order to avoid overfitting, we useda subset of the S-STRAND2 dataset obtained by sam-pling 500 structures uniformly at random as the basisfor the optimization process, and the full S-STRAND2dataset for assessing the resulting, optimized predictionprocedures.Existing secondary structure prediction methodsWe used 10 secondary prediction procedures knownfrom the literature. The SimFold-V2.0 procedure [21],which is based on Zuker and Stiegler’s classic dynamicprogramming algorithm, was used to predict secondarystructures using six different sets of free energy param-eters: Turner99 [1]; NOM-CG [4]; DIM-CG [22]; CG∗,BL∗ and BL-FR∗ [5]. Furthermore, we used CON-TRAfold-v1.1, CONTRAfold-v2.0 [3], MaxExpect-v5.1 [6]and CentroidFold-v0.0.9 [7]. The two versions of CON-TRAfold as well as CentroidFold are based on probabilisticmethods that do not make use of physically plausiblethermodynamic models of RNA secondary structure,while the seven other procedures are all based on(variations of) the widely used free energy model by theTurner group [1].While we originally also considered taveRNA [23] andSARNA-Predict [24], it turned out to be infeasible torun these procedures on the the longer sequences fromthe S-STRAND2 dataset (due to runtime and memoryrequirements).Accuracy measuresConsistent with existing work on RNA secondary struc-ture prediction, we assessed the prediction accuracyachieved by a given RNA secondary structure predic-tion procedure based on a given set of references struc-tures, using sensitivity, positive predictive value (PPV)and the F-measure. We define a correctly predicted base-pair to be a predicted base-pair, exactly identical to oneof the base-pairs in the reference structure. For a singleRNA (sequence, structure) pair, sensitivity is the ratio ofnumber of correctly predicted base-pairs to the numberof base-pairs in the reference structure:Sensitivity = #Correctly Predicted Base-Pairs#Base-Pairs in the Reference Structure ;(1)PPV is the ratio of number of correctly predicted base-pairs to the number of base-pairs in the predictedstructure:PPV = #Correctly Predicted Base-Pairs#Base-Pairs in the Predicted Structure ; (2)and the F-measure is defined as the harmonic mean ofsensitivity and PPV:F-measure = 2 × sensitivity × PPVsensitivity + PPV . (3)If there are no base-pairs in the predicted structure andthe reference structure, we define PPV and Sensitivity tobe 1 and otherwise 0. The F-measure, sensitivity, and PPVfor the prediction of any individual structure are always inthe interval [0, 1], where 1 characterizes a perfect predic-tion. When assessing the prediction accuracy on a givenset of structures, we usually report the average F-measure,sensitivity, and PPV achieved over the entire set.Statistical analysis of prediction accuracyTo formally assess the degree to which prediction accu-racy results measured for a given set of RNAs depend onthe precise choice of this set, we employ two well-knownstatistical resampling techniques: bootstrap confidenceintervals and permutation tests (see, e.g., [25]). Details onthe respective procedures developed and used in the con-text of this work are described in the following. Here, weapplied these statistical analysis procedures to the aver-age F-measure determined for predictions on a given setof RNAs, but we note that they generalize in a straight-forward manner to other measures of accuracy and otherstatistics of these over the given benchmark set. We notethat these statistical techniques are limited to assessingthe impact of different samples from the same under-lying distribution – an important issue, considering thelarge variation of prediction accuracy within the sets ofRNAs commonly used for evaluating structure predictionprocedures – but do not allow assessment of predic-tion accuracy might vary as the underlying distribution ischanged (e.g., by modifying the relative representation ofRNAs from different families or of different provenance);to address this latter question, we use a different approachdescribed later.To investigate the consistency of predictions obtainedfrom different RNA secondary structure prediction pro-cedures, we used scatter plots as well as the Spearmancorrelation coefficient (which, unlike the more widelyAghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 4 of 16http://www.biomedcentral.com/1471-2105/14/139used Pearson product moment correlation coefficient,also captures non-linear relationships).Bootstrap percentile confidence intervalsFollowing common practice (see, e.g., [25]), for a vector fof F-measure values achieved by a given prediction proce-dure on the structures contained in a given set S of RNAs(here, S-STRAND2), we perform the following steps todetermine the 95% confidence interval (CI) for the meanF-measure:(1) Repeat for 104 times: from f, draw a uniform randomsample of size |f| with replacement, and calculate theaverage F-measure of the sample.(2) Report the 2.5th and 97.5th percentiles of thedistribution of F-measures from Step 1 as the lowerand upper bounds of the CI, respectively.The choice of 104 samples in Step 1 follows standard prac-tice for bootstrap CIs (as illustrated by the results shownin Figure S2 in the Supporting Information, the resultsobtained using different sample sizes are very similar).Permutation testFollowing common practice (see, e.g., [25]), for vectors fAand fB of F-measure values achieved by given predictionprocedures A and B, respectively, on the structures con-tained in a given set S of RNAs (here, S-STRAND2), weperform the following steps to perform a permutation testfor the null hypothesis that the mean F-measure achievedby A and B is the same:(1) Repeat for 104 times: For each RNA in S, swap theF-measures of A and B with probability 1/2, resultingin vectors f ′A and f ′B, respectively.(2) Construct the cumulative distribution function (CDF)of avg(f ′A) − avg(f ′B) from the 104 permuted pairs ofvectors f ′A, f ′B from Step 1, where avg(·) denotes theaverage over the elements of a given vector.(3) Determine the percentile c of the CDF from Step 2that is equal to avg(fA)−avg(fB), as determined fromthe original, unpermuted performance vectors for Aand B; p = (100 − c)/100 is the p-value of the test.(4) Reject the null hypothesis of equal performance if,and only if, p from Step 3 is smaller than a givensignificance threshold α.The choice of 104 repetitions in Step 1 follows standardpractice for permutation tests. In this work, we used thistest with a standard significance threshold of α = 0.05.AveRNAAs explained earlier, the key idea behind AveRNA is toexploit complementary strengths of a diverse set of predic-tion algorithms by combining their respective secondarystructure predictions for a given RNA sequence. OurAveRNA procedure can make use of an arbritrary set ofprediction procedures, A := {A1,A2, . . . ,Ak}, which ituses in a black-box manner to obtain predictions for agiven input sequence, s. To emphasise the fact that thesubsidiary structure prediction procedures in A are effec-tively just an input to AveRNA that can be varied freely bythe user, we use AveRNA(A) to denote AveRNA with setA of component structure prediction procedures.Applied to input RNA sequence s, AveRNA(A) firstruns each Al ∈ A on s, resulting in predicted structuresS(A1, s), S(A2, s), . . . , S(Ak , s). Let each of these structuresS be represented by a base-pairing matrix BP(S) definedby BP(S)i,j = 1 if i and j form a base-pair in S and BPi,j = 0otherwise, where i, j ∈ {1, 2, . . . , n}. We note that anyRNA secondary structures, pseudo-knotted or not, corre-sponds to exactly one such binary matrix, but not everybinary matrix represents a valid secondary structure.We now consider the normalised sum of these binarymatrices:P =k∑l=1BP(S(Al, s))k . (4)Each entry Bi,j of this matrix can be interpreted as theprobability of a base pair between bases i and j in inputsequence s, under the assumption that the predictionsobtained from each of the Al should be considered equallylikely to be correct. This is equivalent to tallying votes foreach possible base pair, where each predictor has one voteper candidate pair i, j.However, it may well be that some predictors are gener-ally more accurate than others, as is known to be the casefor the set of secondary structure predictors we considerin this work. Therefore, we associate a weight (in the formof a real number between 0 and 1) with each predictor andconsider the weighted normalised sum of the individualsecondary structure matrices:P(w) =k∑l=1wl · BP(S(Al, s)), (5)wherew = (w1,w2, . . . ,wk), eachwl is the weight assignedto predictor l, and∑ki=1 wi = 1. We note that theunweighted case from above corresponds to wl = 1/k foreach l. Before discussing the interesting question of howto determine appropriate weights, we describe in the fol-lowing how we infer the pseudoknot-free RNA structureultimately returned by AveRNA from the entries in theweighted probability matrix P(w).Structure inferenceThe final structure prediction returned by AveRNA(A) fora given sequence can be obtained in different ways. First,Aghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 5 of 16http://www.biomedcentral.com/1471-2105/14/139we note that the problem of extracting a pseudoknot-free structure from the resulting probability matrix can besolved using a Nussinov-style dynamic programming (DP)algorithm to infer maximum expected accuracy (MEA)structures [6]. We refer to the variant of AveRNA thatuses this procedure asAveRNADP. Unfortunately, this DPprocedure requires (n3) running time, which becomesproblematic in the context of the parameter optimisa-tion described later. Therefore, we designed the followinggreedy algorithm as an alternative way for estimatingMEA structures. Let p = (p1, p2, . . .) be the sorted list ofbase-pair probabilities in P(w) in decreasing order andV = (v1, v2, . . .) be the respective set of base-pairs. Fora given threshold θ (a parameter of the procedure whosevalue we discuss later), we begin with an empty set of base-pairs S, set i := 1, and repeat as long as pi ≥ θ : (1) Add vito S if (and only if ) it is compatible with all other pairs inS, i.e., does not involve a base already paired with anotherposition or introduce a pseudoknot in S; (2) incrementi. We refer to the variant of AveRNA using this greedyinference method as AveRNAGreedy.We note that, while the greedy inference method isnot guaranteed to find a MEA structure, as we will showlater, it performs very well compared to the exact DPinference algorithm and is computationally much moreefficient. When either variant of AveRNA is applied to aset of RNA sequences, prediction and structure inferenceare performed for each given RNA sequence indepen-dently, and the results are independent of the compo-sition of the set or the order in which sequences areconsidered.Parameter optimizationClearly, the performance of AveRNA(A) depends on theset A of component prediction procedures as well as onthe previously mentioned parameters, namely the weightswl and, forAveRNAGreedy, the pairing threshold θ . Beforeusing AveRNA(A) for prediction tasks, we would like tofind settings for these parameters that would result inoptimised prediction accuracy obtained on a set of ref-erence RNAs (in terms of mean F-measure over the set).We solved the resulting numerical optimisation problemusing a well-known procedure called covariance matrixadaptation evolution strategy (CMA-ES) [26,27]. CMA-ES is a non-convex, gradient-free parameter optimizationprocedure that has proven to be empirically successfulin many real-world applications and appeared to be themost appropriate tool for finding performance-optimisingparameters of AveRNA. We used the MATLAB imple-mentation of CMA-ES with default settings, except thatwe had to increase the maximum number of iterations to100, since in some cases we observed clear evidence thata global optimum was not reached with the lower defaultsetting for this parameter [28].Time complexityThe running time required to run AveRNA(A) (with afixed set of parameters) is essentially the sum of therunning times of the component prediction proceduresA1, . . . ,Ak (where we note that in principle, these can berun in parallel and the time required for inferring the out-put structure from these results). While for AveRNADP,the latter time is of order (n3), and therefore no worsethan the complexity of most RNA secondary structureprediction methods based on dynamic programming, forAveRNAGreedy, it is O(n2) in the (unrealistic) worst caseand negligible in practice.Parameter optimisation requires substantially morecomputational effort, but has to be performed only once,off-line, very much like optimisation of the parameters ofa given energy model. In the context of AveRNADP , eachiteration of this optimisation process involves running the(n3)DP procedure on all sequences in the given trainingset of RNAs, and as we will demonstrate later, it turns outto be important to use reasonably large and diverse train-ing sets. In our experiments, using a training set of 500sequences, one iteration of CMA-ES on AveRNADP took653 250 seconds (i.e.,more than 750 CPU days for the fulloptimization). Each iteration of optimising AveRNAGreedy,on the other hand, took only 2 880 seconds (i.e., thefull optimization required less than 4 CPU days). Notethat these runtimes do not include the time required bythe individual algorithms for predicting the structures,which are the same for both approaches and need to beexpended only once at the beginning of the optimisationprocess. Once the parameters of AveRNA are optimised,it runs efficiently, as described at the beginning of thissection.Ablation analysisMeasuring the contribution of each algorithm to AveR-NAs performance can help us answer a wide range ofquestions, including the following: Which componentprediction procedure contributes the most to the over-all performance of AveRNA? Is there a certain number ofcomponent prediction procedures that must be includedbefore the ensemble method outperforms the individualones? Are there prediction procedures that can com-pensate for each other, in the sense that including oneprocedure from a certain set is important, but adding oth-ers from the same set does bring significant further gains?For AveRNA(A) with A = {A1 , A2, ...,Ak} we assessedthe contribution of each Al using the following ablationprocedure:(1) Determine the Al ∈ A for which AveRNA(A \ {Al})performs worst1, i.e., whose average F-measure onthe give set of RNAs is lowest.(2) Remove Al from Step 1 from A.Aghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 6 of 16http://www.biomedcentral.com/1471-2105/14/139(3) If A still contains more than two algorithms, go toStep 1 and iterate.Step 1 involves re-optimising the parameters ofAveRNAfor each set of component algorithms, starting from thevalues of AveRNA(A).ResultsIn our computational experiments, we pursued twomajor goals: firstly, to critically assess the state ofthe art in predicting pseudoknot-free MFE RNA sec-ondary structures, and secondly, to demonstrate that ourAveRNA ensemble-based structure prediction methoddoes indeed achieve significantly better results than pre-vious algorithms.Performance of existing prediction methodsTable 1 shows the the mean F-measure value for eachmethod on the S-STRAND2 dataset, along with bootstrapconfidence intervals calculated as explained in the previ-ous section, which are also shown graphically in Figure 1.Table 2 shows the results (p-values) obtained from permu-tation tests for each pair of methods. As can be seen fromthis table, the only statistically insignificant performancedifferences were observed between T99 andCONTRAfold1.1, and between CONTRAfold 2.0 and NOM-CG.Consistent with previous work [5], we found that theoldest algorithm, T99, achieves a mean F-measure justbelow 0.6. CONTRAfold 1.1 performs slightly better thanT99 on our benchmark set, but the performance advan-tage is not statistically significant; we believe that the rea-son for this lies primarily in the fact that it was trained ona small set of RNAs not representative of the broad rangeof structures found in S-STRAND2.MaxExpect and Cen-troidfold do perform significantly better than T99, but fallshort of the performance achieved by CONTRAfold 2.0.The latter method was trained on the S-STRAND2dataset, which partly explains why it, exactly like NOM-CG, achieves an average F-measure that is 0.026 higherthan that of CONTRAfold 1.1.The methods recently developed by Andronescu et al.,DIM-CG,CG∗, BL∗ and BL-FR∗, each achieve significantlybetter performance than any of the previously mentionedmethods; although the confidence intervals obtainedfor these methods show some overlap, the respectivedifferences in mean F-measure are all significant. Thebest of these methods, BL-FR∗, represents an improve-ment of more than 0.1 in average F-measure over T99,and of almost 0.05 over CONTRAfold 2.0.Performance correlationFor an ensemble-based approach like AveRNA to workwell, the set of component prediction algorithms need tohave complementary strengths, as reflected in less-thanperfect correlation of prediction accuracy over sets ofRNA sequences. As can be seen in Table 2, the pairwiseperformance correlation between the procedures we con-sidered in our study is not very strong (as indicated bySpearmann corelation coefficients between 0.66 and 0.86).Figures 2 and 3 illustrate this further by showing the cor-relation in F-measure across our set of RNAs for the twopairs of algorithms whose average performance does notdiffer significantly, T99 and CONTRAfold 1.1, and CON-TRAfold 2.0 and NOM-CG, respectively. (In these scatterplots, each data point corresponds to one RNA from ourS-STRAND2 set.)Performance of AveRNAAfter optimizing the weights on our training set ofRNAs, we found that there was no statistically signif-icant difference between the predictions of AveRNADPTable 1 Prediction accuracy for various prediction algorithmsMean (CI) S-STRAND2 F-measure Mean testset F-measure Mean testset 2 F-measure CitationAveRNA 0.716 (0.707, 0.725) 0.711 (0.701, 0.721) 0.725 (0.713, 0.737) -BL-FR* 0.703 (0.694, 0.712) 0.698 (0.687, 0.708) 0.717 (0.706, 0.729) [5]BL* 0.688 (0.678, 0.698) 0.686 (0.675, 0.696) 0.704 (0.692, 0.715) [5]CG* 0.676 (0.666, 0.685) 0.673 (0.662, 0.684) 0.690 (0.677, 0.702) [4]DIM-CG 0.668 (0.658, 0.678) 0.664 (0.654, 0.674) 0.681 (0.668, 0.695) [5]NOM-CG 0.656 (0.646, 0.667) 0.653 (0.643, 0.663) 0.667 (0.655, 0.680) [5]CONTRAfold2.0 0.656 (0.647, 0.665) 0.650 (0.640, 0.660) 0.657 (0.644, 0.668) [3]CentroidFold 0.643 (0.633, 0.652) 0.638 (0.627, 0.648) 0.643 (0.630, 0.655) [7]MaxExpect 0.625 (0.615, 0.635) 0.619 (0.607, 0.630) 0.633 (0.620, 0.646) [6]CONTRAfold1.1 0.601 (0.591, 0.610) 0.595 (0.584, 0.605) 0.605 (0.592, 0.619) [3]T99 0.597 (0.587, 0.607) 0.591 (0.581, 0.602) 0.606 (0.593, 0.619) [1]F measures and 95% confidence intervals, calculated using bootstrapping, and shown in parentheses.Aghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 7 of 16http://www.biomedcentral.com/1471-2105/14/139|||||||||||0.55 0.60 0.65 0.70 0.75Confidence IntervalT99CONTRAFold1.1MaxExpectCentroidFoldCONTRAFold2.0NOM−CGDIM−CGCG*BL*BL−FR*AveRNAFigure 1 F-measure confidence intervals. 95% Confidence Intervals for the F-measure of different prediction algorithms (red circles) and themean F-measure (black crosses). The red rectangles indicate algorithms with statistically insignificant performance differences, as determined by apermutation test.Table 2 Spearman correlation for pairs of prediction algorithmsAveRNA BL-FR∗ BL∗ CG∗ DIM-CG NOM-CG CONTRAfold2.0 CentroidFold MaxExpect CONTRAfold1.1 T99AveRNABL-FR* 0.942BL* 0.886 0.857CG* 0.814 0.774 0.821DIM-CG 0.828 0.764 0.819 0.897NOM-CG 0.788 0.747 0.801 0.899 0.877CONTRAfold2.0 0.769 0.707 0.716 0.733 0.749 0.722CentroidFold 0.758 0.698 0.714 0.715 0.741 0.715 0.937MaxExpect 0.749 0.689 0.730 0.732 0.769 0.751 0.755 0.759CONTRAfold1.1 0.720 0.660 0.685 0.707 0.733 0.719 0.799 0.818 0.780T99 0.703 0.665 0.691 0.687 0.697 0.728 0.670 0.684 0.749 0.691Spearman correlation coefficients for the F-measure values of of pairs of prediction algorithms over the S-STRAND2 dataset.Aghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 8 of 16http://www.biomedcentral.com/1471-2105/14/139Spearman Correlation: 0.69T99CONTRAFold1.10.0 0.2 0.4 0.6 0.8 2 Scatter plot of F-measures of T99 and CONTRAfold 1.1. Correlation between the F-measure achieved by T99 and CONTRAfold 1.1 on theRNAs from the S-STRAND2 dataset. The mean F-measures of these algorithms are not significantly different, but prediction accuracy on individualRNAs is only weakly correlated.and AveRNAGreedy on the S-STRAND2 set (as determinedusing a permutation test, which yielded a p-value of 0.51).Because of its substantially lower run-time requirements,especially during training, we therefore decided to focuson AveRNAGreedy for the remainder of our study, and werefer to this variant simply as AveRNA.As can be seen in Table 1, AveRNA achieved an averageF-measure of 0.716 on S-STRAND2, compared to 0.703obtained by the best previous method, BL-FR∗.Moreover, even when assessing AveRNA on a testset obtained by excluding the 500 sequences used forparameter optimisation from S-STRAND2, it achievessignificantly higher prediction accuracy than any of itsconstituent algorithms. We note that although this per-formance improvement might appear to be modest, it isabout as much as the difference between BL∗ and BL-FR∗and, according to a permutation test, statistically highlysignificant (see Table 3).To study AveRNA’s performance on sets of RNAs of dif-ferent types and provenance, we optimised the parametersfor AveRNA on subsets of S-STRAND2, from which oneof the 7 classes that make up the RNA STRAND databasehad been excluded, and then tested on the excluded classonly, such that there was not only no overlap betweentraining and test set, but also very little similarity. Thisis a situation where many machine learning techniquesare known to perform quite poorly. The results from thisexperiment, shown in Table 4, indicate clearly that, even inthis very challenging setting, AveRNA performs very well:only on 2 of the 7 classes, AveRNA performs significantlyworse if trained under exclusion of that class, and in thetwo remaining cases, the loss in accuracy was only aboutAghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 9 of 16http://www.biomedcentral.com/1471-2105/14/139Spearman Correlation: 0.72NOM−CGCONTRAFold2.00.0 0.2 0.4 0.6 0.8 3 Scatter plot of F-measures of NOM-CG and CONTRAfold 2.0. Correlation between the F-measure achieved by NOM-CG and CONTRAfold2.0 on the RNAs from the S-STRAND2 dataset. The mean F-measures of these algorithms are not significantly different, but prediction accuracy onindividual RNAs is only weakly correlated.2% (Additional file 1: Table S1 for detailed results from therespective permutation tests).We further note that, as per the results shown in Table 4,prior to AveRNA, the best energy-based prediction algo-rithm varied between RNA classes. On the other hand,AveRNA was found to not perform significantly worsethan the previous best method on any of the 7 classes,and in 2 of them (CRW and RFA - see Additional file 1:Table S1), it performed significantly better. This sug-gests (but of course cannot guarantee) that AveRNA islikely to perform at least as well as other general purposeenergy-based secondary structure prediction algorithmson previously unseen classes of RNAs. Furthermore,we also optimised AveRNA on a small part of each of the7 classes and then evaluated it on the entire class; theresults of this experiment, also shown in Table 4, indicatethat by training a generic version on the broader set ofsequences described earlier gives surprisingly good androbust performance – only for 3 of the 7 classes (ASE, SPR,and SRP) the respective class-specific version of AveRNAperforms significantly better and in one class (PDB) itperforms worst. Table 4 also shows the mean sequencelength for every class of RNAs and provides clear evidencethat AveRNA’s performance relative to its constituentalgorithms does not deteriorate with increasing sequencelength.One interesting property of AveRNA(A) is that thetrade-off between sensitivity and PPV can be easily andintuitively controlled by the threshold θ ∈ [0, 1]: For highθ , only base pairs are predicted for which there is highagreement between the procedures in A, and therefore,we expect relatively few false positive predictions at theAghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 10 of 16http://www.biomedcentral.com/1471-2105/14/139Table 3 Pairwise permutation tests between prediction algorithmsAveRNA BL-FR∗ BL∗ CG∗ DIM-CG NOM-CG CONTRAfold2.0 CentroidFold MaxExpect CONTRAfold1.1 T99AveRNABL-FR* 0BL* 0 0CG* 0 0 0.0001DIM-CG 0 0 0 0.0002NOM-CG 0 0 0 0 0CONTRAfold2.0 0 0 0 0 0.0001 0.4193CentroidFold 0 0 0 0 0 0.0001 0MaxExpect 0 0 0 0 0 0 0 0CONTRAfold1.1 0 0 0 0 0 0 0 0 0T99 0 0 0 0 0 0 0 0 0 0.1317P-values obtained from permutation tests to determine the statistical significance of performance differences (in terms of F-measure over the S-STRAND2 dataset)between prediction algorithms. All p-values larger than a standard significance threshold of 0.05 are bolded, indicating cases where the performance differences areinsignificant.cost of relatively many false negatives, while for low θ ,even base pairs predicted by very few procedures in Atend to be included in the overall prediction, leading torelatively many false positive, but few false negatives.CONTRAfold 1.1, CONTRAfold 2.0, Centroidfoldand MaxExpect also afford control of this trade-off,via the parameter γ ∈ [−5, 6], but in a less intuitivemanner.Figure 4 illustrates the trade-off between sensitivity andPPV for all of these algorithms and shows clearly thatoverall, AveRNA dominates all previous methods, and inparticular, gives much better results than the previousbest algorithm that afforded control over this trade-off,CONTRAfold 2.0. We note that, in all cases, as a proce-dure becomes increasingly more conservative in predict-ing base pairs, eventually, both sensitivity and PPV drop(see Additional file 1: Figure S1); we believe this to be aresult of the high detrimental impact of even a small num-ber of mispredicted base pairs when overall very few pairsare predicted.Ablation analysisThe results of the ablation analysis we conducted to studythe relative impact of the various component predictionprocedures in A on the performance of AveRNA(A) areshown in Table 5. The top 11 rows contain the weightsassigned to each algorithm; cases in which a procedurefrom A was dropped during the optimisation process areindicated by a value of zero. The bottom three rows showthe value of threshold θ and the average performance onthe training and test sets, respectively.It is interesting to note that although BL-FR∗ has aweight of over 40% in the full ensemble, excluding itleads to a rather modest drop of only 0.011 in averageF-measure, and this drop in performance is the highestcaused by removing any single procedure from the full setA. Similarly, the decreases in performance as additionalprocedure are removed, are mostly quite small. This indi-cates that, within the set of prediction procedures we con-sidered here, there is not only sufficient complementarityin the strength of individual procedures to obtain bene-fits from the ensemble-based approach, but also enoughsimilarity in strength between some of the procedures topermit compensating for the removal of one by increasingthe weight of others.As seen in Table 5, up to the point where only one pro-cedure is left in A, the performance of AveRNA (A) isalways higher than that of any of its constituents, indi-cating the efficacy and robustness of our ensemble-basedprediction approach.Training set selectionClearly, AveRNA’s performance depends on the trainingset that is used as a basis for optimising its weight parame-ters. To study the effect of training set size on performance(in terms of mean F-measure), we generated 11 train-ing sets of size 100 and 200, as well as one training setof size 500 and one set of size 1000. We then optimisedAveRNA(A) for each of these sets and measured the per-formance obtained on the full S-STRAND2 test set. Ascan be seen from the results of this experiment shown inthe Table 6, decreasing the training set size from 500 to200 lead to a modest drop in mean F-measure by 0.004,and further decrease to 100 caused a larger drop by 0.007.On the other hand, increasing the size of the training setfrom 500 to 1000 merely resulted in a very small per-formance improvement of less than 0.001. This indicatesthat, while it is important to use a reasonably large anddiverse training set, at least for the set of prediction proce-dures considered here, there is only very limited value inAghaeepourandHoosBMCBioinformatics2013,14:139Page11of16http://www.biomedcentral.com/1471-2105/14/139Table 4 Class-specific prediction accuracy for various prediction algorithmsALL ASE CRW PDB RFA SPR SRP TMRn 2511 386 411 311 257 526 350 269Testset contribution 0.8 0.83 0.79 0.76 0.78 0.78 0.80 0.87Mean sequence length 332 959 75 129 116 77 226 362BL-FR* 0.703 0.606 (0.592, 0.620) 0.613 (0.590, 0.637) 0.900 (0.878, 0.920) 0.674 (0.633, 0.713) 0.780 (0.761, 0.800) 0.734 (0.712, 0.755) 0.589 (0.569, 0.607)BL* 0.688 0.604 (0.589, 0.618) 0.583 (0.561, 0.603) 0.894 (0.871, 0.915) 0.667 (0.627, 0.704) 0.763 (0.742, 0.782) 0.717 (0.693, 0.738) 0.568 (0.550, 0.587)CG* 0.676 0.601 (0.588, 0.615) 0.576 (0.556, 0.597) 0.891 (0.868, 0.911) 0.640 (0.604, 0.675) 0.791 (0.771, 0.809) 0.675 (0.651, 0.698) 0.496 (0.477, 0.515)DIM-CG 0.668 0.605 (0.592, 0.618) 0.559 (0.540, 0.577) 0.885 (0.863, 0.906) 0.661 (0.625, 0.696) 0.785 (0.765, 0.804) 0.655 (0.630, 0.680) 0.470 (0.451, 0.488)NOM-CG 0.656 0.602 (0.588, 0.616) 0.568 (0.547, 0.587) 0.885 (0.862, 0.905) 0.637 (0.603, 0.674) 0.739 (0.719, 0.760) 0.660 (0.635, 0.685) 0.457 (0.438, 0.476)CONTRAfold2.0 0.656 0.651 (0.639, 0.664) 0.550 (0.532, 0.568) 0.869 (0.846, 0.891) 0.607 (0.569, 0.645) 0.746 (0.729, 0.763) 0.609 (0.587, 0.633) 0.509 (0.488, 0.527)CentroidFold 0.643 0.642 (0.630, 0.654) 0.537 (0.517, 0.556) 0.860 (0.833, 0.885) 0.607 (0.568, 0.646) 0.705 (0.683, 0.724) 0.623 (0.600, 0.646) 0.492 (0.473, 0.512)MaxExpect 0.625 0.577 (0.564, 0.589) 0.508 (0.488, 0.527) 0.858 (0.828, 0.883) 0.644 (0.611, 0.680) 0.695 (0.673, 0.715) 0.634 (0.608, 0.659) 0.435 (0.417, 0.452)CONTRAfold1.1 0.601 0.590 (0.578, 0.602) 0.440 (0.421, 0.459) 0.841 (0.817, 0.866) 0.597 (0.565, 0.630) 0.690 (0.669, 0.712) 0.619 (0.594, 0.643) 0.392 (0.374, 0.410)T99 0.597 0.546 (0.531, 0.560) 0.502 (0.481, 0.522) 0.860 (0.833, 0.885) 0.625 (0.594, 0.657) 0.583 (0.563, 0.604) 0.689 (0.666, 0.710) 0.389 (0.371, 0.406)AveRNA 0.716 0.653 (0.641, 0.665) 0.618 (0.600, 0.638) 0.906 (0.884, 0.925) 0.683 (0.645, 0.719) 0.794 (0.776, 0.812) 0.732 (0.707, 0.753) 0.592 (0.575, 0.608)AveRNA-I 0.676 (0.663, 0.687) 0.619 (0.602, 0.639) 0.901 (0.878, 0.922) 0.673 (0.640, 0.707) 0.808 (0.789, 0.825) 0.736 (0.715, 0.757) 0.590 (0.569, 0.608)AveRNA-E 0.650 (0.637, 0.663) 0.617 (0.597, 0.637) 0.907 (0.885, 0.926) 0.683 (0.646, 0.718) 0.794 (0.774, 0.811) 0.710 (0.688, 0.733) 0.573 (0.555, 0.589)F-measure values for different algorithms for different classes in the S-STRAND2 dataset. AveRNA-I has been trained on 20% of the given class sampled uniformly at random, and the overall F-measure for the entire class isreported. AveRNA-E has been trained on 20% of S-STRAND2 excluding the given class, and the F-measure for the given class is reported.Aghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 12 of 16http://www.biomedcentral.com/1471-2105/14/1390.4 0.5 0.6 0.7 0.8 Predictive ValueSensitivityBL*CG*T99CONTRAFold1.1DIM−CGNOM−CGBL−FR*MaxExpectCONTRAFold2.0CentroidFoldAveRNAFigure 4 Sensitivity versus PPV. Sensitivity vs positive predictive value (PPV) for different prediction algorithms; for AveRNA, the points along thecurve were obtained by adjusting the pairing threshold θ , and for CONTRAfold 1.1, CONTRAfold 2.0, Centroidfold andMaxExpect by adjusting theparameter γ .Table 5 Ablation analysis results0 1 2 3 4 5 6 7 8 9BL-FR* 40.8030BL* 3.4339 36.1240CG* 0.5814 28.3500 23.6200DIM-CG 13.3610 2.2809 18.8470 25.2980NOM-CG 0 1.4514 7.5372 19.4300 29.6720CONTRAfold2.0 7.9964 20.2750 24.4660 25.1060 34.6240 48.6310CentroidFold 6.7425 0.0103 16.4620 15.9370 4.8337 11.1500 48.8500MaxExpect 18.0520 0 3.8522 14.2290 18.5270 24.7580 5.6026 24.0080CONTRAfold1.1 1.8412 8.4554 0 0 3.3164 5.2330 16.9320 42.9650 62.8050T99 7.1883 3.0532 5.2156 0 9.0275 10.2280 28.6160 33.0280 37.1950 100Threshold 42.7290 38.8610 35.6670 36.8770 31.2980 34.4810 31.6520 50 50 50F (train) 0.7350 0.7163 0.7106 0.7052 0.7002 0.6889 0.6798 0.6640 0.6271 0.6188F (test) 0.7158 0.7050 0.6948 0.6886 0.6842 0.6718 0.6629 0.6423 0.6011 0.5967Each data column corresponds to one stage of the ablation analysis, with the (optimised) weights of each prediction algorithm included in the ensemble shown in thetop part of the table, followed by the (optimised) pairing threshold and the training and testing performance (in terms of mean F-measure) in the bottom part.Aghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 13 of 16http://www.biomedcentral.com/1471-2105/14/139using sets larger than that of size 500 we used for all otherexperiments.We note that we did not use the training set devel-oped by Andronuescu et al. (2010) in the context ofenergy parameter estimation, primarily because many ofthe prediction procedures we study here have been opti-mised on that set (which could have biased AveRNAto assign higher weights to those algorithms and leadto poor generalization to test data). We also notethat all training sets we considered were obtained byrandom uniform sampling from the full S-STRAND2set.Additionally, in Table 2 we have reported the F-measures of testset2, a new testset which consists of allmembers of S-STRAND2 which have not been used byAveRNA or any of the individual algorithms for trainingpurposes. Permutation tests on this new test set (Table S2)confirm that AveRNA remains significantly more accuratethan the other algorithms.DiscussionTo no small extent, our work presented here was moti-vated by the observation that in many cases, the differ-ences in accuracy achieved by RNA secondary structureprediction methods are quite small on average, but tendto vary very significantly between individual RNAs [5,6].While this is not surprising, it suggests that care shouldbe taken when assessing different prediction methods toensure statistically meaningful results, and that poten-tially, benefits could be derived from combining predic-tions obtained from different methods. The statisticalprocedures we use in this work make it possible toassess statistical significance in a well-established, quan-titative and yet computationally affordable way, and ourAveRNA procedure provides a practical way for realisingthe benefits inherent in a set of complementary predictionmethods.Our results demonstrate that there has, indeed, beensteady progress in the prediction accuracy obtainedfrom energy-based RNA secondary structure predictionTable 6 Impact of training set size on prediction accuracyTraining set size F-measure CI1000 0.7175 (0.7095, 0.7278)500 0.7167 (0.7075, 0.7269)200 0.7131 (0.7108, 0.7140) (0.7041, 0.7236)100 0.7061 (0.7050, 0.7127) (0.6943, 0.7184)Mean F-measure on S-STRAND2, with 95% confidence intervals shown in the lastcolumn. For the bottom two rows, training set size was sampled 11 timesuniformly at random, and the median (20-, 80-percentiles) of the predictionaccuracies from these samples are reported, along with confidence intervals forthe medians.methods. The fact that CONTRAfold 1.1 provides no sta-tistically significant improvement in accuracy over thestandard T99 energy model when both are evaluated onour large and diverse set of reference structures needsto be viewed in light of the fact that CONTRAfold 1.1was trained on a limited set of RNA structures from theRFam database. The fact that CONTRAfold 2.0, whichwas trained on the the same larger and richer set usedby Andronescu et al. [4], performs much better furtherhighlights the importance of the training set used as abasis for empirically optimising the performance of pre-diction methods. It is interesting to observe that theperformance difference between CONTRAfold 2.0 andNOM-CG, which are trained on the same set of referencesstructures, are insignificant, which indicates that bothmethods are equally effective in making use of the infor-mation inherent in this set. However, NOM-CG, thanksto its additional use of thermodynamic data, produces aphysically plausible energy model, while the probabilisticmodel underlying CONTRAfold 2.0 does not producerealistic free energy values.We further interpret the fact that DIM-CG, CG∗, BL∗and BL-FR∗ all perform significantly better than CON-TRAfold 2.0 as evidence that the thermodynamic dataused by the former methods can effectively inform meth-ods for optimising prediction accuracy based on data. Ourstatistical analysis provides further support for the claimthat the computationally more expensive Boltzmann Like-lihood parameter estimation method leads to betterresults than the Constraint Generation method, and thatthe additional use of probabilistic feature relationshipsenables further significant improvements [5].The accuracy results we obtained for the MaxExpectprocedure [6] and for Centroidfold [7] are markedly lowerthan those reported in the respective original studies,mainly because our evaluation is based on a more exten-sive set of reference structures. However, we note that theunderlying approaches of maximizing expected base-pairaccuracy and γ−centroid estimators can in principle beapplied to any prediction method that produces probabil-ity distributions over the secondary structures of a givensequence. We therefore expect that these ideas can even-tually be used in combination with parameter estimationmethods, such as the ones that gave rise to the CG∗, BL∗and BL-FR∗ parameter sets.The results of our correlation analysis revealed thatprediction methods whose accuracy over the entirebenchmark set does not differ much (such as T99 andCONTRAfold 1.1) show large differences in accuracy onmany individual RNAs. Consistent with earlier observa-tions that predictions that are slightly suboptimal accord-ing to a given energy model can sometimes be muchmore accurate (see, e.g., [6]), we conjecture that this is aconsequence of systematic weaknesses (such as the lackAghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 14 of 16http://www.biomedcentral.com/1471-2105/14/139of accounting for interactions between non-neighbouringbases or the use of an overly simplistic energy modelfor multiloops) and inaccuracies (for example, in thermo-dynamic measurements) in the energy models underly-ing these procedures. Particularly when using automatedmethods for optimising the parmaters of a given energymodels, such weaknesses and inaccuracies could easilylead to multiple solutions that show similar performanceon average, but give very different results on many indi-vidual RNAs.This situation, while at the first glance somewhat unsat-isfactory, provides the basis for our AveRNA approach,which obtains more accurate predictions by means ofweighted combination of the predictions obtained froma set of given prediction procedures. While our studyis focussed on the prediction of pseudoknot-free MFEstructures, we note that the weighted sum calculation per-formed by AveRNA on base pairing matrices naturallyextends to methods that produce base pairing probabil-ities and to pseudoknotted prediction methods. In thelatter case, the calculation of the weighted probabilitymatrix P(w) proceeds exactly as in the pseudoknot-freecase, but the procedure used for structure inference wouldhave to be modified to produce pseudoknotted MEAstructures. In the former case, probability matrices areused instead of Boolean matrices, and the result of thecalculation would be normalised to yield a well-formedbase pairing probability matrix. (We note that, in lightof very recent empirical results based on the statisti-cal approach first developed in the context of the workpresented here, it is not clear that MEA structures deter-mined from individual base pairing probability matricesare generally more accurate than MFE structures for thesame energy model [29]; however, it is possible that higheraccuracies can be obtained via ensemble-based MEA pre-dictions from weighted combinations of multiple basepairing matrices.) We pursued neither of these directionshere, because currently, the number of high-accuracy pre-diction procedures for pseudoknotted RNA structures ofbase-pair probabilities is more limited and because thedevelopment and assessment of extensions of AveRNA tothose cases pose challenges that are beyond the scopeof this work, but we strongly believe that these direc-tions are very promising and should be explored further inthe future.We note, however, that AveRNA as presented herealready realises an advantage usually found only inapproaches that produce base pairing probabilities: aneasy and intuitive way for assessing the confidence withwhich certain bases are predicted to pair or remainunpaired, by means of inspecting the entries of the proba-bility matrix P(w). Values close to one indicate base pairsthat are predicted consistently by many of the underlyingprediction procedures, and values close to zero indicatebases that are consistently predicted to be unpaired. Inter-mediate values indicate base pairings for which there ismore disagreement between the given prediction proce-dures. From the fact that by thresholding these values, thesensitivity and specificity (PPV) for predicting base pairscan be increased quite substantially (as seen in Figure 4),we conclude that the set of prediction procedure used byAveRNA in this work is sufficiently diverse to allow forthis interpretation. The threshold parameter θ controlsthe trade-off between sensitivity and PPV in an intuitiveway. It is conceivable that even higher sensitivity and PPVvalues can be obtained by optimising the weight param-eters of AveRNA specifically for that purpose (somethingwe did not attempt in this work).ConclusionsThe ensemble-based RNA secondary structure predic-tion method AveRNA introduced in this work not onlyimproves over existings state-of-the-art energy-basedmethods, but also holds much promise for the future.AveRNA can make use of arbitrary secondary struc-ture prediction procedures; in particular, as demonstratedhere, it can be used to combine both MEA and MFEstructures. We expect that by adding new predictionprocedures to the set used by AveRNA, even betterensemble-based predictions can be obtained. It is con-ceivable that eventually, a prediction procedure becomesavailable that dominates all previous methods, in thesense that it provides predictions as least as accurateas these on all RNAs of interest, and in that case, theensemble-based prediction approach of AveRNA wouldnot realise any additional gains. Based on our assess-ment of existingmethods, and considering the weaknessesand inaccuracies known to exist in all current energymodels, we do not expect this situation to arise in theforeseeable future. The results of our ablation analysisfurther supports the view that further increases in pre-diction accuracy achieved by the ensemble-based predic-tion approach underlying AveRNA are likely to arise asnew prediction procedures become available, since – asseen in Table 5 – that was the case when adding newprocedures to sets of previously known procedures inthe past.In fact, BL-FR∗ was introduced when AveRNA wasunder development and achieved an F-measure of higherthan the version of AveRNA available at that time. Includ-ing BL-FR∗ in AveRNA produced the version of AveRNAstudied here, which – as expected – performs significantlybetter than BL-FR∗. This suggests that AveRNA not onlyrepresents the state of the art in secondary structure pre-diction at the time of this writing, but is likely to remainso, as improved prediction algorithms and energy modelsare developed and added to the generic ensemble-basedapproach.Aghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 15 of 16http://www.biomedcentral.com/1471-2105/14/139It should be noted, however, that in cases where addi-tional information about the specific secondary structureof a particular RNA is available (e.g., in the form ofSHAPE or other footprinting data), prediction methodsthat utilise this information should be expected to achievehigher accuracies (see, e.g., [30]).We see several avenues for future work: Here, wefocused on pseudoknot free structures, but the generalframework (except the dynamic programming) can beapplied to pseudoknotted structures as well once a widerrange of these algorithms are developed. Similarly, ourframework can be applied to algorithms that are ableto calculate base-pair probabilities (e.g., based on parti-tion functions) or to algorithms that are able to predictseveral sub-optimal structures. New algorithms (e.g., non-energy-based methods) or different configurations of theexisting algorithms (using different training strategies)can be included in AveRNA. We showed that the corre-lation between the predictions of different algorithms isnot very strong. These algorithms can be studied to iden-tify their strengths and weaknesses to provide guidanceto the end-users. Alternatively, this information could beused to design an instance-based selection algorithm thatinstead of combining the predictions of all of the algo-rithms, either selects the most suitable algorithm for eachsequence or selects a number of candidates for AveRNA tocombine.Additional fileAdditional file 1: Supplemental Information. A PDF file withsupplementary figures and tables as described in the main text.Competing interestsBoth authors declare that they have no competing interests.Authors’ contributionsHH conceived the original idea. NA and HH designed the methodology,conceived the experiments, interpreted the results, and wrote the manuscript.NA implemented the methodology and performed the experiments. Allauthors read and approved the final manuscript.AcknowledgementsWe thank Anne Condon and Mirela Andronescu for their insightful commentson this work, and Dave Brent for help with setting up the web server for theAveRNA software. This work was supported by a MSFHR/CIHR scholarship toNA, a University of British Columbia’s graduate fellowship to NA, and by anNSERC discovery grant held by HH.Received: 7 May 2012 Accepted: 21 March 2013Published: 24 April 2013References1. Mathews D, Sabina J, Zuker M, Turner D: Expanded sequencedependence of thermodynamic parameters improves prediction ofRNA secondary structure1. J Mol Biol 1999, 288(5):911–940.2. Zuker M, Stiegler P: Optimal computer folding of large RNAsequences using thermodynamics and auxiliary information. NucleicAcids Res 1981, 9:133.3. Do C, Woods D, Batzoglou S: CONTRAfold: RNA secondary structureprediction without physics-based models. Bioinformatics 2006,22(14):e90.4. Andronescu M, Condon A, Hoos H, Mathews D, Murphy K: Efficientparameter estimation for RNA secondary structure prediction.Bioinformatics 2007, 23(13):i19.5. Andronescu M, Condon A, Hoos HH, Mathews DH, Murphy KP:Computational approaches for RNA energy parameter estimation.RNA 2010, 16:2304–2318.6. Lu Z, Gloor J, Mathews D: Improved RNA secondary structureprediction by maximizing expected pair accuracy. RNA 2009,15(10):1805.7. Hamada M, Kiryu H, Sato K, Mituyama T, Asai K: Prediction of RNAsecondary structure using generalized centroid estimators.Bioinformatics 2009, 25(4):465.8. Mathews DH: Using an RNA secondary structure partition function todetermine confidence in base pairs predicted by free energyminimization. Rna 2004, 10(8):1178–1190.9. Quinlan J: Bagging, boosting, and C4. 5. In Proceedings of the 13thNational Conference on Artificial Intelligence AAAI Press, (1996):725–730.10. Rogic S, Ouellette B, Mackworth A: Improving gene recognitionaccuracy by combining predictions from two gene-findingprograms. Bioinformatics 2002, 18(8):1034.11. Asur S, Ucar D, Parthasarathy S: An ensemble framework for clusteringprotein–protein interaction networks. Bioinformatics 2007,23(13):i29.12. Avogadri R, Valentini G: Fuzzy ensemble clustering based on randomprojections for DNAmicroarray data analysis. Artif Intell Med 2009,45(2–3):173–183.13. Aghaeepour N, Finak G, Hoos H, Mosmann TR, Brinkman R, Gottardo R,Scheuermann RH etal: Critical assessment of automated flowcytometry data analysis techniques. Nature Methods 2013,10(3):228–238.14. Andronescu M, Bereg V, Hoos H, Condon A: RNA STRAND: the RNAsecondary structure and statistical analysis database. BMCBioinformatics 2008, 9:340.15. Zwieb C, Gorodkin J, Knudsen B, Burks J, Wower J: tmRDB (tmRNAdatabase). Nucleic Acids Res 2003, 31:446–447.16. Sprinzl M, Vassilenko K: Compilation of tRNA sequences and sequencesof tRNA genes. Nucleic Acids Res 2005, 33(suppl 1):D139–D140.17. Cannone J, Subramanian S, Schnare M, Collett J, D’Souza L, Du Y, Feng B,Lin N, Madabusi L, Müller K etal: The comparative RNA web (CRW) site:an online database of comparative sequence and structureinformation for ribosomal, intron, and other RNAs. BMCBioinformatics 2002, 3:2.18. Brown J: The ribonuclease P database. Nucleic Acids Res 1999,27:314–314.19. Andersen E, Rosenblad M, Larsen N, Westergaard J, Burks J, Wower I,Wower J, Gorodkin J, Samuelsson T, Zwieb C: The tmRDB and SRPDBresources. Nucleic Acids Res 2006, 34(suppl 1):D163–D168.20. Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy S, Bateman A:Rfam: annotating non-coding RNAs in complete genomes. NucleicAcids Res 2005, 33(suppl 1):D121–D124.21. Andronescu M: Algorithms for predicting the secondary structure of pairsand combinatorial sets of nucleic acid strands. Vancouver, British Columbia,Canada. The University of British Columbia 2003.22. Andronescu M: Computational approaches for RNA energy parameterestimation. Vancouver, British Columbia, Canada. The University of BritishColumbia 2008.23. Aksay C, Salari R, Karakoc E, Alkan C, Sahinalp S: taveRNA: a web suite forRNA algorithms and applications. Nucleic Acids Res 2007,35(suppl 2):W325.24. Tsang H, Wiese K: SARNA-Predict: A study of RNA secondary structureprediction using different annealing schedules. In IEEE Symposium onComputational Intelligence and Bioinformatics and Computational Biology(CIBCB’07): IEEE; 2007:239–246.25. Hesterberg T, Moore D, Monaghan S, Clipson A, Epstein R: Bootstrapmethods and permutation tests. Introduction Pract Stat 2005,47(4):1–70.26. Hansen N, Ostermeier A: Completely derandomized self-adaptation inevolution strategies. Evol Comput 2001, 9(2):159–195.Aghaeepour and Hoos BMC Bioinformatics 2013, 14:139 Page 16 of 16http://www.biomedcentral.com/1471-2105/14/13927. Hansen N, Auger A, Ros R, Finck S, Pošík P: Comparing results of 31algorithms from the black-box optimization benchmarkingBBOB-2009. In Proceedings of the 12th Annual Conference Comp onGenetic and Evolutionary Computation: ACM; 2010:1689–1696.28. Hansen N, Kern S: Evaluating the CMA evolution strategy onmultimodal test functions. In Parallel Problem Solving fromNature-PPSNVIII: Springer; 2004:282–291.29. Hajiaghayi M, Condon A, Hoos H: Analysis of energy-based algorithmsfor RNA secondary structure prediction. BMC Bioinformatics 2012,13:22.30. Deigan K, Li T, Mathews D, Weeks K: Accurate SHAPE-directed RNAstructure determination. Proc Natl Acad Sci 2009, 106:97.doi:10.1186/1471-2105-14-139Cite this article as: Aghaeepour and Hoos: Ensemble-based prediction ofRNA secondary structures. BMC Bioinformatics 2013 14:139.Submit your next manuscript to BioMed Centraland take full advantage of: • Convenient online submission• Thorough peer review• No space constraints or color figure charges• Immediate publication on acceptance• Inclusion in PubMed, CAS, Scopus and Google Scholar• Research which is freely available for redistributionSubmit your manuscript at www.biomedcentral.com/submit


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items