Jabbari and Condon BMC Bioinformatics 2014, 15:147http://www.biomedcentral.com/1471-2105/15/147RESEARCH ARTICLE Open AccessA fast and robust iterative algorithm forprediction of RNA pseudoknotted secondarystructuresHosna Jabbari* and Anne CondonAbstractBackground: Improving accuracy and efficiency of computational methods that predict pseudoknotted RNAsecondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be veryslow and are limited in the types of pseudoknots that they can predict. Incorporating known structural informationcan improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structuresthat can incorporate structural information as input. There is even less understanding of the relative robustness ofthese methods with respect to partial information.Results: We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. IterativeHFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy isat least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFoldleverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on thehierarchical folding hypothesis, and the energy parameters of HotKnots V2.0.Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partialinformation, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as theuser provides up to 40% of the input structure.Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also hassignificantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets.Conclusions: Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whoseaccuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and withabout 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while beingsignificantly faster. Iterative HFold and all data used in this work are freely available at http://www.cs.ubc.ca/~hjabbari/software.php.Keywords: RNA, Secondary structure prediction, Pseudoknot, Hierarchical folding, Minimum free energyBackgroundRNA molecules are crucial in different levels of cellularfunction, ranging from translation and regulation of genesto coding for proteins [1-6]. Understanding the structureof an RNA molecule is important in inferring its function[7-10]. Since experimental methods for determining RNAstructure, such as X-ray, crystallography and NMR, aretime consuming, expensive and in some cases infeasible,*Correspondence: hjabbari@cs.ubc.caDepartment of Computer Science, University of British Columbia, 2366 MainMall, Vancouver, Canadacomputational methods for prediction of RNA structureare valuable.Currently computational RNA structure predictionmethods mainly focus on predicting RNA secondarystructure—the set of base pairs that form when RNAmolecules fold. When multiple homologous (evolutionar-ily related) RNA sequences are available, the secondarystructure of the sequences can be predicted using multi-ple sequence alignment and comparative sequence anal-ysis [11-22]. Alternative approaches, which can be usedto predict secondary structure of a single sequence, are© 2014 Jabbari and Condon; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of theCreative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use,distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons PublicDomain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in thisarticle, unless otherwise stated.Jabbari and Condon BMCBioinformatics 2014, 15:147 Page 2 of 17http://www.biomedcentral.com/1471-2105/15/147based on thermodynamic parameters derived in part fromexperimental data [23]. While thermodynamics-basedapproaches can be less accurate than comparative-basedalgorithms, thermodynamics-based approaches are appli-cable in cases of novel RNAs such as the many RNAsof unknown function recently reported by the ENCODEconsortium [24]. Thermodynamics-based approaches canalso be easier to apply to prediction of the structure ofinteracting RNA molecules, for example, in gene knock-down studies.Many computational thermodynamics-based methodsfind the structures with the minimum free energy (MFE)from the set of all possible structures, when each structurefeature is assigned a free energy value and the energy ofa structure is calculated as the sum of the features’ ener-gies. There has been significant success in prediction ofpseudoknot-free secondary structures (structures with nocrossing base pairs) [23,25,26]. While many small RNAsecondary structures are pseudoknot-free, many biologi-cally important RNA molecules, both in the cell [27,28],and in viral RNA [29] are found to be pseudoknotted.Since finding the MFE pseudoknotted secondary struc-ture is NP-hard [30-32], polynomial time MFE-basedmethods for prediction of pseudoknotted secondarystructures predict a restricted class of pseudoknottedstructures [33-35]. These methods trade off run-timecomplexity and the generality of the class of structuresthey can predict. For example, the most general algorithmof Rivas and Eddy [33], whose running time is (n6) oninputs of length n, is not practical for RNA sequencesof length more than 100 nucleotides. This has been themain reason for development of heuristic methods forprediction of pseudoknotted structures [36-41]. Althoughheuristic methods may not find the MFE structure, theyusually run faster than the MFE-based methods that han-dle the same class of structures. For example, HotKnotsV2.0 [36,41] is a heuristic approach that uses carefullytrained energy parameters, is guided by energy minimiza-tion and can handle kissing hairpin structures. However,HotKnots is still slow on long sequences.Other methods for prediction of pseudoknotted struc-tures, such as the IPknot method of Sato et al. [42], aremotivated by the finding of Mathews [43] that base pairswith high base pairing probabilities in the thermodynamicensemble aremore likely to be in the known structure. In acomprehensive comparison performed by Puton et al. [44]on the performance of publicly available non-comparativeRNA secondary structure prediction methods that canhandle pseudoknotted structures, IPknot ranks first forgeneral length RNA sequences.Incorporating known structural information canimprove the accuracy of structure prediction. For exam-ple, Mathews et al. [45] used SHAPE reactivity data toimprove the prediction accuracy from 26.3% to 86.8% for5S rRNA of E. coli. Roughly, the larger the SHAPE reac-tivity value for a given nucleotide, the more likely it is thatthe nucleotide is unpaired in the structure. However, lim-ited SHAPE reactivity data is available, and the data doesnot unambiguously determine whether a base is paired ornot or, if it is paired, to what other nucleotide. Deigan et al.[46] created pseudo energy terms from SHAPE reactivitydata, as a means of integrating such data into predictionsoftware. They reported prediction accuracy of 96% to100% for three moderate-sized RNAs (< 200 nucleotides)and for 16S rRNA ( 1500 nucleotides). ShapeKnots [47]is a new method for incorporating SHAPE reactivitydata for pseudoknotted structures that incorporates thepseudo energy terms into a heuristic method similar tothat of Ren et al. [41].We previously presented HFold [48], an approach forprediction of pseudoknotted structures, motivated by twogoals, namely to avoid the high running time complexityof other methods for pseudoknotted secondary structureprediction and to leverage the hierarchical folding hypoth-esis. This hypothesis posits that an RNA molecule firstfolds into a pseudoknot-free structure; then additionalbase pairs are added that may form pseudoknots with thefirst structure so as to lower the structure’s free energy[49]. Given a pseudoknot-free structure as input, HFoldpredicts a possibly pseudoknotted structure from a broadclass that contains the given input structure and, relativeto that constraint, has minimum free energy. HFold’s run-ning time isO(n3), significantly faster than other methodsfor predicting pseudoknotted structures. Several expertshave provided evidence for, and support, the hierarchi-cal folding hypothesis [49-52]. The class of structures thatHFold can handle, density-2 structures, is quite generaland includes many important pseudoknots including H-type pseudoknots, kissing hairpins and infinite chains ofinterleaved bands, with arbitrary nested (pseudoknotted)substructures. (Roughly, a structure is density-2 if no baseis enclosed by more than two overlapping pseudoknottedstems.)Another advantage of HFold over heuristic methodssuch as HotKnots or ShapeKnots is that unlike thesemethods, HFold minimizes the free energy of the possiblypseudoknotted output structure relative to the given inputstructure. Therefore HFold’s method of adding pseudo-knotted stems is better motivated energetically than thatof HotKnots or ShapeKnots.While HFold is fast, our earlier implementation ofHFold had its own shortcomings. First, due to a high pseu-doknot initiation penalty in its underlying energy model,many of its predicted structures did not have pseudo-knots. Also low band penalty (i.e., penalty for additionof pseudoknotted stems or bands) in its energy modelencouraged addition of pseudoknotted stems when apseudoknot was predicted. Second, if the first structureJabbari and Condon BMC Bioinformatics 2014, 15:147 Page 3 of 17http://www.biomedcentral.com/1471-2105/15/147input to HFold contains base pairs that are not in the truepseudoknot-free structure for the given RNA sequenceor is not the complete pseudoknot-free structure (i.e., itdoes not include all the base pairs in the pseudoknot-freestructure), HFold is often unable to predict the knownpseudoknotted structure as output.To summarize, existing methods for prediction of pseu-doknotted structures suffer from one or both of the fol-lowing shortcomings: 1) slow running time, or 2) poorprediction accuracy. Moreover there is limited oppor-tunity for the user to provide structural information,or constraints, that can guide prediction. In cases of aprediction method that incorporates user-defined con-straints, it is also useful to understand the degree to whichthe method’s accuracy persists as the input informationdegrades. We use the term robustness with respect to par-tial information or robustness to refer to this property ofa method. (We note that in our definition of robustnesswe do not mean robust with respect to noise.) To thebest of our knowledge, the concept of robustness in sec-ondary structure predictionmethods has not been studiedbefore.In this work we present a new method that addressesthese shortcomings. Our method, Iterative HFold, takes apseudoknot-free input structure and produces a possiblypseudoknotted structure whose energy is at least as lowas that of any (density-2) pseudoknotted structure con-taining the input structure. Iterative HFold incorporatesfour different methods and reports as its final structurethe structure with the lowest energy, among all structuresproduced by these methods. While one of its underlyingmethods, HFold, strictly adheres to the hierarchical fold-ing hypothesis, the other three use iterations to extend orremove the base pairs of input structure, with the goal offinding a structure that has lower energy than the struc-ture found by HFold. Thus, unlike HFold, iterative HFoldis able to modify the input structure (while the class ofstructures handled by both methods is the same). Thisis valuable since 1) computationally produced structuresmay not be completely accurate and 2) while the hierarchi-cal folding hypothesis is a useful guiding principle, there isevidence that allowing for disruption of some base pairs inthe initially formed pseudoknot-free secondary structurecan improve prediction [53,54].All of Iterative HFold’s underlying methods use theenergy model of HotKnots V2.0 DP09 [36]; with thismodel, HFold obtained predictions with higher accuracythan those obtained with our earlier implementation ofHFold. One of Iterative HFold’s underlying methods isHFold-PKonly, which given the input structure only addspseudoknotted base pairs. HFold-PKonly is especially use-ful for cases when the user has either complete informa-tion about the true pseudoknot-free structure or wants tocheck whether a single stem of the input structure canbe part of a pseudoknot since, if the input structure onlyhas the specific stem in question, the output structure ofHFold-PKonly will determine if the given stem can be partof a pseudoknot.Based on our experiments on our HK-PK and HK-PK-free data sets that include 88 pseudoknotted structures,and 337 pseudoknot-free structures respectively, rangingin length from 10 to 400 nucleotides, a single run of Iter-ative HFold does not take more than 9 seconds time and62 MB of memory. In contrast, one of the best heuristicmethods, HotKnots V2.0, takes 1.7 hours and 91 GB ofmemory for a sequence with 400 nucleotides. Thereforeour method is practical for prediction of long RNA struc-tures. Iterative HFold bootstrap 95% percentile confidenceinterval for average accuracy of pseudoknotted structuresof the HK-PK data set is significantly higher than thatof IPknot, ((72.83%, 83.37%) vs. (54.56%, 66.25%)) andis comparable to that of HotKnots V2.0, (vs. (73.60%,83.35%)) two of the best prediction methods available.Iterative HFold’s accuracy is significantly higher than thatof IPknot andHotKnots on our IP-pk168 data set. IterativeHFold also has higher accuracy thanHFold evenwhen justpartial information about the true pseudoknot-free struc-ture is provided, so it is more robust than HFold. Specifi-cally, Iterative HFold’s average accuracy on pseudoknottedstructures steadily increases from roughly 54% to 79% asthe user provides up to 40% of the input structure, andimproves with a more modest but still positive improve-ment in accuracy when further structural information isprovided.MethodsWe represent an RNA molecule by a sequence, S, of itsfour bases, Adenine (A), Cytosine (C), Guanine (G) andUracil (U). We denote the length of the RNA molecule byn and refer to each base by its index i, 1 ≤ i ≤ n.When an RNAmolecule folds, bondsmay form betweencanonical pairs of bases (A-U, C-G, and G-U) (seeFigure 1). Throughout this work, we consider only caseswhere each base may pair at most with one other base, andrepresent base pairing between i and j by i.j. We definea secondary structure, R, as a set of pairs i.j, 1 ≤ i <j ≤ n; i.j and k.j can belong to the same set if and only ifi = k.If i.j and k.l are two base pairs of a secondary structure,R, and 1 ≤ i < k < j < l ≤ n, we say i.j crosses k.l.We refer to a secondary structure with crossing base pairsas a pseudoknotted secondary structure and a secondarystructure with no crossing base pairs as a pseudoknot-freesecondary structure (see Figure 1). Figure 1 shows differentkinds of loops in a secondary structure. We refer the read-ers to Jabbari et al. [48] or Rastegari et al. [56] for precisedefinition and illustration of terms used in the figure.Jabbari and Condon BMCBioinformatics 2014, 15:147 Page 4 of 17http://www.biomedcentral.com/1471-2105/15/147Kissing Hairpin (pseudoknot)CGGGCUCUAGUGCCG CUCGACUAGAGCCCUG U A A C A G U UAUGGACCACG AGCAGUCCAUG110304060615020GInternal LoopStackMulti LoopHairpin LoopGGGUAGCUCAAAAGG UAGAGUGCAGGCCAAC AUAGCCAGCAGAUCUCGG UUCAAACCCGAGCCCUGACC11020304050607073GUCGFigure 1 Pseudoknotted and pseudoknot-free secondary structures. Examples of loops and canonical base pairs in a pseudoknotted and apseudoknot-free secondary structure. The blue base pairs belong to the Gbig structure and the green base pairs belong to the Gsmall structure, asdefined in Section ‘Definition of Gbig and Gsmall ’. This figure was produced using the VARNA software [55].Energy modelMany computational methods for predicting the sec-ondary structure of an RNA (or DNA) molecule are basedon models of the free energy of loops [23,25,26,33-36,48].Table 1 summarizes the energy constants and functionsused in our energy model for pseudoknotted structures.The values of these energy parameters are those of theDP09 parameter set of Andronescu et al. [36], used by theHotKnots V2.0 prediction software.Data setsWeuse three data sets to analyze performance of our algo-rithms. Our first data set is the test data set of Andronescuet al. [36], that contains 446 distinct RNA sequences andtheir reference structures, of which 348 are pseudoknot-free and 98 are pseudoknotted. This set has four struc-tures that are not in the class of structures our methodscan handle (i.e., have densities higher than 2 [48]). Sincethe number of such structures is too small to be useful inan experimental analysis, we removed them from our setof pseudoknotted structures, resulting in a set of size 442.There are eight cases in this data set for whichthe original sequence and structure were shortened toaccommodate restrictions in length. We removed themfrom our data set, resulting in a set of size 425. Fromnow on we use “HK-PK” to refer to the pseudoknottedstructures in this set (with 88 structures) and “HK-PK-free” to refer to the pseudoknot-free structures in thisset (with 337 structures). RNA sequences in HK-PK andHK-PK-free have length between 10 and 400 nucleotides.Our second data set is the pk168 data set of Sato et al.[42]. This set contains 168 pseudoknotted structures from16 categories of pseudoknots. The sequences in this sethave at most 85% similarity and have length of at most 140nucleotides. We refer to this data set as “IP-pk168”.Our third data set is the test data set of Sperschneideret al. [57]. This set contains 16 pseudoknotted structureswith strong experimental support. RNA sequences in thisset have length between 34 and 363 nucleotides. We referto this data set as “DK-pk16”.Definition of Gbig and GsmallTo test the robustness of our methods on a given RNAsequence, we need to provide partial information aboutthe true pseudoknot-free structure as input structurefor that sequence. To obtain the true pseudoknot-freeJabbari and Condon BMC Bioinformatics 2014, 15:147 Page 5 of 17http://www.biomedcentral.com/1471-2105/15/147Table 1 Energy parametersName Description Value (Kcal/mol)Ps Exterior pseudoloop −1.38initiation penaltyPsm Penalty for introducing pseudoknot 10.07inside a multiloopPsp Penalty for introducing pseudoknot 15.00inside a pseudoloopPb Band penalty 2.46Pup Penalty for unpaired base 0.06in a pseudoloopPps Penalty for closed subregion 0.96inside a pseudoloopeH(i, j) Energy of a hairpin loop closed by i.jeS(i, j) Energy of stacked pair closed by i.jestP(i, j) Energy of stacked pair that 0.89 × eS(i, j)spans a bandeint(i, r, r′, j) Energy of a pseudoknot-freeinternal loopeintP(i, r, r′, j) Energy of internal loop 0.74× eint(i, r, r′, j)that spans a banda Multiloop initiation penalty 3.39b Multiloop base pair penalty 0.03c Penalty for unpaired base 0.02in a multiloopa′ Penalty for introducing a multiloop 3.41that spans a bandb′ Base pair penalty for a multiloop 0.56that spans a bandc′ Penalty for unpaired base in a multiloop 0.12that spans a bandThis table provides the names, description and values of the energy parametersand functions that we used in our methods. The names and definitions are thesame as in our original HFold [48], and the values were updated based on thework of Andronescu et al. [36]. These parameters were derived for a temperatureof 37°C and 1 M salt concentration.structure, Gbig , we remove the minimum number ofpseudoknotted base pairs from the reference struc-ture to make the reference structure pseudoknot-free. Ifthe reference structure is pseudoknot-free, then Gbig isthe same as the reference structure itself. We call theremoved base pairs from the reference structure Gsmall.Blue base pairs in Figure 1 represent base pairs of theGbig structure and green base pairs represent the Gsmallstructure.Accuracy measuresFollowing common practice [36,58], wemeasure the accu-racy of a predicted RNA secondary structure relative to areference secondary structure by F-measure, which is theharmonic mean of sensitivity and positive predictive value(PPV ). We define these values as follows:Sensitivity = Number of correctly predicted base pairsNumber of base pairs in the reference structurePPV = Number of correctly predicted base pairsNumber of predicted base pairsandF-measure = 2 × sensitivity × PPVsensitivity + PPVWe also define these values as 0 when their denomina-tors are 0. When a prediction agrees with the referencestructure, the value of F-measure is equal to 1 (so are thevalues of sensitivity and PPV). When the values of sensi-tivity or PPV is equal to 0, the predicted structure doesnot have any base pairs in common with the referencestructure.Bootstrap percentile confidence intervalsTo formally assess the dependency of measured predic-tion accuracy of results of a method on a given set of RNAwe use bootstrap confidence intervals, a well-known sta-tistical resampling technique [59,60]. Following the recentwork of Aghaeepour and Hoos [61] and Hajiaghayi etal. [58] we calculate the bootstrap 95% percentile confi-dence interval of average F-measure as follows. For eachvector f of F-measures (where, for example, f may bethe F-measures of predictions obtained by Iterative HFoldon pseudoknotted structures) we first take 104 resam-ples with replacement, where the resamples have the samelength as the original sample vector f (|f |), and then cal-culate their average F-measures. These 104 calculatedaverage F-measures represent the bootstrap distributionfor the vector f . We then report the 2.5th and 97.5th per-centile of this distribution (i.e., the bootstrap distributionof the 104 average F-measures calculated above) as thelower and upper bounds of the confidence interval respec-tively, and call it the bootstrap 95% percentile confidenceinterval. By reporting the bootstrap 95% percentile confi-dence interval for average F-measure of a method, A, ona data set, D, we say that we are 95% confident that theaverage F-measure of method A on data set D is in thereported interval. All calculations are performed using the“boot” package of the R statistics software environment[62].Permutation testFollowing the recent work of Hajiaghayi et al. [58], we usea two sided permutation test to assess the statistical signif-icance of the observed performance differences betweentwomethods. The test proceeds as follows, given a data setand two structure prediction procedures, A and B. First,we calculate the differencemean(fA)−mean(fB) in meansJabbari and Condon BMCBioinformatics 2014, 15:147 Page 6 of 17http://www.biomedcentral.com/1471-2105/15/147between sets of F-measure values obtained by A and B.Then we combine the two sets fA and fB and record thedifference in sample means for 104 randomly chosen waysof choosing two sets with the same size as |fA| and |fB|from the combined set. The p-value is the proportion ofthe sampled permutations where the absolute differencewas greater than or equal to that of absolute difference ofthe means of sets fA and fB. Then, if the p-value of thistest is less than the 5% significance level, we reject the nullhypothesis that methods A and B have equal accuracy andthus accept the alternative hypothesis that the differencein accuracy of method A and B is significant. Otherwise,we cannot reject the null hypothesis. All calculations areperformed using the “perm” package of the R statisticssoftware environment.Iterative HFoldWe provide a high level description of our Iterative HFoldalgorithm.Pseudocode of our Iterative HFold algorithm is availablein Additional file 1. The algorithm builds on two simplermethods, the first being our original HFold algorithm [48]:HFold: Given an RNA sequence, S, and a pseudoknot-free input structure, G, find a pseudoknot-free structure,G′ such thatG∪G′ is the lowest energy structure that con-tains G. We note thatG∪G′ might not be pseudoknotted.The second method on which Iterative HFold builds,called HFold-PKonly, is similar to HFold except that G′may only contain base pairs that cross base pairs in G.The prediction provided by HFold-PKonly can be usefulin cases where HFold does not produce a pseudoknottedstructure.HFold-PKonly: Given an RNA sequence, S, and apseudoknot-free input structure, G, find a pseudoknot-free structure, G′ such that every base pair in G′ crossessome base pair of G and such that G ∪ G′ is the lowestenergy structure that contains G among all suchG′s. Notethat G′ may contain no base pairs.Iterative HFold also uses the SimFold RNA secondarystructure predictionmethod [63], which predicts themin-imum free energy pseudoknot-free secondary structurefor a given RNA sequence. SimFold uses a dynamic pro-gramming method similar to Zuker’s MFold method [64].In this work we used the HotKnots energy parameterswhen running SimFold. In addition to an RNA sequence,S, SimFold can also take a pseudoknot-free secondarystructure, G, as input and predict the MFE pseudoknot-free secondary structure that contains all base pairs of G.Iterative HFold is distinguished from the above threemethods, namely HFold, HFold-PKonly and SimFold, intwo important ways. First, the output of HFold, HFold-PKonly and SimFold methods must contain the givenpseudoknot-free input structure, G, whereas IterativeHFold may modify the input structure. This can be use-ful when the given input structure is not a high-accuracyestimate of Gbig , the true pseudoknot-free substructure ofthe reference structure. Second, while HFold and HFold-PKonly can add base pairs that cross those in G, theycannot add base pairs that cross each other, and nei-ther can SimFold. In contrast, Iterative HFold can addbase pairs that cross each other. This is particularly use-ful when the input structure contains limited informa-tion about Gbig , and so it is necessary both to predictbase pairs in Gbig and in Gsmall in order to get a goodprediction.Iterative HFold is comprised of four different itera-tive methods. Following the description of each method,we motivate why we chose to include it as part of ouroverall algorithm. Iterative HFold takes as input bothan RNA sequence, S and a pseudoknot-free secondarystructure, G; later we show that structure G can be pro-duced by computational methods, for example, HotKnotshotspots or SimFold suboptimal structures, when only thesequence S is initially available.Iterative HFold: Given an RNA sequence, S, and a pseu-doknot free input structure, G, run the following fourmethods and pick the structure with the lowest freeenergy among these four as the output structure.Iterative HFold runs in O(n3) time, as it runs fourmethods sequentially, when each one is O(n3).Method 1: Run HFold on S and G, and store theresulting G ∪ G′.Motivation: This is the core HFoldmethod, motivated by the hierarchicalfolding hypothesis.Method 2: First run HFold-PKonly on S and G. IfHFold-PKonly results in a structure G ∪ G′such that G′ is not the empty structure,then run HFold with sequence S andstructureG′, and store the result.Otherwise, simply store G as the result. Seethe following example. (We note thatrunning HFold with S and G′ results in astructureG′ ∪ G′′, where it may be the casethat G′′ = G (i.e., Gmay not be part of theresult of method 2).)Motivation: When input structureG doesnot agree with the reference Gbig structure,it may still be the case that HFold-PKonlyfinds the pseudoknotted structureGsmall(or a good approximation to Gsmall). A callto HFold with input Gsmall may then find abetter approximation to Gbig . In thisJabbari and Condon BMC Bioinformatics 2014, 15:147 Page 7 of 17http://www.biomedcentral.com/1471-2105/15/147Example 1: Example of results ofmethod 1 and method 2 of Iterative HFold.S = GGGCUCUGGAGCCCCCCCCGAGCCCCGGCUAACUCUAUCUGUAGGGGGGCG = ...........(((((...........................)))))..Method 1: HFold on S and G[[[[[[.[[..(((((]].]]]]]]..................))))).. -12.75Method 2:HFold Pkonly on S and G[[[[[[.[[..(((((]].]]]]]]..................)))))..G1[[[[[[.[[.......]].]]]]]].........................HFold on S and G1[[[[[[.[[.((((((]].]]]]]]...................)))))) -14.67example, method 2 of Iterative HFoldoutperforms method 1: although bothHFold and HFold PKonly produce the sameresult on sequence S and input structureG,namely the structureG ∪ G′ , the additionaliteration in method 2, in which HFold isrun with S and G′ , finds a structure withlower energy than that of G ∪ G′ .Method 3: First run SimFold on S and G to obtainresult G′—a pseudoknot-free structure thatcontains G. Then let Gupdated be thesecondary structure of S containing therelaxed stems of G′ that include the basepairs of G. By a relaxed stem, we mean asecondary structure containing stackedbase pairs, bulges of size 1 and internalloops of maximum size of 3 (i.e., either thesymmetric loop of 1 × 1 or thenon-symmetric loop of 1 × 2 or 2 × 1 butno other loop types; this is motivated bycommon practice [65]). Then run method2 on S and Gupdated, and store the result.See Example 2.Motivation: This method can work wellwhen the given input structure has a smallnumber of base pairs from Gbig , becauseGupdated contains stems that includes thesebase pairs, but avoids “overcrowding” withfurther base pairs that might preventHFold-PKonly from finding pseudoknottedstems.In this example, method 3 of IterativeHFold outperforms the other methods.Because the input structure G consists ofjust one base pair, method 1 (HFold)outputs a pseudoknot-free structurecontaining G. The output of both methods2 and 4 are pseudoknotted but do notcontain the base pair of the input structureG. In contrast, method 3 first adds basepairs to G, resulting in the pseudoknot-freestructureGupdated, and then addsExample 2: Example of result ofmethod 3 compared to all four methods of Iterative HFold.S = GUUUGUUAGUGGCGUGUCCGUCCGCAGCUGGCAAGCGAAUGUAAAGACUGACG = ............(..........)............................Method 1: HFold on S and G((((((((((.(((........))).))))))))))................ -11.01Method 2 and 4:(((.......))).[[[[.[[.(((.]].]]]].)))............... -5.08Method 3:Simfold on S and G((((((((((.(((........))).))))))))))................Gupdated((((((((((.(((........))).))))))))))................Method 2 on S and Gupdated((((((((((.(((.[[[.[[[))).)))))))))).........]]].]]] -11.26Jabbari and Condon BMCBioinformatics 2014, 15:147 Page 8 of 17http://www.biomedcentral.com/1471-2105/15/147additional pseudoknotted base pairs viamethod 2.Method 4: Let S1 be the subsequence of S obtained byremoving bases that are external unpairedbases with respect to input structureG.Run SimFold on S1 and G (with baseindices renumbered to agree with S1), toobtain pseudoknot-free structureG′. Thencontinue exactly as in method 3. SeeExample 3.Motivation: This method is very similar tomethod 3, but further constrains G′ sincethe base pairs in G′ cannot involve basesthat are removed from S to obtain S1. Thispotentially increases the possibilities forpseudoknotted base pairs to be added bymethod 2.Example 3: Example of result of method 4 compared to all four methods of Iterative HFold.S = CCGAGCUCUGUAGCGAGUGCUUGUAACCCGAGCGGGGGCG = .(..................................)..Method 1 and 3:((..((((.((.((((....)))).))..))))...)). -5.10Method 2:.(..................................).. +3.04Method 4:S1 and G1S1 = CGAGCUCUGUAGCGAGUGCUUGUAACCCGAGCGGGGG1 = (..................................)Simfold on S1 and G1:(..((((.((.((((....)))).))..))))...)Gupdated.(..((((.((.((((....)))).))..))))...)..Method 2 on S and Gupdated((..((((.((.((((..[[)))).))..))))..))]] -6.05In this example, method 4 of IterativeHFold outperforms the other methods.The input structureG has a high energyvalue and neither method 1 (HFold) normethod 2 (HFold-PKonly) can expand thepseudoknot-free structure to add thepseudoknotted stem. Also, by adding toomany pseudoknot-free base pairs, method3 fails to find the pseudoknotted base pairs.Thus, method 4 performs better thanmethods 1, 2 and 3.Experimental settingsIn this section we explain details of our computationalexperiments.Robustness testOne of our goals is to understand the degree to which ourmethods are robust with respect to partial information,that is, provide a reliable prediction even when lim-ited information about the true pseudoknot-free struc-ture, Gbig , is available. For this purpose we generatesubset structures of the correspondingGbig , for each RNAsequence in the HK-PK and HK-PK-free data sets. Foreach α, 0.05 ≤ α ≤ 0.95 with 0.05 steps, we choose eachbase pair of Gbig structure with probability α. We alsogenerate 1% information and 99% information about theGbig structure (i.e., α = 0.01 and α = 0.99). We repeatthis step 100 times to generate 100 substructures of Gbigfor each value of α for each RNA sequence in our datasets. We then run our methods on all 100 substructuresfor each RNA sequence in our data sets and α value andcalculate the bootstrap 95% percentile confidence intervalfor average F-measure of these 100 cases as the accuracyinterval for each method and each RNA sequence and αvalue in our data set.We also compare our methods when the truepseudoknot-free structure, Gbig is provided.Accuracy comparison testsWe compare the accuracy of HFold, HFold-PKonly andIterative HFold with each other on different input struc-tures, and with other methods, namely SimFold [63],HotKnots V2.0 [36,41] and IPknot [42]. We first describethe latter two methods and the settings we choose forour experiments. We then describe the ways in which wechoose input structures for HFold and its variants.HotKnots HotKnots is a heuristic program that given anRNA sequence, first finds about 20 lowest energy stems(from the set of all stems for the given RNA sequence),called hotspots. Then keeping all these stems, it adds otherJabbari and Condon BMC Bioinformatics 2014, 15:147 Page 9 of 17http://www.biomedcentral.com/1471-2105/15/147non-overlapping low energy stems to the stems foundin the first step, so as to minimize the energy of theoverall structure, eventually producing up to 20 outputstructures. In our experiments, we choose the structurewith the lowest energy value among the 20 output struc-tures as the final structure predicted by HotKnots. Whenreporting prediction accuracy for HotKnots, we report thebootstrap 95% percentile confidence interval for the aver-age F-measure of the lowest energy structure for all RNAsequences in our data set.IPknot IPknot is a secondary structure predictionmethod based onMaximum Expected Accuracy (MEA) ofthe base pairs. In addition to the RNA sequence, IPknotgets several parameters as input. Following, we describeeach of these parameters and settings briefly.• level: If structureG can be decomposed into kdisjoint pseudoknot-free structures, G1,G2, . . . ,Gk ,such that every base pair in Gi crosses the base pairsof Gj, 1 ≤ i ≤ j ≤ k, Sato et al. say that structureGhas k levels. For example, a pseudoknot-free structurehas level 1, and an H-type pseudoknot has level 2. Inanother example, when representing the secondarystructure in dot bracket format, the number ofdifferent brackets used to represent the structure isthe level of the structure. IPknot can handlestructures up to level 3.• scoring model: The energy model used to produceposterior probabilities for each base pair is called“scoring model”. IPknot has 3 different scoringmodels, namely “CONTRAfold”, “McCaskill” and“NUPACK”.• refining parameters: The procedure of recalculatingthe base pair probabilities based on the originalprediction results is referred to as “refiningparameters”.• base pair weights for each level: Positive numbersrepresenting the rate of true base pairs in each level.We run IPknot using the provided source code and thedefault parameters for scoring model and level (i.e., scor-ing model = McCaskill and level = 2). The default valuesprovided for base pair weights are not the same on theIPknot website (i.e., γ1 = 2 and γ2 = 16), its source code(i.e., for some cases γ1 = 2 and γ2 = 4 and for others γ1 =1 and γ2 = 1) and the provided perl script (i.e., γ1 = 4 andγ2 = 8). We run IPknot with all of these values with andwithout refinement and provide IPknot’s bootstrap 95%confidence intervals for average F-measures for all of ourdata sets as a table in the Additional file 2. Based on itsperformance we present IPknot’s results with default set-tings (i.e., no refinement, scoring model = McCaskill andlevel = 2) and γ1 = 4 and γ2 = 8, for comparison withother methods.Different versions of HFold We compare the averageaccuracy of HFold, HFold-PKonly and Iterative HFoldwith different input structures.To determine which input structures are good to usewhen Gbig is not known, we compare two differentoptions. Since HFold (HFold-PKonly and Iterative HFold)cannot accept pseudoknotted input structures we usethe following methods to produce pseudoknot-free inputstructures to HFold (HFold-PKonly and Iterative HFold).First, we use HotKnots hotspots [36], i.e., the 20 low-est energy pseudoknot-free stems produced in the firstphase of HotKnots. We choose the lowest free energystructure predicted by each of our methods as theirfinal prediction given these hotspots. Second, we useSimFold’s MFE structure [63] where the energy parame-ters of SimFold are changed to match that of HotKnotsV2.0.Running timeWe ran all methods on the same platform (Macbook pro.OS X 10.5.8 with 2.53 GHz Intel Core 2 Duo processorand 4 GB 1067 MHz DDR3 RAM). We use the time com-mand to measure the running time of our methods oneach sequence, and record the wall clock time.Memory usageTo find the memory usage of the programs, we use theValgrind package [66] and record the total heap usage asmemory usage of each program. IPknot and HotKnots arecompletely written in C and so we can easily find theirmemory usage by running Valgrind. However, IterativeHFold program is a perl script that runs a few C programs(HFold, HFold-pkonly and SimFold) sequentially. So wefind the memory usage of each C component using Val-grind and assign the maximum as the memory usage ofIterative HFold.ResultsAs mentioned in Section ‘Background’, in the literatureon hierarchical folding, there are reports of counterexamples to the hierarchical folding hypothesis wherebases that are initially part of the pseudoknot-free struc-ture for a molecule later change as the pseudoknot forms.This motivates a comparison of HFold versus IterativeHFold, in order to see how a method that sticks strictlywith the hypothesis (i.e., HFold) compares with a methodthat allows for some base changes (i.e., Iterative HFold).In Section ‘Robustness comparison’, we compare therobustness of HFold and Iterative HFold with respectto partial information; that is, the degree to which theyprovide accurate predictions as a function of how muchJabbari and Condon BMCBioinformatics 2014, 15:147 Page 10 of 17http://www.biomedcentral.com/1471-2105/15/147information about Gbig , the true pseudoknot-free sec-ondary structure, is provided as input. Then in Section‘Accuracy comparison of different versions of HFold’we compare HFold, HFold-PKonly and IterativeHFold when a (possibly inaccurate) computationalprediction of Gbig is provided as input. In Section‘Accuracy comparison with existing methods’ we com-pare Iterative HFold—the method that performsbest overall in Sections ‘Robustness comparison’ and‘Accuracy comparison of different versions of HFold’ —with existing methods for pseudoknotted secondarystructure prediction. Sections ‘Running time comparison’and ‘Memory consumption comparison’ report on therunning time and memory usage of our methods.Robustness comparisonOne of our goals is to learn what is the accuracy of eachof our methods when partial information about Gbig isavailable (see Section ‘Robustness test’ for experimen-tal settings). Figure 2 shows the results of this robust-ness evaluation, for pseudoknotted structures (Figure 2A),pseudoknot-free structures (Figure 2B) and the over-all results (Figure 2C). Since HFold-PKonly cannot addpseudoknot-free base pairs to the given input structure,we do not compare its performance here with HFoldand Iterative HFold. However we provide detailed perfor-mance of all versions of HFold including HFold-PKonly inAdditional file 3.As shown in Figure 2A, which pertains to pseudoknot-ted structures of the HK-PK data set, when providedwith ≈ 1% of the Gbig structure as input, Iterative HFold’sbootstrap 95% percentile confidence interval of averageF-measures has higher accuracy than those of HFold. Iter-ative HFold continues to be significantly superior toHFolduntil approximately 90% of Gbig is available, after whichHFold is more accurate. Iterative HFold is most success-ful when little information about Gbig is known becauseit can add both pseudoknot-free and pseudoknotted basepairs. In particular, using methods 3 and 4 (see Section‘Iterative HFold’) Iterative HFold first finds a low energypseudoknot-free structure that includes the given inputstructure (by extending the stems of the given structure),and then adds pseudoknotted base pairs to further lowerthe energy of the overall structure. However,when the vastmajority of base pairs ofGbig are provided as input, HFolddominates as it keeps the base pairs of the input struc-ture, thereby often adding base pairs ofGsmall.When 100%of Gbig is provided as input, HFold’s bootstrap 95% per-centile confidence interval is (85.74%, 91.87%), comparedwith (79.36%, 87.41%) for Iterative HFold.As shown in Figure 2A, Iterative HFold’s average accu-racy on pseudoknotted structures steadily increases fromabout 54% to 79% as the user provides 1% to 40% ofthe input structure. This improvement in accuracy slows++++++++ ++ ++ ++ ++++++++A) pseudoknottedBootstrap 95% confidence interval+++++ +++ ++ ++ ++ ++++++++o+Iterative HFoldHFold+++++++++ ++ ++ + ++ + + + + ++B) pseudoknot freeBootstrap 95% confidence interval+++++++++ ++ + ++ + + ++ + + +++++++++++ ++ ++ ++ ++ +++ ++0 20 40 60 80 1000 20 40 60 80 1000 20 40 60 80 1004050607080901007580859095100707580859095100C) combinedBootstrap 95% confidence interval++++++++ ++ ++ ++ ++ ++ ++ ++Figure 2 Comparison of robustness of HFold and Iterative HFold.Robustness results for pseudoknotted structures of the HK-PK data set(2A), pseudoknot-free structures of the HK-PK-free data set (2B) andall structures (2C). The X axes show the available information aboutGbig structure in percentage format, and the Y axes show bootstrap95% percentile confidence intervals for average F-measure. Dashedlines show the borders of the bootstrap 95% percentile for averageF-measure and solid lines show the average F-measure itself.down but still persists when further structural informa-tion is provided. If we compare the slope of the curvefor Iterative HFold’s average accuracy to that of HFoldin Figure 2A, we can see that HFold’s slope is steeperthan that of Iterative HFold, making Iterative HFold morerobust than HFold.For pseudoknot-free structures of the HK-PK-free dataset, as shown in Figure 2B, HFold performs better thanIterative HFold. Even with 1% information about Gbig ,HFold results in (79.76%, 84.32%) 95% bootstrap confi-dence interval in comparison with (79.14%, 83.84%) forIterative HFold with the same inputs. Roughly, HFold’ssuccess for pseudoknot-free structures is because it oftenJabbari and Condon BMC Bioinformatics 2014, 15:147 Page 11 of 17http://www.biomedcentral.com/1471-2105/15/147adds base pairs that do not cross those provided as part ofthe input, and thus are likely to be in Gbig .When 100% of Gbig is provided as input, the over-all bootstrap 95% confidence interval for HFold is(96.11%, 97.24%) compared with (93.85%, 96.07%) forIterative HFold.Accuracy comparison of different versions of HFoldOften, partial information about Gbig is not available; thisis the case for many RNAs of unknown function reportedby the ENCODE consortium [24]. Therefore, we nextcompare the quality of results obtained by HFold, HFold-PKonly and Iterative HFold when given a pseudoknot-freeinput,G that is predicted by existing computational meth-ods. One way to produce an input structure is to use anMFE pseudoknot-free structure prediction method, suchas MFold. We chose SimFold as it is an implementation ofMFold and, because of its energy parameters, gives moreaccurate predictions than MFold. Of course, when com-parative information is available, the user can input suchinformation as structural constraint as a pseudoknot-freestructure to Iterative HFold and expect a better predictionresult. Here we compare two methods for predicting G,namely SimFold and the hotspots produced by HotKnotsV2.0. Table 2 reports the bootstrap 95% percentile con-fidence intervals of average F-measures. The accuracy ofHFold-PKonly is significantly worse than that of HFoldand Iterative HFold, both with the output of SimFold, andwith the HotKnots hotspots as input, so we do not discussHFold-PKonly further.For pseudoknotted structures, using HotKnots hotspotsas input is far superior to using SimFold as input, for bothHFold and Iterative HFold. This appears to be becauseMFE structures predicted by SimFold tend to have morebase pairs than the true pseudoknot free structure,Gbig , sothat HFold and Iterative HFold are unlikely to add pseudo-knotted base pairs to the input structure. For pseudoknot-free structures, using SimFold as input is somewhat betterthan using HotKnots hotspots, but the permutation testindicates that the difference is not significant.The confidence intervals for HFold and Iterative HFoldwithHotKnots hotspots are (73.35%, 83.53%) and (72.83%,83.37%), respectively, and on pseudoknot-free structuresthey are (75.53%, 80.79%) and (74.93%, 80.26%) respec-tively. Again, based on the result of the permutation test,the difference in the results of HFold and Iterative HFoldon pseudoknotted and pseudoknot-free structures are notsignificant. Similarly, the permutation test shows that thedifference in prediction accuracy of HFold and IterativeHFold on SimFold input is not significant.Accuracy comparison with existingmethodsFor comparisons with other methods already in the lit-erature, we choose to use our Iterative HFold methodwith HotKnots hotspots as input structure, based on itsoverall good accuracies in Section ‘Accuracy comparisonof different versions of HFold’. We compare this methodwith two of the best-performing methods [44] for predic-tion of pseudoknotted structures, namely HotKnots V2.0[36], a MFE-based heuristic method, and IPknot [42], amethod that is based on maximum expected accuracy.(Prepared by Puton et al. [44], CompaRNA, is the websitefor continuous comparison of RNA secondary structuremethods on both PDB data set and RNA strand. Wechose IPknot because it was the best-performing non-comparative pseudoknot predictionmethod that can han-dle long RNA sequences, based on the ranking on theirwebsite as of March 25, 2014. We also noticed that Putonet al. used HotKnots V1 for their comparison, and notthe more recently available and better performing Hot-Knots V2.0. Therefore we chose to include HotKnots inour comparisons as well. Since the focus of this paperis on prediction of pseudoknotted structures, we do notcompare our results with that of Co-Fold [26] or othermethods for prediction of pseudoknot free structures.)Table 3 presents the bootstrap 95% percentile con-fidence interval of average F-measure for IterativeHFold with hotspots as input, HotKnots V2.0, Sim-Fold and IPknot with default setting (see Section‘Accuracy comparison tests’) on the HK-PK and HK-PK-free data sets. For pseudoknotted structures, our per-mutation tests show that the difference in accuracy ofIterative HFold and HotKnots is not significant. How-ever, the superior accuracy of Iterative HFold comparedwith SimFold and IPknot is significant. For pseudoknot-free structures, the difference in accuracy between IPknot,Iterative HFold, HotKnots and SimFold is not significant.Table 4 presents the bootstrap 95% percentile confi-dence interval for average F-measure for Iterative HFold(with hotspots as input), HotKnots and IPknot (withTable 2 Comparison of bootstrap 95%percentile confidence interval of average F-measure of different versions of HFoldwhen given SimFold structure as input vs. when given HotKnots hotspots structures as inputInputHotspots SimFold (MFE)PKonly HFold Iter. HFold PKonly HFold Iter. HFoldHK-PK (55.54, 71.06) (73.35, 83.53) (72.83, 83.37) (50.57, 63.53) (50.69, 63.54) (51.42, 64.39)HK-PK-free (31.37, 38.52) (75.53, 80.79) (74.93, 80.26) (78.42, 83.21) (78.33, 83.27) (78.31, 83.17)Jabbari and Condon BMCBioinformatics 2014, 15:147 Page 12 of 17http://www.biomedcentral.com/1471-2105/15/147Table 3 Comparison of bootstrap 95%percentileconfidence interval of average F-measure with existingmethodsInputIter. HFoldHotKnots SimFoldIPknot(hotspots) (default)HK-PK (72.83, 83.37) (73.60, 83.35) (45.34, 57.73) (54.56, 66.25)HK-PK-free (74.93, 80.26) (76.74, 81.95) (78.78, 83.55) (77.31, 81.79)default setting) on the IP-pk168 and DK-pk16 data sets.Our permutation tests show that the difference in accu-racy of Iterative HFold, HotKnots and IPknot on theDK-pk16 data set is not significant. However, the superioraccuracy of Iterative HFold compared with HotKnots andIPknot on the IP-pk168 data set is significant.Running time comparisonSince prediction of pseudoknotted structures are of inter-est to us, we only report running time comparison onpseudoknotted structures of our HK-PK data set. Figure 3presents result of time comparison between IterativeHFold and HotKnots in a log plot (when log is in base10). The X axis shows log(time) for HotKnots data pointsand the Y axis shows log(time) for Iterative HFold onthe HK-PK data set. RNA sequences in this data setare between 26 and 400 bases long. HFold runs signifi-cantly faster than HotKnots and finishes under 1.5 sec-onds for even the longest RNA sequence in our data set(400 bases). HotKnots is faster than Iterative HFold onsequences of up to 47 bases, where Iterative HFold startsbeing faster than HotKnots. Iterative HFold runs in lessthan 8.3 seconds for all RNA sequences in this data setwhereas HotKnots runs for over 6000 seconds (about 1.7hours) on the longest RNA sequence in our data set. Therunning time of both HFold and Iterative HFold growswith sequence length, whereas HotKnots’ running timeis not directly correlated with RNA length. For exam-ple, HotKnots runs for 1665.94 seconds for one RNAsequence of length 195 (ASE-00360), while it runs for203.12 seconds for another RNA sequence of length 195(ASE-00131).IPknot is significantly faster than both HFold and Iter-ative HFold. For all sequences in this data set, IPknotproduces output in less than 0.8 seconds. For detailedTable 4 Comparison of bootstrap 95%percentileconfidence interval of average F-measure with existingmethods on the DK-pk16 and the IP-pk168 data setsInputIter. HFoldHotKnotsIPknot(hotspots) (default)DK-pk16 (68.05, 81.85) (69.11, 83.81) (65.42, 75.81)IP-pk168 (72.65, 79.86) (65.51, 72.96) (58.20, 66.09)Figure 3 Time comparison. Comparison of running times ofIterative HFold and HotKnots in a log plot. The X axis showslog10(time) for HotKnots data points and the Y axis shows log10(time)for Iterative HFold. Time is measured in seconds.information about performance of each method seeAdditional file 4.Memory consumption comparisonHere we present memory consumption of HFold, Itera-tive HFold and HotKnots on our HK-PK pseudoknottedstructures. Since HotKnots predicts and keeps about 20structures in memory, its memory consumption can varysignificantly from one sequence to another, and is not pre-dictable. Up until 47 bases, HotKnots some times usesless memory than HFold or Iterative HFold, but for RNAsequences with 47 bases or longer, HotKnots uses muchmore memory than HFold and Iterative HFold. Itera-tive HFold’s memory usage is very similar to HFold’s andincreases at a very low rate by the length of the RNAsequence. It starts from 48.69 MB for RNA sequences oflength 26 and increases to 61.33 MB for the longest RNAsequence in this data set (400 bases long). HotKnots, how-ever, uses as little as 16.53 MB for an RNA of length 30bases (LP-PK1) and as much as 93419 MB for the longestRNA sequence in this data set.Jabbari and Condon BMC Bioinformatics 2014, 15:147 Page 13 of 17http://www.biomedcentral.com/1471-2105/15/147IPknot uses much less memory than all other methods.For the longest RNA sequence in this data set, IPknotuses less than 5.5 MB of memory in comparison to 61.33MB of HFold and Iterative HFold and 93419 MB of Hot-Knots. For detailed information about memory usage ofeach method see Additional file 4.DiscussionIn Section ‘Comparison with Hotknots and IPknot’we provide more insight on the differences and mer-its of Iterative HFold, HotKnots and IPknots. Thenin Section ‘Comparison with ShapeKnots’ we com-pare accuracy of Iterative HFold with ShapeKnots, amethod that incorporates SHAPE reactivity data to pre-dict RNA pseudoknotted secondary structure. In Section‘Iterative HFold with SimFold’s suboptimal structures’ wecompare performance accuracy of Iterative HFold withtwo inputs: HotKnots hotspots and suboptimal structures.Section ‘Energy model’ provides more insight into theenergy model used in this work.Comparison with Hotknots and IPknotComparing accuracy of Iterative HFold and HotKnotsV2.0 on HK-PK, HK-PK-free, DK-pk16 and IP-pk168,we found that the difference in their accuracies isinsignificant on HK-PK, HK-PK-free and DK-pk16 datasets when Iterative HFold is provided with HotKnotshotspots as input. Based on our results on the HK-PKdata set, with only about 15% information about thetrue pseudoknot-free structures, Iterative HFold’s 95%percentile confidence interval is (65.08%; 73.36%) (datashown in Additional file 3). If the user has about 35% infor-mation about the true pseudoknot-free structure, IterativeHFold’s accuracy is comparable with that of HotKnots (i.e.,(74.18%; 82.30%) vs. (73.60%; 83.35%)). However IterativeHFold’s accuracy (with hotspots as input) is significantlybetter than that of HotKnots on the IP-pk168 data set.One of the advantages of Iterative HFold over HotKnots isthat in Iterative HFold base pairs are added to lower theenergy of the given structure while in HotKnots stems areadded in a way that does not take into account the energyof stems in the previous steps.When reporting on time and memory consumption ofIterative HFold and HotKnots on the HK-PK data set, wedid not include the time and memory required to get theinput structures to Iterative HFold. Since we only run Hot-Knots V2.0 partially to produce hotspots it does not takeas long as running HotKnots and does not consume asmuch memory. For example, for the 400 nucleotides longRNA sequence in our data set (A.tum.RNaseP), it onlytakes 0.5 seconds time and 4 MB of memory to producethe hotspots. (The time required to get the hotspots for allRNA sequences in this data set is provided in Additionalfile 4.) We also note that since calculating hotspots andrunning Iterative HFold are done sequentially, the mem-ory consumption is calculated as the maximum of thetwo, so the memory consumption of Iterative HFold forthis sequence is still the same even including the memoryneeded for calculating hotspots. As we can see based onthis example, even including time and memory require-ments of calculating hotspots, Iterative HFold is still fasterthan HotKnots and uses less memory.We also compared Iterative HFold with IPknot [42].While IPknot is faster than Iterative HFold and uses lessmemory, we found that for the HK-PK and IP-pk168 datasets, Iterative HFold provides significantly more accuratepredictions of pseudoknotted structures, compared withIPknot. Based on our results on the HK-PK data set,Iterative HFold’s performance with more than 5% infor-mation about the true pseudoknot-free structure, is betterthan that of IPknot with default settings (data shown inAdditional file 3). We note that Sato et al. [42] find theperformance of IPknot with predictions using “NUPACK”superior to all versions of IPknot, but since this model canbe used for RNA sequences of length< 80 nucleotides, wedid not compare our results with this version of IPknot.Among all different versions of IPknot we tested, we foundall but γ1 = 1 and γ2 = 1 setting producing similarconfidence intervals for all but the HK-PK-free data sets,for which γ1 = 4 and γ2 = 8 produces the best result(data shown in Additional file 2). While running parame-ter refinement with one iteration improved the confidenceintervals in the HK-PK data set, it did not result in anyimprovement in accuracy in the rest of our data sets as inmany cases IPknot failed to produce results. We note thatour results on the IP-pk168 data set with different weightparameters perfectly match the results of Sato et al. [42].A disadvantage of IPknot over Iterative HFold is thatbeing an MEA-based method, IPknot does not pro-duce free energy of the predicted structure. Also to getthe best prediction, the user needs to provide someguidance as to what type of structure to predict forthe given sequence, e.g., whether pseudoknot-free orpseudoknotted.Comparison with ShapeKnotsSimilar to HotKnots, the ShapeKnots method of Hajdinet al. [47] is a heuristic algorithm for prediction of pseu-doknotted structures. This method incorporates SHAPEreactivity data as a pseudo energy term into the pre-diction method. SHAPE reactivity data is only availablefor a limited number of RNA sequences, so we cannotcompare Iterative HFold with ShapeKnots on our dataset. Therefore, we use data set of Hajdin et al. to com-pare these two methods. In their data set Hajdin et al.have 18 RNA sequences in their training set and 6 RNAsequences in their test set. We run Iterative HFold withhotspots for each RNA sequence and choose the lowestJabbari and Condon BMCBioinformatics 2014, 15:147 Page 14 of 17http://www.biomedcentral.com/1471-2105/15/147energy structure as the final output of our program. ForShapeknots, we use the sensitivity and positive predictivevalues reported in the work of Hajdin et al. [47] to com-pare with Iterative HFold. Table 5 shows the results ofthis comparison. In all but one sequence of the test set,Iterative HFold obtains higher accuracy than ShapeKnots.The exception is the HIV-1 5’ pseudoknot domain; Hajdinet al. note that the accepted structure of HIV-1 5’ pseudo-knot domain is based on a SHAPE directed prediction andthus an accuracy comparison between ShapeKnots andIterative HFold may be biased towards ShapeKnots. In thetraining set, however, Iterative HFold does not perform aswell as ShapeKnots. This might be because parameters ofShapeKnots were tuned on the training set to achieve thehighest possible accuracy. Since both the training and testdata sets are small, we cannot make more general state-ments about the significance of the differences in accuracybetween the two methods.Iterative HFold with SimFold’s suboptimal structuresTo further investigate which input structures are goodto use when Gbig is not known, we use the first 50 sub-optimal structures produced by SimFold (including theMFE structure). Then for each RNA sequence we runour methods on all 50 suboptimal structures and choosethe one with the lowest free energy as the final result forthat RNA sequence. With this approach, the bootstrap95% percentile confidence interval of average F-measureof HFold and Iterative HFold is (61.80%, 80.63%) and(67.70%, 79.57%) respectively for pseudoknotted struc-tures and (77.17%, 82.35%) and (76.27%, 81.46%) respec-tively for pseudoknot-free structures. The permutationtest indicates that the difference between these resultsand the corresponding results when input structures arehotspots is not significant. We also test the significance ofresults of HFold with first 50 suboptimal structures ver-sus Iterative HFold with the same input structures andTable 5 Comparison of Iterative HFold F-measure with ShapeKnots on SHAPE dataTraining set Len PKIter. HFold ShapeKnotssen ppv F sen ppv FPre-Q1 riboswitch, B. subtilis 34 1 62.5 100 76.9 100 100 100Telomerase pseudoknot, human 47 1 100 100 100 100 100 100tRNA(asp), yeast 75 0 81.0 100 89.5 95.2 95.2 95.2TPP riboswitch, E. coli 79 0 46.5 47.6 47.1 95.4 87.5 91.3SARS corona virus pseudoknot 82 1 69.2 86.3 69.2 84.6 88.0 86.3cyclic-di-GMP riboswitch, V. cholerae 97 0 85.5 81.0 83.2 89.3 86.2 87.7SAM I riboswitch, T. tengcongenis 118 1 79.5 91.2 84.9 92.3 97.3 94.7M-Box riboswitch, B. subtilis 154 0 87.5 91.3 89.4 87.5 91.3 89.3P546 domain, bI3 group I intron 155 0 55.4 57.4 56.4 94.6 96.4 95.5Lysine riboswitch, T. maritima 174 1 85.7 94.7 90.0 87.3 88.7 88.0Group I intron, Azoarcus sp. 214 1 52.4 54.1 53.2 92.1 95.1 93.5Signal recognition particle RNA, human 301 0 70.0 73.7 71.8 55.0 53.9 54.4Hepatitis C virus IRES domain 336 1 71.2 74.0 72.5 92.3 96.0 94.1RNase P, B. subtilis 405 1 55.7 59.3 57.4 75.6 79.8 77.7Group II intron, O. iheyensis 412 1 87.9 95.9 91.7 93.2 97.6 95.3Group I intron, T. thermophila 425 1 83.2 85.2 84.2 93.9 91.2 92.55’ domain of 23S rRNA, E. coli 511 0 84.0 72.5 77.8 92.4 76.4 83.65’ domain of 16S rRNA, E. coli 530 0 73.6 69.0 71.2 89.9 80.6 84.9Test set Len PKIter. HFold ShapeKnotssen ppv F sen ppv FFluoride riboswitch, P. syringae 66 1 100 100 100 93.7 93.7 93.7Adenine riboswitch, V. vulnificus 71 0 100 100 100 100 100 100tRNA(phe), E. coli 76 0 100 100 100 100 84.0 91.35S rRNA, E. coli 120 0 91.4 91.4 91.4 85.7 76.9 81.15’ domain of 16S rRNA, H. volcanii 473 0 90.3 82.3 86.1 89.6 82.7 86.0HIV-1 5’ pseudoknot domain 500 1 45.4 50.4 47.7 100 100 100Jabbari and Condon BMC Bioinformatics 2014, 15:147 Page 15 of 17http://www.biomedcentral.com/1471-2105/15/147Iterative HFold with hotspots for both pseudoknotted andpseudoknot-free structures. Although the bootstrap 95%percentile confidence intervals for average F-measuresseem different, the permutation test indicates that thedifference is not significant. Similarly, results of IterativeHFold with the first 50 suboptimal structures are not sig-nificantly better or worse than the result of HFold withhotspots as input structures for both pseudoknotted andpseudoknot-free structures.Energy modelIn this paper we use the HotKnots V2.0 DP09 [36] energyparameters in our implementation of Iterative HFold. Toinvestigate the degree to which the energy model maybe causing mis-predictions by HotKnots V2.0 or IterativeHFold, we considered the degree to which the maxi-mum accuracy structures produced by these methods,i.e., the structure with highest F-measure, is better thanthe minimum free energy structures. Table 6 presents thedifference in bootstrap 95% percentile confidence inter-vals of average F-measure. If we choose the maximumaccuracy structure among the 20 output structures pre-dicted by HotKnots for each RNA sequence, the bootstrap95% percentile confidence intervals of average F-measureof HotKnots will increase to (84.50%, 91.48%) for pseu-doknotted structures of the HK-PK data set (vs. (73.6%,83.35%) when choosing the lowest energy structure) and(88.32%, 91.08%) for pseudoknot-free structures of theHK-PK-free data set (vs. (76.74%, 81.95%) when choosingthe lowest energy structure).Similarly, if we compare the maximum accuracy struc-ture output by Iterative HFold with the minimum freeenergy structure, whether given HotKnots hotspots or thefirst 50 suboptimal structures to Iterative HFold as input,the bootstrap 95% percentile confidence intervals of aver-age F-measure also show improvement - see Table 6. Thedifference in improvements is significant in all but onecase, namely Iterative HFold on hotspots structures asinput for pseudoknotted structures of the HK-PK data set.We conclude that improvements on the energy parametervalues for pseudoknotted structures may further improveaccuracy of both HotKnots and Iterative HFold.ConclusionsIn this work we present Iterative HFold, a fast and robustiterative algorithm that matches the accuracy of the bestexisting pseudoknot prediction methods. Iterative HFoldis significantly more accurate than IPknot while matchingthe accuracy of HotKnots on the HK-PK data set. IterativeHFold is superior to both IPknot and HotKnots on the IP-pk168 data set. Moreover both Iterative HFold and IPknotuse less memory and run much faster than HotKnots onlong sequences.Iterative HFold also has lower rate of accuracy deteri-oration than HFold with loss of information about thetrue pseudoknot-free structure, so it is more robust thanHFold. This is particularly helpful when the given inputstructure may be unreliable and/or limited informationabout the true pseudoknot-free structure is available. Iter-ative HFold is also more accurate than ShapeKnots [47] onthe test set of Hajdin et al. [47].In this work, we compared two different ways to gener-ate pseudoknot-free input structures for input to IterativeHFold, namely the first 50 suboptimal structures pro-duced by SimFold, and HotKnots hotspots. On theHK-PKand HK-PK-free data sets, accuracy of Iterative HFold isnot significantly different on each of these. An alterna-tive approach that may be worth exploring in future workwould be to use the most highly probable base pairs, ascalculated using the partition function [43]. Even bettermay be to calculate base pair probabilities for base pairsof pseudoknotted RNA structures; however this requires(n5) time. Since HFold finds minimum free energystructure in O(n3) time, conditional on the given inputstructure, we are currently investigating ways to developan O(n3)-time partition function version of HFold thatcan produce pseudoknotted base pair probabilities thatare conditional on the given input structure.Comparing accuracy of the minimum free energy struc-tures with the maximum accuracy structures in this work,Table 6 Comparison of bootstrap 95%percentile confidence interval of average F-measure between theminimumenergy structures and themaximum accuracy structures of the HK-PK and the HK-PK-free data setsInput structures Min energy Max accuracy Permutation testIter. HFold - hotspots PKed (72.83, 83.37) (78.56, 87.05) Not significantIter. HFold - hotspots PK-free (74.93, 80.26) (87.70, 90.57) SignificantIter. HFold - 50 suboptimals PKed (67.70, 79.57) (80.41, 88.14) SignificantIter. HFold - 50 suboptimals PK-free (76.27, 81.46) (90.05, 93.00) SignificantHotKnots PKed (73.60, 83.35) (84.50, 91.48) SignificantHotKnots PK-free (76.74, 81.95) (88.32, 91.08) SignificantJabbari and Condon BMCBioinformatics 2014, 15:147 Page 16 of 17http://www.biomedcentral.com/1471-2105/15/147we found that, on average, theminimum free energy struc-ture has significantly poorer F-measure than the maxi-mum accuracy structure. This suggests that an improvedenergymodel for pseudoknotted structure predictionmayimprove accuracy of prediction algorithms for pseudo-knotted structures.Another direction for future work can be to use Itera-tiveHFold for structure prediction of two interacting RNAmolecules. Iterative HFold may be well suited for this pur-pose because, given input structures for each individualinput molecule, it allows for modification of these inputstructures as it explores potential base pairing interactionsbetween the two molecules.Data and software availabilityIterative HFold and all data used in this work are freelyavailable at http://www.cs.ubc.ca/~hjabbari/software.php.Additional filesAdditional file 1: Pseudocode. We provide pseudocode of our IterativeHFold algorithm in this section.Additional file 2: IPknot Performance. Table 1 provides the bootstrap95% confidence intervals for average F-measure of IPknot on different datasets and different weight parameters. The energy model in all theseexperiments is set to McCaskill and level is set to 2 (both default values).Additional file 3: Robustness Comparison and Correlation to FalsePositives. Tables 1 and 2 provide complete presenting robustnesscomparison of HFold-PKonly, HFold, and Iterative HFold, when providedwith different percentage of Gbig information of HK-PK and HK-PK-free datasets. Note that the reported interval in each case is the bootstrap 95%confidence interval for F-measure of the 100 structures with 1 ≤ α ≤ 99percent information about the Gbig structure. The 100% information is thebootstrap 95% confidence interval for F-measure when input structure isGbig . Table 3 provides Pearson correlation coefficient of HFold and IterativeHFold with HotKnots hotspots, SimFold MFE and SimFold first 50suboptimal structures. Here FP represents false positive rate and Frepresents the F-measure as the accuracy measure.Additional file 4: Time andMemory Comparison. Tables 1 and 2provide complete data presenting running time comparison of HFold,Iterative HFold, HotKnots V2.0 and IPknot on the HK-PK data set. Timing ispresented in seconds. Tables 3 and 4 provide complete data presentingmemory (total heap usage) comparison of HFold, Iterative HFold, HotKnotsV2.0 and IPknot on the HK-PK data set. Memory usage is presented in MegaBytes.Competing interestsThe authors declare that they have no competing interests.Authors’ contributionsHJ designed and implemented the algorithms, acquired, analyzed andinterpreted the data and drafted the manuscript. AC supervised the research,participated in analysis of data and in revising the manuscript. Both authorsread and approved the final manuscript.AcknowledgementsThe authors thank the reviewers for their very constructive suggestions. Theauthors also thank and acknowledge Dr. Holger Hoos for his helpful commentsand early discussion about the paper. This research was funded by a grant fromthe Natural Sciences and Engineering Research Council of Canada (NSERC).Received: 7 January 2014 Accepted: 8 May 2014Published: 18 May 2014References1. Hale BJ, Yang C-X, Ross JW: Small RNA regulation of reproductivefunction.Mol Reprod Dev 2014, 81(2):148–159.2. Deryusheva S, Gall JG: Novel small cajal-body-specific RNAs identifiedin drosophila: probing guide RNA function. RNA 2013,19(12):1802–1814.3. Holt CE, Schuman EM: The central dogma decentralized: Newperspectives on RNA function and local translation in neurons.Neuron 2013, 80(3):648–657.4. Mattick JS, Makunin IV: Non-coding RNA. HumMol Genet 2006,15(suppl 1):17–29.5. The FANTOM Consortium, Carninci P, Kasukawa T, Katayama S, Gough J,Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, Kodzius R,Shimokawa K, Bajic VB, Brenner SE, Batalov S, Forrest ARR, Zavolan M,Davis MJ, Wilming LG, Aidinis V, Allen JE, Ambesi-Impiombato A, ApweilerR, Aturaliya RN, Bailey TL, Bansal M, Baxter L, Beisel KW, Bersano T, et al.:The transcriptional landscape of the mammalian genome. Science2005, 309(5740):1559–1563.6. Dennis C: The brave new world of RNA. Nature 2002,418(6894):122–124.7. Lee K, Varma S, Santalucia J, Cunningham PR: In vivo determination ofRNA structure-function relationships: analysis of the 790 loop inribosomal RNA. J Mol Biol 1997, 269(5):732–743.8. Abdi NM, Fredrick K: Contribution of 16S rRNA nucleotides formingthe 30S subunit a and p sites to translation in escherichia coli. RNA2005, 11(11):1624–1632.9. Saraiya AA, Lamichhane TN, Chow CS, SantaLucia J, Cunningham PR:Identification and role of functionally important motifs in the 970loop of escherichia coli 16S ribosomal RNA. J Mol Biol 2008,376(3):645–657.10. Calidas D, Lyon H, Culver GM: The N-terminal extension of S12influences small ribosomal subunit assembly in Escherichia coli. RNA2014. [http://dx.doi.org/10.1261/rna.042432.113]11. Sato K, Kato Y, Akutsu T, Asai K, Sakakibara Y: DAFS: simultaneousaligning and folding of RNA sequences via dual decomposition.Bioinformatics 2012, 28(24):3218–3224.12. Hamada M, Sato K, Asai K: Improving the accuracy of predictingsecondary structure for aligned RNA sequences. Nucleic Acids Res2011, 39(2):393–402.13. Hamada M, Yamada K, Sato K, Frith MC, Asai K: CentroidHomfold-LAST:accurate prediction of RNA secondary structure using automaticallycollected homologous sequences. Nucleic Acids Res 2011,39(suppl 2):100–106.14. Xu Z, Mathews DH:Multilign: an algorithm to predict secondarystructures conserved in multiple RNA sequences. Bioinformatics 2011,27(5):626–632.15. Wiebe NJP, Meyer IM: Transat - a method for detecting the conservedhelices of functional rna structures, including transient,pseudo-knotted and alternative structures. PLoS Comput Biol 2010,6(6):1000823.16. Bernhart S, Hofacker I, Will S, Gruber A, Stadler P: RNAalifold: improvedconsensus structure prediction for RNA alignments. BMCBioinformatics 2008, 9(1):474.17. Meyer IM, Miklós I: SimulFold: simultaneously inferring RNAstructures including pseudoknots, alignments, and treesusing a Bayesian MCMC framework. PLoS Comput Biol 2007,3(8):149.18. Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, LanderES, Kent J, Miller W, Haussler D: Identification and classification ofconserved RNA secondary structures in the human genome. PLoSComput Biol 2006, 2(4):33.19. Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A:Rfam: annotating non-coding RNAs in complete genomes. NucleicAcids Res 2005, 33(Database issue). [http://view.ncbi.nlm.nih.gov/pubmed/15608160]20. Touzet H, Perriquet O: CARNAC: folding families of related RNAs.Nucleic Acids Res 2004, 32(Web Server issue). [http://dx.doi.org/10.1093/nar/gkh415]Jabbari and Condon BMC Bioinformatics 2014, 15:147 Page 17 of 17http://www.biomedcentral.com/1471-2105/15/14721. Knudsen B, Hein J: RNA secondary structure prediction usingstochastic context-free grammars and evolutionary history.Bioinformatics 1999, 15(6):446–454.22. Durbin R, Eddy SR, Krogh A, Mitchison G: Biological Sequence Analysis:Probabilistic Models of Proteins and Nucleic Acids. Cambridge: CambridgeUniversity Press; 1998.23. Mathews DH, Sabina J, Zuker M, Turner DH: Expanded sequencedependence of thermodynamic parameters improves prediction ofRNA secondary structure. J Mol Biol 1999, 288(5):911–940.24. The ENCODE Project Consortium: Identification and analysis offunctional elements in 1% of the human genome by the ENCODEpilot project. Nature 2007, 447(7146):799–816.25. Hofacker IL, Fontana W, Stadler PF, Bonhoeffer LS, Tacker M, Schuster P:Fast folding and comparison of RNA secondary structures.Monatshefte für Chemie / ChemMonthly 1994, 125(2):167–188. [http://dx.doi.org/10.1007/bf00818163]26. Proctor JR, Meyer IM: CoFold: an RNA secondary structure predictionmethod that takes co-transcriptional folding into account. NucleicAcids Res 2013, 41(9):102.27. Staple DW, Butcher SE: Pseudoknots: RNA structures with diversefunctions. PLoS Biol 2005, 3(6):e213+. [http://dx.doi.org/10.1371/journal.pbio.0030213]28. van Batenburg FH, Gultyaev AP, Pleij CW: Pseudobase: structuralinformation on RNA pseudoknots. Nucleic Acids Res 2001,29(1):194–195.29. Deiman BALM, Pleij CWA: Pseudoknots: A vital feature in viral RNA.Semin Virol 1997,s, 8(3):166–175.30. Akutsu T: Dynamic programming algorithms for RNA secondarystructure prediction with pseudoknots. Disc AppMath 2000,104(1–3):45–62.31. Lyngsø RB: Complexity of pseudoknot prediction in simple models.In ICALP. Automata, Languages and Programming. Lecture Notes inComputer Science, vol. 3142. Edited by Díaz J, Karhumäki J, Lepistö A,Sannella D. Heidelberg: Springer Berlin; 2004:919–931.32. Lyngsø RB, Pedersen CN: RNA pseudoknot prediction in energy-basedmodels. J Comput Biol 2000, 7(3–4):409–427.33. Rivas E, Eddy SR: A dynamic programming algorithm for RNAstructure prediction including pseudoknots. J Mol Biol 1999,285(5):2053–2068.34. Dirks RM, Pierce NA: A partition function algorithm for nucleic acidsecondary structure including pseudoknots. J Comput Chem 2003,24(13):1664–1677.35. Reeder J, Giegerich R: Design, implementation and evaluation of apractical pseudoknot folding algorithm based on thermodynamics.BMC Bioinformatics 2004, 5:104+. [http://dx.doi.org/10.1186/1471-2105-5-104]36. Andronescu MS, Pop C, Condon AE: Improved free energy parametersfor RNA pseudoknotted secondary structure prediction. RNA 2010,16(1):26–42.37. Sperschneider J, Datta A, Wise MJ:Heuristic RNA pseudoknotprediction including intramolecular kissing hairpins. RNA 2011,17(1):27–38.38. Sperschneider J, Datta A: DotKnot: pseudoknot prediction using theprobability dot plot under a refined energy model. Nucleic Acids Res2010, 38(7):103.39. Sperschneider J, Datta A: KnotSeeker: Heuristic pseudoknot detectionin long RNA sequences. RNA 2008, 14(4):630–640.40. Huang C-H, Lu CL, Chiu H-T: A heuristic approach for detecting RNAh-type pseudoknots. Bioinformatics 2005, 21(17):3501–3508.41. Ren J, Rastegari B, Condon A, Hoos HH: Hotknots: Heuristic predictionof rna secondary structures including pseudoknots. RNA 2005,11(10):1494–1504.42. Sato K, Kato Y, Hamada M, Akutsu T, Asai K: IPknot: fast and accurateprediction of RNA secondary structures with pseudoknots usinginteger programming. Bioinformatics 2011, 27(13):85–93.43. Mathews DH: Using an RNA secondary structure partition function todetermine confidence in base pairs predicted by free energyminimization. RNA 2004, 10(8):1178–1190.44. Puton T, Kozlowski LP, Rother KM, Bujnicki JM: CompaRNA: a server forcontinuous benchmarking of automatedmethods for RNAsecondary structure prediction. Nucleic Acids Res 2013,41(7):4307–4323.45. Mathews DH, Disney MD, Childs JL, Schroeder SJ, Zuker M, Turner DH:Incorporating chemical modification constraints into a dynamicprogramming algorithm for prediction of RNA secondary structure.Proc Natl Acad Sci U S A 2004, 101(19):7287–7292.46. Deigan KE, Li TW, Mathews DH, Weeks KM: Accurate SHAPE-directedRNA structure determination. Proc Natl Acad Sci 2009,s, 106(1):97–102.47. Hajdin CE, Bellaousov S, Huggins W, Leonard CW, Mathews DH, WeeksKM: Accurate SHAPE-directed RNA secondary structuremodeling,including pseudoknots. Proc Natl Acad Sci U S A 2013,110(14):5498–5503.48. Jabbari H, Condon A, Zhao S: Novel and efficient RNA secondarystructure prediction using hierarchical folding. J Comput Biol 2008,15(2):139–163.49. Tinoco I, Bustamante C: How RNA folds. J Mol Biol 1999, 293(2):271–281.50. Mathews DH: Predicting RNA secondary structure by free energyminimization. Theor Chem Acc: Theory, Computation, andModeling(Theoretica Chimica Acta) 2006:1–9.51. Cho SS, Pincus DL, Thirumalai D: Assembly mechanisms of RNApseudoknots are determined by the stabilities of constituentsecondary structures. Proc Natl Acad Sci 2009, 106(41):17349–17354.52. Bailor MH, Sun X, Al-Hashimi HM: Topology links RNA secondarystructure with global conformation, dynamics, and adaptation.Science 2010, 327(5962):202–206. [http://dx.doi.org/10.1126/science.1181085]53. Wilkinson KA, Merino EJ, Weeks KM: RNA SHAPE chemistry revealsnonhierarchical interactions dominate equilibrium structuraltransitions in tRNAasp transcripts. J AmChem Soc 2005,127(13):4659–4667.54. Ding F, Sharma S, Chalasani P, Demidov VV, Broude NE, Dokholyan NV: Abinitio RNA folding by discrete molecular dynamics: From structureprediction to folding mechanisms. RNA 2008, 14(6):1164–1173.55. Darty K, Denise A, Ponty Y: VARNA: interactive drawing and editing ofthe RNA secondary structure. Bioinformatics 2009, 25(15):1974–1975.56. Rastegari B, Condon A: Parsing nucleic acid pseudoknotted secondarystructure: algorithm and applications. J Comput Biol 2007, 14(1):16–32.57. Sperschneider J, Datta A, Wise MJ: Predicting pseudoknottedstructures across two RNA sequences. Bioinformatics 2012,28(23):3058–3065.58. Hajiaghayi M, Condon A, Hoos H: Analysis of energy-based algorithmsfor RNA secondary structure prediction. BMC Bioinformatics 2012,13(1):22.59. Varian H: Bootstrap tutorial.Math J 2005, 9(4):768–775.60. Hesterberg T, Monaghan S, Moore DS, Cipson A, Epstein R: Bootstrapmethods and permutation tests. In The practice of business statistics.Edited by Farace P, Ward T, Swearengin D, Donnellan B. New York: W. H.Freeman and Company; Chap. 18.61. Aghaeepour N, Hoos H: Ensemble-based prediction of RNA secondarystructures. BMC Bioinformatics 2013, 14(1):139.62. R Core Team: R: A Language and Environment for Statistical Computing.Vienna, Austria: R Foundation for Statistical Computing; 2013.[http://www.R-project.org/]63. Andronescu M, Chuan Z, Condon A: Secondary structure prediction ofinteracting RNAmolecules. J Mol Biol 2005, 345(5):987–1001.64. Zuker M:Mfold web server for nucleic acid folding and hybridizationprediction. Nucleic Acids Res 2003, 31:3406–3415.65. Bellaousov S, Mathews DH: ProbKnot: fast prediction of RNAsecondary structure including pseudoknots. RNA 2010,16(10):1870–1880.66. Nethercote N, Seward J: Valgrind: a framework for heavyweightdynamic binary instrumentation. SIGPLANNot 2007, 42(6):89–100.doi:10.1186/1471-2105-15-147Cite this article as: Jabbari and Condon: A fast and robust iterativealgorithm for prediction of RNA pseudoknotted secondary structures.BMC Bioinformatics 2014 15:147.
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Faculty Research and Publications /
- A fast and robust iterative algorithm for prediction...
Open Collections
UBC Faculty Research and Publications
A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures Jabbari, Hosna; Condon, Anne May 18, 2014
pdf
Page Metadata
Item Metadata
Title | A fast and robust iterative algorithm for prediction of RNA pseudoknotted secondary structures |
Creator |
Jabbari, Hosna Condon, Anne |
Publisher | BioMed Central |
Date Issued | 2014-05-18 |
Description | Background: Improving accuracy and efficiency of computational methods that predict pseudoknotted RNA secondary structures is an ongoing challenge. Existing methods based on free energy minimization tend to be very slow and are limited in the types of pseudoknots that they can predict. Incorporating known structural information can improve prediction accuracy; however, there are not many methods for prediction of pseudoknotted structures that can incorporate structural information as input. There is even less understanding of the relative robustness of these methods with respect to partial information. Results We present a new method, Iterative HFold, for pseudoknotted RNA secondary structure prediction. Iterative HFold takes as input a pseudoknot-free structure, and produces a possibly pseudoknotted structure whose energy is at least as low as that of any (density-2) pseudoknotted structure containing the input structure. Iterative HFold leverages strengths of earlier methods, namely the fast running time of HFold, a method that is based on the hierarchical folding hypothesis, and the energy parameters of HotKnots V2.0. Our experimental evaluation on a large data set shows that Iterative HFold is robust with respect to partial information, with average accuracy on pseudoknotted structures steadily increasing from roughly 54% to 79% as the user provides up to 40% of the input structure. Iterative HFold is much faster than HotKnots V2.0, while having comparable accuracy. Iterative HFold also has significantly better accuracy than IPknot on our HK-PK and IP-pk168 data sets. Conclusions Iterative HFold is a robust method for prediction of pseudoknotted RNA secondary structures, whose accuracy with more than 5% information about true pseudoknot-free structures is better than that of IPknot, and with about 35% information about true pseudoknot-free structures compares well with that of HotKnots V2.0 while being significantly faster. Iterative HFold and all data used in this work are freely available at http://www.cs.ubc.ca/~hjabbari/software.php . |
Subject |
RNA Secondary structure prediction Pseudoknot Hierarchical folding Minimum free energy |
Genre |
Article |
Type |
Text |
Language | eng |
Date Available | 2015-11-05 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution 4.0 International (CC BY 4.0) |
DOI | 10.14288/1.0215949 |
URI | http://hdl.handle.net/2429/55182 |
Affiliation |
Computer Science, Department of Science, Faculty of |
Citation | BMC Bioinformatics. 2014 May 18;15(1):147 |
Publisher DOI | 10.1186/1471-2105-15-147 |
Peer Review Status | Reviewed |
Scholarly Level | Faculty |
Copyright Holder | Jabbari and Condon; licensee BioMed Central Ltd. |
Rights URI | http://creativecommons.org/licenses/by/4.0/ |
Aggregated Source Repository | DSpace |
Download
- Media
- 52383-12859_2014_Article_6443.pdf [ 680.28kB ]
- Metadata
- JSON: 52383-1.0215949.json
- JSON-LD: 52383-1.0215949-ld.json
- RDF/XML (Pretty): 52383-1.0215949-rdf.xml
- RDF/JSON: 52383-1.0215949-rdf.json
- Turtle: 52383-1.0215949-turtle.txt
- N-Triples: 52383-1.0215949-rdf-ntriples.txt
- Original Record: 52383-1.0215949-source.json
- Full Text
- 52383-1.0215949-fulltext.txt
- Citation
- 52383-1.0215949.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.52383.1-0215949/manifest