ralssBioMed CentBMC BioinformaticsOpen AcceResearch articleAn ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problemAlena Shmygelska and Holger H Hoos*Address: Department of Computer Science, University of British Columbia, Vancouver, B.C., V6T 1Z4, CanadaEmail: Alena Shmygelska - oshmygel@cs.ubc.ca; Holger H Hoos* - hoos@cs.ubc.ca* Corresponding author AbstractBackground: The protein folding problem is a fundamental problems in computational molecularbiology and biochemical physics. Various optimisation methods have been applied to formulationsof the ab-initio folding problem that are based on reduced models of protein structure, includingMonte Carlo methods, Evolutionary Algorithms, Tabu Search and hybrid approaches. In our work,we have introduced an ant colony optimisation (ACO) algorithm to address the non-deterministicpolynomial-time hard (NP-hard) combinatorial problem of predicting a protein's conformationfrom its amino acid sequence under a widely studied, conceptually simple model – the 2-dimensional (2D) and 3-dimensional (3D) hydrophobic-polar (HP) model.Results: We present an improvement of our previous ACO algorithm for the 2D HP model andits extension to the 3D HP model. We show that this new algorithm, dubbed ACO-HPPFP-3,performs better than previous state-of-the-art algorithms on sequences whose nativeconformations do not contain structural nuclei (parts of the native fold that predominantly consistof local interactions) at the ends, but rather in the middle of the sequence, and that it generally findsa more diverse set of native conformations.Conclusions: The application of ACO to this bioinformatics problem compares favourably withspecialised, state-of-the-art methods for the 2D and 3D HP protein folding problem; our empiricalresults indicate that our rather simple ACO algorithm scales worse with sequence length butusually finds a more diverse ensemble of native states. Therefore the development of ACOalgorithms for more complex and realistic models of protein structure holds significant promise.BackgroundAnt Colony Optimisation (ACO) is a population-basedstochastic search method for solving a wide range of com-binatorial optimisation problems. ACO is based on theconcept of stigmergy – indirect communication betweenmembers of a population through interaction with theenvironment. An example of stigmergy is the communica-trails on the ground and thereby influencing the decisionprocesses of other ants. This simple form of communica-tion between individual ants gives rise to complex behav-iours and capabilities of the colony as a whole.From the computational point of view, ACO is an iterativeconstruction search method in which a population of sim-Published: 14 February 2005BMC Bioinformatics 2005, 6:30 doi:10.1186/1471-2105-6-30Received: 16 July 2004Accepted: 14 February 2005This article is available from: http://www.biomedcentral.com/1471-2105/6/30© 2005 Shmygelska and Hoos; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Page 1 of 22(page number not for citation purposes)tion of ants during the foraging process: ants indirectlycommunicate with each other by depositing pheromoneple agents ('ants') repeatedly constructs candidate solu-tions to a given problem; this construction process isBMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30probabilistically guided by heuristic information on thegiven problem instance as well as by a shared memorycontaining experience gathered by the ants in previousiterations ('pheromone trails'). Following the seminalwork by Dorigo et al. [1,2], ACO algorithms have beensuccessfully applied to a broad range of hard combinato-rial problems, including the traveling salesman problem,the graph colouring problem, the quadratic assignmentproblem and vehicle routing problems (see, e.g., [3-5]).The research presented in this paper builds on an ACOalgorithm first proposed in [6] (and later improved in [7])for ab-initio protein folding under a widely studiedabstract model – the hydrophobic polar (HP) model. Inparticular, we extend our previous ACO algorithm to the3D HP model and improve its performance by modifyingthe subsidiary local search procedure.The protein folding problem is one of the most challeng-ing problems in computational biology, molecular biol-ogy, biochemistry and physics. Even under simplifiedlattice models, the protein folding problem is non-deter-ministic polynomial-time hard (NP-hard) [8]. The ab-ini-tio protein folding problem can be broken down intothree sub-problems: 1) design of a model (with a desiredlevel of accuracy); 2) definition of an energy function thatcan effectively discriminate between native and non-native states; and 3) design of a search algorithm that canefficiently find minimal-energy conformations. A numberof search (or sampling) methods have been proposed inthe literature to solve the protein folding problem, includ-ing Monte Carlo algorithms, Evolutionary Algorithms,Tabu Search and hybrid approaches. ACO, which hasbeen very successfully applied to other combinatorialproblems, appears to be a very attractive computationalmethod for solving the protein folding problem, since itcombines aspects of chain-growth and permutation-basedsearch with ideas closely related to reinforcement learn-ing. These concepts and ideas apply rather naturally toprotein folding: By folding from multiple initial foldingpoints, guided by the energy function and experiencefrom previous iterations of the algorithm, an ensemble ofpromising, low-energy complete conformations isobtained. These conformations are further improved by asubsidiary local search procedure and then evaluated toupdate the accumulated pheromone values that are usedto bias the generation of conformations in future itera-tions of the algorithm.In this paper, we ask and address the following questions:Is ACO a competitive method for solving the ab-initio pro-tein folding problem under the 2D and 3D HP models?How does its performance scale with sequence length?classes of structures (if any) are solved more efficiently byACO than by any other known algorithms? Finally, itshould be noted that our ACO algorithm for this problemis based on very simple design choices, in particular withrespect to the solution components reinforced in the phe-romone matrix and of the subsidiary local search proce-dure. We discuss which of the many design choicesunderlying our algorithm should be reconsidered in orderto achieve further performance improvements.The hydrophobic polar modelDue to the complexity of the protein folding problem,simplified models such as Dill's hydrophobic-polar (HP)model have become one of the major tools for studyingprotein structure [9]. The HP model is based on the obser-vation that the hydrophobic force is the main force deter-mining the unique native conformation (and hence thefunctional state) of small globular proteins [9,10].In the HP model, the primary amino acid sequence of aprotein (which can be represented as a string over atwenty-letter alphabet) is abstracted to a sequence ofhydrophobic (H) and polar (P) residues that is repre-sented as a string over the letters H and P. The conforma-tions of such an HP sequence are restricted to self-avoiding walks on a lattice. For the 2D HP model, a 2-dimensional square lattice is typically used, and the 3DHP model is generally based on a 3-dimensional cubic lat-tice. An example of a protein conformation under the 2DHP model is shown in Figure 1. The energy of a conforma-tion is defined as the number of topological contactsbetween hydrophobic amino acids that are not neigh-bours in the given sequence. More specifically, a confor-mation c with exactly n such H-H contacts has energy E(c)= n·(-1); for example, the 2D HP conformation shown inFigure 1 has energy -9.The HP Protein Folding Problem can be formally definedas follows: Given an HP sequence s = s1 s2...sn, find anenergy-minimising conformation of s, i.e., find c* ∈ C(s)such that E(c*) = min{E(c) | c ∈ C}, where C(s) is the setof all valid conformations for s. It has been provedrecently that this problem and several variations of it areNP-hard [8].Existing 2D and 3D HP protein folding algorithmsA number of well-known heuristic optimisation methodshave been applied to the 2D and 3D HP Protein FoldingProblem, including Evolutionary Algorithms (EAs) [11-15] and Monte Carlo (MC) algorithms [16-22]. The latterhave been found to be particularly robust and effective forfinding high-quality solutions to the HP Protein FoldingProblem [18].Page 2 of 22(page number not for citation purposes)What is the role of the parameters of the ACO algorithmfor the efficiency of the optimisation process? WhichBMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30Besides general optimisation methods, there are otherheuristic methods that rely on specific heuristics that arebased on intuitions or assumptions about the foldingprocess, such as co-operativity of folding or the existenceof a hydrophobic core. Co-operativity is believed to arisefrom local conformational choices that result in a globallyoptimal state without exhaustive search [23]. Amongthese methods are the hydrophobic zipper method (HZ)[23], the contact interactions method (CI) [24], the core-directed chain growth method (CG) [25], and the con-straint-based hydrophobic core construction method(CHCC) [26].The hydrophobic zipper (HZ) strategy developed by Dillet al. is based on the hypothesis that once a hydrophobiccontact is formed it cannot be broken, and other contactsare formed in accordance with already folded parts of thechain (co-operativity of folding) [23]. The contact interac-tions (CI) algorithm by Toma and Toma [24] combinesresidues in the chain, and thus allows previously formedcontacts to be modified according to their computedmobilities. The core-directed chain growth method (CG)by Beutler and Dill [25] biases construction towards find-ing a good hydrophobic core by using a specificallydesigned heuristic function and by approximating thehydrophobic core with a square (in 2D) or a cube (in 3D).The constraint-based hydrophobic core constructionmethod (CHCC) by Yue and Dill [26] is complete, i.e.,always guaranteed to find a global optimum; it attemptsto find the hydrophobic core with the minimal possiblesurface area by systematically introducing geometric con-straints and by pruning branches of a conformationalsearch tree. A similar, but more efficient complete con-straint satisfaction search method has been proposed byBackofen et al. [27] for the more complex face-centredcubic lattice.An early application of Evolutionary Algorithms to pro-tein structure prediction was presented by Unger andMoult [14,15]. Their non-standard EA incorporates char-acteristics of Monte Carlo methods. Currently among thebest known algorithms for the HP Protein Folding prob-lem are various Monte Carlo algorithms, including the'pruned-enriched Rosenbluth method' (PERM) of Grass-berger et al. [16,18]. PERM is a biased chain growth algo-rithm that evaluates partial conformations and employspruning and enrichment strategies to explore promisingpartial solutions.Other methods for solving protein folding problemsinclude the dynamic Monte Carlo algorithm by Ram-akrishnan et al. [21], which introduced long-range movesinvolving disconnection of the chain, and the evolution-ary Monte Carlo (EMC) algorithm by Liang and Wong[19], which works with a population of individuals thateach perform Monte Carlo optimisation; a variant of EMCalso reinforces certain secondary structures (alpha-helicesand beta-sheets).Finally, Chikenji et al. introduced the multi-self-overlapensemble (MSOE) Monte Carlo method [17], which con-siders overlapping chain configurations.Other Monte Carlo methods that have been particularlyuseful in off-lattice protein folding include generalisedensemble methods, such as umbrella sampling [28] (withreplica exchange sampling [29,30] being the most com-mon variant) and multi-canonical (entropic) sampling[30,31]. Replica exchange Monte Carlo (parallel temper-ing) has also been applied to the off-lattice HP model[32].A sample protein conformation in the 2D HP modelFigure 1A sample protein conformation in the 2D HP model. The underlying protein sequence (Sequence S1-1 from Table 1)is HPHPPHHPHPPHPHHPPHPH; black circles represent hydrophobic amino acids, while white circles symbolise polar amino acids. The dotted lines represents the H-H contacts underlying the energy calculation. The energy of this confor-mation is -9, which is optimal for the given sequence.Page 3 of 22(page number not for citation purposes)the idea of HZ with a Monte Carlo search procedure thatassigns different conformational freedom to the differentCurrently, when applied to the square and cubic latticeHP model, none of these algorithms appears toBMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30completely dominate the others in terms of solution qual-ity and run-time.Our ACO algorithm for the 2D and 3D HP protein folding problemIn previous work, we have applied ACO to the 2D HP Pro-tein Folding Problem [6,7]; in the following, we brieflysummarise the main features of our ACO algorithm andthe improvements introduced in this work. Details on ourACO framework and the new ACO-HPPFP-3 algorithmdeveloped in the context of this work are given in the'Methods' section.As usual, the ants in our ACO algorithm iterativelyundergo three phases: the construction phase – duringwhich each ant constructs a candidate solution by sequen-tially growing a conformation of the given HP sequence,starting from a folding point that is chosen uniformly atrandom among all sequence positions; the local searchphase – when ants further optimise protein conformationsfolded during the construction phase; and the pheromoneupdate phase – when ants update the pheromone matrix(representing the collective global memory of the colony)based on the energies of the conformations obtained afterthe construction and the local search phases. A generaloutline of ACO is shown in Figure 2.The solution components used during the constructionprocess, the local search phase and the pheromone updateare local structure motifs (or relative folding directions)straight (S), left (L), right (R) in 2D, and straight (S), left (L),right (R), up (U), down (D) in 3D, which for each aminoacid indicate its position on the 2D or 3D lattice relativeto its direct predecessors in the given sequence (see Figure3). In 3D, the relative folding directions are defined as in[33]: A local coordinate system is associated with everysequence position, such that S corresponds to the direc-tion of the x axis, L to the direction of the y axis, and U tothe direction of the z axis. Each local motif corresponds toa relative rotation of this coordinate system (for the for-ward construction: S = no rotation, L = 90° counter-clock-wise around the z axis, R = 90° clockwise around the zaxis, U = 90° clockwise around the y axis, D = 90° coun-ter-clockwise around the y axis).Since conformations are rotationally invariant, the posi-tion of the first two amino acids can be fixed without lossof generality. Hence, we represent candidate conforma-tions for a protein sequence of length n by a sequence oflocal structure motifs of length n - 2. For example, theconformation of Sequence S1-1 shown in Figure 1 corre-sponds to the motif sequence LSLLRRLRLLSLRRLLSL.ACO outlineFigure 2ACO outline. Generic outline of Ant Colony Optimisation (for static combinatorial problems).procedure ACOinitialise pheromone trails;while (termination condition not satisfied) doconstruct candidate conformations;perform local search;update pheromone values;endendLocal structure motifsFigure 3Local structure motifs. The local structure motifs which form the solution components underlying the construction and local search phases of our ACO algorithm in 3D.i i+1ii−1i−1S S SSS LS i i ii+1i−1 i−1i−1S S S S S SSS i+1US i+1R DSi+1UP DOWNSTRAIGHT LEFT RIGHTPage 4 of 22(page number not for citation purposes)BMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30During the construction phase, ants fold a protein froman initial folding point by probabilistically adding oneamino acid at a time based on the two sources of informa-tion: pheromone matrix values τ (which represent previ-ous search experience and reinforce certain structuralmotifs) and heuristic function values η (which reflect cur-rent energy of the considered structural motif); details ofthis process are given in the 'Methods' section. The relativeimportance of τ and η is determined by parameters α andβ, respectively, whose settings are detailed in the 'Discus-sion' section. Similar to other ACO algorithms knownfrom the literature, our algorithm for the HP ProteinFolding Problem incorporates a local search phase thattakes the initially built protein conformation andattempts to optimise its energy further, using probabilisticlong-range moves that are described in detail in the 'Meth-ods' section.Finally, the pheromone update procedure is based on twomechanisms: Uniform pheromone evaporation is mod-elled by decreasing all pheromone levels by a constant fac-tor ρ (where 0 <ρ ≤ 1), and pheromone reinforcement isachieved by increasing the pheromone levels associatedwith the local folding motifs used in a fraction of the bestconformations (in terms of energy values) obtained dur-ing the preceding construction and local search phase.Furthermore, to prevent search stagnation when all of thepheromone is accumulated on very few structural motifs,we introduce an additional renormalisation mechanismfor the pheromone levels (controlled by a parameter θwhere 0 ≤ θ < 1; details are given in the 'Methods' section).Different from our previous ACO algorithms for the HPProtein Folding Problem, our new algorithm, ACO-HPPFP-3, supports the 3D HP cubic lattice model in addi-tion to the 2D HP square lattice model. Furthermore, ituses a different iterative improvement strategy, a modifiedlong-range move operator and a less restrictive termina-tion criterion in its local search phase. ACO-HPPFP-3 wasused in all ACO experiments described in the following.ResultsTo compare ACO-HPPFP-3 with algorithms for the 2Dand 3D HP Protein Folding Problem described in the lit-erature, we tested it on a number of standard benchmarkinstances as well as on two newly created data sets, one ofwhich was obtained by randomly generating amino acidsequences with hydrophobicity value characteristic ofglobular proteins, while the other consists of biologicalsequences that were translated into HP strings using astandard hydrophobicity scale. (These new data sets willbe described in more detail later in this section.)Results for standard benchmark instancesThe 21 standard benchmark instances for 2D- and 3D-HPprotein folding shown in Table 1 have been widely usedin the literature [6,12,14-17,19,25]. Experiments on thesestandard benchmark instances were conducted by per-forming a number of independent runs for each probleminstance (in 2D: 500 runs for sequence length n ≤ 50, 100runs for 50 <n ≤ 64, and 20 runs for n >64; in 3D: 100 runsfor each sequence). Unless explicitly indicated otherwise,we used the following parameter settings for all experi-ments: α: = 1, β: = 2, ρ: = 0.8 and θ: = 0.05. Furthermore,all pheromone values were initialised to 1/3 in 2D and to1/5 in 3D, and a population of 100 ants was used, 50% ofwhich were allowed to perform local search. The localsearch procedure was terminated when no improvementin energy had been obtained after between 1 000 (for n ≤50) and 10 000 (for n > 50) scans through the proteinsequence. We used an elitist pheromone updating schemein which only the best 1% of all ants was allowed to per-form pheromone updates. The probability of keepingthe previous direction when feasible during the long-range mutation move was set to 0.5 (see 'Methods' sec-tion). These settings were determined in a series of exper-iments in which we studied the influence of differentparameter settings and will be further discussed later. Allexperiments were performed on PCs with 2.4 GHz Pen-tium IV CPUs, 256 Kb cache and 1 MB RAM, running Red-hat Linux (our reference machine), and run-time wasmeasured in terms of CPU time.Most studies of EA and MC methods in the literature,including [12,14,15,19], report the number of valid con-formations scanned during the search. This makes a per-formance comparison difficult, since run-time spent forbacktracking and the checking of partial or infeasible con-formations, which may vary substantially between differ-ent algorithms, is not accounted for. We thereforecompared ACO to the best-performing algorithm fromthe literature for which performance data in terms of CPUtime is available – PERM [18] (we used the most recentimplementation, which was kindly provided by P. Grass-berger). We note that the most efficient PERM variant forthe HP Protein Folding Problem uses an additional pen-alty of 0.2 for H-P contacts [34]. Since this corresponds toan energy function different from that of the standard HPmodel underlying our ACO algorithm as well as otheralgorithms developed in literature, we used the best per-forming variant of PERM [18] based on the standardenergy function in our experiments. It may be noted thatthe chain growth process in PERM can start from the N- orC-terminus of the given HP sequence, and in many cases,this results in substantial differences in the performanceof the algorithm. To capture this effect, we always ranpˆPage 5 of 22(page number not for citation purposes)PERM in both directions, and in addition to the respectiveaverage run-times, t1 and t2, we report the expected timeBMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30for solving a given problem instance when performingboth runs concurrently, texp = 2·(1/t1 + 1/t2)-1. For all runsof PERM, the following parameter settings were used:inverse temperature γ: = 26 and q: = 0.2.The results obtained on standard 2D benchmark instances(see Table 2) indicate that ACO-HPPFP-3 is competitivewith the EA and MC methods described in the literature;it works very well on sequences of sizes up to 64 aminoacids and produces high quality suboptimal configura-tions for the longest sequences considered here (85 and100 amino acids). On average, ACO requires less CPUtime than PERM for finding best known conformationsfor Sequence S1-8; but PERM performs better forSequences S1-6 and S1-7 as well as for the longersequences of 85 to 100 residues (Sequence S1-9 to S1-11).Sequence S1-8 has a very symmetrical optimal state (seeFigure 4), which – as argued in [18] – would be difficultto find for any chain growing algorithm. All algorithmsis able to handle this instance quite well, since a numberof ants folding from different starting points in conjunc-tion with a local search procedure that involves large-scalemutations originating from different sequence positionscan produce good partial folds for various parts of thechain. In comparison with other algorithms for the 2D HPProtein Folding Problem considered here (EA, EMC,MSOE), ACO-HPPFP-3 generally shows very good per-formance on standard benchmark instances.In case of the 3D HP Protein Folding Problem (see Table3), the majority of algorithms for which we were able tofind performance results in the literature use heuristicsthat are highly specialised for this problem. Unlike HZ,CG and CI, ACO-HPPFP-3 finds optimal (or best known)solution qualities for all sequences. However, PERM(when folding from the N-terminus) and CHCC consist-ently outperform ACO-HPPFP-3 on these standard 3D HPbenchmark instances, and CG reaches best known solu-tion qualities substantially faster in many cases. We noteTable 1: 2D and 3D HP standard benchmark instances. Benchmark instances for the 2D and 3D HP Protein Folding Problem used in this study with optimal or best known energy values E*. Most instances for 2D and 3D HP can also be found in [44]; Sequence S1-9 (2D) is taken from [45], and the last two instances (2D) are from [21]. Hi and Pi indicate a string of i consecutive H's and P's, respectively; likewise, (s)i indicates an i-fold repetition of string s.ID Length E* Protein Sequence2D HPS1-1 20 -9 (HP)2PH2PHP2HPH2P2HPHS1-2 24 -9 H2(P2H)7HS1-3 25 -8 P2HP2(H2P4)3H2S1-4 36 -14 P3H2P2H2P5H7P2H2P4H2P2HP2S1-5 48 -23 P2H(P2H2)2P5H10P6(H2P2)2HP2H5S1-6 50 -21 H2(PH)3PH4PH(P3H)2P4H(P3H)2PHPH4(HP)3H2S1-7 60 -36 P2H3PH8P3H10PHP3H12P4H6PH2PHPS1-8 64 -42 H12(PH)2(P2H2)2P2HP2H2PPH2P2HP2(H2P2)2(HP)2H12S1-9 85 -53 H4P4H12P6(H12P3)3HP2(H2P2)2HPHS1-10 100 -50 P3H2P2H4P2H3(PH2)2PH4P8H6P2H6P9HPH2PH11P2H3PH2PHP2HPH3P6H3S1-11 100 -48 P6HPH2P5H3PH5PH2P4H2P2H2PH5PH10PH2PH7p11H7P2HPH3P6HPH23D HPS2-1 48 -32 HPH2P2H4PH3P2H2P2HPH2PHPH2P2H2P3HP8H2S2-2 48 -34 H4PH2PH5P2HP2H2P2HP6HP2HP3HP2H2P2H3PHS2-3 48 -34 PHPH2PH6P2HPHP2HPH2(PH)2P3H(P2H2)2P2HPHP2HPS2-4 48 -33 PHPH2P2HPH3P2H2PH2P3H5P2HPH2(PH)2P4HP2(HP)2S2-5 48 -32 P2HP3HPH4P2H4PH2PH3P2(HP)2HP2HP6H2PH2PHS2-6 48 -32 H3P3H2PH(PH2)3PHP7HPHP2HP3HP2H6PHS2-7 48 -32 PHP4HPH3PHPH4PH2PH2P3HPHP3H3(P2H2)2P3HS2-8 48 -31 PH2PH3PH4P2H3P6HPH2P2H2PHP3H2(PH)2PH2P3S2-9 48 -34 (PH)2P4(HP)2HP2HPH6P2H3PHP2HPH2P2HPH3P4HS2-10 48 -33 PH2P6H2P3H3PHP2HPH2(P2H)2P2H2P2H7P2H2Page 6 of 22(page number not for citation purposes)from the literature which we are aware of have problemsfolding this sequence; ACO-HPPFP-3, on the other hand,that for Sequence S2-3 and S2-7, PERM'S performance isgreatly dependent on the folding direction.BMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30Result for new biological and random data setsTo thoroughly test the performance of ACO-HPPFP-3, wecreated two new data sets of random and biologicalsequences of length ≈ 30 and ≈ 50 amino acids (tensequences for each length; for details, see additional datafile 1). Random sequences were generated based on theobservation that most globular proteins have a fairly uni-form amino acid profile, and the percent of hydrophobicresidues of majority of globular proteins falls in the rangeof 40–50% [35]. Thus, the probability of generatingcharacter H at each position of a sequence was chosen tobe 0.45, and in the remaining cases (i.e., with probability0.55), we generated a P.For the biological test-sets, ten sequences were taken fromthe PDBSELECT data set with homology < 25% from theProtein Data Bank (PDB) in order to obtain a non-redun-dant representative set of proteins. These proteinsequences were translated into HP strings using the hydro-phobicity scale classification of RASMOL [36], accordingto which the following amino acids were consideredhydrophobic: Ala, Leu, Val, Ile, Pro, Phe, Met, Trp, Gly andTyr. Non-standard amino acid symbols, such as X and Z,were skipped in this translation.Figures 5 and 6 illustrate the performance of ACO-HPPFP-3 vs PERM in terms of mean CPU time over 10 runs perinstance and algorithm; for practical reasons, each runTable 2: Performance comparison of various algorithms for the 2D HP Protein Folding Problem. Comparison of the solution quality obtained in 2D by the evolutionary algorithm of Unger and Moult (EA) [14], the evolutionary Monte Carlo algorithm of Liang and Wong (EMC) [19], the multi-self-overlap ensemble algorithm of Chickenji et al. (MSOE) [17], the pruned-enriched Rosenbluth method (PERM) and ACO-HPPFP-3. For EA and EMC, the reported energy values are the lowest among five independent runs, and the values in parentheses are the numbers of valid conformations scanned before the lowest energy values were found. Missing entries indicate cases where the respective method has not been tested on a given instance. The CPU times reported in parentheses for MSOE have been determined on a 500 MHz CPU, and those for PERM and ACO-HPPFP-3 are based on 100 – 200 runs per instance on our reference 2.4 GHz Pentium IV machine. The energy values shown in bold face correspond to currently best known solution qualities.ID E GA EMC MSOE PERM t1 PERM t2 PERM texp ACOS1-1 -9 -9 (30 492) -9 (9 374) -9 (< 1 sec) -9 (< 1 sec) -9 (< 1 sec) -9 (< 1 sec)S1-2 -9 -9 (30 491) -9 (6 929) -9 (< 1 sec) -9 (< 1 sec) -9 (< 1 sec) -9 (< 1 sec)S1-3 -8 -8 (20 400) -8 (7 202) -8 (6 sec) -8 (< 1 sec) -8 (2 sec) -8 (< 1 sec)S1-4 -14 -14 (301 339) -14 (12 447) -14 (< 1 sec) -14 (< 1 sec) -14 (< 1 sec) -14 (4 sec)S1-5 -23 -23 (126 547) -23 (165 791) -23 (3 min) -23 (< 1 sec) -23 (2 sec) -23 (1 min)S1-6 -21 -21 (592 887) -21 (74 613) -21 (3 sec) -21 (3 sec) -21 (3 sec) -21 (15 sec)S1-7 -36 -34 (208 781) -35 (203 729) -36 (7 sec) -36 (3 sec) -36 (4 sec) -36 (20 min)S1-8 -42 -37 (187 393) -39 (564 809) -39 -42 (78 hrs) -42 (78 hrs) -42 (78 hrs) -42 (1.5 hrs)S1-9 -53 -52 (44 029) -53 (64 sec) -53 (60 sec) -53 (1 min) -53 (20% of runs 1 days)S1-10 -50 -50 (50 hrs) -50 (50% of runs 1 hrs) -50 (20 min) -50 -49 (12 hrs)S1-11 -48 -47 -48 (9 min) -48 (7 min) -48 (8 min) -47 (10 hrs)The 2D native state of the standard Sequence S1-8Figure 4The 2D native state of the standard Sequence S1-8. The native conformation of Sequence S1-8 from Table 1 (64 amino acids; energy -42), found by ACO-HPPFP-3 in an aver-age CPU time of 1.5 hours and by PERM in t1 = t2 = texp = 78 hours.Page 7 of 22(page number not for citation purposes)was restricted to 1 CPU hour on our reference machine,BMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30and the lowest energies obtained in these runs (listed inadditional data file 1) are not necessarily optimal.As can be seen from these results, in 2D, ACO-HPPFP-3performs roughly comparably to PERM (PERM'S texp wascalculated as described in the previous subsection): ACO-HPPFP-3 reaches the same energies as PERM, but on someinstances, particularly of length 50, requires more run-time. In 3D, ACO-HPPFP-3 generally requires a compara-ble amount of run-time on sequences of length 30 andoutperforms PERM on one random sequences of length30, but performs noticeably worse on sequences of length50 and in some cases does not reach the same energy. Wealso generated longer sequences of length 75; for these,ACO-HPPFP-3 failed to reach the minimal energy valuesobtained by PERM in a number of cases. The run-times forboth algorithms are reported in detail in Additional file 1;we note that on some sequences, the performance ofPERM depends significantly on the direction of folding.Interestingly, there is no significant difference in perform-ance between the biological and random test-sets foreither PERM or ACO-HPPFP-3.In summary, the performance of ACO-HPPFP-3 is compa-rable with that of PERM (the best known algorithm for the2D and 3D HP Protein Folding Problem) on biologicaland random sequences of length 30–50, but worse onlonger sequences. This scaling effect is significantly morepronounced in 3D than in 2D. We note that neither ACO-HPPFP-3 nor PERM were optimised for short sequences (n≤ 30), but by using parameter settings different from theCharacteristic performance differences between ACO and PERMTo further investigate the conditions under which ACOperforms well compared to PERM, we visually examinednative conformations found by both algorithms, payingspecial attention to conformations for which one of thetwo algorithms does not perform well (see Figures 7 and9). Based on our observations, we hypothesised thatPERM usually performs well on sequences that have astructural nucleus in the native conformation at one of theends of the sequence (particularly the end from whichPERM starts folding the sequence); on the other hand, ithas trouble folding sequences whose nativeconformations have structural nuclei in the middle of thesequence. In comparison, ACO is not significantlyaffected by the location of the structural nucleus (or mul-tiple nuclei) in the sequence, since it uses constructionfrom different folding points as well as the long-rangemutation moves in local search, which can initiate refold-ing from arbitrary sequence positions. Here, we use theterm 'structural nucleus' to refer to a predominantlylocally folded part of the chain that can be relatively easilyfolded sequentially based on local sequence information[37]. For most sequences considered in this study, weobserved a single structural nucleus, which is not surpris-ing, given their relatively short length; however, it is gen-erally believed that longer sequences have multiplefolding nuclei [37].The left side of Figure 7 shows an example of a relativelyshort biological sequence (B50-7, 45 amino acids) with aTable 3: Performance comparison of various algorithms for the 3D HP Protein Folding Problem. Comparison of the solution quality obtained in 3D by the hydrophobic zipper (HZ) algorithm [23], the constraint-based hydrophobic core construction method (CHCC) [26], the core-directed chain growth algorithm (CG) [25], the contact interactions (CI) algorithm [24], the pruned-enriched Rosenbluth method (PERM) and ACO-HPPFP-3. For CI, only the best energies obtained are shown. For HZ, CHCC and CG, the reported CPU times are taken from [25]; these are the expected times for finding optimal solutions on a Sparc 1 workstation. In the case of HZ, the reported CPU times are based on an extrapolation from the measured times required for finding suboptimal conformations with the energy values listed here. The CPU times for PERM and ACO-HPPFP-3 were determined on our reference 2.4 GHz Pentium IV machine based on 50 – 100 runs per instance. The energy values shown in bold face correspond to currently best known solution qualities.ID E HZ CHCC CG CI PERM t1 PERM t2 PERM texp ACOS2-1 -32 -31(4 hrs) -32 (30 min) -32 (9.4 min) -32 -32 (0.1 min) -32 (0.5 min) -32 (0.2 min) -32 (30 min)S2-2 -34 -32 (18 hrs) -34 (2.3 min) -34 (35 min) -33 -34 (0.3 min) -34 (48 min) -34 (0.6 min) -34 (420 min)S2-3 -34 -31 (23 hrs) -34 (30 min) -34 (62 min) -32 -34 (0.1 min) -34 (4 days) -34 (0.2 min) -34 (120 min)S2-4 -33 -30 (19 days) -33 (71 min) -33 (29 min) -32 -33 (2 min) -33 (4 min) -33 (3 min) -33 (300 min)S2-5 -32 -30 (1.3 days) -32 (32 min) -32 (12 min) -32 -32 (0.5 min) -32 (19 min) -32 (1 min) -32 (15 min)S2-6 -32 -29 (2.1 days) -32 (80 min) -32 (460 min) -30 -32 (0.5 min) -32 (0.1 min) -32 (0.2 min) -32 (720 min)S2-7 -32 -29 (2.5 days) -32 (110 min) -32 (64 min) -30 -32 (0.5 min) -32 (2 days) -32 (1 min) -32 (720 min)S2-8 -31 -29 (4 hrs) -31 (530 min) -31 (38 min) -30 -31 (0.3 min) -31 (8 min) -31 (0.6 min) -31 (120 min)S2-9 -34 -31(4.5 hrs) -34 (8.3 min) -33 -32 -34 (5 min) -34 (10 min) -34 (7 min) -34 (450 min)S2-10 -33 -33 (1.1 hr) -33 (4.8 min) -33(1.1 min) -32 -33 (0.01 min) -33 (0.01 min) -33 (0.01 min) -33 (60 min)Page 8 of 22(page number not for citation purposes)ones specified earlier, the performance of both algorithmscan be significantly improved in this case.unique native hydrophobic core in the 2D HP model.(This is rare for HP sequences, which usually have a highBMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30ground state and hydrophobic core degeneracy: Accordingto our observations, of the 11 standard benchmarkinstances in 2D, only Sequences S1-1, S1-3, S1-4 have aunique hydrophobic core; in 3D, none of the sequencesstudied here have a unique hydrophobic core.) Thissequence has no structural nuclei at its ends; instead, thetwo ends interact with each other. ACO-HPPFP-3 outper-forms PERM by a factor of 2 on this sequence in terms ofCPU time: using a cut-off time of 1 CPU hour per run,PERM found the optimum with energy -17 in an averagerun-time of 284.06 CPU seconds (t1 = 271 sec, t2 = 299sec), while using the same cut-off time and machine,ACO-HPPFP-3 found the optimum in an average run-time of 130 CPU seconds.We also designed two additional sequences, D-1 and D-2,of length 50 and 60, respectively, that have a uniquenative state in which both ends of the sequence interactperformance of PERM and ACO-HPPFP-3 on thesesequences, we found that on D-1, ACO-HPPFP-3 requiresa mean run-time of 236 CPU seconds, compared to t1 = 3795, t2 = 1, texp = 2 CPU seconds for PERM (values arebased on 100 successful runs). When this sequence wasreversed, PERM started folding the sequence from thestructural nucleus, and its mean run-time dropped to 1CPU second. A result similar to that for sequence B50-7was obtained for Sequence D-2, which has no structuralnuclei at the ends, but a native state in which the endsinteract with each other. Here, ACO-HPPFP-3 was foundto require a mean run-time of 951 CPU seconds (again,mean run-times were obtained from 100 successful runs),compared to t1 = 9 257, t2 = 19 356, texp = 12 525 CPU sec-onds for PERM; as expected, in this case, reversing thefolding order of the sequence did not cause a decrease inPERM'S run-time.Performance comparison of ACO-HPPFP-3 and PERM on biological and random instances in 2DFigure 5Performance comparison of ACO-HPPFP-3 and PERM on biological and random instances in 2D. Mean CPU time (natural log transformed) required by ACO-HPPFP-3 vs PERM for reaching the best solution quality, as observed over 10 runs with a cut-off time of 1 CPU hour for sequences of length 30 and 50 in 2D. The left and right plots show the results for the biological and random test-sets, respectively. Performance results for instances of size 30 are indicated by circles, while stars mark results for instances of size 50. The dashed lines indicate the band within which performance differences are not statistically significant. Mean run-times were obtained from 10 runs per instance and algorithm, and we only show data points for the runs where the best known solution quality was reached at least in some runs out of 10 by both algorithms (when unsuccessful runs were present, the expected time was calculated as in [43]); results for both successful and unsuccessful runs are given in the Additional file 1.-6-4-2 0 2 4 6 8-6 -4 -2 0 2 4 6 8ACO CPU time, log [sec]PERM CPU time, log [sec]-6-4-2 0 2 4 6 8-6 -4 -2 0 2 4 6 8ACO CPU time, log [sec]PERM CPU time, log [sec]Page 9 of 22(page number not for citation purposes)with each other (see Figure 8). Sequence D-1 also has astructural nucleus near its C-terminus. When testing theBMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30We also analysed native conformations of sequences onwhich PERM outperforms ACO and observed that the endfrom which PERM starts folding is relatively compact andforms a structural nucleus in the resulting conformation.An example of a conformation with the structural nucleusat the beginning of the sequence (near the N-terminus,i.e., residue 1) is shown in the right panel of Figure 7. Forthis biological sequence (B50-5, 53 amino acids), PERMfinds an optimal conformation with an energy of -22 in t1= 5, t2 = 118, texp = 9 CPU seconds, while the average run-time for ACO-HPPFP-3 is 820 CPU seconds. Our ACOalgorithm generally performs worse than PERM onsequences that have structural nuclei at the ends, becauseit tends to spend substantial amounts of time compactinglocal regions in the interior of the sequence, while PERMfolds more systematically from one end. These observa-tions also hold in 3D, as seen from two random sequencesfolded in 3D (see Figure 9).found by ACO-HPPFP-3 and PERM, respectively. For thispurpose, we introduced the notion of relative H-H contactorder, which captures arrangement of H residues in thecore of the folded protein, and thus determines thetopology of the conformation (the closely related conceptof contact order was first defined in [38]). Relative H-Hcontact order is defined as follows:where l is the number of H-H contacts, n is the number ofH residues in the sequence, and i and j are interacting Hresidues that are not neighbours in the chain. Intuitively,COH-H specifies the average sequence separation betweenH-H residues in contact per H in the sequence.Figure 10 shows cumulative frequency distributions of rel-ative H-H contact order values for sets of native conforma-tions of a 2D (the left panel) and 3D (the right panel)Performance comparison of ACO-HPPFP-3 and PERM on biological and random instances in 3DFigure 6Performance comparison of ACO-HPPFP-3 and PERM on biological and random instances in 3D. Mean CPU time (natural log transformed) required by ACO-HPPFP-3 vs PERM for reaching the best solution quality, as observed over 10 runs with a cut-off time of 1 CPU hour for sequences of length 30 and 50 in 3D. The left and right plots show the results for the biological and random test-sets, respectively. Performance results for instances of size 30 are indicated by circles, while stars mark results for instances of size 50. Mean run-times were obtained from 10 runs per instance and algorithm, and we only show data points for the runs where the best known solution quality was reached at least in some runs out of 10 by both algorithms (when unsuccessful runs were present, the expected time was calculated as in [43]); results for both successful and unsuccessful runs are given in the Additional file 1.-4-2 0 2 4 6 8-4 -2 0 2 4 6 8ACO CPU time, log [sec]PERM CPU time, log [sec]-4-2 0 2 4 6 8-4 -2 0 2 4 6 8ACO CPU time, log [sec]PERM CPU time, log [sec]COl ni jH Hi j−< −=⋅− ( )∑: ,1 11Page 10 of 22(page number not for citation purposes)To further investigate our hypothesis, we studied differ-ences between the distributions of native conformationsstandard benchmark instance, respectively, found byACO-HPPFP-3 and PERM over 500 independent runs,BMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30each of which was terminated as soon as a native confor-mation had been found. These results show that the ACOalgorithm finds a set of native conformations with a widerrange of H-H contact order values than PERM; in particu-lar, ACO-HPPFP-3 finds conformations with high relativeH-H contact oder as compared to PERM (more distantparts of the chain interact; for example, relative COH-H =0.324 for Sequence S1-7 in 2D and relative COH-H = 0.75for Sequence S2-5 in 3D are not found by PERM; similarresults were obtained for other sequences), which furthersupports our hypothesis that both, in 2D and 3D, PERMis biased toward a more restricted set of native conforma-tions. We performed analogous experiments for the casewhere PERM is allowed to keep certain statistics from onerun to another as in [18] (runs are no longer independent)and found no significant differences in the set of confor-mations obtained (data not shown).To further examine the topological differences betweenensembles of native conformations found by the twoalgorithms, we also looked at the hydrophobic solventaccessible area (defined as SAH-H: = ∑hEh, where Eh is thenumber of unoccupied lattice sites around H residue h),from the N-terminus of the sequence – where PERM startsfolding). In this analysis, we calculated the properties ofinterest mentioned above for the native conformationsfound in 100 independent runs by ACO-HPPFP-3 andPERM, and plotted the mean values of the respectivequantities as functions of the sequence prefix length (seeFigures 11, 12 and 13).As seen in Figure 11, ACO-HPPFP-3 is less greedy thanPERM, both in 2D (left side) and in 3D (right side), and ittends to leave more lattice sites around H residues accessi-ble for future contacts with other H residues that appearlater in the chain. This is also reflected in the meannumber of H-H contacts formed when folding prefixes ofincreasing length; ACO-HPPFP-3 tends to form fewer H-Hcontacts than PERM for short and medium size prefixes(see Figure 12). By examining the dependence of absoluteH-H contact order (defined as , the aver-age sequence separation per H-contact) on prefix length,we furthermore observed that different from PERM, ACO-HPPFP-3 realises the bulk of its local H-H interactions inthe middle of the given sequence (see Figure 13). This fur-Illustration and comparison of difficult structures for PERM and ACO-HPPFP-3 in 2DFigu e 7Illustration and comparison of difficult structures for PERM and ACO-HPPFP-3 in 2D. Left side: Lowest energy conformation of a biological sequence (B50-7, 45 amino acids, energy -17) that is harder for PERM (t1 = 271, t2 = 299, texp = 284 CPU seconds) than for ACO-HPPFP-3 (texp = 130 CPU seconds; cut-off time 1 CPU hour). Right side: Lowest energy confor-mation of a biological sequence (B50-5, 53 amino acids, energy -22) that is much harder for ACO-HPPFP-3 than for PERM; within a cut-off time of 1 CPU hour, both ACO-HPPFP-3 and PERM reached this energy in 10 out of 10 runs in tavg = 820 and t1 = 5, t2 = 118, texp = 9 CPU seconds on average, respectively.11li ji j−< −∑Page 11 of 22(page number not for citation purposes)the number of H-H contacts, and the H-H contact order asa function of the length of the sequence prefix (startingther confirms that ACO is capable of finding native con-formations with structural folding nuclei that are notBMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30located at or near the end of a given protein sequence. Theresults illustrated in Figures 11, 12 and 13 are typical forall 2D and 3D HP instances we studied.DiscussionAlthough conceptually rather simple, our ACO algorithmis based on a number of distinct components and mecha-nisms. A natural question to ask is whether and to whichextent each of these contributes to the performancereported in the previous section. A closely relatedquestions concerns the impact of parameter settings onthe performance of ACO-HPPFP-3; further details con-cerning parameters can be found in the 'Methods' section.To address these questions, we conducted several series ofexperiments. In this context, we primarily used threestandard test sequences: Sequence S1-7 of length 60 andSequence S1-8 of length 64 (long sequences) in 2D, aswell as Sequence S2-5 of length 48 in 3D (all standardbenchmark sequences for 3D are 48 amino acids inlength); these sequences were chosen because the CPUtime required to find the best known solutions was suffi-Following the methodology of Hoos and Stützle [39], wemeasured run-time distributions (RTDs) of our ACO algo-rithm, which represent the (empirical) probabilitydistribution over the run-time required to reach (orexceed) a given solution quality; the solution qualitiesused here are the known optima or best known energiesfor the respective sequences.Pheromone values and heuristic informationTwo important components of any ACO algorithm are theheuristic function, which indicates the desirability ofusing particular solution components during the con-struction phase, and the pheromone values, which repre-sent information learned over multiple iterations of thealgorithm. Three parameters control the influence of thepheromone information versus heuristic information onthe construction of candidate solutions: the relativeweight of the pheromone information, α; the relativeweight of the heuristic information, β; and the pherom-one persistence, ρ (see also 'Methods' section).Performance of ACO-HPPFP-3 and PERM on designed sequences in 2D HPFigure 8Performance of ACO-HPPFP-3 and PERM on designed sequences in 2D HP. Left side: Unique minimal energy con-formation of a designed sequence, D-1 (length 50, energy -19); ACO-HPPFP-3 reaches this conformation much faster than PERM when folding from the left end (mean run-time over 100 successful runs for ACO-HPPFP-3: 236 CPU seconds, com-pared to t1 = 3 795, t2 = 1, texp = 2 CPU seconds for PERM). Right side: Unique native conformation of another designed sequence, D-2 (length 60, energy -17); ACO-HPPFP-3 finds this conformation much faster than PERM folding from either end (mean run-time over 100 successful runs for ACO-HPPFP-3: 951 CPU seconds, compared to t1 = 9 257, t2 = 19 356, texp = 12 524 CPU seconds for PERM).Page 12 of 22(page number not for citation purposes)ciently small to perform a large number of runs (100–200per instance).In the first experiment, we investigated the impact of phe-romone (α) and heuristic information (β), and their rela-BMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30Illustration and comparison of difficult structures for PERM and ACO-HPPFP-3 in 3DFigu e 9Illustration and comparison of difficult structures for PERM and ACO-HPPFP-3 in 3D. Left side: Lowest energy conformation of random sequence R50-9 (50 amino acids, energy -30), which is harder for PERM when folding from the left end than for ACO-HPPFP-3; with a cut-off time of 1 CPU hour, ACO-HPPFP-3 reached this energy in 10 out of 10 runs with texp = 1000 CPU seconds, while PERM failed to find a conformation with this energy in 7 out of 10 runs when folding from the left end (t1 = 9 892, t2 = 2, texp = 3 CPU seconds). Right side: Lowest energy conformation of random sequence R50-7 (50 amino acids, energy -38), which is much harder for ACO-HPPFP-3 than for PERM; with a cut-off time of 1 CPU hour, PERM reached this energy in two out of 10 runs when folding from the left and in 10 of 10 runs when folding from the right end in t1 = 15 322, t2 = 46, texp = 92 CPU seconds, while the lowest energy reached by ACO-HPPFP-3 over ten runs was -37.Comparison of distributions of H-H contact order of native astructures found by ACO-HPPFP-3 and PERM in 2D and 3DFigure 10Comparison of distributions of H-H contact order of native structures found by ACO-HPPFP-3 and PERM in 2D and 3D. Distributions of H-H contact order for 500 conformations of Sequence S1-7 from Table 1 (60 amino acids) in 2D (left side) and Sequence S1-5 from Table 1 (48 amino acids) in 3D (right side) found by ACO-HPPFP-3 and PERM. 0 50 100 150 200 250 300 350 400 450 500 0.2 0.22 0.24 0.26 0.28 0.3 0.32 0.34Frequency(COHH)COHHACOPERM 0 50 100 150 200 250 300 350 400 450 500 0.45 0.5 0.55 0.6 0.65 0.7 0.75Frequency(COHH)COHHACOPERMPage 13 of 22(page number not for citation purposes)BMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30tive importance for the performance of our ACOalgorithm. As can be seen from the results shown in Figure14, both the pheromone values and the heuristic informa-tion are important in 2D and 3D; when ignoring either ofthem (α: = 0 or β: = 0, respectively), the algorithm per-forms worse, particularly for longer 2D sequences (n > 50;for short 2D sequences with n ≤ 50, the pheromone matrixdoes not appear to play a significant role, since sequencesare generally easily solved by the subsidiary local searchprocedure alone). The optimal settings for α and β formost problem instances seem to be around α = 1 and β =2, as shown in Figure 14. It should be noted that in 3D,pheromone information appears to be less importantthan in 2D, which suggests that the specific solutioncomponents used in our algorithms are somewhat lessmeaningful in 3D.The goal of the next experiment was to further explore therole of experience accumulated over previous iterations inthe form of pheromone values. To this end, we varied thepheromone persistence, ρ, while keeping other parame-choose ρ > 0), but also to weaken its impact over time(i.e., to use ρ < 1). At the same time, closer examinationrevealed that for ρ > 0, attrition, i.e., the construction ofinextensible partial conformations, is a major problem,which is a result of the accumulation of pheromone frommultiple conformations. This is why the backtrackingmechanism described in the 'Methods' section isextremely important for the performance of our algorithmin 2D. In 3D, for the previously stated reasons andbecause of the fact that the attrition problem is much lesssevere, the impact of the persistence parameter is generallysmaller than in 2D.Ant colony size and length of local search phaseDuring the initial empirical evaluation of our algorithm,we observed that ant colony size (i.e., the number of antsused in each iteration) and the duration of local search(expressed as a number of non-improving search steps weare willing to consider before terminating the local searchprocedure) are correlated and significantly affect its per-formance. To further investigate this phenomenon, wePlot of mean hydrophobic solvent accessible area, ACO-HPPFP-3 vs PERM in 2D and 3DFigure 11Plot of mean hydrophobic solvent accessible area, ACO-HPPFP-3 vs PERM in 2D and 3D. Mean hydrophobic sol-vent accessible area as a function of prefix length for a biological sequence (B50-4, 50 amino acids) in 2D (left side) and Sequence S2-6 from Table 1 (48 amino acids) in 3D. Crosses and circles represent mean values for an ensemble of 100 native structures found by ACO-HPPFP-3 and PERM, respectively.Page 14 of 22(page number not for citation purposes)ters constant. The results shown in Figure 15 show that in2D, it is important to utilise past experience (i.e., toconducted additional experiments in which we fixed theant colony size and varied the maximal number of non-BMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30improving steps during local search, and vice versa. In thisseries of experiments, different colony sizes were consid-ered, from a single ant up to a population of 5 000 ants,and the number of non-improving steps in local searchwas varied from 100 to 10 000. The results, shown in Fig-ure 16, indicate that there is an optimal colony size ofabout 100 ants for both, 2D and 3D; ACO-HPPFP-3 isquite robust with respect to colony size, but performancedecreases for very small or very large colony sizes.Intuitively, this is the case because the use of a populationof ants provides diversification to the search process,which enables it to explore different regions of the under-lying search space; very small populations provideinsufficient diversification, and the search stagnates easily,while for very large populations, the additional timerequired for running the search phases for each ant on thesame sequential machine is not amortised any longer byincreased efficiency of the overall search process.Our results also indicate that the performance of ACO-HPPFP-3 is more sensitive to the number of non-improv-ing steps than to ant colony size. The optimal value for thebe around 1 000 for short 2D sequences (n ≤ 50) andaround 10 000 for long 2D sequences (n > 50); the lattervalue also appeared to be optimal for all 3D sequencesconsidered here. This observation follows the intuitionthat more degrees of freedom, as present for longersequences and in higher dimensions, require more timefor local optimisation, since for any conformation,improving neighbours tend to be rarer and hence harderto find.Selectivity and persistence of local searchAs described in the 'Methods' section, our ACO algorithmuses selective local search, i.e., local search is onlyperformed on a certain fraction of the lowest energyconformations. We observed that ACO-HPPFP-3 is fairlyrobust with respect to the fraction of conformations towhich local search is applied; good performance wasobtained for local search selectivity values between 5%and 50%, but performance was found to deteriorate whenlocal search is performed by all ants. Intuitively, similar tocolony size, local search selectivity has an impact onsearch diversification. If too few ants perform local search,Plot of mean number of H-H contacts, ACO-HPPFP-3 vs PERM in 2D and 3DFigure 12Plot of mean number of H-H contacts, ACO-HPPFP-3 vs PERM in 2D and 3D. Mean number of H-H contacts as a function of prefix length for a biological sequence (B50-4, 50 amino acids) in 2D (left side) and Sequence S2-6 from Table 1 (48 amino acids) in 3D. Crosses and circles represent mean values for an ensemble of 100 native structures found by ACO-HPPFP-3 and PERM, respectively.Page 15 of 22(page number not for citation purposes)maximum number of non-improving steps tolerated (perant) before the local search phase terminates was found toinsufficient diversification is achieved, which typicallyleads to premature stagnation of the search process. OnBMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30Plot of mean H-H contact order, ACO-HPPFP-3 vs PERM in 2D and 3DFigure 13Plot of mean H-H contact order, ACO-HPPFP-3 vs PERM in 2D and 3D. Mean H-H contact order as a function of prefix length for a biological sequence (B50-4, 50 amino acids) in 2D (left side) and Sequence S2-6 from Table 1 (48 amino acids) in 3D. Crosses and circles represent mean values for an ensemble of 100 native structures found by ACO-HPPFP-3 and PERM, respectively.Impact of parameter settings on ACO-HPPFP-3 performance in 2D and 3D: relative weights of pheromone and heuristic informationFigure 14Impact of parameter settings on ACO-HPPFP-3 performance in 2D and 3D: relative weights of pheromone and heuristic information. Effect of the relative weights of pheromone information, α, and heuristic information, β, on the average CPU time required for obtaining minimal energy conformations of Sequence S1-8 in 2D (length 64, left side) and Sequence S2-5 in 3D (length 48, right side). 0 0.2 0.4 0.6 0.8 1 10 100 1000 10000 100000P(solve)CPU time,sec α = 1, β = 2α = 1, β = 0α = 0, β = 1 0 0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 100000P(solve)CPU time,sec α = 1, β = 2α = 1, β = 0α = 0, β = 1Page 16 of 22(page number not for citation purposes)BMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30the other hand, if local search is performed by too manyants, the resulting substantial overhead in run-time canno longer be amortised by increased search efficiency.Similarly to selective local search, pheromone update wasperformed only by a certain fraction of so-called 'elitistsearch selectivity, ACO-HPPFP-3 shows robustly high per-formance for elitist fractions between 1% and 50%(results are not shown here), but performance deterioratesmarkedly when all ants in the colony are allowed toupdate the pheromone matrix.Impact of parameter settings on ACO-HPPFP-3 performance in 2D and 3D: pheromone persistenceFigure 15Impact of parameter settings on ACO-HPPFP-3 performance in 2D and 3D: pheromone persistence. Effect of the pheromone persistence parameter, ρ, on the average CPU time required for obtaining minimal energy conformations of Sequence S1-8 in 2D (length 64, left side) and Sequence S2-5 in 3D (length 48, right side).Parameter settings influence on ACO-HPPFP-3 performance in 2D and 3D: ant colony size and maximum number of non-improving local search stepsFigur 16Parameter settings influence on ACO-HPPFP-3 performance in 2D and 3D: ant colony size and maximum number of non-improving local search steps. Mean CPU time required for finding minimum energy conformations of Sequence S1-7 in 2D (length 60, left side) and Sequence S2-5 in 3D (length 48, right side), as a function of ant colony size and the maximum number of non-improving local search steps. 0 0.2 0.4 0.6 0.8 1 10 100 1000 10000 100000P(solve)CPU time,sec ρ = 0.5ρ = 0.8ρ = 1ρ = 0 0 0.2 0.4 0.6 0.8 1 1 10 100 1000 10000P(solve)CPU time,sec ρ = 0.8ρ = 0.5ρ = 1ρ = 0 1000 10000 2000 4000 6000 8000 10000 12000 14000mean CPU time, secnoImpr steps ants = 5000ants = 1ants = 1000ants = 100 10 100 1000 100 1000 10000mean CPU time, secnoImpr steps ants = 5000ants = 1ants = 10ants = 1000ants = 100Page 17 of 22(page number not for citation purposes)ants' whose solution quality after the local search phase ishighest within the population. As in the case of localIn a final experiment, we studied the impact of the persist-ence of local search, i.e., of the probability of retainingpˆBMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30(feasible) previous relative directions during long-rangemutation moves. As can be seen in Figure 17, good per-formance is generally obtained for values between 0.3and 0.7. Both extreme cases, = 0, which corresponds toan extremely H-contact greedy mutation operator, and = 1, in which refolding always follows previous directionswhen feasible, result in a substantial decrease in perform-ance. When = 0, the decrease of performance in 3D isformation of very compact partial conformations, whichoften cannot be extended into valid complete conforma-tions. The performance decrease for high values is dueto insufficient ability of the chain to fold into a newconformation that accommodates well the local change instructure which triggered the refolding.ConclusionsIn this work, we have shown that ant colony optimisation(ACO) can be applied in a rather straight-forward way tothe 2D and 3D HP Protein Folding Problems. Eventhough our ACO-HPPFP-3 algorithm is based on verysimple structure components (single relative directions)and a simple subsidiary local search procedure (iterativefirst improvement), it performs fairly well compared toother algorithms and specialised heuristics on the bench-mark instances considered here, particularly in 2D. Theonly non-specialised algorithm that typically performsbetter than our ACO algorithm, both in 2D and 3D, isPERM. We observed that, particularly in 3D, the run-timerequired by ACO-HPPFP-3 for finding minimum (or bestknown) energy conformations scales worse with sequencelength than PERM. However, our results show that ourACO algorithm finds a different ensemble of nativeconformations compared to PERM, and has less difficultyfolding sequences whose native states contain structuralnuclei located in the middle rather than at the ends of agiven sequence, as well as sequences with structures inwhich the ends interact. We found that two major compo-Parameter settings influence on ACO-HPPFP-3 performance in 2D and 3D: probability of retaining previous directions in local searchFigur 17Parameter settings influence on ACO-HPPFP-3 performance in 2D and 3D: probability of retaining previous directions in local search. Mean CPU time required for finding minimum energy conformations of Sequence S1-8 in 2D (length 64, left side) and Sequence S2-5 in 3D (length 48, right side), as a function of the probability of retaining previous direc-tions ( ) during long-range mutation moves. 1000 10000 100000 0 0.2 0.4 0.6 0.8 1mean CPU time, secprobability of previous direction 1000 10000 100000 0 0.2 0.4 0.6 0.8 1mean CPU time, secprobability of previous direction pˆOutline of the subsidiary local search procedureFigur 18Outline of the subsidiary local search procedure. The iterative first improvement local search procedure that is performed by selected ants after the construction phase.procedure IterativeImprovementLS(c)input: candidate conformation coutput: candidate conformation c′while (termination condition not satisfied) doi := random({1, . . . , n});c′ := longRangeMove(c, i);if E(c′) ≤ E(c) thenc := c′;endendreturn(c)endpˆpˆpˆpˆpˆPage 18 of 22(page number not for citation purposes)smaller than in 2D, because there is no severe attrition asin 2D, where greedy placement of H residues leads to earlynents of ACO-HPPFP-3 – the pheromone values, whichcapture experience accumulated over multiple iterationsBMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30of the search process and from multiple conformations, aswell as the heuristic information that provides myopicguidance to the folding process – play a significant role forlonger 2D sequences and, to a lesser extent, for 3Dsequences, which suggests that in 3D, it may be preferableto associate pheromone values with more complex solu-tion components.We also found that the subsidiary local search procedureis crucial for the performance of the algorithm; in particu-lar, to ensure that high-quality conformations areobtained, it is very important to allow the local searchprocedure to run sufficiently long. In an earlier version ofour algorithm [7], we used substantially more stringenttermination criteria, which forced us to additionally usenon-greedy local search (probabilistic iterativeimprovement, which accepts worsening steps) in additionto the greedy local search procedure used here. The resultspresented in this study indicate that by using a new andsimpler local search procedure, ACO-HPPFP-3 achievesbetter performance; this is probably due to the fact thatthe new local search procedure is based on a type of long-range move that leads to a larger effective searchneighbourhood.We have shown that all components of our ACO algo-rithms contribute to its performance. In particular, itsperformance is affected by the following components andparameters (listed in the order of decreasing impact): phe-romone values, termination criterion for local search, per-sistence of long-range moves, ant colony size, pheromonepersistence, heuristic function, selectivity of local search,and selectivity of pheromone update (i.e., fraction of elit-ist ants).Because of its ability to find more balanced ensembles ofminimum (or close to minimum) energy conformations,our new ACO algorithm can greatly facilitateinvestigations of the topology and location of structuralnuclei, which we plan to undertake in future work.Finally, while HP protein folding problems are ofconsiderable interest because of their conceptual simplic-ity, ultimately, most applications of protein folding algo-rithms require the use of more realistic models of proteinstructure. Since it does not rely on heuristics and proper-ties that are specific to the HP model and yet performsvery well on this restrictive, but not entirely unrealisticabstract model, we believe that relatively straight-forwardextensions of our ACO algorithm to more complex andrealistic models of protein structure hold significantpromise.Methodsuntil a termination condition is satisfied; in the context ofthis work, we mostly terminated the algorithm uponreaching a given energy threshold. In the following, wedescribe the three search phases in detail.Construction phase, pheromone and heuristic valuesDuring the construction phase of ACO-HPPFP-3, each antfirst determines a starting point within the given proteinsequence; this is done by uniform random choice. Fromthis starting point, the sequence is folded in bothdirections, adding one residue at a time. Each antperforms probabilistic chain-growth construction of theprotein conformation, where in every step, the structure isextended either to the left or to the right, such that theratio of unfolded residues at each end of the proteinremains (roughly) unchanged.Here, we assume that folding is performed in 3D (the 2Dcase is handled analogously by considering three relativedirections {S, L, R} instead of five {S, L, R, U, D}, see also[6]). The relative directions in which the conformation isextended in each construction step are determinedprobabilistically based on a heuristic function ηi,d andpheromone values τi,d, according to the formula:The pheromone values τi,d indicate the desirability ofusing the local structure motif with relative direction d ∈{S, L, R, U, D} at sequence position i. Initially, all τi,d areequal, such that local structure motifs are chosen in anunbiased way; but throughout the search process, thepheromone values are updated to bias folding towards theuse of local motifs that occur in low-energy structures (theupdating mechanism will be described in more detaillater). The heuristic values ηi,d are based on the energyfunction E. They are defined according to the Boltzmandistribution as ηi,d: = , where γ is a parametercalled the inverse temperature (as in [18]), and hi,d is thenumber of new H-H contacts achieved by placing aminoacid i at the position specified by direction d.During construction, it may happen that the chain cannotbe extended without running into itself. This situation iscalled attrition, and our algorithm overcomes it as follows:First, starting at the end at which attrition occurred, half ofthe sequence that has been folded up to this point isunfolded. Then, this segment of the chain is refolded; thefirst residue (i.e., the last one that was unfolded) is placedsuch that its relative direction differs from what it hadpi di d i di ee S L R U D i e,, ,,, , , , ,:= (∈{ }∑τ ητ ηα βα β 2 )e hi d− ⋅γ ,Page 19 of 22(page number not for citation purposes)Our new ACO algorithm, ACO-HPPFP-3, iterates con-struction, local search, and pheromone update phasesbeen when attrition occurred, while all of the subsequentresidues are folded in a feasible direction that is chosenBMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30uniformly at random. This backtracking mechanism isparticularly important for longer protein sequences in 2D,where infeasible conformations are frequently encoun-tered during the construction phase.Local searchThe local search phase is based on a long-range mutationmove that has been designed to avoid infeasibleconformations. It also has a number of important advan-tages over the more commonly used point mutationmoves or Monte Carlo moves (i.e., the end, crankshaftand corner moves [40]): It is easy to implement; itdecreases the number of infeasible conformationsencountered, even when the protein is very compact (athigh densities); it considers a larger neighbourhood thatsubsumes the single point mutation neighbourhood; andit has some validity in terms of the physical processes tak-ing place during the protein folding process. Similarattempts have been previously undertaken, but theseinvolved disconnection of the chain [21].From studies of protein folding dynamics, it is known thatproteins display a broad range of motions that range fromlocalised motions to slow large-scale movements [37].Inspired by this complex process, we designed a long-range mutation move that starts by selecting a residuewhose relative direction is randomly mutated and thenadapts the rest of the chain by probabilistically changingrelative directions starting from this initial position [7].During this adaptation, for each residue, with a probabil-ity (0 ≤ ≤ 1) its previous relative direction, if it is stillfeasible, is left unchanged, and otherwise (i.e., with prob-ability 1 - , or if the previous direction has becomeinfeasible), a different relative direction is chosen, wherethe probability for each direction d is proportional to thecorresponding heuristic value ηi,d. Formally, this can bewritten as follows:where P[di : = ] is the probability of choosing direction as the relative direction di at sequence position i. Unlikein our previous implementation [7], the local searchphase of our new ACO algorithm is a simple iterative firstimprovement procedure that is based on this long-rangemutation move. The outline of this local search procedureis shown in Figure 18. Iterative first improvement acceptsimproves over the current solution quality (energy) of c.This search process is greedy in the sense that it does notallow worsening steps, and it is terminated when noimproving steps have been found after a specific numberof scans through the chain (this number is a parameter ofthe algorithm). Since this local search procedure has a rel-atively high time-complexity, in each iteration of ACO-HPPFP-3 it is only applied to a certain fraction of thehighest-quality conformations constructed by the ants inthe preceding construction phase.Update of the pheromone valuesAfter each construction and local search phase pherom-ones are updated according toτi,d:= ρ·τi,d, (4)where 0 <ρ ≤ 1 is the pheromone persistence, a parameterthat determines how much of the information gathered inprevious iterations is retained. Subsequently, selected antswith low-energy conformations update the pheromonevalues according toτi,d:= τi,d + ∆i,d,c, (5)where ∆i,d,c is the relative solution quality of the given ant'scandidate conformation c if that conformation containslocal structure motif d at sequence position i, and zerootherwise.As a further mechanism for preventing search stagnation,we use an additional renormalisation of the pheromonevalues that is conceptually similar to the method used inMAX - MIN Ant System [41]: After the standard pherom-one updates according to Equations 3 and 4, all τ valuesare normalised such that ∑d∈{S,L,R,U,D} τi,d = 1 for every res-idue i; additionally, whenever for a given sequence posi-tion i the minimal normalised pheromone value(mind∈{S,L,R,U,D} τi,d)/(∑d∈{S,L,R,U,Dr} τi,d) falls below athreshold θ (which is a parameter of the algorithm), theminimal τi,d value is set to θ, while the maximal τi,d valueis decreased by θ - mind∈{S,L,R,U,D} τi,d. (If there is morethan one minimal τi,d value, all of these are increased to θ,and if there is more than one maximal τi,d value, one ofthem is chosen uniformly at random.) This guaranteesthat the probability of selecting an arbitrary local structuremotif for the corresponding sequence position does notbecome arbitrarily small, and hence ensures theprobabilistic approximate completeness of our algorithm(see [42]).Implementation details and availabilityACO-HPPFP-3 has been implemented in C++ and com-pˆ pˆpˆP d dp d ds diprevprevi di ee: :,,,= ==( )∈if andfeasibleηηS L R U Dprevprevd ds d, , , , , ,{ }∑≠( )(if orinfeasible3 )dˆdˆPage 20 of 22(page number not for citation purposes)a new conformation generated via long-range mutationonly if the solution quality of a new conformation c'piled using gcc (version 3.3.3) for the Linux operating sys-BMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/30tem; a Linux executable is available from http://www.cs.ubc.ca/labs/beta/Projects/ACO-HPPFP.Authors' contributionsBoth authors contributed to the development of ideas,design of experiments, analysis and interpretation ofresults, and the writing of the paper. AS implemented theproposed method and performed the computationalexperiments.Additional materialAcknowledgementsThis work has been supported by an NSERC Postgraduate Scholarship (PGS-A) held by AS and by HH's NSERC Individual Research Grant #238788. We thank Peter Grassberger for kindly providing us with his implementation of PERM and for very useful feedback on earlier versions of this paper. We also thank the anonymous reviewers for helpful suggestions.References1. Dorigo M, Maniezzo V, Colorni A: Positive feedback as a searchstrategy. Tech rep., 91-016, Dip Elettronica, Politecnico di Milano, Italy1991.2. Dorigo M, Maniezzo V, Colorni A: The Ant System: Optimiza-tion by a colony of cooperating agents. IEEE Transactions on Sys-tems, Man, and Cybernetics-Part B 1996, 26:29-41.3. Dorigo M, Di Caro G: New Ideas in Optimization. In New Ideasin Optimization Edited by: Corne D, Dorigo M, Glover F. McGraw-Hill;1999. 4. Dorigo M, Di Caro G, Gambardella LM: Ant Algorithms for Dis-crete Optimization. Artificial Life 1999, 5(2):137-172.5. Dorigo M, Stützle T: Ant Colony Optimization The MIT Press; 2004. 6. Shmygelska A, Hernandez R, Hoos HH: An Ant Colony Optimiza-tion Algorithm for the 2D HP Protein Folding Problem. InProc of the 3rd Intl Workshop on Ant Algorithms, ANTS LNCS 2463Springer Verlag; 2002:40-52. 7. Shmygelska A, Hoos HH: An Improved Ant Colony Optimisa-tion Algorithm for the 2D HP Protein Folding Problem. InProc of the 16th Canadian Conference on Artificial Intelligence, LNCS 2671Springer Verlag; 2003:400-17. 8. Unger R, Moult J: Finding the lowest Free-Energy Conforma-tion of a protein is an NP-hard problem – Proof andImplications. Bull Math Biol 1993, 55(6):1183-1198.9. Lau KF, Dill KA: lattice statistical mechanics model of the con-formation and sequence space of proteins. Macromolecules1989, 22:3986-3997.10. Richards FM: Areas, volumes, packing, and protein structures.Annu Rev Biophys Bioeng 1977, 6:151-176.11. Krasnogor N, Pelta D, Lopez PM, Mocciola P, de la Canal E: Geneticalgorithms for the protein folding problem: a critical view. InProc of Engineering of Intelligent Systems Edited by: Alpaydin C. ICSC12. Krasnogor N, Hart WE, Smith J, Pelta DA: Protein structure pre-diction with evolutionary algorithms. Proc of the Genetic and Evo-lutionary Computation conference 1999:1596-1601.13. Patton AWP, Goldman E: A standard GA approach to nativeprotein conformation prediction. In Proc of the 6th Intl ConfGenetic Algorithms Morgan Kaufmann Publishers; 1995:574-581. 14. Unger R, Moult J: Genetic algorithms for protein foldingsimulations. J of Mol Biol 1993, 231:75-81.15. Unger R, Moult J: A genetic algorithm for three dimensionalprotein folding simulations. In Proc of the 5th Intl Conf on GeneticAlgorithms Morgan Kaufmann Publishers; 1993:581-588. 16. Bastolla U, Fravenkron H, Gestner E, Grassberger P, Nadler W:Testing a New Monte Carlo algorithm for the protein foldingproblem. Proteins 1998, 32:52-66.17. Chikenji G, Kikuchi M, Iba Y: Multi-Self-Overlap Ensemble forprotein folding: ground state search and thermodynamics.Condensed Materials Archive 1999:27.18. Hsu HP, Mehra V, Nadler W, Grassberger P: Growth Algorithmfor Lattice Heteropolymers at Low Temperatures. J ChemPhys 2003, 118:444-51.19. Liang F, Wong WH: Evolutionary Monte Carlo for protein fold-ing simulations. J Chem Phys 2001, 115(7):3374-3380.20. O'Toole EM, Panagiotopoulos AZ: Monte Carlo simulation offolding transitions of simple model proteins using a chaingrowth algorithm. J Chem Phys 1992, 97(11):8644-8652.21. Ramakrishnan R, Ramachandran B, Pekny JF: A dynamic MonteCarlo algorithm for exploration of dense conformationalspaces in heteropolymers. J Chem Phys 1997, 106(6):2418-2424.22. Sali A, Shakhnovich E, Karplus M: How does a protein fold? Nature1994, 369:248-251.23. Dill KA, Fiebig KM, Chan HS: Cooperativity in Protein-FoldingKinetics. Proc Natl Acad Sci USA 1993, 90:1942-1946.24. Toma L, Toma S: Contact interactions method: A new algo-rithm for protein folding simulations. Protein Sci 1996,5:147-153.25. Beutler T, Dill K: A fast conformational search strategy forfinding low energy structures of model proteins. Protein Sci1996, 5:2037-2043.26. Yue K, Dill KA: Forces of Tertiary Structural Organization inGlobular Proteins. Proc Natl Acad Sci USA 1995, 92:146-150.27. Backofen R, Will S: A Constraint-Based Approach to StructurePrediction for Simplified Protein Models that OutperformsOther Existing Methods. Proc XIX Intl Conf on Logic Programming2003:49-71.28. Torrie GM, Valleau JP: Nonphysical sampling distributions inMC free energy estimation: Umbrella sampling. J Comput Phys1977, 23:187-199.29. Gront D, Kolinski A, Skolnick J: Comparison of three MonteCarlo conformational search strategies for a proteinlikehomopolymer model: Folding thermodynamics and identifi-cation of low-energy structures. J Chem Phys 2000,113(12):5065-5071.30. Mitsutake A, Sugita Y, Okamoto Y: Replica-exchange multica-nonical and multicanonical replica-exchange Monte Carlosimulations of peptides. I. Formulation and benchmark test.J Chem Phys 2003, 118(14):6664-6675.31. Berg BA, Neuhaus T: Multicanonical ensemble: A newapproach to simulate first-order phase transitions. Phys RevLett 1992, 68:9-12.32. Irbäck A: Dynamic-parameter algorithms for protein folding.In Monte Carlo Approach to Biopolymers and Protein Folding Edited by:Grassberger P, Barkema GT, Nadler W,. World Scientific, Singapore;1998:98-109. 33. Backofen R, Will S, Clote P: Algorithmic approach to quantify-ing the hydrophobic force contribution in protein folding.Proc of the 5th Pacific Symposium on Biocomputing 2000:92-103.34. Hsu HP, Mehra V, Nadler W, Grassberger P: Growth-based Opti-misation Algorithm for Lattice Heteropolymers. Phys Rev E2003, 68:021113-1-021113-4.35. Nandi T, B-Rao C, Ramachandran S: Comparative Genomicsusing Data Mining tools. J Bioscience 2002, 27:15-25.36. Sayle R, Milner-White EJ: RASMOL – Biomolecular Graphics forAll. Trends Biochem Sci 1995, 20(9):374-376.37. Creighton TE: Protein Folding W H Freeman and Company; 1992. Additional File 1Additional information on biological and randomly generated HP sequences. This file (in .pdf format) contains tables providing additional information on our new test sets of biological and randomly generated HP sequences and the results from our computational experiment with ACO and PERM.Click here for file[http://www.biomedcentral.com/content/supplementary/1471-2105-6-30-S1.pdf]Page 21 of 22(page number not for citation purposes)Academic Press; 1998:353-360. Publish with BioMed Central and every scientist can read your work free of charge"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."Sir Paul Nurse, Cancer Research UKYour research papers will be:available free of charge to the entire biomedical communitypeer reviewed and published immediately upon acceptancecited in PubMed and archived on PubMed Central BMC Bioinformatics 2005, 6:30 http://www.biomedcentral.com/1471-2105/6/3038. Plaxco KW, Simons KT, Baker D: Contact order, transition stateplacement and the refolding rates of single domainproteins.J Mol Biol 1998, 277:985-994.39. Hoos HH, Stützle T: On the empirical evaluation of Las Vegasalgorithms. In Proc of the 14th Conference on Uncertainty in ArtificialIntelligence Morgan Kaufmann Publishers; 1998:238-245. 40. Sali A, Shakhnovich E, Karplus M: Kinetics of protein folding – Alattice model study of the requirements for folding tothenative state. J Mol Biol 1994, 235:1614-1636.41. Stützle T, Hoos HH: MAX-MIN Ant System. Future GenerationComputer Systems 2000, 16(8):889-914.42. Hoos HH, Stützle T: Stochastic Local Search: Foundations andApplications Morgan Kaufmann Publishers / Elsevier; 2004. 43. Parkes A, Walser JP: Tuning Local Search for Satisfiability Test-ing. In Proc of the Applications of Artificial Intelligence Conf MIT Press;1996:356-362. 44. HP Benchmarks [http://www.cs.sandia.gov/tech_reports/compbio/tortilla-hp-benchmarks.html]45. Konig R, Dandekar T: Improving Genetic Algorithms for Pro-tein Folding simulations by systematic crossover. Biosystems1999, 50:17-25.yours — you keep the copyrightSubmit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.aspBioMedcentralPage 22 of 22(page number not for citation purposes)
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Faculty Research and Publications /
- An ant colony optimisation algorithm for the 2D and...
Open Collections
UBC Faculty Research and Publications
An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem Shmygelska, Alena; Hoos, Holger H Feb 14, 2005
pdf
Page Metadata
Item Metadata
Title | An ant colony optimisation algorithm for the 2D and 3D hydrophobic polar protein folding problem |
Creator |
Shmygelska, Alena Hoos, Holger H |
Publisher | BioMed Central |
Date Issued | 2005-02-14 |
Description | Background: The protein folding problem is a fundamental problems in computational molecular biology and biochemical physics. Various optimisation methods have been applied to formulations of the ab-initio folding problem that are based on reduced models of protein structure, including Monte Carlo methods, Evolutionary Algorithms, Tabu Search and hybrid approaches. In our work, we have introduced an ant colony optimisation (ACO) algorithm to address the non-deterministic polynomial-time hard (NP-hard) combinatorial problem of predicting a protein's conformation from its amino acid sequence under a widely studied, conceptually simple model – the 2-dimensional (2D) and 3-dimensional (3D) hydrophobic-polar (HP) model. Results: We present an improvement of our previous ACO algorithm for the 2D HP model and its extension to the 3D HP model. We show that this new algorithm, dubbed ACO-HPPFP-3, performs better than previous state-of-the-art algorithms on sequences whose native conformations do not contain structural nuclei (parts of the native fold that predominantly consist of local interactions) at the ends, but rather in the middle of the sequence, and that it generally finds a more diverse set of native conformations. Conclusions: The application of ACO to this bioinformatics problem compares favourably with specialised, state-of-the-art methods for the 2D and 3D HP protein folding problem; our empirical results indicate that our rather simple ACO algorithm scales worse with sequence length but usually finds a more diverse ensemble of native states. Therefore the development of ACO algorithms for more complex and realistic models of protein structure holds significant promise. |
Genre |
Article |
Type |
Text |
Language | eng |
Date Available | 2016-01-19 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution 4.0 International (CC BY 4.0) |
DOI | 10.14288/1.0223754 |
URI | http://hdl.handle.net/2429/56576 |
Affiliation |
Computer Science, Department of Science, Faculty of |
Citation | BMC Bioinformatics. 2005 Feb 14;6(1):30 |
Publisher DOI | 10.1186/1471-2105-6-30 |
Peer Review Status | Reviewed |
Scholarly Level | Faculty |
Copyright Holder | Shmygelska and Hoos. |
Rights URI | http://creativecommons.org/licenses/by/4.0/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 52383-12859_2004_Article_355.pdf [ 1.8MB ]
- Metadata
- JSON: 52383-1.0223754.json
- JSON-LD: 52383-1.0223754-ld.json
- RDF/XML (Pretty): 52383-1.0223754-rdf.xml
- RDF/JSON: 52383-1.0223754-rdf.json
- Turtle: 52383-1.0223754-turtle.txt
- N-Triples: 52383-1.0223754-rdf-ntriples.txt
- Original Record: 52383-1.0223754-source.json
- Full Text
- 52383-1.0223754-fulltext.txt
- Citation
- 52383-1.0223754.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.52383.1-0223754/manifest