Implicit solvent models and free energies of solvation inthe context of protein foldingbyAlexander Michael CumberworthB.Sc., The University of British Columbia, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Genome Science and Technology)The University of British Columbia(Vancouver)April 2015c© Alexander Michael Cumberworth, 2015AbstractGiven the wide and fundamental roles proteins play in cells, as well as their po-tential medical and industrial applications, a detailed understanding of the rela-tionships between sequence, structure, dynamics, and function is of critical im-portance. Molecular models are required to solve this problem, as well as modelsof the associated conformational spaces. One of the most challenging aspects ofmodeling these vast ensembles is the computer power required to carry out therequisite simulations. Reduced solvent models, and particularly a class referred toas implicit solvent models, have been developed extensively; however, they makemany assumptions and approximations that are likely to affect accuracy. Here, sev-eral implicit solvent models commonly used for protein modeling are evaluated bycomparing the expected changes in free energies of solvation upon folding ∆∆Gsolvderived from micro–ms simulations of fast folding proteins to those given by theimplicit solvent models. In the majority of cases, there is a significant and sub-stantial difference between the ∆∆Gsolv values calculated from the two approaches,with the implicit solvent models excessively favouring the folded state over theunfolded state. This could only be remedied by selecting values for the model pa-rameters – the internal dielectric constant for the polar term and the surface tensioncoefficient for the apolar term – that were system specific or physically unrealistic.iiPrefaceI was the lead investigator of all work presented, and composed the manuscript withinput from my supervisor, Jo¨rg Gsponer. Jo¨rg Gsponer initially conceptualized theproject, and contributed ideas throughout the investigation. Jennifer M. Bui alsocontributed important ideas to the investigation.A modified version of the following work is currently being revised for sub-mission to a peer reviewed journal: Cumberworth, A., Bui, J.M., Gsponer, J. Freeenergy of solvation in the context of protein folding: implications for implicit sol-vent models.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Molecular modeling of protein systems . . . . . . . . . . . . . . 11.2 Modeling solvation of protein systems . . . . . . . . . . . . . . . 31.3 Parameterization and validation of implicit solvent models . . . . 61.4 Issue and approach . . . . . . . . . . . . . . . . . . . . . . . . . 82 Theoretical background and computational details . . . . . . . . . . 102.1 Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.1.1 Molecular dynamics . . . . . . . . . . . . . . . . . . . . 102.1.2 Calculation details . . . . . . . . . . . . . . . . . . . . . 122.2 State definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.1 Markov state models . . . . . . . . . . . . . . . . . . . . 13iv2.2.2 Calculation details . . . . . . . . . . . . . . . . . . . . . 172.3 Energy calculations . . . . . . . . . . . . . . . . . . . . . . . . . 182.3.1 Definition of implicit solvent . . . . . . . . . . . . . . . . 182.3.2 Continuum electrostatic approximations . . . . . . . . . . 212.3.3 Effective energy functions . . . . . . . . . . . . . . . . . 362.3.4 Calculation details . . . . . . . . . . . . . . . . . . . . . 402.4 Entropy calculations . . . . . . . . . . . . . . . . . . . . . . . . 422.4.1 Entropy calculation from trajectories . . . . . . . . . . . . 422.4.2 Calculations details . . . . . . . . . . . . . . . . . . . . . 462.5 Uncertainty estimates . . . . . . . . . . . . . . . . . . . . . . . . 462.5.1 Block bootstrapping . . . . . . . . . . . . . . . . . . . . 462.5.2 Calculation details . . . . . . . . . . . . . . . . . . . . . 473 Results and discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 483.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.2 Calculation setup . . . . . . . . . . . . . . . . . . . . . . . . . . 483.2.1 Defining states . . . . . . . . . . . . . . . . . . . . . . . 483.2.2 Checking convergence . . . . . . . . . . . . . . . . . . . 493.3 Expected and calculated ∆∆Gsolv . . . . . . . . . . . . . . . . . . 503.3.1 Default settings . . . . . . . . . . . . . . . . . . . . . . . 503.3.2 Examining the nonpolar term . . . . . . . . . . . . . . . . 523.3.3 Examining the polar term . . . . . . . . . . . . . . . . . . 523.3.4 Repeating with all systems . . . . . . . . . . . . . . . . . 563.4 Potential criticisms . . . . . . . . . . . . . . . . . . . . . . . . . 563.4.1 Transferability of models . . . . . . . . . . . . . . . . . . 563.4.2 Temperature effects . . . . . . . . . . . . . . . . . . . . . 583.4.3 Entropy calculation . . . . . . . . . . . . . . . . . . . . . 603.4.4 Accuracy of explicit solvent model . . . . . . . . . . . . 604 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66A Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 97vList of TablesTable 2.1 Pertinent details on systems and associated trajectories . . . . . 13Table 2.2 Parameters used in the creation of the MSMs . . . . . . . . . . 18Table A.1 Intermediate and final values for the expected change in freeenergy of solvation upon folding . . . . . . . . . . . . . . . . 98Table A.2 Changes in free energy of solvation upon folding for defaultvalues of tested implicit solvent models . . . . . . . . . . . . . 98viList of FiguresFigure 1.1 Thermodynamic cycle used for calculating ∆∆Gsolv . . . . . . 9Figure 3.1 State and entropy calculations for chignolin and Trp-Cage . . 50Figure 3.2 Changes in free energies of solvation upon folding with defaultsettings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Figure 3.3 Differences between expected and implicit solvent model freeenergies of solvation upon folding (∆∆Gexpsolv- ∆∆Gimpsolv) for var-ious internal dielectrics . . . . . . . . . . . . . . . . . . . . . 53Figure 3.4 Differences between expected and PBEQ free energies of sol-vation upon folding (∆∆Gexpsolv- ∆∆Gimpsolv) for state specific inter-nal dielectrics . . . . . . . . . . . . . . . . . . . . . . . . . . 54Figure 3.5 Changes in free energies of solvation upon folding with alter-native radii sets and surface definitions . . . . . . . . . . . . . 57Figure 3.6 Changes in free energies of solvation upon folding with differ-ent forcefields . . . . . . . . . . . . . . . . . . . . . . . . . . 59Figure 3.7 Changes in free energies of solvation upon folding with addi-tive joint entropies . . . . . . . . . . . . . . . . . . . . . . . 61Figure 3.8 Changes in free energies of solvation upon folding with rangeof folded to unfolded ratios . . . . . . . . . . . . . . . . . . . 63Figure A.1 Cartoon and stick representation of an experimentally deter-mined folded state of Chignolin . . . . . . . . . . . . . . . . 99Figure A.2 Cartoon and stick representation of an experimentally deter-mined folded state of Trp-cage . . . . . . . . . . . . . . . . . 100viiFigure A.3 Changes in free energies of solvation upon folding with nega-tive surface tension . . . . . . . . . . . . . . . . . . . . . . . 101Figure A.4 Changes in free energies of solvation upon folding against in-ternal dielectric constant . . . . . . . . . . . . . . . . . . . . 102Figure A.5 Changes in free energies of solvation upon folding with no saltcorrection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103Figure A.6 State and entropy calculations for BBA . . . . . . . . . . . . 104Figure A.7 State and entropy calculations for Villin . . . . . . . . . . . . 105Figure A.8 State and entropy calculations for WW domain . . . . . . . . 106Figure A.9 State and entropy calculations for NTL9 . . . . . . . . . . . . 107Figure A.10 State and entropy calculations for BBL . . . . . . . . . . . . 108Figure A.11 State and entropy calculations for Protein B . . . . . . . . . . 109Figure A.12 State and entropy calculations for Homeodomain . . . . . . . 110Figure A.13 State and entropy calculations for Protein G . . . . . . . . . . 111Figure A.14 State and entropy calculations for α3D . . . . . . . . . . . . 112Figure A.15 State and entropy calculations for λ -repressor . . . . . . . . . 113Figure A.16 Changes in free energies of solvation upon folding with mini-mized trajectories . . . . . . . . . . . . . . . . . . . . . . . . 114Figure A.17 Changes in free energies of solvation upon folding with tem-perature corrections . . . . . . . . . . . . . . . . . . . . . . . 115viiiGlossaryABSINTH Self-Assembly of Biomolecules Studied by an Implicit, Novel, andTunable HamiltonianACE Analytical Continuum SolventACE2 Analytical Continuum SolventCC-MLA Correlation Corrected-Local Multibody ApproximationCFA Coulomb field approximationEEF effective energy functionEEF1 Effective Energy Function 1FACTS Fast Analytical Continuum Treatment of SolvationGB generalized BornGBMV2 Generalized Born using Molecular Volume 2GBSW Generalized Born with a Simple SwitchingGSE Gaussian split EwaldIDP intrinsically disordered proteinITS implied time scaleLJ Lennard JonesixMC Monte CarloMD molecular dynamicsMM-PBSA molecular mechanics-Poisson BoltzmanMSM Markov state modelPB Poisson BoltzmannPCA principle component analysisPME particle mesh EwaldPMF potential of mean forceQHA quasiharmonic approximationRMSD root mean square deviationSAV solvent accessible volumeSCP Screened Coulomb PotentialsSASA solvent accessible surface areaTICA time-lagged independent component analysisTIC time-lagged independent componentVDW van der WaalsxAcknowledgmentsI want to thank my supervisor, Jo¨rg Gsponer for his guidance and mentoring, offer-ing me insight into both the theory and methods of the field, as well as the humanelement of it. I still remember the first time I came to his office: he showed mea simulation of a small unstructured peptide interacting with a globular protein,and watching it crawl around the surface in an almost spider-like fashion made mewant to know more, both about how biomolecules behave and interact on an atomicscale, and how such a thing could even be modeled at all.I would also like to thank Lawrence McIntosh for his insight and guidance,from our collaboration on Ets-1, to being a rotation supervisor, to being a memberof my thesis committee. I thank Steve Plotkin, a member of my thesis commit-tee; Alexandre Bouchard-Coˆte´, whose lab I rotated through; Hugh Brown, the for-mer systems administrator; Kevin Griffin, the current systems administrator; FrankNoe´, for his helpful correspondence on Markov state model validation and molec-ular dynamics uncertainty estimates; Aaron Oakley, for sending the CHARMMformatted version of the CHARMM22* forcefield files; Jianhan Chen, for send-ing me the files for his solvent model, GBSW/MS2; and Sharon Ruschkowski, theGSAT program administrator.Many of the current and past members of the lab were more than just col-leagues: they became friends who’s ideas and general support (and heated debatesand foosball matches) were invaluable. Finally, I would like to thank my family,especially my parents, sister, and brother-in-law, whose support has always beenunwavering.xiChapter 1Introduction1.1 Molecular modeling of protein systemsThe importance of proteins in the functioning of the cell cannot be overstated. Asenzymes, they catalyze chemical reactions necessary for extracting the energy andmolecular building blocks stored in nutrients, as well as reactions necessary forutilizing these products in the continuation, expansion, and division of the cell.As members of complex interaction networks, they pass signals through the celland regulate the activity of its many processes. As structural components, theycontrol passage to and from the cell and its many compartments, provide a scaf-fold for DNA to be stored on, and form a dynamic skeleton that provides the cellwith modifiable form and internal transportation infrastructure. With only twentybuilding blocks, much of the myriad functions depends on selecting for a particularconformational space rather than a particular chemical structure. A detailed under-standing of the relationships between sequence, structure, dynamics, and functionis clearly a fundamental problem whose solutions will have impact not only ourunderstanding of biology, but also on our ability to manipulate proteins for thecreation of new medical and industrial technologies.To solve this problem, molecular models of proteins are required. One of thefundamental challenges in molecular modeling is finding an optimal balance be-tween detail and efficiency. Clearly, a more detailed model is not always the bestchoice. The optimal choice is dependent on both the size of the system under study1and the questions being asked. In the case of proteins, their relatively large size putshigh detail at a serious computational disadvantage. There are four dimensions inwhich the level of detail may be altered to find an appropriate balance: the choiceof fundamental particles, the interactions between these particles, the breadth ofrepresentation of configurational and conformational space, and the description ofthe dynamics.The former two are closely linked. The most detailed chemical models includeelectrons and nuclei as individual particles, and an ab initio or semiempirical quan-tum mechanical model to describe their behaviour and interactions. These modelsare often used to study reaction mechanism, or for obtaining highly accurate molec-ular geometries. Particles may be removed or combined to lower the resolution; thefirst step, in which electrons and nuclei are combined to form atomic particles, alsonecessitates the fundamental shift to a classical model for a description of their me-chanics. The more detailed of these models still include a description of electronicpolarization, but those that are nonpolarizable have been most popular in study-ing problems involving single proteins. In the subsequent steps, known as coarsegraining [114, 133], multiple atoms may be combined into a single particle repre-senting an atom group, a residue, a group of residues, a domain, or even a wholeprotein. These allow large cellular assemblies to be studied, actin networks beingone prominent example [133], and may eventually be used in whole cell models[40, 127]. As the description of particles becomes more coarse grained and inter-actions more approximate, empirical parameters enter the equations that allow animplicit representation of previously explicitly represented details. These need tobe determined either from higher level models or experimental data.The latter two are also closely linked, but still dependent on the former. Thereare many questions that only require a single structure of the native state, whichmay be determined through experimental means. However, there are also manycases when an understanding of the different conformations and their relative weightsin the ensemble is necessary, whether to describe the nonstatic nature of the foldedstate or to create a model of an intrinsically disordered protein [95]. These de-scriptions require the use of an interaction model to score states and some methodfor sampling different conformations; there may also be additions to the interactionmodel in the form of restraints derived from experimental data, as with distance and2order restraints from NMR experiments [1, 51]. If sampling is done with a methodknown as molecular dynamics (MD), in which the laws of motion are solved to givethe positions of the particles as a function of time and no experimental restraintsare added, the sampling will also serve as a simulation of the physical motion ofthe particles in the simulation. As numerical methods are required to solve theseequations, a discrete time step must be chosen; if the motions of high frequencybond or angle vibrations are not of interest, longer timesteps may be used in the in-tegration in conjunction with constraints on the relevant degrees of freedom, givinga less detailed description of both conformational space and dynamics.It is not necessary to use a single level of detail for the entire system. The 2013Nobel Prize in Chemistry was awarded for developing models in which a protein ispartitioned into regions of varying resolution, known as multiscale modeling [68].While the focus was on using quantum mechanical models for active sites of en-zymes and molecular mechanical models for the remainder of the protein, anothernatural split was used: that between the protein and its surrounding environment.While some effort has been expended developing models of environments akin tothose found in cells [40], most studies focus on diffuse in vitro environments byincluding only water and simple ions. Because only the thermodynamic and ki-netic effects on the protein are of interest, simplified models present a potentiallylucrative route to developing more efficient models without much loss of detail.1.2 Modeling solvation of protein systemsThe solvent has a substantial impact on the energy landscape of proteins [117],and the unique properties of its natural solvent, water, play particularly importantand unique roles [6, 7]. It is then vital that these effects be represented accuratelyif models of protein systems are to be of any use. Explicit inclusion of watermolecules provides the most conceptually straightforward route towards describ-ing the enthalpic, entropic, and kinetic effects. Water has been studied and mod-eled extensively in many contexts [34]; with protein modeling, the most popularmodels have been rigid, point charge representations [69, 125]. Periodic bound-ary conditions, where the system is placed in a cell with symmetry such that itmay be copied and tiled, are often used to allow bulk solvent conditions to be3represented. Unfortunately, the number of water molecules needed to prevent theprotein from interacting with copies of itself regularly results in more computa-tional power being spent simulating hydrodynamics than protein dynamics. Thisbecomes especially problematic when large portions of conformational space mustbe convergently sampled, as when studying protein folding mechanisms [11, 85].Substantial effort has been placed in developing models with reduced dimen-sions [125]. Integral equation theories, which describe the solvent structure throughradial distribution functions, represent one of the most rigorous and accurate ap-proaches; the reference interaction site models, which specifically model the den-sity of interaction sites of water, have been most popular [10]. Multiscale ap-proaches have also been applied to the solvent itself, usually referred to as hy-brid models, in which only the first solvation shell is represented in a fully explicitfashion, while the bulk solvent is represented with a macroscopic continuum model[125]. However, both of these approaches still face many technical issues, and mayonly give modest speedups. Fully implicit models, in which the average effects ofthe solvent are represented as a correction term to the forcefield, have been devel-oped extensively [25, 30, 35, 39, 76, 125, 132]. The fundamental quantity given isthe free energy of solvation ∆Gsolv, but additional modifications to the equations ofmotion are often made in MD simulations to more accurately model kinetic effectsvia the use of Langevin dynamics. ∆Gsolv is almost always decomposed in someway, most commonly into atomic contributions, allowing energies and forces oneach atom to be obtained. An additional decomposition used in some of the sim-plest models consists of separating the screening of electrostatic interactions fromall other effects and modeling them as a distance dependent scaling, while the restof ∆Gsolv is made proportional to the solvent accessible surface area (SASA) [42];however, this scheme has also been used in more accurate models [86, 162]. Amore physically intuitive approach, and perhaps the most popular, is to decompose∆Gsolv into a polar component ∆Gpolsolv and a nonpolar component ∆Gnpsolv [132].∆Gnpsolv describes the reversible work associated with the placement of the pro-tein with all charges turned off into the solvent [132]. The most common modelshave assumed this term to be a linear relation to the SASA, although geometricproperties have been used in addition or in its place, mainly the solvent accessiblevolume [88, 157, 164]. The nonpolar term has been further decomposed to a cavity4term (sometimes referred to as a repulsive term) that describes the reversible workassociated with the creation of the cavity, and an attractive dispersion term that de-scribes the reversible work associated with attractive nonpolar interactions betweenthe protein and the solvent [44, 45, 72, 92, 157, 164]. While the cavity term hasbeen described similarly to the whole ∆Gnpsolv, the length scale dependence of thehydrophobic contributions [19] has inspired more nuanced treatments [22, 23]. Theattractive term is modeled by a volume integral of a Lennard Jones (LJ) potential,of which several forms have been proposed [92, 164].∆Gpolsolv describes the reversible work associated with charging the protein afterbeing placed in the solvent [132], usually modeled with some form of continuumelectrostatic approximation. The idea is to represent the solvent as a featurelessdielectric, with a relative permittivity, usually referred to as the dielectric constant,dependent on the conformation of the solute. The Poisson Boltzmann (PB) equa-tion describes such a model, and numerical solutions to it [96] are generally consid-ered accurate descriptions of the polar contribution [5]. However, PB based modelsare often computationally intensive, reducing their advantages over more detailedmodels. The generalized Born (GB) approach [8, 147], an analytical approximationto the PB equation, has been a popular alternative [44, 45, 53, 65, 84, 89, 90, 106].All continuum electrostatic models require the dielectric constant to be defined atall points in space. Usually, the system is partitioned into solvent and protein vol-umes, with the solvent volume assigned the dielectric constant of bulk water, andthe protein assigned a value near unity. As the idea of a well defined boundaryis a concept with clear physical meaning only on the macroscale, there is signif-icant ambiguity in its definition [160, 161]. These definitions have ranged fromsimpler methods involving overlapping atomic radii [112], solvent probe definedmolecular surfaces [89], and inclusion of smoothed transition regions [113], tomore advanced methods using differential geometry [26–28, 159]; the latter mayalso be used to define the surface for the nonpolar component. Others have taken adifferent approach, defining a dielectric function that varies throughout the vicinityof the protein [52, 59, 93], which may be more sensible considering the nonhomo-geneous nature of the protein environment [137].51.3 Parameterization and validation of implicit solventmodelsAll implicit solvent models involve various parameters that must be determined.Some of these are specific to the form of the model, while others correspond tophysical quantities, atomic partial charges and atomic radii being the most com-mon. Determining the parameters, like the models themselves, is often a balancebetween accuracy and feasibility. In designing a parameterization approach, sys-tems and calculable quantities related to the solvent effects must be chosen forwhich references may be obtained. Ideally, reference values for each desired ther-modynamic and atomic component of ∆Gsolv would be available for representa-tive samples of conformational space from a representative sample of proteins;however, such an idea is experimentally nonsensical and computationally unrea-sonable. Compromises must be made instead. One approach has been to pa-rameterize on a set of small molecules or peptides for which ∆Gsolv is obtainedexperimentally [44, 45, 57, 72, 74, 84, 164], calculated with quantum mechani-cal models [29, 72], or calculated in explicit solvent [156, 157]; others have usedexperimental ∆Gsolv of side chain analogues directly in the model [86, 162]. AsGB models often aim to reproduce the accuracy of PB models at a lower cost,it is common to calculate ∆Gsolv or directly related quantities [118] and use thePB model values as a reference [21, 53, 89, 90, 106, 108]. Instead of solvationfree energies, structural properties may be used instead by running simulations,calculating the desired properties, and comparing them to experimental structures[2, 42, 57, 75, 134, 135], secondary structure frequencies [21, 24], or derived prop-erties calculated from explicit solvent simulations [2, 162]. If specific interac-tions require careful tuning, usually salt-bridges or hydrogen bonds, the potentialof mean force (PMF), a function giving the relative free energy along a chosen re-action coordinate, may be calculated and compared to a PMF calculated in explicitsolvent [21, 24, 46, 66, 101, 106, 108, 141, 156]. Perhaps the most direct methodin the context of MD simulations is to average over the forces on each atom inexplicit solvent simulations of the folded state and parameterize to match them[77, 155, 164]. Bottaro et al. [13] take this one step further, using a related ap-proach to match the entire ensemble (folded and unfolded) of two small structured6peptides. Of course, each of these approaches has its own shortcomings: parame-ters derived from small molecules are unlikely to be fully transferable to proteins,parameters optimized to a PB model will reflect its flaws, parameters tuned forone type of interaction may detrimentally affect others, and parameters designedto stabilize the folded state risk over stabilizing it. Because of this, a combinationof approaches is often used.Besides the obligatory validation section in studies introducing a new model,implicit solvent models have been examined and compared from various perspec-tives. The range of contexts includes small molecules [78, 144], small peptides[129, 156], protein-ligand binding [50, 62, 124, 172], conformational kinetics [38,126], aggregation [149], QM/MM calculations [128], protein design [67], PB-GBcomparisons [18, 41, 107, 158, 177], folding and native state accuracy/stability[16, 37, 64, 70, 97, 103, 109, 115, 116, 119, 142, 163, 165, 170, 174, 175], andexaminations of specific interaction types [32, 46, 66, 101, 156, 171]. While theoverall results are often dependent on the specific model being used, some generalflaws and trends can be extracted. One of the earliest problem detected was an over-stabilization of salt bridges [46, 116, 119, 141, 174–176], an effect strong enoughto overweight nonnative conformations in the ensemble. In response, efforts weremade to correct the parameters [46, 106, 108, 141, 156], but even when the saltbridges have correct minima, subtleties of the interaction are unable to be repro-duced without explicit water molecules [156, 171]. Hydrogen bonding was alsofound to be insufficiently accurate, with similar corrections attempted [21, 24, 66].Another issue has been the extent of the applicability of the additivity assumption,that the free energy contributions of individual atoms or solute groups are inde-pendent [33]. In particular, hydrogen bonds were found to be a major factor in aneffect labelled self-solvation, in which side chain-backbone interactions lead to adecrease in the favourability of solvation that cannot be accounted for by solventexclusion [54, 83]; so far, only one model has introduced a correction for this [29].SASA based models of the nonpolar term have been known to be flawed for sometime [88, 97, 144, 163, 164], and could explain some of the poor results found inthe studies cited. The use of solvent volume terms and terms from addtional de-compositions into cavity and attractive dispersion contributions have given betterresults [44, 45, 88, 144, 164].71.4 Issue and approachWhile many implicit solvent models assign the lowest free energy to a state simi-lar to the correct native structure [37, 70, 97, 109, 165, 170], the balance betweenconformational states of whole ensembles has been less well studied. Becauseof the focus on parameterizing with small molecules and peptides, and/or ensur-ing the folded state is stable, we hypothesized that implicit solvent models do notcorrectly balance the weights between native and non-native states. Direct deter-mination of reference ∆Gsolv values for folded and unfolded states is not an optionbecause of the aforementioned issues – lack of experimental ∆Gsolv, and the infea-sibility of computing converged ∆Gsolv for the unfolded ensemble – so a differentroute must be taken. We instead calculate the change in the free energy of solva-tion upon folding, ∆∆Gsolv, by calculating the free energy of folding ∆Gfold,S ofproteins simulated in explicit solvent two ways: directly, with the populations ofthe folded and unfolded states, and through a thermodynamic cycle (Figure 1.1)similar to those used in the molecular mechanics-Poisson Boltzman (MM-PBSA)method of Kollman and coworkers [60, 81, 145]. By rearranging the terms, the ex-pected ∆∆Gsolv, ∆∆Gexpsolv, can be directly compared with the implicit solvent modelpredictions ∆∆Gimpsolv. ∆∆Gexpsolv can be calculated as follows:∆∆Gexpsolv = ∆Gfold,S−∆Gfold,V (1.1)where ∆Gfold,S and ∆Gfold,V are the folding free energies in solution (S) and vacuum(V). The ratio of the populations of the folded (PF) and unfolded states (PU) in theexplicit MD simulations provides ∆Gfold,S:∆Gfold,S =−kBT lnPFPU(1.2)while ∆Gfold,V can be calculated as∆Gfold,V =−T∆SV +∆E¯MM,V (1.3)where ∆SV and ∆E¯MM,V are the changes in entropy and average molecular mechan-ical energy (an approximation of the change in internal energy, ∆UV), respectively,8Folded, solvated Folded, vacuum∆Gsolv,FUnfolded, solvated∆Gfold,SUnfolded, vacuum−∆Gsolv,U∆Gfold,VFigure 1.1: Thermodynamic cycle used for calculating ∆∆Gsolvupon folding in vacuum, and kB is Boltzmann’s constant. This approach also hasthe advantage of isolating the effects of the implicit solvent from inaccuracies inthe forcefield, under the assumption that they will at best reproduce explicit waterresults: even if the simulations in explicit solvent incorrectly represent the weightsof conformational states, the implicit solvent models should still give the correct∆∆Gsolv for a given set of structures. In contrast, comparing to experimental datawould include the forcefield itself in the analysis. To summarize, we use changesin free energies of solvation, rather than indirectly related structural properties, todetermine the accuracy of implicit solvent models across the conformational en-semble, relative to an explicit solvent model reference.9Chapter 2Theoretical background andcomputational details2.1 Simulations2.1.1 Molecular dynamicsMD, while being a method to study dynamics of molecules, may also be thought ofas an algorithm for sampling its conformational space. MD has two main require-ments: a description of the interactions between all particles in the system, andalgorithms for calculating the interaction forces and integrating them over time.Classical MD uses a molecular mechanical forcefield, the conventional termfor the potential energy function. The CHARMM22 forcefield [98] is one exampleof a class of nonpolarizable atomistic forcefields with similar forms used to de-scribe protein systems. The terms may be separated into bonded and nonbondedcontributions:U(x) =Ubonded(x)+Unonbonded(x) (2.1)where x is the state of the system. Analytical derivatives are available for all terms,a necessity for efficient calculation of the forces. The bonded terms consist ofharmonic approximations, with the exception of the dihedral angles, which are10described by a trigonometric function:Ubonded(x) = ∑bondsKb(b−b0)2 + ∑anglesKθ (θ −θ0)2 + ∑Urey−BradleyKUB(S−S0)2 +∑impropersKω(ω−ω0)2 + ∑dihedralsKφ (1+ cos(nφ −δ )) (2.2)where b is the bond length, θ is the valence angle, S is the distance between twoatoms bound to a common third, ω is the improper dihedral formed by three atomsbound to a common fourth, φ is the dihedral angle formed by four adjacently boundatoms, the subscript 0 terms are the equilibrium values of these geometric mea-sures, the K coefficients are force constants, n is the multiplicity of the dihedralangle term, and δ is its phase shift. The nonbonded interactions consist of pairwiseinteractions between atoms separated by more than two bonds, including both adispersion term, described by a LJ potential, and an electrostatic term, describedby a Coulomb potential:Unonbonded(x) = ∑nonbondedpairs{εi j[(Ri jri j12)−2(Ri jri j)6]+qiq j4piε0εri j}(2.3)where εi j is the well depth of the LJ potential, Rmini j is its location, ri j is the distancebetween atoms i and j, qi is the partial charge of atom i, ε0 is the permittivity of freespace, and ε is the relative permittivity, or dielectric constant. The LJ are derivedfrom combinations of atomic contributions: for the well depth εi j = (εi + ε j)1/2,while for the well location Ri j = 12(Ri +R j), with Ri and R j often referred to asthe atomic or van der Waals (VDW) radii. The many parameters involved weredetermined by a mix of experimental data and quantum mechanical calculations,and optimized with the TIP3P solvent model. Several variants of CHARMM22exist, the most common being CHARMM22/CMAP, in which additional termswere added to better reproduce backbone dihedral angle potentials as compared topotential maps calculated with quantum mechanical methods, and to several foldedprotein structures [99]. Another variant by Piana et al. [121], CHARMM22*, wascreated by reparameterizing the backbone dihedrals against quantum mechanicalcalculations and experimental conformations of several peptides, with a focus on11reproducing helical fractions; several partial charges were also modified [121].The use of the forcefield in structure calculations and simulations requiresmany algorithms, including those for force and energy determination and integra-tion. The nonbonded terms, which scale with N2, are usually approximated insome way to reduce the computational cost. Flat cutoffs may be used to removeinteractions at large distances, while more involved methods use a switching func-tion that turns off the interaction between an inner and outer cutoff distance (inthe case of the CHARMM software suite, a sigmoidal function is used [146]), or ashifted function modified such that at the cutoff the interaction disappears contin-uously [146]. Unlike dispersion interactions, electrostatic interactions may still besubstantial at long ranges, so further treatment beyond the cutoff is often required.This is usually through an approximation of the Ewald sum method, examples ofwhich include particle mesh Ewald (PME) and Gaussian split Ewald (GSE) [140].Finite difference algorithms are usually used to numerically integrate the equationsof motion, with the Verlet integrator and its variants being most popular; theseare based on a truncation of the Taylor expansion of the positions as a function oftime. The chosen ensemble may necessitate the use of a barostat or a thermostat,with Langevin methods being popular for both, and the Nose´-Hoover method alsocommonly used for the latter.Further details may be found in the texts Understanding Molecular Simulation:From Algorithms to Applications [43] and Molecular Modelling: Principles andApplications [87].2.1.2 Calculation detailsThe trajectories used in the current study were generated previously [94]. TheCHARMM22* forcefield [121] was used in equilibrium folding simulations startedfrom either crystal or extended structures. A modified TIP3P water model was usedwith long range electrostatic interactions calculated using GSE. A Nose´-Hooverthermostat was used to produce an NVT ensemble. The timestep for integrationswas 2.5 fs, with frames recorded every 200 ps. Periodic boundary conditions withrectangular cells were used. Further details on the systems and their simulationscan be found in Table 2.1.12Nr Time (µs) T (K) Nt PDBIDChignolin 10 106 340 47 N/ATrp-cage 20 208 290 20 2JOFBBA 28 325 325 32 1FMEVillin 35 125 360 62 2F4KWW domain 35 1137 360 24 2F21NTL9 39 2936 355 37 2HBABBL 47 429 298 33 2WXCProtein B 47 104 340 34 1PRBHomeodomain 52 327 360 63 2P6JProtein G 56 1155 350 33 1MIOα3D 73 707 370 20 A3Dλ -repressor 80 643 350 35 1LMBTable 2.1: Pertinent details on systems and associated trajectories. Nr is thenumber of residues, Time is the total simulation time, T is the simulationtemperature, Nt is the number of folding or unfolding events observedwith current state definitions, and PDBID refers to the closest solvedexperimental structure. The experimental structure of chignolin was notdeposited in the PDB, but is available in the SI of a previous publication[61].2.2 State definitions2.2.1 Markov state modelsPrinz et al. [122] have rigorously defined Markov state models (MSMs); the mainidea shall be sketched here. The dynamics of molecular systems are Markovian;i.e., their behaviour at a future point in time is only dependent on the current stateof the system. The state at time t will be denoted as x(t) or more simply x, whilethe probability density of a particular state in an ensemble will be denoted ft(x).The propagation of these probability densities in τ time units can be described withthe propagator, Q asft+τ(x) =Q(τ)◦ ft(x) (2.4)=∫Ωdy f (y,x;τ) ft(y) (2.5)13where f (y,x;τ) is the probability density of transitioning from state y to state xafter time τ , and Ω is state space. The spectrum of the operator can be partitionedinto a continuous and discrete region, where the discrete region is composed of aset of m isolated, dominant eigenvalue/eigenfunction pairs. The dynamics of anysystem in kτ can be decomposed into a superposition of fast and slow processes,corresponding to the continuous and discrete regions, respectively:ft+kτ(x) =Qslow(kτ)◦ ft(x)+Q f ast(kτ)◦ ft(x) (2.6)=m∑i=1λ ki 〈 ft ,φi〉φi(x)+Q f ast(kτ)◦ ft(x) (2.7)where 〈 ft ,φi〉 is the scalar product of ft and φi. Scalar products are defined on thisHilbert space of square integrable functions as〈u(x),v(x)〉=∫Ωdxu(x)∗v(x) (2.8)where ∗ denotes the complex conjugate. The fast processes can safely be ignoredin most cases, as they relax at timescales far below those usually examined. Theremaining eigenvalues can then be used to describe characteristic timescales of theprocesses:ti =−τlnλi(2.9)Thus, as the timescale being examined increases, the contribution of all processesdiminishes except for that corresponding to φ1(x), as λ1 = 1. This eigenfunctionis the stationary distribution, µ(x), describing the density of the system when it isfully relaxed.Markov state models (MSM)s, attempt to model the dynamics with a discreteapproximation of state space. The model consists of two components: the sets thatpartition state space, often referred to as microstates, and the probabilities of tran-sitioning between all pairs of these sets. State space is usually crisply partitionedinto nonoverlapping sets, S1, ...,Sn. Probability densities are now described withvectors, f(t), where the ith element is density of Si. The densities can be prop-agated with a row-stochastic transition probability matrix, T(τ), whose elementsare the probabilities of transitioning from Si to S j in lagtime τ given that the system14is in Si at time t:Ti, j(τ) = P[x(t+ τ) ∈ S j|x ∈ Si] (2.10)T(τ) is analogous to Q, and can be calculated from the transfer operator, T (anequivalent, but more mathematically convenient description of the dynamics thanQ). T(τ) propagates f(t) according to the matrix equation:f(t+ τ) = fT (t)T(τ) (2.11)Importantly, the left eigenvectors of T(τ) correspond to the eigenfunctions of Q,allowing the dynamics to be decomposed using standard spectral analysis methods,the first of which, pi , gives the stationary distribution of the system. The dynamicsare no longer Markovian: the transition probabilities may be dependent on howlong the system has been in a particular microstate, as the system requires a finiteamount of time to reach equilibrium within each state, while the models assumeinstantaneous relaxation. However, it can be proven that the error due to the dis-cretization can be made as small as desired by using a finer partition and a longerlagtime. Practically, an optimal balance must be made with the statistical error,which will increase when more microstates are used.Standard MSM creation consists of several common steps. Usually, the firststep is to coarse-grain state space by ignoring various degrees of freedom (e.g. ve-locities, solvent molecules, non-backbone atoms of polypeptides), before selectinga set of order parameters r to describe the remaining degrees of freedom. Fur-ther dimensional reduction may be performed, with one particularly useful methodbeing time-lagged independent component analysis (TICA) [123, 138]. Briefly,TICA involves a linear transformation of the coordinates into an orthogonal, un-correlated, maximally autocovaried basis set. Like how principle component analy-sis (PCA) isolates the largest amplitude motions of a molecular trajectory by max-imizing the variance of orthogonal basis vectors (principle components), TICAisolates the longest time scale motions of a molecular trajectory by maximizing theautocovariance of orthogonal, uncorrelated components (time-lagged independentcomponents (TICs)). The required transformation matrix U may be obtained by15solving a generalized eigenvector problem:Cr(τ)U = Cr(0)UΛ‡(τ) (2.12)where Cr(τ) is the time-lagged covariance matrix of the order parameters, Cr(0) isthe covariance matrix, and Λ(τ) is the diagonal matrix of autocovariances. In thecase of nondegenerate eignevalues, this amounts to diagonalizing the covariancematrix to obtain the principle components, normalizing, calculating a time-laggedcovariance matrix on the principle components, and finally diagonalizing this tofind the TICs in terms of the normalized principle components, all of which maybe combined to give U.Once an appropriate description of the relevant subspace of state space hasbeen selected, a geometric clustering is then performed using a set of trajectoriesof the system to define cluster centers. These centers are used in a Voronoi tessel-lation, in which each frame is assigned to the closest cluster center according tosome distance metric. T(τ) is estimated from counts of the number of transitionsbetween each pair of states in the trajectory set (either with a sliding window ap-proach to use all data available, or by τ intervals to obtain independent samples).The simplest approach is to use a maximum likelihood estimate:Ti, j(τ) =Ci, j∑kCi,k(2.13)where Ci, j is an element of the count matrix C. In order to select a lagtime, and asa test of the validity of the model, the transition matrix is recalculated for variousvalues of τ , and the slowest implied timescales plotted. Because the Chapman-Kolmogorov equation holds:T(nτ) = T(τ)n (2.14)the implied timescales can be shown to be constant as a function of τ:ti =−nτlnλi,T (nτ)=−τlnλi,T (τ)(2.15)meaning that if the model is Markovian, the plotted implied timescales will be16constant. However, the converse does not follow, as the eigenvectors for each tran-sition matrix may be different, so it is possible for the implied timescale to beconstant for a non-Markovian model. Other methods may be used to test for statis-tical validity of the model if an in depth analysis is desired. Optimal lagtimes areselected by choosing the value at which the implied timescale becomes approxi-mately constant, as this will balance the discretization error, which occurs at shortlagtimes, with the statistical error, which occurs at long lagtimes. Identificationof metastable, or macrostates, can be accomplished by dividing the microstatesbased on the components of the dominant eigenvectors. This is often done with thePCCA algorithm, or one of its derivatives, which carries out a kinetic clustering onthe microstates. The number of macrostates to create, m, is usually set to be onemore than the number of slow, distinct dynamical processes expected, which maybe estimated from the plot of the slowest implied timescales.More details on the theory and applications of MSMs can be found in thetextbook An introduction to Markov state models and their application to longtimescale molecular simulation [14].2.2.2 Calculation detailsThe Markov state models (MSMs) were created and validated using EMMA [139].To create a space to cluster on, a time lagged independent component analysis(TICA) [123] was performed on the CA-CA distances of residues separated by atleast 7 other residues in the sequence, with frames separated by 1 ns. Cluster-ing was then carried out using a k-means algorithm and a Euclidean metric on theselected TICs. The PCCA+ algorithm was used to identify macrostates, and themacrostate with the greatest similarity to the experimental structure (as measuredby RMSD) was identified as the native state. Other folded macrostates were la-beled as intermediates and excluded from the analysis. The remaining macrostateswere identified as unfolded states. To remove spurious transitions, states were onlydefined at intervals equal to the lagtime. The states were defined starting from thefirst and last frame; frames that were not defined the same both ways were con-sidered transition states. Parameter selection, while requiring a certain amount oftrial and error, was generally robust to the exact values selected (values chosen17TICA lagtime (ns) TICs Microstates Lagtime (ns) MacrostatesChignolin 50 1 100 200 2Trp-cage 25 3 100 300 4BBA 1000 3 100 500 4Villin 75 2 100 500 3WW domain 5000 1 100 700 2NTL9 2000 1 100 2000 2BBL 500 3 200 300 5Protein B 50 4 400 800 5Homeodomain 150 1 100 400 2Protein G 500 4 200 300 3α3D 1000 4 100 1000 3λ -repressor 1000 2 100 1000 3Table 2.2: Parameters used in the creation of the MSMs. TICA lagtime isthe lagtime used in the TICA calculations, TICs is the number of slowestTICs clustered on, Microstates is the number of miscrostates used in theMSM, Lagtime is the lagtime selected for the MSM, and Macrostates isthe number of clusters selected to be output from the PCCA+ algorithm.are given in Table 2.2). Parameters were selected to create the simplest modelspossible while retaining the important information: the identification of folded andunfolded states.2.3 Energy calculations2.3.1 Definition of implicit solventRoux and Simonson [132] have provided rigorous definitions of implicit solventmodels as PMFs. A PMF is a function that gives the free energy for some reduceddescription of the system. Often the reduced description is defined by a set of orderparameters, functions which map the coordinates to some lower dimensional space,or simply by a subset of the coordinates used to define the conformation. The PMFacts as an effective potential: when differentiated, it gives the mean force on thereduced coordinate set of the current conformation. Here, the degrees of freedomare partitioned into the solute x and solvent y coordinates. Let f (x,y) be the jointprobability density function of the solute and solvent conformations in an NVT18ensemble:f (x,y) =e−U(x,y)/kBT∫∫dxdye−U(x,y)/kBT(2.16)where U(x,y) is the potential energy, kB is Boltzmann’s constant, and T is thetemperature. Assume the potential energy of the system can be decomposed intoan intrasolute potential Uuu(x), an intrasolvent potential Uvv(y), and solute-solventinteraction potential Uuv(x,y). f (x,y) may be marginalized over y:f (y) =∫dy f (x,y) (2.17)=∫dye−[Uuu(x)+Uvv(y)+Uuv(x,y)]/kBT∫∫dxdye−[Uuu(x)+Uvv(y)+Uuv(x,y)]/kBT(2.18)=eW (x)/kBT∫dxeW (x)/kBT(2.19)where W (x) is the solute PMF. This marginalized distribution may be used tocalculate quantities Q averaged over the ensemble:〈Q〉=∫∫dxdy f (x,y) (2.20)=∫dx f (y) (2.21)W (x) may be expressed directly:eW (x)/kBT =∫dye−[Uuu(x)+Uvv(y)+Uuv(x,y)]/kBT∫dye−Uvv(y)(2.22)=⇒ W (x) =−kBT ln∫dye−[Uuu(x)+Uvv(y)+Uuv(x,y)]/kBT +kBT ln∫dye−Uvv(y)/kBT (2.23)Note that the absolute potential of mean of mean force is relative to a state in whichthe solute interacts neither with itself or the solvent, hence the addition of the de-19nominator in Equation 2.22. The contribution of the solvent to W (x) is normallythe quantity of interest, which may also be referred to as the free energy of solva-tion for a fixed conformation of the solute. This contribution may be isolated byremoving the intrasolute potential from W (x):W (x) =Uuu(x)+∆W (x) (2.24)The forms of ∆W (x) analogous to Equations 2.22 and 2.23 are only different bythe absence all instances of Uuu(x). The partial derivative of ∆W (x) with respectto one of the coordinates may be related to the average force due to the solvent onthat coordinate, 〈Fuv,xi(x,y)〉x:δ∆W (x)δxi=〈δUuv(x,y)δxi〉x= 〈Fuv,xi(x,y)〉x (2.25)Expressing the potential energy with a coupling constant,U(x,y,λ ) =Uuu(x)+Uvv(y)+Uuv(x,y,λ ) (2.26)where Uuv(x,y,0) = 0 and Uuv(x,y,1) =Uuv(x,y), allows ∆W (x) to be expressedin terms of an alchemical thermodynamic process:∆W (x) =∫ 10dλ δ∆W (x)δλ=∫ 10dλ∫dyδUuv(x,y,λ )δλ f (x,y)=∫ 10dλ〈δUuv(x,y,λ )δλ〉x,λ(2.27)Assume Uuv(x) can be separated into a nonpolar potential Uuv,np(x) and polar po-tential Uuv,pol(x); then ∆W (x) to be decomposed into two corresponding compo-20nents:∆Wnp(x) =−kBT ln∫dye−[Uvv(y)+Uuv,np(x,y)]/kBT +kBT ln∫dye−Uvv(y)/kBT (2.28)and∆Wpol(x) =−kBT ln∫dye−[Uvv(y)+Uuv,np(x,y)+Uuv,pol(x,y)]/kBT +kBT ln∫dye−[Uvv(y)+Uuv,np(x)]/kBT (2.29)They also may be expressed in terms of alchemical thermodynamic processes:∆Wnp(x) =∫ 10dλ1〈δUuv(x,y,λ1,0)δλ1〉x,λ1(2.30)∆Wpol(x) =∫ 10dλ2〈δUuv(x,y,1,λ2)δλ2〉x,λ2(2.31)where λ1 is the coupling parameter for the polar potential, and λ2 is the couplingparameter for the nonpolar potential. While the decomposition is path dependent,it corresponds to physically meaningful processes: the nonpolar term is the workrequired to move the uncharged solute into a cavity in the solvent, and the polarterm is the work required to charge the solute in this cavity.These forms provide a starting point for rigorous connections to approxima-tions of the solvent effects.2.3.2 Continuum electrostatic approximationsThe PB equation is used in many implicit solvent models as the foundation of thepolar term [125]. The PB equation may be derived from the Poisson equation [4],a fundamental relation in classical electrostatics relating the charge density ρ(r) tothe electric potential φ(r) at any point r:−∇ · ε(r)∇φ(r) = ρ(r) (2.32)21where ε(r) is the dielectric function. The assumptions behind the Poisson equationabout the nature of the polarizability of the system, being expressed as a dielectriccoefficient, imply that is not capable of describing nonlinear behaviour (as wouldoccur with high charge) and nonlocal behaviour (as would occur over short length-scales in water due to hydrogen bond networks) [125]. The expansion of the chargedensity to include both the m solute partial charges:ρs(r) =4pie2ckBTm∑i=1qiδ (r−xi) (2.33)where ec is the charge of an electron, qi is the magnitude of partial charge i withposition xi, and δ is the delta function; and a mean field approximation of M mobileionic species of bulk concentration c j and charge Q j:ρm(r) =4pie2ckBTM∑jc jQ je−q jφ(r)−Vj(r) (2.34)≈ κ¯2(r)sinhφ(r) (2.35)where Vj(r) is a steric potential, the collection of terms κ¯(r) is referred to as themodified Debye-Huckel screening factor, and the second equality holds for a one-to-one electrolyte; gives the common form of the full PB equation:−∇ · ε(r)∇φ(r)+ κ¯2(r)sinhφ(r) = 4pie2ckBTM∑i=1qiδ (r−xi) (2.36)The mean field assumption made for mobile charges means that ion specific effectsare ignored, that effects related to discrete ionic interactions will not be described,and that unrealistic concentrations can occur; however, for systems that are nothighly charged or contain low electrolyte concentrations, these effects are usuallynegligible [125]. Being a nonlinear second order differential equation, a linearizedversion is often used instead for more efficient solutions:−∇ · ε(r)∇φ(r)+ κ¯2(r)φ(r) = 4pie2ckBTM∑i=1qiδ (r−xi) (2.37)22The linearization of the PB equation holds as long as the electric potential is not toohigh. The polar contribution to the free energy of solvation can then be calculated:∆Gpol =12∫drφreac(r)ρs(r) (2.38)=12m∑iqiφreac (2.39)where φreac(r) = φsol(r)−φvac(r) is the reaction potential, which is caused by po-larization of the solvent by the solute; φsol(r) is the electric potential in solution,and φvac(r) is the electric potential in vacuum.The Poisson equation may be solved analytically in the case of a point chargeq in a sphere of radius Rborn; the free energy of solvation in this case is known asthe Born formula [8]:∆Gborn =−(1−1εs)q22Rborn(2.40)R is often referred to as the Born radius. If a molecule was composed of N atomsof sufficient separation, the polar solvation free energy may be calculated as:∆Gpol =−(1−1εs) N∑iq2i2Rborn,i−12(1−1εs) N∑iN∑j 6=iqiq jri j(2.41)where ri j is the distance between atoms i and j. The idea of the GB approach is tocalculate effective Born radii for atoms in real molecules, radii that when used inthe Born formula give the correct free energy of solvation of charging the atom inthat molecule with all other charges turned off, and effective interaction distancesfor pairs of charged atoms. The free energy of solvation may be expressed as:∆Gpol =−12(1−1εs)∑i, jqiq jfGB(ri j)(2.42)where fGB(ri j), in the original formulation, is:fGB(ri j) = [r2i j +Rborn,iRborn, je−r2i j/4Rborn,iRborn, j)]12 (2.43)23To find the effective Born radii, the equation is first rearranged:Rborn,i =(1−1εs)q2i∆Gself,i(2.44)A solution for ∆Gself,i, the atomic self energy contribution to ∆Gpol [134], is needed.The free energy of an assembly of charges in a linearly polarizable continuum maybe expressed in terms of the electric displacement field D(r) and the electric fieldE(r):Gpol =12∫drρ(r)φ(r) (2.45)=18pi∫drE(r) ·D(r) (2.46)By assuming the electric displacement is Coulombic, known as the Coulomb fieldapproximation (CFA):Di(r) = qir−xi|r−xi|3(2.47)and expressing the electric field in terms of the electric displacement:E(r) =D(r)−P(r)ε0=D(r)ε0εs(2.48)where P(r) = ε0(εs−1)E(r) is the polarizability density in a linear, homogeneous,isotropic dieletric; Gself,i may evaluated:Gself,i =18pi∫drq2i xi|xi− r|4(2.49)=18pi∫pdrq2i|xi− r|4εp+18pi∫sdrq2i|xi− r|4εs(2.50)where the first integral is over the protein volume with dielectric constant εp, andthe second is over the solvent volume. ∆Gself,i may then be calculated by taking adifference between the free energy of the solvated system and the system in vac-uum:∆Gself,i =−18pi(1−1εs)∫sdrqi|xi− r|4(2.51)24allowing the effective Born radii to be calculated:R−1born,i =14pi∫sdr1|xi− r|4(2.52)The CFA is know to be inadequate [8], but it is still often used as a starting point inmany GB based models. Note that dielectric constant of the protein environmentdoes not explicitly enter the GB equations. However, if εp 6= 1, then the referenceenvironment should be set to that value, giving a modified expression for ∆Gpol:∆Gpol =−12(1εp−1εs)∑i, jqiq jfGB(ri j)(2.53)GB models may be further modified to include the effects of an electrolyte concen-tration by including the Debye-Huckel screening parameter κ:∆Gpol =−12(1εp−e−κ fGB(ri j)εs)∑i, jqiq jfGB(ri j)κ =(εsε0kbT2NAe2cI)−1/2(2.54)where NA is Avogadro’s number and I is the ionic strength.The form and parameterization of implicit solvent models with polar termsbased on continuum electrostatic approximations tested in the current study arereviewed below.PBEQThe PB model used here is implemented in CHARMM as part of the PBEQ mod-ule. In the PB equation, both ε(r) and κ(r) require the boundary between thesolute and solvent to be defined. Because definitions based on VDW volume over-lap or solvent probe defined molecular surfaces have discontinuous derivatives atthe boundary, they are unsuitable for simulation applications. Here, we use the def-inition of Nina et al. [113], which uses a smoothing function to add an intermediate25region to a volume exclusion function based on overlapping radii:H(r) =∏iHi(|r−xi|) (2.55)Hi(r) =0, r ≤ Ri−w− 14w3 (r−Ri+w)3 + 34w2 (r−Ri+w)2, Ri−w < r < Ri+w1, r ≤ Ri+wwhere w is the half width of the transition region, r is the distance between atom iand r, and Ri is the atomic radius of atom i. These can then be used in models forthe dieletric function and the modified Debye-Huckel screening factor:ε(r) = 1+(εs−1)H(r) (2.56)κ¯(r)2 = κ2H(r) (2.57)The PB equation is usually solved numerically, and several methods have beendeveloped, including finite difference, finite element, and boundary element ap-proaches [96]. Here, a finite difference method is used. It requires the area ofinterest be divided into a grid, which is then used to reformulate the PB equation[113]:εx(i, j,k)[φ(i+1, j,k)−φ(i, j,k)]+εx(i−1, j,k)[φ(i−1, j,k)−φ(i, j,k)]+εy(i, j,k)[φ(i+1, j+1,k)−φ(i, j,k)]+εy(i, j−1,k)[φ(i+1, j−1,k)−φ(i, j,k)]+εz(i, j,k)[φ(i+1, j,k+1)−φ(i, j,k)]+εz(i, j,k−1)[φ(i+1, j,k−1)−φ(i, j,k)]−κ¯2(i, j,k)φ(i, j,k)h2 =−4pi q(i, j,k)h(2.58)where h is the spacing of a grid with vertices xi,y j,zk, q(i, j,k) = h3ρ(i, j,k), andεx(i, j,k) = (xi+h/2,y j,zk). These equations may be arranged into a matrix equa-26tion. Assumptions must be made about the potentials at the edge of the grid; themodel used here employs a focusing method in which a large coarse grid is firstsolved with the boundary potentials calculated with the Debye-Huckel approxi-mation, followed by a smaller, finer grid which uses the results of the previouscalculations as its boundaries [48]. Iterative methods are used to solve the equationset, here the successive overrelaxation method [110]. In the calculation of the reac-tion potential, the vacuum potential must be calculated with the same grid so thatthe artificial interactions between charges of the same origin spread across multiplegridpoints can be eliminated [4].Parameters that must be chosen are the size of the grids, the half width of thesmoothing function, the dielectrics of the solute and solvent, and sets of partialatomic charges and atomic radii. Grids were calculated separately for each confor-mation of the solute; the coarse-grained grid was set to have dimensions equal tothe longest length in each plus 20 A˚ on each side with a resolution of 1 A˚, whilethe fine-grained grid had only an additional 5 A˚, but a resolution of 0.2 A˚ (the morecommonly used 0.5 A˚ has been found to be insufficient in many cases [56]). A halftransition width of 0.3 A˚ was used. Atomic radii were originally optimized by tak-ing the first solvent density peak calculated from simulations carried out with theTIP3P water model and the CHARMM22 forcefield on neutrally-capped tripep-tides for each amino acid and optimized with the simple atomic volume overlapsurface model by comparing the ∆Gpol values calculated with explicit solvent tothose calculated with the PB model [112]. In a later study, these calculations wererepeated with the smoothed dielectric boundary for varying transition widths, andan empirical correction for the atomic radii derived [113].GBMV2Generalized Born using Molecular Volume 2 (GBMV2) is a model based on theCFA and a correction term to it [89, 90]. The correction term was discovered27empirically, giving the following expression for ∆Gself,i:∆Gself,i =−166τ(1−1√2)(1Ri−14pi∫ ∫ ∞RidrdΩV (r)|r−xi|4)−166τ(14R4i−14pi∫ ∫ ∞RidrdΩV (r)|r−xi|7)1/4(2.59)where τ =(1εp −1εs), the first integral in each term is over the angular componentΩ, and the second is over the radial component r (the CFA term has been convertedto an integration over all space by adding a term related to the solute volume,V (r)). The integrals are approximated with numerical techniques. The volumerelated term involves contributions from both a molecular volume term SMV2(r)and a VDW volume term SVDW(r):V (r) =11+ eβ ((1− f (u))SMV2(r)+SVDW(r))−λ )(2.60)where β and λ are fitted parameters. The molecular volume is approximated as asuperposition of atomic volume functions FMV2(t j), where t j = r−x j, modulatedby a term to correct for overestimation of volume on the outside of the protein andoverestimation of the size of gap regions within:SMV2(r) = S0∑jFMV2(tt)∑ j |t j|2F2MV2(t j)|∑ j t2jFMV2(t j)|2FMV2(t j) =C2j(C j + |t j|2−R2j)2; C j = P1R j +P2(2.61)where S0, P1 and P2 are empirical parameters. For efficiency, they use a simplerfunction for regions within the VDW radii:SVDW(r) = 2∑jf (µ j) (2.62)28The switching function is defined as:f (u) =1, u≤ 01−10u3 +15u4−6u5, 1 > u > 00, u≤ 1u =|t|2− (Ri+ t−)2(R j + t+)2− (R j + t−)2(2.63)where t+ and t− are empirical parameters, of which there is two sets, one for themolecular surface switching function, and one for the VDW switching function.The Born radii derived from these expressions are then scaled linearly:RSborn,i =C1Rborn,i+C0 (2.64)where C0 and C1 are empirical parameters. Salt effects are taken into account bythe use of a Debye-Huckel screening parameter.The parameters involved in the linear scaling of the Born radii were optimizedagainst values calculated with the PB equation on a set of small proteins. Theremaining parameters were determined through trial and error coupled with theCHARMM22 forcefield, CHARMM22 VDW radii, and a SASA based nonpolarterm; details were not clearly provided.GBSWThe Generalized Born with a Simple Switching (GBSW) model [65] takes a similarapproach to the GBMV models. A correction term, ∆G1self,i is added to the CFA,∆G0self,i in the evaluation of the effective Born radii:∆Gself = a0∆G0self +a1∆G1self (2.65)29where a0 and a1 are empirical parameters. The terms are defined as:∆G0self =−12τq2i(1ηi−14pi∫|r−xi|>ηidrV (r)|r−xi|4)(2.66)∆G1self =−12τq2i(14η4i−14pi∫|r−xi|>ηidrV (r)|r−xi|7)1/4(2.67)where τ =(1εp −1εs), and ηi is an arbitrarily defined starting point for the integra-tion. The volume function is calculated using overlapping atomic radii with thesmoothing function defined in Equation 2.55:V (x) = 1−H(x) (2.68)The CFA is further approximated to give an analytical form:∆G0self,i =−12τq2i(1ηi−14pi ∑m ∑nwmwnV (xi+ rmn)r2mn)(2.69)while the correction is approximated as:∆G1self,i =−12τq2i(14η4i−14pi ∑m ∑nwmwnV (xi+ rmn)r5mn)1/4(2.70)where m and n are indices for radial and angular integration points, respectively,mw and nw are the respective weights for those integration points, and rmn is theintegration point at the origin. Salt effects are taken into account with a Debye-Huckel screening factor.The empirical parameters in the model were optimized by minimizing the er-ror of ∆Gself,i calculated with the model to values calculated with the PB equa-tion for all atoms of a single experimental structure of a folded protein. TheCHARMM22 forcefield was used with the optimized atomic radii of Nina et al.[112, 113]. A later study attempted to improve the balance between intra-proteininteractions and protein-solvent interactions with updated radii [21]. The PMFs ofseveral backbone-side chain hydrogen bonds calculated in explicit solvent using theCHARMM22/CMAP forcefield were used to tune the VDW radii and CMAP po-30tentials; following this, simulations of several small peptides were run with the im-proved parameter sets and further optimization carried out by matching secondarystructure frequencies to experimental values. A second study [21] introduced amodification to the evaluation of the Born radii, using an empirical form:R−1born,i = D−2C0τq2i∆G0self,i−2C1τq2i∆G1self,i (2.71)where D, C0, and C1 are empirical parameters. The intention was to reproduce amolecular surface by tuning the parameters to molecular surface-based PB resultson a set of small molecules. Further optimization of the radii and CMAP potentialswas carried out in a manner similar to the previous study. In both of these studies,the PMF calculations and simulations included a SASA based nonpolar term.FACTSFast Analytical Continuum Treatment of Solvation (FACTS)[53] is based on theidea that the Born radius is related to solvent displacement around an atom. Theycombine a measure of volume and a measure of symmetry, and empirically deter-mine a relationship between them and ∆Gself,i:∆Gself,i =−a1 +a11+ e−a2(Ci−a3);a1 =τ2Ri(1+ e−a2a3)(2.72)where a2, and a3 are empirical parameters. Ci combines the solute occupied volumeAi and symmetry Bi around atom i:Ci = Ai+b1Bi+b2AiBi (2.73)where b1 and b2 are empirical parameters. The solute occupied volume is approxi-mated as a weighted combination of VDW volumes:Ai =∑j 6=iVjΘi j (2.74)31and the symmetry as a weighted combination of unit vectors xˆ between atom i andatom j:Bi =∣∣∣∣∣∣∑ j 6=iVjri jΘi j1+∑ j 6=iVjri jΘi j∣∣∣∣∣∣(2.75)where xi j = xi−x j, xˆi j =xi jri j, andΘi j =(1−(ri jRspherei)2)2, ri j ≤ Rspherei0, ri j > Rsphereiwhere Rspherei is an empirical parameter. Salt effects are taken into accout with aDebye-Huckel screening parameter.The parameters were determined by minimizing the error between the self sol-vation energies calculated with FACTS and with a finite difference PB model usingCHARMM22 VDW radii and unit charges (rather than the partial charges from theCHARMM22 parameter set) on a dataset of protein structures. For each struc-ture, a folded, molten-globule-like, and extended conformation were included, thefirst set coming from experimental structures, while the latter two from simulationswith the CHARMM19 forcefield and the SASA implicit solvent model [42].ACEThe Analytical Continuum Solvent (ACE) solvent model [134] is based on the GBformalism and CFA approximations. The self energies are calculated by expressingthe integral over the protein volume as a sum of individual atomic contributions,and after using the CFA, modeling the charge and volume densities as Gaussian32distributions:∆Gself,i =−τq2i2Rcharge,i+∑i6= jGself,i j; Gself,i j =τ8pi∫drD(r)Pj(r)Di(r) =−qi∇erf(ai|r−xi|)|r−xi|; Pj(r) =43pi1/2α3e−(r−x j)2(αR˜ j)3 ; ai =pi1/221/2Rcharge,i(2.76)where α is an empirical parameter, erf denotes the error function, and Rcharge,i isthe charge radius (equal to the VDW radius except in the case of hydrogen, seebelow for formal definition). The volume density P(x) uses effective volumes foreach atom type V˜i, with associated radii R˜i, which are assumed to be independent ofthe protein geometry. This integral must be approximated to be solved analytically.The resulting expressions are rather convoluted, and included without expositionfor completeness:Gself,i j =(1−1εs)q2iωi je−r2i j/σ2i j +(1−1εs)q2i V˜j8pi(r3i jr4i j +µ4i j)41ωi j=43piα3i j(Qi j− arctanQi j)1αi jR˜ j;σ2i j =3(Qi j− arctanQi j)(3+ fi j)Qi j−4arctanQi jα2i jR˜2j ;Qi j =q2i j(2q2i j +1)1/2; fi j =2q2i j +1−12q2i j +1; q2i j =pi2(αi jR˜ jRcharge,i)2;αi j = Max(α,Rcharge,i/R˜ j); Rcharge,i = MaxNj (di j); di j = R jri jµi j =77pi(21/2)512(1−2pi3/2σ3i jRiωi jV˜j)Ri(2.77)The effective volumes, their associated radii, and the smoothing parameter α wereoptimized on experimental structures using the PARAM19 and PARAM22 radiiwith a method based on Voronoi polyhedra [135]. A later study provides a uni-33form correction to the volumes which were determined by comparing simulatedproperties of the folded states of three proteins to experimental values, as wellby comparing self solvation energies to PB calculations; however, this was onlydone with the PARAM19 forcefield [17]. In the CHARMM implementation ofACE, additional options are provided for an updated model referred to as Analyt-ical Continuum Solvent (ACE2), which puts an upper bound on the Born radii.No details are given on how this is done, and no papers are cited; however, theACE2 additions are used here as they are used in the only given example setup inthe CHARMM documentation; thus we consider it a default setting. Preliminarycalculations show it gives no significant systematic trend in the measures discussedin the results sections.SCPThe Screened Coulomb Potentials (SCP) implicit solvent model [57] takes a dif-ferent approach than the GB based models. No boundary between the solute andsolvent is used; instead, an empirical dielectric screening function is defined andrelated to the dielectric function under the assumption of spherical symmetry:D(r) =εs+11+ ke−λ (εs+1)r−1;ε(r) = D(r)(1+rD(r)dD(r)dr)−1(2.78)where k = εs−12 and λ is an empirically determined parameter. Using these rela-tions in the expression of the electrostatic free energy in terms of displacement andelectric fields and taking the difference between a solvated state with electrostaticscreening function Ds(r) and a vacuum state with electrostatic screening functionDv(r) gives the polar contribution to the free energy of solvation:∆Gpol =12 ∑i 6= jqiq jri j(1Ds(ri j)−1Dv(ri j))+12∑iq2i[1RBs,i(1Ds(RBs,i)−1)−1RBv,i(1Dv(RBv,i)−1)](2.79)34where RBs,i and RBp,i are the effective Born radii in solvent and in vacuum envi-ronments, respectively. They approximate the solvent and vacuum effective Bornradii by scaling contributions from Born radii in a pure solvent medium Rw,i, a pureprotein medium Rp,i, and in vacuum Rv,i with a measure of solvent accessibility ξi(in the original paper this involved numerically calculated SASAs, but was updatedto an analytical approximation in a later study [59]):RBs,i = Rw,iξi+Rp,i(1−ξi);RBv,i = Rv,iξi+Rp,i(1−ξi);ξi =14piR2i[Ai−Bi∑i 6= je−Ciri j](2.80)where Ai, Bi, and Ci are empirical parameters, and the Born radii are approximatedin relation to the covalent radii, Rcov,i:Rw,i = Rcov,i+hi; Rp,i = Rcov,i+gi; Rv,i = Rcov,i (2.81)A modified form is used for polar hydrogens to take into account that their VDWradii overlap with acceptor atoms in hydrogen bonds, where the VDW overlapbetween a specific acceptor and polar hydrogen is considered a unique dielectricenvironment with its own Born radius Ri j [58, 59]:RBs,i = Rw,iξ +m∑iRi jζi j +Rp,i(1−ξi−m∑iζi j);Ri j = Rcov,i+gi j; ζi j =Ri(R2j − (Ri+ ri j)4R2i jri j(2.82)where m is the number of acceptor atoms overlapping with polar hydrogen i.The empirical radius modification parameter for the solvent Born radius hi waschosen to be only dependent on the sign of the charge of the atom, giving onevalue for each. Averaging over atomic cavities in experimental structures led tothe selection of a single value to be added to hi to give gi. The screening param-eters λi are defined based on atom type for Ds(r) and residue type for Dv(r); for35the interaction term, λi j = (λiλ j)1/2. They optimized values by comparing exper-imental solvation free energies to those calculated from the model for amino acidside chain analogues. Values for gi j were determined by selecting the values thatgave best agreement of hydrogen bond strengths calculated with the model and theCHARMM22 forcefield using neutrally capped amino acids and accepted hydro-gen bonds strengths; they found they could decompose gi j into a donor contributionaveraged over all acceptors and an acceptor contribution averaged over all donors[58]. The parameters used in the measure of solvent accessibility, Ai, Bi, and Ci aredependent on the element, and were determined by optimizing the fit with self en-ergies for all atoms of a folded protein calculated with the original SASA method[59]. It is not made clear how Rcov,i were obtained.2.3.3 Effective energy functionsThe self-energy and nonpolar contributions to ∆Gsolv may be combined into a sin-gle term. This idea may be expressed rigorously by assuming no solute-soluteelectrostatic interactions and defining a solvation free energy density f (r) and in-tegrating over space [86]:∆Gdmfi =∫dr f (r) (2.83)which may be decomposed into individual solute group contributions:∆Gdmfi =∑i∆Gsolv,i (2.84)This has been referred to as a direct mean-field interaction contribution [162].These are usually approximated by scaling experimental free energies of relevantmodel compounds ∆Gref,i by their solvent accessibility in the protein. Togetherwith a modification of the Coulomb term to take into account the protein-proteinelectrostatic interaction contributions (screening), a full implicit solvent model canbe created; these are usually referred to as effective energy functions.Temperature dependence may be introduced into the model if enthalpies ∆Href,iand heat capacities ∆Cp,i at the reference temperature T0 are also measured [86].By assuming the heat capacity is independent of temperature, an integral of the36entropy over temperature may be taken, giving:∆Gref,i(T ) = ∆Gref,i(T0)−∫ TT0dT ∆Sref,i(T )= ∆Gref,i(T0)−∆Sref,i(T0)[T −T0]−∆Cp,iT lnTT0+∆Cp,i[T −T0]∆Sref,i =∆Href,i(T0)−∆Gref,i(T0)T0(2.85)EEF1In the approach taken by Lazaridis and Karplus [86] for Effective Energy Function1 (EEF1), the integral is reduced to only the solvent excluded volume, which is thenbe approximated by a product between atomic volumes of the solute group (whichin this case are single atoms) Vj and the solvation free energy density functionf j(ri j):∆Gdmfi,i = ∆Gref,i−∫Vjdr fi(r) = ∆Gref,i−∑jf j(ri j)Vj (2.86)The free energy density distribution is modeled as a Gaussian:fi(r) =αi4pir2 e−x2i ; xi =r−Riλiαi =2∆Gfree,ipiλi(2.87)where λi and ∆Gfree,i are empirical parameters. ∆Gfree,i is used instead of ∆Gref,ibecause the reference value is measured for an isolated atom, while in the protein itwill be covalently bound to other atoms, reducing its exposure to the solvent evenin the most extended state. The volume of atom i is calculated as the VDW volume37minus half the volume of the overlap with all covalently bound atoms [71]:Vi =43piR3i −∑j12V (2)i j ;V (2)i j =pi3(2R3i +2R3j + r3i j)−pi[R2i r∗i j +(ri j− ri j∗)(R2jri jr∗i j)];r∗i j =R2i −R2j + r2i j2ri j(2.88)The Coulomb term was modified with the addition of a distance dependent dielec-tric simply by setting ε = ri j.EEF1 was originally designed and optimized for the CHARMM19 forcefield.Corrections to the ∆Gref,i are made to account for the use of cutoffs in the treat-ment of long-range interactions: the model definition includes the use of a switch-ing function with an inner cutoff of 7 A˚ and an outer of 9 A˚. The LJ term is thusintegrated from 9 A˚ to infinity and subtracted from ∆Gref,i values. The values for∆Gfree,i were determined by calculating ∆Gdmfi,i of atoms buried in the interior ofa folded protein and adjusting ∆Gfree,i until some were equal to zero; for ionicgroups, ∆Gfree,i = ∆Gref,i. The partial charges of ionic atom groups were modifiedsuch that their net charge was zero. The value of λi is set to a single value, thewidth of the first solvation shell, for all but the ionic solute groups; a larger value isused based on the stabilities of several folded proteins during simulations run withthe model. A subsequent study compared the PMFs calculated for interactions be-tween ionic side chains of individual amino acids and compared to explicit solvent,finding the interactions to be too favourable [101]. Based on these results, modifi-cations were made to the partial charges, although the changes are only mentionedin the CHARMM documentation. Parameters for the CHARMM22 forcefield werealso introduced, but again, they are only mentioned in the CHARMM documenta-tion. Bottaro et al. [13] developed updated parameters for the CHARMM36 force-field [12] with an approach that allows simulated ensembles to be matched to ex-plicit references; here they chose two small peptides with α-helical and β -hairpinstructure. The refer to the model as EEF1-SB.38ABSINTHSelf-Assembly of Biomolecules Studied by an Implicit, Novel, and Tunable Hamil-tonian (ABSINTH) [162], while referred to as an implicit solvent model, is a fullforcefield intended for Monte Carlo (MC) simulations of intrinsically disorderedproteins. The implicit treatment of the solvent is in the form of a direct mean fieldinteraction and a screening term for Coulombic interactions. The direct mean fieldinteraction is approximated as:∆Gsolv =∑iζi∆Gref,i; ζi =∑jλ ijvij (2.89)where ζi is the solvent accessibility of solute group i and λ ij are the weight factorsof the solvation states vij of atom j of solute group i. The solvation states are deter-mined by the fraction of atomic solvent accessible volume (SAV) to its maximumvalue, modulated by a sigmoidal switching function:vij =[1+ e−η ij−d1τd]−1d2 +d3d1 = χdη ij,max +(1−χd)η ij,mind2 =[1+ e−η ij,max−d1τd]−1−[1+ e−η ij,min−d1τd]−1d3 = 1−d2[1+ e−η ij,max−d1τd]−1η ij = 1−1V ij,max∑k∑lγ jl4pi3(dkl2)3V ij,max =4pi3(rw+dij2)3−(dij2)3(2.90)where τd , χd are empirical parameters, rw is the radius of water (which is effectivelyanother empirical parameter), γi j is the overlap fraction (details of its computationare not given), η ij,max and η ij,min are the maximum and minimum SAV fraction (it is39stated they are constant for a given atom, but information is not given on how theyare determined), and dij is the diameter of atom j of solute group i (it is mentionedthat they are derived from the LJ parameters, although details are not given). Thescreening of the solute-solute charge interactions is approximated by multiplyingthe term by a screening function:si j = (1−avki )(1−avlj); a = 1−1εs(2.91)where the solvation state functions differ from those used for the direct mean fieldinteraction by using the distinct empirical parameters χd and τd .The optimization of the terms used here was done through trial and error, whichincluded comparing calculated quantities from simulations of small model com-pounds, peptides and proteins (of the native state, folding/unfolding, and intrinsi-cally disordered protein (IDP) ensembles) to data from explicit solvent simulationsor experiments; the details are not given. A new set of LJ parameters was devel-oped, while for the charges the OPLS-AA charge was selected. Parameters fortemperature corrections were measured by Wuttke et al. [169].2.3.4 Calculation detailsThe PB model, GBMV2, FACTS, ACE, SCP, and EEF1 are implemented in theCHARMM molecular simulation program [15], while ABSINTH is implementedin the CAMPARI molecular modeling program [168]. Energies were back calcu-lated at 50 ns intervals for NTL9, and at 5 ns intervals for BBL and Homeodomain;for the remaining systems, a 10 ns interval was used. The external, or solvent, di-electric constant was set to 80 for all models. The remaining settings for these andthe other models were left at default values unless otherwise specified. The SASAterm was calculated with a grid-based method implemented in CHARMM (COORmodule) using a probe radius of 1.4 A˚. Standard CHARMM22* settings were usedfor the vacuum calculations. Minimization of trajectories was carried out with thesteepest descent algorithm until an energy difference of less than 0.1 kcal/mol wasobserved.Here, we assumed that vacuum settings should be calculated with default CHA-RMM22* parameters and nonbonded settings. This includes a constant dielectric40of 1, a shifting function on the electrostatics with a cutoff of 12 A˚, a switchingfunction for LJ interactions with an inner cutoff of 10 A˚ and an outer cutoff of 12A˚, and a nonbonded list cutoff of 14 A˚. Solvent effects were considered as anychange to this vacuum energy, whether it be a direct consequence of the terms inthe model, or due to any additional parameters and/or settings the model employs.The PB model, FACTS, and SCP do not introduce any nonstandard parameters orsettings. However, the implementation documentation for GBMV2, GBSW, ACE,and ABSINTH recommends several nonstandard nonbonded settings, while non-standard settings are an integral part of the EEF1 model. GBMV2 recommends aswitching function on both the electrostatic and the LJ interactions with an innercutoff of 16 A˚ and an outer cutoff of 18 A˚, and a nonbonded list cutoff of 21 A˚.GBSW recommends a switching function on the cutoffs for both the electrostaticand the LJ interactions with an inner and outer cutoff of 16 A˚, and a nonbondedlist cutoff of 20 A˚. ACE also recommends a switching function on both electro-static and LJ interactions, but with an inner cutoff of 8 A˚ and an outer cutoff of12 A˚, and a nonbonded list cutoff of 13 A˚. EEF1 requires a distance dependentdielectric, a switching function on both the electrostatic and LJ interactions withan inner cutoff of 7 A˚ and an outer cutoff of 9 A˚, and a nonbonded list cutoff of 10A˚. Further, EEF1 requires that the charged residues be neutralized, requiring thetopology file itself be modified. In the case of ABSINTH, being implemented ina different molecular modeling suite, many things are calculated differently; thusthe solvent effect was defined as the difference between the total energy calculatedwith the model turned on and with the model turned off, keeping all other settingsstandard (other than the use of the nonstandard CHARMM22* charge set). Sinceonly relative energies are being considered in this study, for these choices to havean impact would require the effect to differentially favour the folded or unfoldedstate. Finally, it could be argued that vacuum energies for a given solvent modelshould be calculated with the same parameters and settings, and implicit solventmodels be defined as any change to these energies; however, it makes no differ-ence to this analysis, as we are only interested in the difference between ∆∆Gexpsolvand ∆∆Gimpsolv, which would remain unchanged.When considering internal dielectrics other than unity, we followed the scheme41laid out by Wang and Kollman [166]. The free energy of folding then becomes:∆Gfold =−T∆SV +1n∆E¯eleMM,V +∆E¯remMM,V +(∆Gapolarsolv,F −∆Gapolarsolv,U)+(∆Gn-80solv,F−∆Gn-80solv,U) (2.92)where n is the value used for the internal dielectric, ∆E¯eleMM,V is the molecular me-chanical (MM) electrostatic energy, and ∆E¯ remMM,V is the energy resulting from theremaining MM terms. This, with the ∆Gfold,S calculated with statistical weights,can be rearranged as before to yield ∆∆Gexpsolv.The temperature dependent εs was calculated with the following empirical re-lation [100]:ε = 87.740−0.40008T +9.398×10−4T 2−1.410×10−6T 3 (2.93)2.4 Entropy calculations2.4.1 Entropy calculation from trajectoriesIt is often assumed that the entropy of a system can be decomposed into transla-tional, rotational, and configurational components:S = Strans +Srot +Sconfig (2.94)There are two distinct approaches for calculating Sconfig from molecular simulationtrajectories: parametric methods, which assume a functional form for the probabil-ity density of conformational space f (x), and nonparametric methods.Quasiharmonic approximationThe quasiharmonic approximation (QHA) approach [3, 73, 91, 136] is a popularparametric method that begins by assuming that f (x) takes the form of a multivari-42ate normal distribution:f (x) =e−(x−〈x〉)TΣ−1(x−〈x〉)(2pi)k/2|Σ| 12(2.95)where Σ is the covariance matrix of the atomic positions and 〈x〉 is the averageposition vector. Elements of the eigenvector basis set of the mass weighted covari-ance matrix Σ′ = M12ΣM12 are known as quasiharmonic modes. The entropy ofthe system can be described by the sum of the one-dimensional quantum-harmonicoscillator entropies of these modes:Sho =3n−6∑iSho,i (2.96)Sho,i =kαeα −1− ln(1− e−α) (2.97)where α = h¯ωkT , and the frequencies ω are related to the eigenvectors of the quasi-harmonic modes λi:ωi =kTλi(2.98)It can be shown that Sho gives an upper bound of Sconfig [136].CC-MLAThe configurational entropy of a single molecule is sometimes further decomposedinto vibrational and pure conformational components:Sconfig = S¯vib +Sconform (2.99)The correlation corrected multibody local approximation (CC-MLA) entropy es-timator [150, 152] takes a nonparametric approach to give an upper bound forSconform (when calculating relative entropies, vibrational contributions will usuallycancel). Conformational space is first coarse grained to the level of torsion an-gles and then discretized. This is done by first estimating the density function ofeach torsion angle θ using a kernel density estimate approach. Specifically, a vonMises kernel, an approximation to the circular analogue of the normal distribution,43is used:fˆ (θ) = ∑Ni=1 evcos(θ−θi)2piNI0(v)(2.100)where I0 is the modified Bessel function of order 0, N is the number of confor-mations, θi is the ith realization of θ , and v is the inverse of the bandwidth, orsmoothing parameter of the density estimator (the analogue of the variance σ in anormal distribution), which can be determined using the von Mises-scale plug inrule. The m minima of this density function are then found through a grid searchand used to partition torsion angles into the set of intervals:{[θmin,1,θmin,2), . . . , [θmin,m−1,θmin,m), [θmin,m,θmin,1)}allowing the continuous random variable θ to be transformed into the discrete ran-dom variable A. If the probability mass function f (A ), where A = {A1, . . . ,AM}and M is the number of torsion angles, could be accurately estimated, then theentropy could be calculated as Shannon information entropy:Sconform(A ) =−kBN∑ f (A ) ln f (A ) (2.101)where there are N possible configurations of the system. However, this is usuallynot possible, and generally:Sconform(A ) 6=M∑iSconform(Ai) (2.102)because of correlations between the torsion angles. Instead, truncations of expan-sions based on mutual information are used as an approximation. The CC-MLAestimator takes the formSˆL = ∑Ai∈AS(Ai)+ [SL (A )−S′L (A )] (2.103)withSL (A ) =M∑i=1[S(Li)−S(Li−{Ai})] (2.104)44andLi = {A j ∈A | j ≥ i, d(Ai,A j) < R} (2.105)where d(Ai,A j) is the distance between the torsion angles Ai and A j, R is an ad-justable distance, and S is the Shannon information entropy (now feasibly calcu-lable with the given sets). S′L has the same form as SL but the ordered seriesproduced by the random variables Ai have been randomly and independently re-ordered. Basically, the estimator is set up so that only correlations between torsionangles that are within distance R are considered, as it is assumed that the contribu-tions of correlations between distant angles will be negligible. Further, the term inwhich the frames for individual torsions angles have been reordered is present toremove false correlations. They show that the method gives an upper bound, andso the best cutoff distance R can be selected by calculating the entropy at severalvalues for the system and selecting that which gives the minimum value.Additive joint entropiesThe method used by Baxa et al. [9] is referred to here as the additive joint entropymethod. The entropies of each residue are calculated with joint probability densityfunctions of the dihedral configurations and summed. For residues with two or lessside chain dihedrals χi, the entropy is calculated as:S =−kBN∑P(φ ,ψ,χ1,χ2) lnP(φ ,ψ,χ1,χ2) (2.106)where N is the total number of configurations of the dihedrals. P(φ ,ψ,χ1,χ2) isestimated by partitioning the dihedrals angles into equally sized bins and calculat-ing the associated probability mass function. For residues with more than two sidechain dihedrals, the entropy is calculated as:S =−kB[N∑P(φ ,ψ,χ1) lnP(φ ,ψ,χ1)+M∑P([χn]) lnP([χn])−m∑P(χ1) lnP(χ1)](2.107)where N, M, and m are the total number of configurations of the dihedrals of theassociated probability functions.452.4.2 Calculations detailsThe QHA entropies were calculated with the implementations in both the CHARMMVIBRAN and COOR modules. CENCALC [153] was used for the entropies calcu-lated with the CC-MLA method, while an in-house implementation of the approachof Baxa et al. [9] with a bin size of 10◦ was used for the alternative approach.2.5 Uncertainty estimates2.5.1 Block bootstrappingBootstrapping is a resampling technique that allows the distribution of a func-tional of an unknown probability distribution T (F) to be estimated. In an exper-iment, the random variable X ∼ F is observed multiple times to form a dataset,χ = {X1, . . . ,XN}. From this dataset, only a single instance of T (Fˆ), often thesample mean, can be observed. To create more data to calculate T (Fˆ) from, boot-strap samples of pseudo data, χ∗ = {X∗1 , . . . ,X∗i }, may be created by resamplingwith replacement from χ according to the empirical distribution function, X∗ ∼ Fˆ .The main assumption of the bootstrapping approach is to equate the distributionof T (Fˆ) to T (F), allowing interesting features of the distribution of T (F) to beestimated, often the standard error of the sample mean. However, this approachassumes Xi to be independently and identically distributed, which will destroythe correlation structure in the case of dependent data. To account for this, amoving block bootstrap [82] approach may be used to create the samples. Here,all blocks of l adjacent Xt are formed from the dataset, which is now the seriesχ = (X1, . . . ,XN), to create the set B = {(Xi, . . . ,Xi+l−1) | i ∈ [1,N− l+1]}. FromB, b blocks may be sampled and concatenated, where b = Nl and is rounded tothe nearest integer, to create the bootstrapped samples, χ∗ = (X∗1 , . . . ,X∗N). Variousmethods exist to optimize the block length, such as subsampling plus bootstrap-ping.More details can be found in the text Computational Statistics [49].462.5.2 Calculation detailsThe block length used for bootstrapping was set to be equal to the slowest impliedtimescale of the system calculated from the MSM. For all energy terms, the tra-jectory was resampled 1000 times, while for the entropies, the trajectory was onlyresampled 10 times for the CC-MLA method, and 100 times for the Baxa et al.method, due to the computational expense.47Chapter 3Results and discussion3.1 OverviewWe use states defined by MSMs to calculate ∆∆Gexpsolv and ∆∆Gimpsolv for two systems,finding the folded states to be overstabilized by the implicit solvent models. We ex-amine parameters in the nonpolar and polar terms of models decomposable in thisway to try and correct for this. We repeat the calculations on ten more systems,finding similar results. Several potential criticisms are then addressed, relating totransferability of the models, dependence of the results on the non-standard temper-atures of the simulations, accuracy of the entropy calculation method, and accuracyof the explicit solvent reference model.3.2 Calculation setup3.2.1 Defining statesA key step in our analysis was to create a robust definition of the folded and un-folded states. A simple way of defining these states is to select a root mean squaredeviation (RMSD) cutoff relative to an experimental structure. However, this in-troduces a certain amount of arbitrariness as to what constitutes folded and un-folded states. Therefore, we defined folded and unfolded states by using MSMs[120] built from the trajectory data (set Section 2.2.2 for details). The MSMs48were validated by examining the implied time scales (ITSs) (see Section 2.2.1 forexplanation), and by comparing MSM definitions with RMSDs from experimen-tally determined structures of the folded state. Figures 3.1a and 3.1d contain theITS plots that were calculated using the MD trajectories of the chignolin peptidevariant CLN025 (chignolin), a small β -hairpin protein (Figure A.1), and the K8Amutant of the thermostable Trp-cage variant TC10b (Trp-Cage), a small α-helicalprotein (Figure A.2, and see Table 2.1, Table 2.2, and Section 2.1.2 for system de-tails). The ITS plots for both systems show well resolved, slow implied timescalesthat appear to have converged, indicating both that the model is able to detect thefolding-unfolding transition, and that an optimal lagtime can be selected. In addi-tion, the RMSD/MSM-state superimposition plots in Figures 3.1c and 3.1f showthat conformations with low RMSDs correspond well with folded states defined bythe MSM. These results give us confidence that we can reliably identify the dom-inant states (particularly the folded and unfolded states) sampled in the explicitsolvent MD simulations.3.2.2 Checking convergenceA concern in our calculations was the convergence of the entropies. Indeed, cal-culation of entropies from trajectories has been an ongoing challenge in the field[47], with convergence being especially difficult to achieve relative to other values.In order to test for convergence of ∆SV, we plotted the entropy at various interme-diate time steps using a recent method known as the Correlation Corrected-LocalMultibody Approximation (CC-MLA) [152]. As seen in Figures 3.1b and 3.1e, theentropy appears to have converged for both the folded and unfolded states of chig-nolin and Trp-cage; it is then safe to also assume that the MM energy terms andimplicit solvent terms have converged [151]. A related concern was the calculationof uncertainties for energy and entropy terms. Here, we estimated all uncertaintieswith a block bootstrapping approach (see Section 2.5 for details).490.0 0.1 0.2 0.3 0.4 0.5Lagtime (µs)0.000.080.160.240.320.400.48ITS (µs)0 20 40 60Frames (100−1 )10.512.013.515.016.518.0TS (kcal/mol)0 20 40 60 80 100Time (µs)0.01.53.04.56.07.59.0RMSD ()0.0 0.1 0.2 0.3 0.4 0.5Lagtime (µs)0.00.51.01.52.02.53.0ITS (µs)0 30 60 90 120Frames (100−1 )15.017.520.022.525.0TS (kcal/mol)0 40 80 120 160 200Time (µs)0.02.55.07.510.012.5RMSD ()a) b) c)d) e) f)Figure 3.1: State and entropy calculations for chignolin and Trp-Cage. a, b,and c refer to chignolin while d, e, and f refer to Trp-cage. ITS plots ofthe slowest timescales are shown in a and d. Conformational entropiescalculated using the CC-MLA method with data up to a number of inter-mediate time points are shown for the folded state (brown) and unfoldedstate (purple) in b and e. RMSD/MSM-state superimposition plots areshown in c and f; the RMSDs relative to experimental structures of thefolded state are shown as purple points, and MSM-defined folded statesare shaded in brown.3.3 Expected and calculated ∆∆Gsolv3.3.1 Default settingsWe chose eight implicit solvent models for comparison: a finite difference PBmodel [113] (referred to here as PBEQ), GBMV2 [90], GBSW, FACTS [53],ACE [134], SCP [57], EEF1 [86], and ABSINTH [162]. Other than EEF1 andABSINTH, which provide effective energy functions (EEFs), the models consistof both a polar and an nonpolar term. For consistency, we elected to use a simplelinear SASA model with a surface tension coefficient λ = 0.015 kcal/A˚/mol forthe nonpolar term. Outside of the nonpolar term and system specific parameters(temperature or salt concentration), we used default and recommended parameters504856647280Chignolin405060708090Trp-cage45301501530BBA203040506070Villin153045607590WW domain90105120135150165NTL91501530BBL102030405060Protein BPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH30201001020HomeodomainPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH50250255075Protein GPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH4004080120160α3DPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH604530150a) b) c) d)e) f) g) h)i) j) k) l)λ-repressor∆∆Gsolv (kcal/mol)Figure 3.2: Changes in free energies of solvation upon folding with defaultsettings. Expected values are plotted in brown, implicit solvent modelvalues with default settings in purple, and implicit solvent model val-ues without the nonpolar term in yellow. The error bars represent thestandard error.for each model (further details can be found in Section 2.3). ∆∆Gexpsolv and ∆∆Gimpsolvof chignolin and Trp-cage are plotted in Figures 3.2a and 3.2b. Independent of thesolvent model used, ∆∆Gimpsolv is significantly lower than ∆∆Gexpsolv, with most modelsproducing substantial differences relative to the magnitude of ∆Gfold,S. These re-sults imply that the implicit solvent models overstabilize the folded state relative tothe unfolded state. The EEFs, ABSINTH and EEF1 provided ∆∆Gimpsolv values thatcome closest to the ∆∆Gexpsolv values for both systems.513.3.2 Examining the nonpolar termIt is well known that the nonpolar term in decomposable solvent models representsa crude approximation for large biomolecules [88, 92, 163, 164]. Therefore, wetested whether λ could be changed to improve the results. As it is generally ac-cepted that the nonpolar term favours the folded states of proteins [6, 7, 23, 102],and as our results indicated that the implicit solvent models were overstabilizingthe folded state, we tried decreasing the value of λ . However, as can be seen in Fig-ures 3.2a and 3.2b, even setting it to zero still leaves a large discrepancy between∆∆Gexpsolv and ∆∆Gimpsolv. Further testing with more involved models of the nonpolarterm would then provide little insight, as the error due to the nonpolar term cannotbe separated from the error due to the polar term. We did not fail to notice thatnegative values could be used to further close the gap – in fact, it is possible toprovide surprisingly good agreement for all models in both systems by selecting avalue for λ through visual inspection (Figure A.3) – but we consider the necessityof a nonphysical parameter an indication of issues with the polar term, rather thana solution to the revealed discrepancies.3.3.3 Examining the polar termWithout the nonpolar term, ∆∆Gimpsolv should be greater than ∆∆Gexpsolv, allowing pos-itive differences between ∆∆Gexpsolv and ∆∆Gimpsolv to be attributed to error in the polarterm. While each model for the polar term considered here has several uniqueparameters, there are a few common parameters corresponding to physical quan-tities that may be examined systematically. As several of the simulations wererun with non-zero salt concentrations, we first examined how large an effect thecorrections for these had on the results. Besides PBEQ, three of the GB basedmodels, GBMV2, GBSW, and FACTS, are able to account for salt effects by useof a Debye-Huckel screening factor. We reran the calculations without correc-tions; however, it was found that the contribution was unsubstantial (Figures A.5aand A.5b). Next, we considered whether alternative internal, or solute, dielectricconstants εp would improve the results. We recalculated ∆∆Gexpsolv and ∆∆Gimpsolv formodels with adjustable εp, PBEQ, GBSW, and ACE, without the nonpolar termusing εp values of 2, 4, 8, 20, and 40 (see Section 2.3.4 for details of calculations).521 2 4 8 20 400510152025Chignolin1 2 4 8 20 400816243240Trp-cage1 2 4 8 20 400510152025BBA1 2 4 8 20 4008162432Villin1 2 4 8 20 400816243240WW domain1 2 4 8 20 4008162432NTL91 2 4 8 20 400510152025BBL1 2 4 8 20 4008162432Protein B1 2 4 8 20 4003691215Homeodomain1 2 4 8 20 40020406080100Protein G1 2 4 8 20 400255075100125α3D1 2 4 8 20 4020100102030a) b) c) d)e) f) g) h)i) j) k) l)λ-repressor∆∆∆Gsolv (kcal/mol)²pFigure 3.3: Differences between expected and implicit solvent model free en-ergies of solvation upon folding (∆∆Gexpsolv- ∆∆Gimpsolv) for various internaldielectrics. PBEQ values are plotted in purple, ACE model values inyellow, and GBSW model values in blue. The nonpolar term is not in-cluded for either implicit solvent model. The dielectrics are plotted ona logarthimic scale.The difference between ∆∆Gexpsolv and ∆∆Gimpsolv decreases substantially as εp is in-creased (Figures A.4 and 3.3, a and b), although the decrease is still not enough tomake them agree within uncertainty.When running simulations with current implementations of these models, it isonly possible to use a constant or distance-dependent value for the internal dielec-tric constant. However, state dependent εp values can be used when post-processingtrajectories. Therefore, we tested whether a specific combination of state depen-531 2 4 8 204012482040Chignolin1 2 4 8 204012482040Trp-cage1 2 4 8 204012482040BBA1 2 4 8 204012482040Villin1 2 4 8 204012482040WW domain1 2 4 8 204012482040NTL91 2 4 8 204012482040BBL1 2 4 8 204012482040Protein B1 2 4 8 204012482040Homeodomain1 2 4 8 204012482040Protein G1 2 4 8 204012482040α3D1 2 4 8 204012482040a) b) c) d)e) f) g) h)i) j) k) l)λ-repressor504030201001020304050²F p²UpFigure 3.4: Differences between expected and PBEQ free energies of sol-vation upon folding (∆∆Gexpsolv- ∆∆Gimpsolv) for state specific internal di-electrics. εFi and εUi are the internal dielectrics used for the folded andunfolded state, respectively. The color bar is scaled between -50 kcal/-mol and 50 kcal/mol to allow finer differentiation of more productivecombinations. The nonpolar term was not used in the calculations.dent εp values would provide consistently better agreement between ∆∆Gexpsolv and∆∆Gimpsolv. We calculated the differences between ∆∆Gexpsolv and ∆∆Gimpsolv for differentcombinations of εp for the folded and unfolded states (Figures 3.4a and 3.4b). Inboth cases, the discrepancy is almost entirely eliminated with a εp of 40 for thefolded state and 20 for the unfolded state.Warshel and coworkers have pointed out that εp is better thought of as a phe-nomenological, position dependent scaling constant rather than a physical constant,54partly based on their findings that optimal values for εp are both model and systemdependent [137, 167]. The dielectric constant of a material can be measured exper-imentally; however, in the case of proteins, only values for dry powders have beenmeasured, the relation of which to a solvated protein dielectric constants is notclear. Fortunately, the connection between this macroscopic concept to statisticalmechanics has been made, with theory giving a positional and directional relation-ship between the dielectric constant and the fluctuation of dipoles [52], allowing itsdetermination in the relevant context. εp, on the other hand, is perhaps best thoughtof as a coefficient implicitly representing the dielectric effects of the degrees offreedom that have been excluded from the model, implying that it should only beset to one if all relevant degrees of freedom of the protein and the solvent are rep-resented. As detail is removed (moving to a nonpolarizable model, representingthe solvent implicitly, etc.), the value must be increased to implicitly account fortheir effects, found to be the case in investigations of Schutz and Warshel [137].Here, then, there is justification for higher values of εp giving better agreement with∆∆Gexpsolv. The results of the state specific dielectric calculations seem to imply thatthe effects the solvent has on the fluctuations of dipoles are stronger in the foldedstate than the unfolded state.The atomic radii have been the target of several optimization attempts, oftenlinked with changes to the definition of the protein-solvent boundary. The radiiused for PBEQ are based on optimized CHARMM22 VDW radii for use with asimple overlapping atomic radii surface definition [112]; modifications were foundnecessary when the surface definition was updated to include a smoothed transi-tion region for the dielectric between the protein and solvent values [113]. Theradii used by GBSW, which defines the boundary in a similar manner, are also theNina et al. [112] set; it was later found that modifications were necessary, here tocorrectly balance intra-protien interactions with protein-solvent interactions, whilea subsequent study introduced a molecular surface definition with associated radii[21]. We recalculated ∆∆Gimpsolv with these updated radii sets, as well as with theoriginal CHARMM22 radii to test their effect under our measure. Interestingly,Figures 3.5a and 3.5b show the updated radii sets and molecular surface definitionsgive slightly worse results then the CHARMM22 radii, although the effect wasnot substantial. The optimization of the radii with GBSW was carried out along-55side optimization of the CMAP parameters, so we reran the calculations with theCHARMM22/CMAP forcefield with these update parameters. While they reducethe increasing gap with newer radii, they do not lead to any overall improvement.3.3.4 Repeating with all systemsWe then considered whether these results were specific to the systems we tested.Accordingly, the analysis was repeated on the ten other systems simulated in Lindorff-Larsen et al.’s study [94]. The MSMs were validated and the convergence of theterms checked as before (Figures A.6-A.15). In most cases, the implicit solventmodels have significantly, and in many cases, substantially lower values of ∆∆Gimpsolvthen ∆∆Gexpsolv (Figure 3.2, also see Tables A.1 and A.2 for exact values). As before,we tried setting λ to zero, removing salt corrections, increasing the εp values forPBEQ ACE, and testing updated radii and surface definitions; similar trends arefound across the systems (Figures 3.2, A.5, A.4, and 3.3). Using state-specific εpvalues (Figure 3.4) revealed that only one additional system (Figure 3.4e) was ableto be improved with the tested combinations; clearly, though, there exists someset of εp with higher values for the folded state that will bring full agreement foreach system. Importantly, no universal εp will minimize the discrepancies between∆∆Gexpsolv and ∆∆Gimpsolv for all systems, consistent with the arguments of Warshel andcoworkers that the idea of εp as a universal constant for protein systems is incorrect[137], and, besides being highly dependent on the model used, is dependent on thesystem itself.3.4 Potential criticisms3.4.1 Transferability of modelsOur findings may be tempered by criticisms of the validity of our method for test-ing the accuracy of the models. The first concerns the transferability of the models,relevant in two contexts: whether it is appropriate to apply the models to trajec-tories created with explicit solvent, and whether it is appropriate to consider themodels without the exact settings they were parameterized under. It is possiblethat some of the structures produced by the explicit solvent simulations are much564856647280Chignolin405060708090Trp-cage301501530BBA1530456075Villin153045607590WW domain80100120140160NTL930150153045BBL01530456075Protein BPBEQ-P22PBEQ-N99GBSW-P22GBSW-N97GBSW-C06GBSW-C1030201001020HomeodomainPBEQ-P22PBEQ-N99GBSW-P22GBSW-N97GBSW-C06GBSW-C1050250255075Protein GPBEQ-P22PBEQ-N99GBSW-P22GBSW-N97GBSW-C06GBSW-C104004080120160α3DPBEQ-P22PBEQ-N99GBSW-P22GBSW-N97GBSW-C06GBSW-C1075604530150a) b) c) d)e) f) g) h)i) j) k) l) λ-repressor∆∆Gsolv (kcal/mol)Figure 3.5: Changes in free energies of solvation upon folding with alterna-tive radii sets and surface definitions. Expected values are plotted inbrown and implicit solvent models with alternate radii in purple. P22refers to the CHARMM22 radii set, N97 to the Nina et al. [112] radiiset, N99 to the [113] modified set, C06 refers to the Chen et al. [24]radii set, and C10 to the Chen [21] radii set and associate molecularsurface definition. For GBSW-C06 and GBSW-C10, the calculationswere repeated with the modified CMAP potentials, with the new refer-ence values as brown points and the new solvent model values in yellow.Note that half as many frames were used in the PBEQ-P22 calculations,leading to higher uncertainties.57less favourable under an implicit solvent model, but that subtle structural changescould lead to a return to similar stability. To test for such artefacts, we minimizedthe trajectories with the implicit solvent models before running the analysis; how-ever, the results are not significantly affected (Figure A.16).It is possible that fortuitous cancellation of errors occurs when the models areused with the same forcefield they were parameterized with. All the models wereparameterized with forcefields other than the one used here, and in the case ofthose models decomposable into polar and nonpolar components, with a differentnonpolar model. Both of these points have already been implicitly addressed inour calculations: the latter with the demonstration that a large fraction of the errorcan be pinned on the polar term, and the former with the calculations of GBSWshowing small but mostly insubstantial improvement when including the coparam-eterized CMAP terms with the updated radii. To more thoroughly investigate thesecond point, we repeated the analysis with the forcefield and settings used in theparameterization of each of the solvent models implemented in CHARMM. Ascan be seen in Figure 3.6, the changes were generally insignificant, except for sev-eral cases in which EEF1 does significantly worse. In the same figure, we includedcalculations for a reparameterization of EEF1 for CHARMM36 carried out by Bot-taro et al. [13]. Considering they used an approach that allows the parameters tobe optimized by matching the entire ensembles of two small structure peptides, itis somewhat surprising that the new parameter set provides little improvement.3.4.2 Temperature effectsImplicit solvent models are usually designed to reproduce ∆Gsolv values at roomtemperature, yet many of the simulations used in the study were carried out at tem-peratures upwards of 340 K. There is evidence that the higher temperatures useddo not invalidate the study. Both Trp-cage and BBL were simulated at room tem-perature, but display no better agreement among ∆∆Gexpsolv and ∆∆Gimpsolv. In the caseof EEF models, temperature effects are able to be taken into account by a relationof the reference ∆Gsolv values of the model compounds to temperature that onlyrequires the enthalpies of solvation and heat capacities of the model compounds atthe reference temperature (see Section 2.3.3 for details). EEF1 uses this relation by58020406080Chignolin20406080100Trp-cage50250255075BBA01530456075Villin153045607590WW domain04080120160NTL9200204060BBL01530456075Protein BPBEQGBMV2GBSWFACTSACE2SCPEEF1EEF1-SB45301501530HomeodomainPBEQGBMV2GBSWFACTSACE2SCPEEF1EEF1-SB50250255075Protein GPBEQGBMV2GBSWFACTSACE2SCPEEF1EEF1-SB60060120180240α3DPBEQGBMV2GBSWFACTSACE2SCPEEF1EEF1-SB75502502550a) b) c) d)e) f) g) h)i) j) k) l)λ-repressor∆∆Gsolv (kcal/mol)Figure 3.6: Changes in free energies of solvation upon folding with differ-ent forcefields. Expected values are plotted in brown, implicit solventmodel values with the default values in purple, and implicit solventmodel values with the forcefields they were parameterized with in yel-low. GBMV, GBSW, ACE, SCP, and EEF1 used CHARMM22, FACTSCHARMM22/CMAP, and EEF1-SB CHARMM36.default in its implementation in CHARMM, so we had also used the temperaturecorrection of ABSINTH in Figure 3.2 with the parameters of Wuttke et al. [169]for consistency. Recalculating without these corrections revealed they made littledifference to the results (Figure A.17). Few attempts have been made to providetemperature corrections for models with separable polar and nonpolar components;however, one example is provided by Elcock and McCammon [36], who used atemperature dependent εs and atomic radii scaled by fitted polynomials. Here, we59tried only the temperature dependent solvent dielectric, using the empirical relationderived by Malmberg and Maryott [100] (see Section 2.3.4 for details). However,as seen in Figure A.17, this correction made almost no change.3.4.3 Entropy calculationAs mentioned previously, entropy has been one of the most difficult quantities tocalculate directly from trajectories, and a consensus has not been reached on thebest approach. To test whether the results are dependent on the method used tocalculate the entropy, we tried using two other methods in the analysis. We firsttested the QHA approach, a classic method commonly employed in MM-PBSAstudies [60]. However, the results were completely unreasonable, giving entropydifferences on the order of thousands of kJ/mol; this is likely related to assum-ing a Gaussian distribution for the unfolded state. A previous study also foundthe approach to be inaccurate [20]. For a second approach, we selected a rela-tively simple method used in a recently published paper [9], which we refer to asthe additive joint entropy method (see Section 2.4.1 for details). Given that thejoint entropy method was used on trajectories with NMR restraints to create moreaccurate ensembles, we emphasize that here we are interested only in the confor-mational entropy changes of these trajectories, not in the actual entropy changesupon folding of the real proteins. The alternative entropies often differed by upto a factor of two (Table A.1), and did generally lead to better agreement between∆∆Gexpsolv and ∆∆Gimpsolv (Figure 3.7), in a few systems leading to agreement withinuncertainty. However, in the majority of cases there is still substantial discrepancy,and we put more weight on the results calculate with CC-MLA as it more carefullydeals with correlation effects.3.4.4 Accuracy of explicit solvent modelBecause we are using trajectories resultant from explicit solvent simulations of thesystems, the water model employed, TIP3P, is effectively acting as the reference.If the water model is inaccurate, the conformational weights produced in simula-tion will also be inaccurate, putting our analysis in question. To reiterate a pointtouched on in Section 1.4, it is only inaccuracies caused by the water model that60485664728088Chignolin405060708090Trp-cage45301501530BBA203040506070Villin153045607590WW domain90105120135150165NTL91501530BBL102030405060Protein BPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH30201001020HomeodomainPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH50250255075Protein GPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH4004080120160α3DPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH604530150a) b) c) d)e) f) g) h)i) j) k) l)λ-repressor∆∆Gsolv (kcal/mol)Figure 3.7: Changes in free energies of solvation upon folding with additivejoint entropies. Expected values with CC-MLA entropies are plottedin brown, expected values with additive joint entropies in blue, implicitsolvent model values with default settings in purple, and implicit sol-vent model values without the nonpolar term in yellow. The error barsrepresent the standard errors.are relevant: if the forcefield produces a poor representation of conformationalspace, the water models can still correctly solvate these incorrectly weighted con-formations. At least for small compounds, studies that compared experimental freeenergies of hydration against those calculated using TIP3P show it to be relativelyaccurate in this regard [104, 105, 111]. While we have considered this a poor testof the accuracy of an implicit solvent model, because an explicit solvent model hasfewer approximations involved, it can be expected to be more transferable. Further,61explicit solvent models have been successful in predicting various structural, ther-modynamic, and kinetic properties of both structured [94] and unstructured [95]proteins.Still, the relatively simple TIP3P is certainly not perfect. Estimating the mag-nitude of the associated uncertainty in our analysis is nontrivial. As a first approx-imation, we assumed that only the weights of the folded and unfolded states as awhole were in question, and recalculated ∆∆Gexpsolv with ∆Gfold,S values correspond-ing to folded to unfolded ratios of 0.01 and 99. Figure 3.8 depicts that, even withsuch extreme values for the ratios, the changes are generally small compared tothe disparity between ∆∆Gexpsolv and ∆∆Gimpsolv. This crude calculation also illustratesa key challenge that implicit solvent models must address: ∆∆Gimpsolv must be pre-cisely balanced with the other terms in order to recapitulate the folding free energy,which is often two orders of magnitude smaller. Perhaps the most important pointhere is not the accuracy of the explicit solvent models, but whether implicit solventmodels can be expected to better represent solvation effects. As simulations runwith explicit solvent models figure prominently into the parameterization of manyimplicit solvent models (see Section 1.3, it is likely they will reflect errors presentin explicit models.62485664728088Chignolin405060708090Trp-cage45301501530BBA203040506070Villin153045607590WW domain90105120135150165NTL91501530BBL01530456075Protein BPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH30201001020HomeodomainPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH50250255075Protein GPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH4004080120160α3DPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH604530150a) b) c) d)e) f) g) h)i) j) k) l)λ-repressor∆∆Gsolv (kcal/mol)Figure 3.8: Changes in free energies of solvation upon folding with range offolded to unfolded ratios. Expected values with free energies of foldingcorresponding to folded to unfolded ratios ranging from 0.01 to 99 areplotted in brown, implicit solvent model values with default settings inpurple, and implicit solvent model values without the nonpolar term inyellow. The error bars represent the standard errors.63Chapter 4ConclusionOur results reveal significant and substantial discrepancies between the changesin free energies of solvation upon folding calculated with popular implicit solventmodels used in simulations of large biomolecules and the expected free energychanges (i.e., those that are consistent with the conformational sampling observedin explicit solvent simulations). For implicit solvent models that relate the entire∆Gsolv to solvent accessibility, the discrepancy was often found to be the smallest.Somewhat paradoxically, the reason for this may stem from these models havingless clearly defined physical origins: they lack the highly parameter sensitive termsof PB-based models, making them more robust and perhaps explaining their suc-cessful use in modeling and design of large biomolecules (an EEF1-based implicitsolvent models used by Rosetta [31, 130]). PB- and GB-based models have alsobeen used successfully in various applications, including ab initio protein folding,protein modeling and design, and prediction of changes in free energies of foldingupon single residue mutations [35, 109, 131, 154, 173]. Nevertheless, concernsabout the use of a universal protein dielectric constant in the latter models havebeen raised, and recent studies have also questioned the validity of the assump-tion of a positive, unvarying surface tension coefficient for models of the nonpolarcomponent of solvation [55, 63, 79, 80]. Although a negative λ provided improvedagreement between ∆∆Gexpsolv and ∆∆Gimpsolv, we consider it an indication of the limi-tation of the models, rather than a solution to the revealed discrepancies.Further investigations are needed to understand the source of these discrepan-64cies. A first step may be to partition the unfolded state along selected structuralmeasures (e.g., the radius of gyration or number of salt bridges) and test whether itis possible to reweight them in such a way as to consistently provide better agree-ment across the models and systems. This could give insight into the types ofconformations and interactions that contribute to the discrepancies seen here. Asecond step may be to run simulations of these systems with the tested models andcompare the resulting ensembles, with a focus on examining the differences in theunfolded state. This would also provide us with an opportunity to further validatethe entropy methods: because we know the solvation term exactly, we can calcu-late an expected conformational entropy and compare to various entropy methods.More ambitious investigations might attempt to optimize the atomic radii to givebetter agreement, perhaps with a genetic algorithm or with the ensemble matchingmethod of Bottaro et al. [13].Overall, our calculations underline the need for improved, more robust ap-proaches for modeling solvation effects in proteins undergoing large-scale con-formational changes. With the rise of advanced sampling methods [11, 85, 178],hardware designed for molecular simulation [11, 85, 143], and GPU accelration[85, 148], implicit solvent models may seem to lose some of their desirability.However, the boundaries of simulation time- and spacial-scales will always bepushed to their limits by researchers asking questions on the edge of knowledge,and it is here that they will continue to stay relevant. Further, even if their appli-cation in calculations becomes obsolete, the ability to give accurate, quantitativee¨xplicitd¨escriptions of the effects of solvation on protein free energy landscapeswould provide insight into a fundamental biochemical concept, a goal worthy onits own merits.65Bibliography[1] J. R. Allison, P. Varnai, C. M. Dobson, and M. Vendruscolo. Determinationof the free energy landscape of -synuclein using spin label nuclearmagnetic resonance measurements. Journal of the American ChemicalSociety, 131(51):18314–18326, 2009. doi:10.1021/ja904716h. URLhttp://pubs.acs.org/doi/abs/10.1021/ja904716h. → pages 3[2] J. R. Allison, K. Boguslawski, F. Fraternali, and W. F. van Gunsteren. Arefined, efficient mean solvation force model that includes the interiorvolume contribution. The Journal of Physical Chemistry B, 115(15):4547–4557, 2011. doi:10.1021/jp2017117. URLhttp://dx.doi.org/10.1021/jp2017117. → pages 6[3] I. Andricioaei and M. Karplus. On the calculation of entropy fromcovariance matrices of the atomic fluctuations. The Journal of ChemicalPhysics, 115(14):6289–6292, 2001.doi:http://dx.doi.org/10.1063/1.1401821. URLhttp://scitation.aip.org/content/aip/journal/jcp/115/14/10.1063/1.1401821.→ pages 42[4] N. A. Baker. Poisson-boltzmann methods for biomolecular electrostatics.In L. Brand and M. L. Johnson, editors, Numerical Computer Methods,Part D, volume 383 of Methods in Enzymology, pages 94 – 118. AcademicPress, 2004. doi:http://dx.doi.org/10.1016/S0076-6879(04)83005-2. URLhttp://www.sciencedirect.com/science/article/pii/S0076687904830052. →pages 21, 2766[5] N. A. Baker. Improving implicit solvent simulations: a poisson-centricview. Current Opinion in Structural Biology, 15(2):137 – 143, 2005. ISSN0959-440X. doi:http://dx.doi.org/10.1016/j.sbi.2005.02.001. URLhttp://www.sciencedirect.com/science/article/pii/S0959440X05000448. →pages 5[6] R. L. Baldwin. Energetics of protein folding. Journal of MolecularBIology, 371(2):283–301, AUG 10 2007. ISSN 0022-2836.doi:10.1016/j.jmb.2007.05.078. → pages 3, 52[7] P. Ball. Water as an active constituent in cell biology. Chemical Reviews,108(1):74–108, 2008. doi:10.1021/cr068037a. URLhttp://dx.doi.org/10.1021/cr068037a. → pages 3, 52[8] D. Bashford and D. A. Case. Generalized born models of macromolecularsolvation effects. Annual Review of Physical Chemistry, 51(1):129–152,2000. doi:10.1146/annurev.physchem.51.1.129. URLhttp://dx.doi.org/10.1146/annurev.physchem.51.1.129. → pages 5, 23, 25[9] M. C. Baxa, E. J. Haddadian, J. M. Jumper, K. F. Freed, and T. R. Sosnick.Loss of conformational entropy in protein folding calculated using realisticensembles and its implications for nmr-based calculations. Proceedings ofthe National Academy of Sciences, 111(43):15396–15401, 2014.doi:10.1073/pnas.1407768111. URLhttp://www.pnas.org/content/111/43/15396.abstract. → pages 45, 46, 60[10] D. Beglov and B. Roux. An integral equation to describe the solvation ofpolar molecules in liquid water. The Journal of Physical Chemistry B, 101(39):7821–7826, 1997. doi:10.1021/jp971083h. URLhttp://pubs.acs.org/doi/abs/10.1021/jp971083h. → pages 4[11] R. B. Best. Atomistic molecular simulations of protein folding. CurrentOpinion in Structural Biology, 22(1):52 – 61, 2012. ISSN 0959-440X.doi:http://dx.doi.org/10.1016/j.sbi.2011.12.001. URLhttp://www.sciencedirect.com/science/article/pii/S0959440X11002041. →pages 4, 6567[12] R. B. Best, X. Zhu, J. Shim, P. E. M. Lopes, J. Mittal, M. Feig, and A. D.MacKerell. Optimization of the additive charmm all-atom protein forcefield targeting improved sampling of the backbone , and side-chain 1 and 2dihedral angles. Journal of Chemical Theory and Computation, 8(9):3257–3273, 2012. doi:10.1021/ct300400x. URLhttp://dx.doi.org/10.1021/ct300400x. → pages 38[13] S. Bottaro, K. Lindorff-Larsen, and R. B. Best. Variational optimization ofan all-atom implicit solvent force field to match explicit solvent simulationdata. Journal of Chemical Theory and Computation, 9(12):5641–5652,2013. doi:10.1021/ct400730n. URL http://dx.doi.org/10.1021/ct400730n.→ pages 6, 38, 58, 65[14] G. R. Bowman, V. S. Pande, and F. No, editors. Estimation and Validationof Markov Models, volume 797 of Advances in Experimental Medicine andBiology. Springer Netherlands, 2014. ISBN 978-94-007-7605-0.doi:10.1007/978-94-007-7606-7 4. URLhttp://dx.doi.org/10.1007/978-94-007-7606-7 4. → pages 17[15] B. R. Brooks, C. L. Brooks, A. D. Mackerell, L. Nilsson, R. J. Petrella,B. Roux, Y. Won, G. Archontis, C. Bartels, S. Boresch, A. Caflisch,L. Caves, Q. Cui, A. R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodoscek,W. Im, K. Kuczera, T. Lazaridis, J. Ma, V. Ovchinnikov, E. Paci, R. W.Pastor, C. B. Post, J. Z. Pu, M. Schaefer, B. Tidor, R. M. Venable, H. L.Woodcock, X. Wu, W. Yang, D. M. York, and M. Karplus. Charmm: Thebiomolecular simulation program. Journal of Computational Chemistry, 30(10):1545–1614, 2009. ISSN 1096-987X. doi:10.1002/jcc.21287. URLhttp://dx.doi.org/10.1002/jcc.21287. → pages 40[16] B. D. Bursulaya and C. L. Brooks. Comparative study of the folding freeenergy landscape of a three-stranded -sheet protein with explicit andimplicit solvent models. The Journal of Physical Chemistry B, 104(51):12378–12383, 2000. doi:10.1021/jp0027602. URLhttp://dx.doi.org/10.1021/jp0027602. → pages 768[17] N. Calimet, M. Schaefer, and T. Simonson. Protein molecular dynamicswith the generalized born/ace solvent model. Proteins: Structure,Function, and Bioinformatics, 45(2):144–158, 2001. ISSN 1097-0134.doi:10.1002/prot.1134. URL http://dx.doi.org/10.1002/prot.1134. → pages34[18] N. Carrascal and D. F. Green. Energetic decomposition with thegeneralized-born and poissonboltzmann solvent models: Lessons fromassociation of g-protein components. The Journal of Physical Chemistry B,114(15):5096–5116, 2010. doi:10.1021/jp910540z. URLhttp://dx.doi.org/10.1021/jp910540z. → pages 7[19] D. Chandler. Interfaces and the driving force of hydrophobic assembly.NATURE, 437(7059):640–647, SEP 29 2005. ISSN 0028-0836.doi:10.1038/nature04162. → pages 5[20] C.-E. Chang, W. Chen, and M. K. Gilson. Evaluating the accuracy of thequasiharmonic approximation. Journal of Chemical Theory andComputation, 1(5):1017–1028, 2005. doi:10.1021/ct0500904. URLhttp://pubs.acs.org/doi/abs/10.1021/ct0500904. → pages 60[21] J. Chen. Effective approximation of molecular volume using atom-centereddielectric functions in generalized born models. Journal of ChemicalTheory and Computation, 6(9):2790–2803, 2010. doi:10.1021/ct100251y.URL http://dx.doi.org/10.1021/ct100251y. → pages 6, 7, 30, 31, 55, 57[22] J. Chen and C. L. Brooks. Critical importance of length-scale dependencein implicit modeling of hydrophobic interactions. Journal of the AmericanChemical Society, 129(9):2444–2445, 2007. doi:10.1021/ja068383+. URLhttp://dx.doi.org/10.1021/ja068383+. → pages 5[23] J. Chen and C. L. Brooks III. Implicit modeling of nonpolar solvation forsimulating protein folding and conformational transitions. Phys. Chem.Chem. Phys., 10:471–481, 2008. doi:10.1039/B714141F. URLhttp://dx.doi.org/10.1039/B714141F. → pages 5, 5269[24] J. Chen, W. Im, and C. L. Brooks. Balancing solvation and intramolecularinteractions: toward a consistent generalized born force field. Journal ofthe American Chemical Society, 128(11):3728–3736, 2006.doi:10.1021/ja057216r. URL http://dx.doi.org/10.1021/ja057216r. →pages 6, 7, 57[25] J. Chen, C. L. B. III, and J. Khandogin. Recent advances in implicitsolvent-based methods for biomolecular simulations. Current Opinion inStructural Biology, 18(2):140 – 148, 2008. ISSN 0959-440X.doi:http://dx.doi.org/10.1016/j.sbi.2008.01.003. URLhttp://www.sciencedirect.com/science/article/pii/S0959440X08000079. →pages 4[26] Z. Chen, N. A. Baker, and G. Wei. Differential geometry based solvationmodel i: Eulerian formulation. Journal of Computational Physics, 229(22):8231 – 8258, 2010. ISSN 0021-9991.doi:http://dx.doi.org/10.1016/j.jcp.2010.06.036. URLhttp://www.sciencedirect.com/science/article/pii/S0021999110003517. →pages 5[27] Z. Chen, N. Baker, and G. Wei. Differential geometry based solvationmodel ii: Lagrangian formulation. Journal of Mathematical Biology, 63(6):1139–1200, 2011. ISSN 0303-6812. doi:10.1007/s00285-011-0402-z.URL http://dx.doi.org/10.1007/s00285-011-0402-z. → pages[28] Z. Chen, S. Zhao, J. Chun, D. G. Thomas, N. A. Baker, P. W. Bates, andG. W. Wei. Variational approach for nonpolar solvation analysis. TheJournal of Chemical Physics, 137(8):084101, 2012.doi:http://dx.doi.org/10.1063/1.4745084. URLhttp://scitation.aip.org/content/aip/journal/jcp/137/8/10.1063/1.4745084.→ pages 5[29] H. Choi, H. Kang, and H. Park. Extended solvent-contact model for proteinsolvation: Test cases for dipeptides. Journal of Molecular Graphics andModelling, 42(0):50 – 59, 2013. ISSN 1093-3263.70doi:http://dx.doi.org/10.1016/j.jmgm.2013.02.006. URLhttp://www.sciencedirect.com/science/article/pii/S1093326313000405. →pages 6, 7[30] C. J. Cramer and D. G. Truhlar. Implicit solvation models: equilibria,structure, spectra, and dynamics. Chemical Reviews, 99(8):2161–2200,1999. doi:10.1021/cr960149m. URL http://dx.doi.org/10.1021/cr960149m.→ pages 4[31] R. Das and D. Baker. Macromolecular modeling with rosetta. AnnualReview of Biochemistry, 77(1):363–382, 2008.doi:10.1146/annurev.biochem.77.062906.171838. URLhttp://dx.doi.org/10.1146/annurev.biochem.77.062906.171838. → pages64[32] K. T. Debiec, A. M. Gronenborn, and L. T. Chong. Evaluating the strengthof salt bridges: A comparison of current biomolecular force fields. TheJournal of Physical Chemistry B, 118(24):6561–6569, 2014.doi:10.1021/jp500958r. URL http://dx.doi.org/10.1021/jp500958r. →pages 7[33] K. A. Dill. Additivity principles in biochemistry. Journal of BiologicalChemistry, 272(2):701–704, 1997. doi:10.1074/jbc.272.2.701. URLhttp://www.jbc.org/content/272/2/701.short. → pages 7[34] K. A. Dill, T. M. Truskett, V. Vlachy, and B. Hribar-Lee. Modeling water,the hydrophobic effect, and ion solvation. Annual Review of Biophysicsand Biomolecular Structure, 34(1):173–199, 2005.doi:10.1146/annurev.biophys.34.040204.144517. URLhttp://dx.doi.org/10.1146/annurev.biophys.34.040204.144517. → pages 3[35] F. Dong, B. Olsen, and N. A. Baker. Computational methods forbiomolecular electrostatics. In D. J. J. Correia and I. Dr. H.William Detrich, editors, Biophysical Tools for Biologists, Volume One: InVitro Techniques, volume 84 of Methods in Cell Biology, pages 843 – 870.Academic Press, 2008.71doi:http://dx.doi.org/10.1016/S0091-679X(07)84026-X. URLhttp://www.sciencedirect.com/science/article/pii/S0091679X0784026X. →pages 4, 64[36] A. H. Elcock and J. A. McCammon. Continuum solvation model forstudying protein hydration thermodynamics at high temperatures. TheJournal of Physical Chemistry B, 101(46):9624–9634, 1997.doi:10.1021/jp971903q. URL http://dx.doi.org/10.1021/jp971903q. →pages 59[37] H. Fan, A. E. Mark, J. Zhu, and B. Honig. Comparative study ofgeneralized born models: Protein dynamics. Proceedings of the NationalAcademy of Sciences of the United States of America, 102(19):6760–6764,2005. doi:10.1073/pnas.0408857102. URLhttp://www.pnas.org/content/102/19/6760.abstract. → pages 7, 8[38] M. Feig. Kinetics from implicit solvent simulations of biomolecules as afunction of viscosity. Journal of Chemical Theory and Computation, 3(5):1734–1748, 2007. doi:10.1021/ct7000705. URLhttp://dx.doi.org/10.1021/ct7000705. → pages 7[39] M. Feig and C. L. B. III. Recent advances in the development andapplication of implicit solvent models in biomolecule simulations. CurrentOpinion in Structural Biology, 14(2):217 – 224, 2004. ISSN 0959-440X.doi:http://dx.doi.org/10.1016/j.sbi.2004.03.009. URLhttp://www.sciencedirect.com/science/article/pii/S0959440X04000430. →pages 4[40] M. Feig and Y. Sugita. Reaching new levels of realism in modelingbiological macromolecules in cellular environments. Journal of MolecularGraphics and Modelling, 45(0):144 – 156, 2013. ISSN 1093-3263.doi:http://dx.doi.org/10.1016/j.jmgm.2013.08.017. URLhttp://www.sciencedirect.com/science/article/pii/S1093326313001447. →pages 2, 372[41] M. Feig, A. Onufriev, M. S. Lee, W. Im, D. A. Case, and C. L. Brooks.Performance comparison of generalized born and poisson methods in thecalculation of electrostatic solvation energies for protein structures.Journal of Computational Chemistry, 25(2):265–284, 2004. ISSN1096-987X. doi:10.1002/jcc.10378. URLhttp://dx.doi.org/10.1002/jcc.10378. → pages 7[42] P. Ferrara, J. Apostolakis, and A. Caflisch. Evaluation of a fast implicitsolvent model for molecular dynamics simulations. Proteins: Structure,Function, and Bioinformatics, 46(1):24–33, 2002. ISSN 1097-0134.doi:10.1002/prot.10001. URL http://dx.doi.org/10.1002/prot.10001. →pages 4, 6, 32[43] D. Frenkel and B. Smit. Understanding Molecular Simulation: FromAlgorithms to Applications. Academic Press, 2002. ISBN 0-12-267351-4.→ pages 12[44] E. Gallicchio and R. M. Levy. Agbnp: An analytic implicit solvent modelsuitable for molecular dynamics simulations and high-resolution modeling.Journal of Computational Chemistry, 25(4):479–499, 2004. ISSN1096-987X. doi:10.1002/jcc.10400. URLhttp://dx.doi.org/10.1002/jcc.10400. → pages 5, 6, 7[45] E. Gallicchio, K. Paris, and R. M. Levy. The agbnp2 implicit solvationmodel. Journal of Chemical Theory and Computation, 5(9):2544–2564,2009. doi:10.1021/ct900234u. URLhttp://pubs.acs.org/doi/abs/10.1021/ct900234u. → pages 5, 6, 7[46] R. Geney, M. Layten, R. Gomperts, V. Hornak, and C. Simmerling.Investigation of salt bridge stability in a generalized born solvent model.Journal of Chemical Theory and Computation, 2(1):115–127, 2006.doi:10.1021/ct050183l. URL http://dx.doi.org/10.1021/ct050183l. →pages 6, 7[47] S. Genheden and U. Ryde. Will molecular dynamics simulations ofproteins ever reach equilibrium? Phys. Chem. Chem. Phys., 14:8662–8677,732012. doi:10.1039/C2CP23961B. URLhttp://dx.doi.org/10.1039/C2CP23961B. → pages 49[48] M. K. Gilson, K. A. Sharp, and B. H. Honig. Calculating the electrostaticpotential of molecules in solution: Method and error assessment. Journalof Computational Chemistry, 9(4):327–335, 1988. ISSN 1096-987X.doi:10.1002/jcc.540090407. URLhttp://dx.doi.org/10.1002/jcc.540090407. → pages 27[49] G. H. Givens and J. A. Hoeting. Computational Statistics. John Wiley &Sons, Inc., 2013. ISBN 978-0-470-53331-4. → pages 46[50] F. Godschalk, S. Genheden, P. Soderhjelm, and U. Ryde. Comparison ofmm/gbsa calculations based on explicit and implicit solvent simulations.Phys. Chem. Chem. Phys., 15:7731–7739, 2013.doi:10.1039/C3CP00116D. URL http://dx.doi.org/10.1039/C3CP00116D.→ pages 7[51] J. Gsponer, J. Christodoulou, A. Cavalli, J. M. Bui, B. Richter, C. M.Dobson, and M. Vendruscolo. A coupled equilibrium shift mechanism incalmodulin-mediated signal transduction. Structure, 16(5):736 – 746,2008. ISSN 0969-2126. doi:10.1016/j.str.2008.02.017. URLhttp://www.sciencedirect.com/science/article/pii/S0969212608001317. →pages 3[52] W. C. Guest, N. R. Cashman, and S. S. Plotkin. A theory for theanisotropic and inhomogeneous dielectric properties of proteins. Phys.Chem. Chem. Phys., 13:6286–6295, 2011. doi:10.1039/C0CP02061C.URL http://dx.doi.org/10.1039/C0CP02061C. → pages 5, 55[53] U. Haberthr and A. Caflisch. Facts: Fast analytical continuum treatment ofsolvation. Journal of Computational Chemistry, 29(5):701–715, 2008.ISSN 1096-987X. doi:10.1002/jcc.20832. URLhttp://dx.doi.org/10.1002/jcc.20832. → pages 5, 6, 31, 50[54] T. Hajari and N. F. A. van der Vegt. Peptide backbone effect on hydrationfree energies of amino acid side chains. The Journal of Physical Chemistry74B, 118(46):13162–13168, 2014. doi:10.1021/jp5094146. URLhttp://dx.doi.org/10.1021/jp5094146. → pages 7[55] R. C. Harris and B. M. Pettitt. Effects of geometry and chemistry onhydrophobic solvation. Proceedings of the National Academy of Sciences,111(41):14681–14686, 2014. doi:10.1073/pnas.1406080111. URLhttp://www.pnas.org/content/111/41/14681.abstract. → pages 64[56] R. C. Harris, A. H. Boschitsch, and M. O. Fenley. Influence of grid spacingin poissonboltzmann equation binding energy estimation. Journal ofChemical Theory and Computation, 9(8):3677–3685, 2013.doi:10.1021/ct300765w. URL http://dx.doi.org/10.1021/ct300765w. →pages 27[57] S. A. Hassan, F. Guarnieri, and E. L. Mehler. A general treatment ofsolvent effects based on screened coulomb potentials. The Journal ofPhysical Chemistry B, 104(27):6478–6489, 2000. doi:10.1021/jp993895e.URL http://pubs.acs.org/doi/abs/10.1021/jp993895e. → pages 6, 34, 50[58] S. A. Hassan, F. Guarnieri, and E. L. Mehler. Characterization of hydrogenbonding in a continuum solvent model. The Journal of Physical ChemistryB, 104(27):6490–6498, 2000. doi:10.1021/jp9938967. URLhttp://pubs.acs.org/doi/abs/10.1021/jp9938967. → pages 35, 36[59] S. A. Hassan, E. L. Mehler, D. Zhang, and H. Weinstein. Moleculardynamics simulations of peptides and proteins with a continuumelectrostatic model based on screened coulomb potentials. Proteins:Structure, Function, and Bioinformatics, 51(1):109–125, 2003. ISSN1097-0134. doi:10.1002/prot.10330. URLhttp://dx.doi.org/10.1002/prot.10330. → pages 5, 35, 36[60] N. Homeyer and H. Gohlke. Free energy calculations by the molecularmechanics poissonboltzmann surface area method. Molecular Informatics,31(2):114–122, 2012. ISSN 1868-1751. doi:10.1002/minf.201100135.URL http://dx.doi.org/10.1002/minf.201100135. → pages 8, 6075[61] S. Honda, T. Akiba, Y. S. Kato, Y. Sawada, M. Sekijima, M. Ishimura,A. Ooishi, H. Watanabe, T. Odahara, and K. Harata. Crystal structure of aten-amino acid protein. Journal of the American Chemical Society, 130(46):15327–15331, 2008. doi:10.1021/ja8030533. URLhttp://dx.doi.org/10.1021/ja8030533. → pages 13, 99[62] T. Hou, J. Wang, Y. Li, and W. Wang. Assessing the performance of themm/pbsa and mm/gbsa methods. 1. the accuracy of binding free energycalculations based on molecular dynamics simulations. Journal ofChemical Information and Modeling, 51(1):69–82, 2011.doi:10.1021/ci100275a. URL http://dx.doi.org/10.1021/ci100275a. →pages 7[63] C. Y. Hu, H. Kokubo, G. C. Lynch, D. W. Bolen, and B. M. Pettitt.Backbone additivity in the transfer model of protein solvation. ProteinScience, 19(5):1011–1022, 2010. ISSN 1469-896X. doi:10.1002/pro.378.URL http://dx.doi.org/10.1002/pro.378. → pages 64[64] A. Huang and C. M. Stultz. Conformational sampling with implicit solventmodels: Application to the phf6 peptide in tau protein. BiophysicalJournal, 92(1):34 – 45, 2007. ISSN 0006-3495.doi:http://dx.doi.org/10.1529/biophysj.106.091207. URLhttp://www.sciencedirect.com/science/article/pii/S0006349507708020. →pages 7[65] W. Im, M. S. Lee, and C. L. Brooks. Generalized born model with a simplesmoothing function. Journal of Computational Chemistry, 24(14):1691–1702, 2003. ISSN 1096-987X. doi:10.1002/jcc.10321. URLhttp://dx.doi.org/10.1002/jcc.10321. → pages 5, 29[66] W. Im, J. Chen, and C. Brooks. Peptide and protein folding andconformational equilibria: Theoretical treatment of electrostatics andhydrogen bonding with implicit solvent models. In PEPTIDE SOLVATIONAND H-BONDS, volume 72 of ADVANCES IN PROTEIN CHEMISTRY,pages 173+. ELSEVIER ACADEMIC PRESS INC, 525 B STREET,76SUITE 1900, SAN DIEGO, CA 92101-4495 USA, 2006.doi:10.1016/S0065-3233(05)72007-6. → pages 6, 7[67] A. Jaramillo and S. J. Wodak. Computational protein design is a challengefor implicit solvation models. Biophysical Journal, 88(1):156 – 171, 2005.ISSN 0006-3495. doi:http://dx.doi.org/10.1529/biophysj.104.042044. URLhttp://www.sciencedirect.com/science/article/pii/S0006349505730952. →pages 7[68] W. Jorgensen. Foundations of biomolecular modeling. Cell, 155(6):1199 –1202, 2013. ISSN 0092-8674.doi:http://dx.doi.org/10.1016/j.cell.2013.11.023. URLhttp://www.sciencedirect.com/science/article/pii/S0092867413014724. →pages 3[69] W. L. Jorgensen and J. Tirado-Rives. Potential energy functions foratomic-level simulations of water and organic and biomolecular systems.Proceedings of the National Academy of Sciences of the United States ofAmerica, 102(19):6665–6670, 2005. doi:10.1073/pnas.0408037102. URLhttp://www.pnas.org/content/102/19/6665.abstract. → pages 3[70] A. Juneja, M. Ito, and L. Nilsson. Implicit solvent models and stabilizingeffects of mutations and ligands on the unfolding of the amyloid -peptidecentral helix. Journal of Chemical Theory and Computation, 9(1):834–846,2013. doi:10.1021/ct300941v. URLhttp://pubs.acs.org/doi/abs/10.1021/ct300941v. → pages 7, 8[71] Y. K. Kang, G. Nemethy, and H. A. Scheraga. Free energies of hydration ofsolute molecules. 1. improvement of the hydration shell model by exactcomputations of overlapping volumes. The Journal of Physical Chemistry,91(15):4105–4109, 1987. doi:10.1021/j100299a032. URLhttp://dx.doi.org/10.1021/j100299a032. → pages 38[72] P. Kar, M. Seel, U. H. E. Hansmann, and S. Hfinger. Dispersion terms andanalysis of size- and charge dependence in an enhanced poissonboltzmannapproach. The Journal of Physical Chemistry B, 111(30):8910–8918, 2007.77doi:10.1021/jp072302u. URL http://dx.doi.org/10.1021/jp072302u. →pages 5, 6[73] M. Karplus and J. N. Kushick. Method for estimating the configurationalentropy of macromolecules. Macromolecules, 14(2):325–332, 1981.doi:10.1021/ma50003a019. URL http://dx.doi.org/10.1021/ma50003a019.→ pages 42[74] A. Kent, A. K. Jha, J. E. Fitzgerald, and K. F. Freed. Benchmarkingimplicit solvent folding simulations of the amyloid (1035) fragment. TheJournal of Physical Chemistry B, 112(19):6175–6186, 2008.doi:10.1021/jp077099h. URLhttp://pubs.acs.org/doi/abs/10.1021/jp077099h. → pages 6[75] E. Kim, S. Jang, and Y. Pak. Consistent free energy landscapes andthermodynamic properties of small proteins based on a single all-atomforce field employing an implicit solvation. The Journal of ChemicalPhysics, 127(14):145104, 2007. doi:http://dx.doi.org/10.1063/1.2775450.URLhttp://scitation.aip.org/content/aip/journal/jcp/127/14/10.1063/1.2775450.→ pages 6[76] J. Kleinjung and F. Fraternali. Design and application of implicit solventmodels in biomolecular simulations. Current Opinion in StructuralBiology, 25(0):126 – 134, 2014. ISSN 0959-440X.doi:http://dx.doi.org/10.1016/j.sbi.2014.04.003. URLhttp://www.sciencedirect.com/science/article/pii/S0959440X14000438. →pages 4[77] J. Kleinjung, W. R. P. Scott, J. R. Allison, W. F. van Gunsteren, andF. Fraternali. Implicit solvation parameters derived from explicit waterforces in large-scale molecular dynamics simulations. Journal of ChemicalTheory and Computation, 8(7):2391–2403, 2012. doi:10.1021/ct200390j.URL http://dx.doi.org/10.1021/ct200390j. → pages 678[78] J. L. Knight and C. L. Brooks. Surveying implicit solvent models forestimating small molecule absolute hydration free energies. Journal ofComputational Chemistry, 32(13):2909–2923, 2011. ISSN 1096-987X.doi:10.1002/jcc.21876. URL http://dx.doi.org/10.1002/jcc.21876. → pages7[79] H. Kokubo, C. Y. Hu, and B. M. Pettitt. Peptide conformationalpreferences in osmolyte solutions: Transfer free energies of decaalanine.Journal of the American Chemical Society, 133(6):1849–1858, 2011.doi:10.1021/ja1078128. URL http://dx.doi.org/10.1021/ja1078128. →pages 64[80] H. Kokubo, R. C. Harris, D. Asthagiri, and B. M. Pettitt. Solvation freeenergies of alanine peptides: The effect of flexibility. The Journal ofPhysical Chemistry B, 117(51):16428–16435, 2013.doi:10.1021/jp409693p. URL http://dx.doi.org/10.1021/jp409693p. →pages 64[81] P. A. Kollman, I. Massova, C. Reyes, B. Kuhn, S. Huo, L. Chong, M. Lee,T. Lee, Y. Duan, W. Wang, O. Donini, P. Cieplak, J. Srinivasan, D. A. Case,and T. E. Cheatham. Calculating structures and free energies of complexmolecules: combining molecular mechanics and continuum models.Accounts of Chemical Research, 33(12):889–897, 2000.doi:10.1021/ar000033j. URL http://dx.doi.org/10.1021/ar000033j. →pages 8[82] H. R. Kunsch. The jackknife and the bootstrap for general stationaryobservations. Ann. Statist., 17(3):1217–1241, 09 1989.doi:10.1214/aos/1176347265. URLhttp://dx.doi.org/10.1214/aos/1176347265. → pages 46[83] G. Knig and S. Boresch. Hydration free energies of amino acids: Why sidechain analog data are not enough. The Journal of Physical Chemistry B,113(26):8967–8974, 2009. doi:10.1021/jp902638y. URLhttp://dx.doi.org/10.1021/jp902638y. → pages 779[84] P. Labute. The generalized born/volume integral implicit solvent model:Estimation of the free energy of hydration using london dispersion insteadof atomic surface area. Journal of Computational Chemistry, 29(10):1693–1698, 2008. ISSN 1096-987X. doi:10.1002/jcc.20933. URLhttp://dx.doi.org/10.1002/jcc.20933. → pages 5, 6[85] T. J. Lane, D. Shukla, K. A. Beauchamp, and V. S. Pande. To millisecondsand beyond: challenges in the simulation of protein folding. CurrentOpinion in Structural Biology, 23(1):58 – 65, 2013. ISSN 0959-440X.doi:http://dx.doi.org/10.1016/j.sbi.2012.11.002. URLhttp://www.sciencedirect.com/science/article/pii/S0959440X12001820. →pages 4, 65[86] T. Lazaridis and M. Karplus. Effective energy function for proteins insolution. Proteins: Structure, Function, and Bioinformatics, 35(2):133–152, 1999. ISSN 1097-0134. doi:10.1002/(SICI)1097-0134(19990501)35:2〈133::AID-PROT1〉3.0.CO;2-N.URL http://dx.doi.org/10.1002/(SICI)1097-0134(19990501)35:2〈133::AID-PROT1〉3.0.CO;2-N. → pages 4, 6, 36, 37, 50[87] A. R. Leach. Molecular Modelling: Principles and Applications. PearsonEducation Limited, 2001. ISBN 0-582-38210-6. → pages 12[88] M. S. Lee and M. A. Olson. Comparison of volume and surface areanonpolar solvation free energy terms for implicit solvent simulations. TheJournal of Chemical Physics, 139(4):044119, 2013.doi:10.1063/1.4816641. URL http://link.aip.org/link/?JCP/139/044119/1.→ pages 4, 7, 52[89] M. S. Lee, F. R. Salsbury, and C. L. Brooks. Novel generalized bornmethods. The Journal of Chemical Physics, 116(24):10606–10614, 2002.doi:http://dx.doi.org/10.1063/1.1480013. URLhttp://scitation.aip.org/content/aip/journal/jcp/116/24/10.1063/1.1480013.→ pages 5, 6, 2780[90] M. S. Lee, M. Feig, F. R. Salsbury, and C. L. Brooks. New analyticapproximation to the standard molecular volume definition and itsapplication to generalized born calculations. Journal of ComputationalChemistry, 24(11):1348–1356, 2003. ISSN 1096-987X.doi:10.1002/jcc.10272. URL http://dx.doi.org/10.1002/jcc.10272. → pages5, 6, 27, 50[91] R. M. Levy, M. Karplus, J. Kushick, and D. Perahia. Evaluation of theconfigurational entropy for proteins: application to molecular dynamicssimulations of an -helix. Macromolecules, 17(7):1370–1374, 1984.doi:10.1021/ma00137a013. URL http://dx.doi.org/10.1021/ma00137a013.→ pages 42[92] R. M. Levy, L. Y. Zhang, E. Gallicchio, and A. K. Felts. On the nonpolarhydration free energy of proteins: surface area and continuum solventmodels for the solutesolvent interaction energy. Journal of the AmericanChemical Society, 125(31):9523–9530, 2003. doi:10.1021/ja029833a.URL http://pubs.acs.org/doi/abs/10.1021/ja029833a. → pages 5, 52[93] L. Li, C. Li, Z. Zhang, and E. Alexov. On the dielectric constant ofproteins: Smooth dielectric function for macromolecular modeling and itsimplementation in delphi. Journal of Chemical Theory and Computation, 9(4):2126–2136, 2013. doi:10.1021/ct400065j. URLhttp://dx.doi.org/10.1021/ct400065j. → pages 5[94] K. Lindorff-Larsen, S. Piana, R. O. Dror, and D. E. Shaw. How fast-foldingproteins fold. Science, 334(6055):517–520, 2011.doi:10.1126/science.1208351. URLhttp://www.sciencemag.org/content/334/6055/517.abstract. → pages 12,56, 62[95] K. Lindorff-Larsen, N. Trbovic, P. Maragakis, S. Piana, and D. E. Shaw.Structure and dynamics of an unfolded protein examined by moleculardynamics simulation. Journal of the American Chemical Society, 134(8):3787–3791, 2012. doi:10.1021/ja209931w. URLhttp://dx.doi.org/10.1021/ja209931w. → pages 2, 6281[96] B. Z. Lu, Y. C. Zhou, M. J. Holst, and J. A. McCammon. Recent progressin numerical methods for the poisson-boltzmann equation in biophysicalapplications. Communications in computational physics, 3(5):973–1009,MAY 2008. ISSN 1815-2406. → pages 5, 26[97] T. Z. Lwin, R. Zhou, and R. Luo. Is poisson-boltzmann theory insufficientfor protein folding simulations? The Journal of Chemical Physics, 124(3):034902, 2006. doi:http://dx.doi.org/10.1063/1.2161202. URLhttp://scitation.aip.org/content/aip/journal/jcp/124/3/10.1063/1.2161202.→ pages 7, 8[98] A. D. MacKerell, D. Bashford, Bellott, R. L. Dunbrack, J. D. Evanseck,M. J. Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph-McCarthy,L. Kuchnir, K. Kuczera, F. T. K. Lau, C. Mattos, S. Michnick, T. Ngo, D. T.Nguyen, B. Prodhom, W. E. Reiher, B. Roux, M. Schlenkrich, J. C. Smith,R. Stote, J. Straub, M. Watanabe, J. Wirkiewicz-Kuczera, D. Yin, andM. Karplus. All-atom empirical potential for molecular modeling anddynamics studies of proteins. The Journal of Physical Chemistry B, 102(18):3586–3616, 1998. doi:10.1021/jp973084f. URLhttp://pubs.acs.org/doi/abs/10.1021/jp973084f. → pages 10[99] A. D. Mackerell, M. Feig, and C. L. Brooks. Extending the treatment ofbackbone energetics in protein force fields: Limitations of gas-phasequantum mechanics in reproducing protein conformational distributions inmolecular dynamics simulations. Journal of Computational Chemistry, 25(11):1400–1415, 2004. ISSN 1096-987X. doi:10.1002/jcc.20065. URLhttp://dx.doi.org/10.1002/jcc.20065. → pages 11[100] C. Malmberg and A. Maryott. Dielectric constant of water from0-degrees-c to 100-degrees-c. Journal of Research of the National Bueau ofStandards, 56(1):1–8, 1956. ISSN 0091-0635. doi:10.6028/jres.056.001.→ pages 42, 60[101] A. Masunov and T. Lazaridis. Potentials of mean force between ionizableamino acid side chains in water. Journal of the American Chemical Society,82125(7):1722–1730, 2003. doi:10.1021/ja025521w. URLhttp://dx.doi.org/10.1021/ja025521w. → pages 6, 7, 38[102] E. E. Meyer, K. J. Rosenberg, and J. Israelachvili. Recent progress inunderstanding hydrophobic interactions. Proceedings of the NationalAcademy of Sciences, 103(43):15739–15746, 2006.doi:10.1073/pnas.0606422103. URLhttp://www.pnas.org/content/103/43/15739.abstract. → pages 52[103] C. A. Miller, S. H. Gellman, N. L. Abbott, and J. J. de Pablo. Mechanicalstability of helical -peptides and a comparison of explicit and implicitsolvent models. Biophysical Journal, 95(7):3123 – 3136, 2008. ISSN0006-3495. doi:http://dx.doi.org/10.1529/biophysj.108.134833. URLhttp://www.sciencedirect.com/science/article/pii/S0006349508784571. →pages 7[104] D. L. Mobley, C. I. Bayly, M. D. Cooper, and K. A. Dill. Predictions ofhydration free energies from all-atom molecular dynamics simulations. TheJournal of Physical Chemistry B, 113(14):4533–4537, 2009.doi:10.1021/jp806838b. URL http://dx.doi.org/10.1021/jp806838b. →pages 61[105] D. L. Mobley, C. I. Bayly, M. D. Cooper, M. R. Shirts, and K. A. Dill.Small molecule hydration free energies in explicit solvent: An extensivetest of fixed-charge atomistic simulations. Journal of Chemical Theory andComputation, 5(2):350–358, 2009. doi:10.1021/ct800409d. URLhttp://dx.doi.org/10.1021/ct800409d. → pages 61[106] J. Mongan, C. Simmerling, J. A. McCammon, D. A. Case, and A. Onufriev.Generalized born model with a simple, robust molecular volumecorrection. Journal of Chemical Theory and Computation, 3(1):156–169,2007. doi:10.1021/ct600085e. URL http://dx.doi.org/10.1021/ct600085e.→ pages 5, 6, 7[107] J. Mongan, W. A. Svrcek-Seiler, and A. Onufriev. Analysis of integralexpressions for effective born radii. The Journal of Chemical Physics, 12783(18):185101, 2007. doi:http://dx.doi.org/10.1063/1.2783847. URLhttp://scitation.aip.org/content/aip/journal/jcp/127/18/10.1063/1.2783847.→ pages 7[108] H. Nguyen, D. R. Roe, and C. Simmerling. Improved generalized bornsolvent model parameters for protein simulations. Journal of ChemicalTheory and Computation, 9(4):2020–2034, 2013. doi:10.1021/ct3010485.URL http://dx.doi.org/10.1021/ct3010485. → pages 6, 7[109] H. Nguyen, J. Maier, H. Huang, V. Perrone, and C. Simmerling. Foldingsimulations for proteins with diverse topologies are accessible in days witha physics-based force field and implicit solvent. Journal of the AmericanChemical Society, 136(40):13959–13962, 2014. doi:10.1021/ja5032776.URL http://dx.doi.org/10.1021/ja5032776. → pages 7, 8, 64[110] A. Nicholls and B. Honig. A rapid finite-difference algorithm, utilizingsuccessive over-relaxation to solve the poisson-boltzmann equation.Journal of Computational Chemistry, 12(4):435–445, MAY 1991. ISSN0192-8651. doi:10.1002/jcc.540120405. → pages 27[111] A. Nicholls, D. L. Mobley, J. P. Guthrie, J. D. Chodera, C. I. Bayly, M. D.Cooper, and V. S. Pande. Predicting small-molecule solvation freeenergies: An informal blind test for computational chemistry. Journal ofMedicinal Chemistry, 51(4):769–779, 2008. doi:10.1021/jm070549+.URL http://dx.doi.org/10.1021/jm070549+. → pages 61[112] M. Nina, D. Beglov, and B. Roux. Atomic radii for continuumelectrostatics calculations based on molecular dynamics free energysimulations. The Journal of Physical Chemistry B, 101(26):5239–5248,1997. doi:10.1021/jp970736r. URLhttp://pubs.acs.org/doi/abs/10.1021/jp970736r. → pages 5, 27, 30, 55, 57[113] M. Nina, W. Im, and B. Roux. Optimized atomic radii for proteincontinuum electrostatics solvation forces. Biophysical Chemistry, 78(12):89 – 96, 1999. ISSN 0301-4622.doi:http://dx.doi.org/10.1016/S0301-4622(98)00236-1. URL84http://www.sciencedirect.com/science/article/pii/S0301462298002361. →pages 5, 25, 26, 27, 30, 50, 55, 57[114] W. G. Noid. Perspective: Coarse-grained models for biomolecular systems.The Journal of Chemical Physics, 139(9):090901, 2013.doi:http://dx.doi.org/10.1063/1.4818908. URLhttp://scitation.aip.org/content/aip/journal/jcp/139/9/10.1063/1.4818908.→ pages 2[115] H. Nymeyer and A. E. Garca. Simulation of the folding equilibrium of-helical peptides: A comparison of the generalized born approximationwith explicit solvent. Proceedings of the National Academy of Sciences,100(24):13934–13939, 2003. doi:10.1073/pnas.2232868100. URLhttp://www.pnas.org/content/100/24/13934.abstract. → pages 7[116] A. Okur, L. Wickstrom, and C. Simmerling. Evaluation of salt bridgestructure and energetics in peptides using explicit, implicit, and hybridsolvation models. Journal of Chemical Theory and Computation, 4(3):488–498, 2008. doi:10.1021/ct7002308. URLhttp://dx.doi.org/10.1021/ct7002308. → pages 7[117] J. N. Onuchic, Z. Luthey-Schulten, and P. G. Wolynes. Theory of proteinfolding: The energy landscape perspective. Annual Review of PhysicalChemistry, 48(1):545–600, 1997.doi:10.1146/annurev.physchem.48.1.545. URLhttp://dx.doi.org/10.1146/annurev.physchem.48.1.545. → pages 3[118] A. Onufriev, D. A. Case, and D. Bashford. Effective born radii in thegeneralized born approximation: The importance of being perfect. Journalof Computational Chemistry, 23(14):1297–1304, 2002. ISSN 1096-987X.doi:10.1002/jcc.10126. URL http://dx.doi.org/10.1002/jcc.10126. → pages6[119] Y. Pak, E. Kim, and S. Jang. Misfolded free energy surface of a peptidewith motif (1psv) using the generalized born solvation model. The Journalof Chemical Physics, 121(18):9184–9185, 2004.85doi:http://dx.doi.org/10.1063/1.1804159. URLhttp://scitation.aip.org/content/aip/journal/jcp/121/18/10.1063/1.1804159.→ pages 7[120] V. S. Pande, K. Beauchamp, and G. R. Bowman. Everything you wanted toknow about markov state models but were afraid to ask. Methods, 52(1):99– 105, 2010. ISSN 1046-2023.doi:http://dx.doi.org/10.1016/j.ymeth.2010.06.002. URLhttp://www.sciencedirect.com/science/article/pii/S1046202310001568. →pages 48[121] S. Piana, K. Lindorff-Larsen, and D. Shaw. How robust are protein foldingsimulations with respect to force field parameterization? BiophysicalJournal, 100(9):L47 – L49, 2011. ISSN 0006-3495.doi:http://dx.doi.org/10.1016/j.bpj.2011.03.051. URLhttp://www.sciencedirect.com/science/article/pii/S0006349511004097. →pages 11, 12[122] J.-H. Prinz, H. Wu, M. Sarich, B. Keller, M. Senne, M. Held, J. D.Chodera, C. Schtte, and F. No. Markov models of molecular kinetics:Generation and validation. The Journal of Chemical Physics, 134(17):174105, 2011. doi:http://dx.doi.org/10.1063/1.3565032. URLhttp://scitation.aip.org/content/aip/journal/jcp/134/17/10.1063/1.3565032.→ pages 13[123] G. Prez-Hernndez, F. Paul, T. Giorgino, G. De Fabritiis, and F. No.Identification of slow molecular order parameters for markov modelconstruction. The Journal of Chemical Physics, 139(1):015102, 2013.doi:http://dx.doi.org/10.1063/1.4811489. URLhttp://scitation.aip.org/content/aip/journal/jcp/139/1/10.1063/1.4811489.→ pages 15, 17[124] G. Rastelli, A. D. Rio, G. Degliesposti, and M. Sgobba. Fast and accuratepredictions of binding free energies using mm-pbsa and mm-gbsa. Journalof Computational Chemistry, 31(4):797–810, 2010. ISSN 1096-987X.86doi:10.1002/jcc.21372. URL http://dx.doi.org/10.1002/jcc.21372. → pages7[125] P. Ren, J. Chun, D. G. Thomas, M. J. Schnieders, M. Marucho, J. Zhang,and N. A. Baker. Biomolecular electrostatics and solvation: acomputational perspective. Quarterly Reviews of Biophysics, 45:427–491,11 2012. ISSN 1469-8994. doi:10.1017/S003358351200011X. URLhttp://journals.cambridge.org/article S003358351200011X. → pages 3, 4,21, 22[126] Y. M. Rhee and V. S. Pande. On the role of chemical detail in simulatingprotein folding kinetics. Chemical Physics, 323(1):66 – 77, 2006. ISSN0301-0104. doi:http://dx.doi.org/10.1016/j.chemphys.2005.08.060. URLhttp://www.sciencedirect.com/science/article/pii/S0301010405003939. →pages 7[127] E. Roberts. Cellular and molecular structure as a unifying framework forwhole-cell modeling. Current Opinion in Structural Biology, 25(0):86 –91, 2014. ISSN 0959-440X.doi:http://dx.doi.org/10.1016/j.sbi.2014.01.005. URLhttp://www.sciencedirect.com/science/article/pii/S0959440X14000062. →pages 2[128] T. H. Rod, P. Rydberg, and U. Ryde. Implicit versus explicit solvent in freeenergy calculations of enzyme catalysis: Methyl transfer catalyzed bycatechol o-methyltransferase. The Journal of Chemical Physics, 124(17):174503, 2006. doi:http://dx.doi.org/10.1063/1.2186635. URLhttp://scitation.aip.org/content/aip/journal/jcp/124/17/10.1063/1.2186635.→ pages 7[129] D. R. Roe, A. Okur, L. Wickstrom, V. Hornak, and C. Simmerling.Secondary structure bias in generalized born solvent models: comparisonof conformational ensembles and free energy of solvent polarization fromexplicit and implicit solvation. The Journal of Physical Chemistry B, 111(7):1846–1857, 2007. doi:10.1021/jp066831u. URLhttp://dx.doi.org/10.1021/jp066831u. → pages 787[130] C. A. Rohl, C. E. Strauss, K. M. Misura, and D. Baker. Protein structureprediction using rosetta. In L. Brand and M. L. Johnson, editors, NumericalComputer Methods, Part D, volume 383 of Methods in Enzymology, pages66 – 93. Academic Press, 2004.doi:http://dx.doi.org/10.1016/S0076-6879(04)83004-0. URLhttp://www.sciencedirect.com/science/article/pii/S0076687904830040. →pages 64[131] R. Rohs, S. M. West, A. Sosinsky, P. Liu, R. S. Mann, and B. Honig. Therole of dna shape in protein-dna recognition. Nature, 461(7268):1248–U81,OCT 29 2009. ISSN 0028-0836. doi:10.1038/nature08473. → pages 64[132] B. Roux and T. Simonson. Implicit solvent models. Biophysical Chemistry,78(12):1 – 20, 1999. ISSN 0301-4622.doi:http://dx.doi.org/10.1016/S0301-4622(98)00226-9. URLhttp://www.sciencedirect.com/science/article/pii/S0301462298002269. →pages 4, 5, 18[133] M. G. Saunders and G. A. Voth. Coarse-graining methods forcomputational biology. Annual Review of Biophysics, 42(1):73–93, 2013.doi:10.1146/annurev-biophys-083012-130348. URL http://www.annualreviews.org/doi/abs/10.1146/annurev-biophys-083012-130348. →pages 2[134] M. Schaefer and M. Karplus. A comprehensive analytical treatment ofcontinuum electrostatics. The Journal of Physical Chemistry, 100(5):1578–1599, 1996. doi:10.1021/jp9521621. URLhttp://pubs.acs.org/doi/abs/10.1021/jp9521621. → pages 6, 24, 32, 50[135] M. Schaefer, C. Bartels, F. Leclerc, and M. Karplus. Effective atomvolumes for implicit solvent models: comparison between voronoi volumesand minimum fluctuation volumes. Journal of Computational Chemistry,22(15):1857–1879, 2001. ISSN 1096-987X. doi:10.1002/jcc.1137. URLhttp://dx.doi.org/10.1002/jcc.1137. → pages 6, 3388[136] J. Schlitter. Estimation of absolute and relative entropies ofmacromolecules using the covariance matrix. Chemical Physics Letters,215(6):617 – 621, 1993. ISSN 0009-2614.doi:http://dx.doi.org/10.1016/0009-2614(93)89366-P. URLhttp://www.sciencedirect.com/science/article/pii/000926149389366P. →pages 42, 43[137] C. N. Schutz and A. Warshel. What are the dielectric constants of proteinsand how to validate electrostatic models? Proteins: Structure, Function,and Bioinformatics, 44(4):400–417, 2001. ISSN 1097-0134.doi:10.1002/prot.1106. URL http://dx.doi.org/10.1002/prot.1106. → pages5, 55, 56[138] C. R. Schwantes and V. S. Pande. Improvements in markov state modelconstruction reveal many non-native interactions in the folding of ntl9.Journal of Chemical Theory and Computation, 9(4):2000–2009, 2013.doi:10.1021/ct300878a. URLhttp://pubs.acs.org/doi/abs/10.1021/ct300878a. → pages 15[139] M. Senne, B. Trendelkamp-Schroer, A. S. Mey, C. Schtte, and F. No.Emma: A software package for markov model building and analysis.Journal of Chemical Theory and Computation, 8(7):2223–2238, 2012.doi:10.1021/ct300274u. URLhttp://pubs.acs.org/doi/abs/10.1021/ct300274u. → pages 17[140] Y. Shan, J. L. Klepeis, M. P. Eastwood, R. O. Dror, and D. E. Shaw.Gaussian split ewald: A fast ewald mesh method for molecular simulation.The Journal of Chemical Physics, 122(5):054101, 2005.doi:http://dx.doi.org/10.1063/1.1839571. URLhttp://scitation.aip.org/content/aip/journal/jcp/122/5/10.1063/1.1839571.→ pages 12[141] Y. Shang, H. Nguyen, L. Wickstrom, A. Okur, and C. Simmerling.Improving the description of salt bridge strength and geometry in ageneralized born model. Journal of Molecular Graphics and Modelling, 2989(5):676 – 684, 2011. ISSN 1093-3263.doi:http://dx.doi.org/10.1016/j.jmgm.2010.11.013. URLhttp://www.sciencedirect.com/science/article/pii/S109332631000183X. →pages 6, 7[142] Q. Shao, L. Yang, and Y. Q. Gao. A test of implicit solvent models on thefolding simulation of the gb1 peptide. The Journal of Chemical Physics,130(19):195104, 2009. doi:http://dx.doi.org/10.1063/1.3132850. URLhttp://scitation.aip.org/content/aip/journal/jcp/130/19/10.1063/1.3132850.→ pages 7[143] D. E. Shaw, J. P. Grossman, J. A. Bank, B. Batson, J. A. Butts, J. C. Chao,M. M. Deneroff, R. O. Dror, A. Even, C. H. Fenton, A. Forte, J. Gagliardo,G. Gill, B. Greskamp, C. R. Ho, D. J. Ierardi, L. Iserovich, J. S. Kuskin,R. H. Larson, T. Layman, L.-S. Lee, A. K. Lerer, C. Li, D. Killebrew, K. M.Mackenzie, S. Y.-H. Mok, M. A. Moraes, R. Mueller, L. J. Nociolo, J. L.Peticolas, T. Quan, D. Ramot, J. K. Salmon, D. P. Scarpazza,U. Ben Schafer, N. Siddique, C. W. Snyder, J. Spengler, P. T. P. Tang,M. Theobald, H. Toma, B. Towles, B. Vitale, S. C. Wang, and C. Young.Anton 2: Raising the bar for performance and programmability in aspecial-purpose molecular dynamics supercomputer. In Proceedings of theInternational Conference for High Performance Computing, Networking,Storage and Analysis, SC ’14, pages 41–53, Piscataway, NJ, USA, 2014.IEEE Press. ISBN 978-1-4799-5500-8. doi:10.1109/SC.2014.9. URLhttp://dx.doi.org/10.1109/SC.2014.9. → pages 65[144] D. Shivakumar, Y. Deng, and B. Roux. Computations of absolute solvationfree energies of small molecules using explicit and implicit solvent model.Journal of Chemical Theory and Computation, 5(4):919–930, 2009.doi:10.1021/ct800445x. URL http://dx.doi.org/10.1021/ct800445x. →pages 7[145] J. Srinivasan, T. E. Cheatham, P. Cieplak, P. A. Kollman, and D. A. Case.Continuum solvent studies of the stability of dna, rna, andphosphoramidatedna helices. Journal of the American Chemical Society,90120(37):9401–9409, 1998. doi:10.1021/ja981844+. URLhttp://dx.doi.org/10.1021/ja981844+. → pages 8[146] P. J. Steinbach and B. R. Brooks. New spherical-cutoff methods forlong-range forces in macromolecular simulation. Journal of ComputationalChemistry, 15(7):667–683, 1994. ISSN 1096-987X.doi:10.1002/jcc.540150702. URLhttp://dx.doi.org/10.1002/jcc.540150702. → pages 12[147] W. C. Still, A. Tempczyk, R. C. Hawley, and T. Hendrickson.Semianalytical treatment of solvation for molecular mechanics anddynamics. Journal of the American Chemical Society, 112(16):6127–6129,1990. doi:10.1021/ja00172a038. URLhttp://pubs.acs.org/doi/abs/10.1021/ja00172a038. → pages 5[148] J. E. Stone, D. J. Hardy, I. S. Ufimtsev, and K. Schulten. Gpu-acceleratedmolecular modeling coming of age. Journal of Molecular Graphics andModelling, 29(2):116 – 125, 2010. ISSN 1093-3263.doi:http://dx.doi.org/10.1016/j.jmgm.2010.06.010. URLhttp://www.sciencedirect.com/science/article/pii/S1093326310000914. →pages 65[149] B. Strodel and D. J. Wales. Implicit solvent models and the energylandscape for aggregation of the amyloidogenic kffe peptide. Journal ofChemical Theory and Computation, 4(4):657–672, 2008.doi:10.1021/ct700305w. URL http://dx.doi.org/10.1021/ct700305w. →pages 7[150] E. Sua´rez, N. Dı´az, and D. Sua´rez. Entropy calculations of singlemolecules by combining the rigidrotor and harmonic-oscillatorapproximations with conformational entropy estimations from moleculardynamics simulations. Journal of Chemical Theory and Computation, 7(8):2638–2653, 2011. doi:10.1021/ct200216n. URLhttp://pubs.acs.org/doi/abs/10.1021/ct200216n. → pages 4391[151] D. Surez and N. Daz. Sampling assessment for molecular simulationsusing conformational entropy calculations. Journal of Chemical Theoryand Computation, 10(10):4718–4729, 2014. doi:10.1021/ct500700d. URLhttp://dx.doi.org/10.1021/ct500700d. → pages 49[152] E. Surez and D. Surez. Multibody local approximation: Application toconformational entropy calculations on biomolecules. The Journal ofChemical Physics, 137(8):084115, 2012.doi:http://dx.doi.org/10.1063/1.4748104. URLhttp://scitation.aip.org/content/aip/journal/jcp/137/8/10.1063/1.4748104.→ pages 43, 49[153] E. Surez, N. Daz, J. Mndez, and D. Surez. Cencalc: A computational toolfor conformational entropy calculations from molecular simulations.Journal of Computational Chemistry, 34(23):2041–2054, 2013. ISSN1096-987X. doi:10.1002/jcc.23350. URLhttp://dx.doi.org/10.1002/jcc.23350. → pages 46[154] J. M. Swanson, R. H. Henchman, and J. A. McCammon. Revisiting freeenergy calculations: A theoretical connection to mm/pbsa and directcalculation of the association free energy. Biophysical Journal, 86(1):67 –74, 2004. ISSN 0006-3495.doi:http://dx.doi.org/10.1016/S0006-3495(04)74084-9. URLhttp://www.sciencedirect.com/science/article/pii/S0006349504740849. →pages 64[155] J. M. J. Swanson, J. A. Wagoner, N. A. Baker, and J. A. McCammon.Optimizing the poisson dielectric boundary with explicit solvent forces andenergies: lessons learned with atom-centered dielectric functions. Journalof Chemical Theory and Computation, 3(1):170–183, 2007.doi:10.1021/ct600216k. URL http://dx.doi.org/10.1021/ct600216k. →pages 6[156] C. Tan, L. Yang, and R. Luo. How well does poissonboltzmann implicitsolvent agree with explicit solvent? a quantitative analysis. The Journal of92Physical Chemistry B, 110(37):18680–18687, 2006.doi:10.1021/jp063479b. URL http://dx.doi.org/10.1021/jp063479b. →pages 6, 7[157] C. Tan, Y.-H. Tan, and R. Luo. Implicit nonpolar solvent models. TheJournal of Physical Chemistry B, 111(42):12263–12274, 2007.doi:10.1021/jp073399n. URLhttp://pubs.acs.org/doi/abs/10.1021/jp073399n. → pages 4, 5, 6[158] T. Terada and K. Shimizu. A comparison of generalized born methods infolding simulations. Chemical Physics Letters, 460(13):295 – 299, 2008.ISSN 0009-2614. doi:http://dx.doi.org/10.1016/j.cplett.2008.05.066. URLhttp://www.sciencedirect.com/science/article/pii/S000926140800746X. →pages 7[159] D. G. Thomas, J. Chun, Z. Chen, G. Wei, and N. A. Baker.Parameterization of a geometric flow implicit solvation model. Journal ofComputational Chemistry, 34(8):687–695, 2013. ISSN 1096-987X.doi:10.1002/jcc.23181. URL http://dx.doi.org/10.1002/jcc.23181. → pages5[160] H. Tjong and H.-X. Zhou. The dependence of electrostatic solvationenergy on dielectric constants in poisson-boltzmann calculations. TheJournal of Chemical Physics, 125(20):206101, 2006.doi:http://dx.doi.org/10.1063/1.2393243. URLhttp://scitation.aip.org/content/aip/journal/jcp/125/20/10.1063/1.2393243.→ pages 5[161] H. Tjong and H.-X. Zhou. On the dielectric boundary in poissonboltzmanncalculations. Journal of Chemical Theory and Computation, 4(3):507–514,2008. doi:10.1021/ct700319x. URL http://dx.doi.org/10.1021/ct700319x.→ pages 5[162] A. Vitalis and R. V. Pappu. Absinth: A new continuum solvation model forsimulations of polypeptides in aqueous solutions. Journal ofComputational Chemistry, 30(5):673–699, 2009. ISSN 1096-987X.93doi:10.1002/jcc.21005. URL http://dx.doi.org/10.1002/jcc.21005. → pages4, 6, 36, 39, 50[163] J. Wagoner and N. A. Baker. Solvation forces on biomolecular structures:A comparison of explicit solvent and poissonboltzmann models. Journal ofComputational Chemistry, 25(13):1623–1629, 2004. ISSN 1096-987X.doi:10.1002/jcc.20089. URL http://dx.doi.org/10.1002/jcc.20089. → pages7, 52[164] J. A. Wagoner and N. A. Baker. Assessing implicit models for nonpolarmean solvation forces: The importance of dispersion and volume terms.Proceedings of the National Academy of Sciences, 103(22):8331–8336,2006. doi:10.1073/pnas.0600118103. URLhttp://www.pnas.org/content/103/22/8331.abstract. → pages 4, 5, 6, 7, 52[165] J. Wang, C. Tan, E. Chanco, and R. Luo. Quantitative analysis ofpoisson-boltzmann implicit solvent in molecular dynamics. Phys. Chem.Chem. Phys., 12:1194–1202, 2010. doi:10.1039/B917775B. URLhttp://dx.doi.org/10.1039/B917775B. → pages 7, 8[166] W. Wang and P. A. Kollman. Free energy calculations on dimer stability ofthe hiv protease using molecular dynamics and a continuum solvent model.Journal of Molecular Biology, 303(4):567 – 582, 2000. ISSN 0022-2836.doi:http://dx.doi.org/10.1006/jmbi.2000.4057. URLhttp://www.sciencedirect.com/science/article/pii/S0022283600940579. →pages 42[167] A. Warshel, P. K. Sharma, M. Kato, and W. W. Parson. Modelingelectrostatic effects in proteins. Biochimica et Biophysica Acta (BBA) -Proteins and Proteomics, 1764(11):1647 – 1676, 2006. ISSN 1570-9639.doi:http://dx.doi.org/10.1016/j.bbapap.2006.08.007. URLhttp://www.sciencedirect.com/science/article/pii/S1570963906002809. →pages 55[168] T. E. Williamson, A. Vitalis, S. L. Crick, and R. V. Pappu. Modulation ofpolyglutamine conformations and dimer formation by the n-terminus of94huntingtin. Journal of Molecular Biology, 396(5):1295 – 1309, 2010.ISSN 0022-2836. doi:http://dx.doi.org/10.1016/j.jmb.2009.12.017. URLhttp://www.sciencedirect.com/science/article/pii/S0022283609015228. →pages 40[169] R. Wuttke, H. Hofmann, D. Nettels, M. B. Borgia, J. Mittal, R. B. Best, andB. Schuler. Temperature-dependent solvation modulates the dimensions ofdisordered proteins. Proceedings of the National Academy of Sciences, 111(14):5213–5218, 2014. doi:10.1073/pnas.1313006111. URLhttp://www.pnas.org/content/111/14/5213.abstract. → pages 40, 59[170] I.-C. Yeh and A. Wallqvist. Structure and dynamics of end-to-end loopformation of the penta-peptide cys-ala-gly-gln-trp in implicit solvents. TheJournal of Physical Chemistry B, 113(36):12382–12390, 2009.doi:10.1021/jp904064z. URL http://dx.doi.org/10.1021/jp904064z. →pages 7, 8[171] Z. Yu, M. P. Jacobson, J. Josovitz, C. S. Rapp, and R. A. Friesner.First-shell solvation of ion pairs: correction of systematic errors in implicitsolvent models. The Journal of Physical Chemistry B, 108(21):6643–6654,2004. doi:10.1021/jp037821l. URL http://dx.doi.org/10.1021/jp037821l.→ pages 7[172] F. Zeller and M. Zacharias. Evaluation of generalized born model accuracyfor absolute binding free energy calculations. The Journal of PhysicalChemistry B, 118(27):7467–7474, 2014. doi:10.1021/jp5015934. URLhttp://dx.doi.org/10.1021/jp5015934. → pages 7[173] Z. Zhang, L. Wang, Y. Gao, J. Zhang, M. Zhenirovskyy, and E. Alexov.Predicting folding free energy changes upon single point mutations.Bioinformatics, 28(5):664–671, 2012. doi:10.1093/bioinformatics/bts005.URL http://bioinformatics.oxfordjournals.org/content/28/5/664.abstract. →pages 64[174] R. Zhou. Free energy landscape of protein folding in water: Explicit vs.implicit solvent. Proteins: Structure, Function, and Bioinformatics, 53(2):95148–161, 2003. ISSN 1097-0134. doi:10.1002/prot.10483. URLhttp://dx.doi.org/10.1002/prot.10483. → pages 7[175] R. Zhou and B. J. Berne. Can a continuum solvent model reproduce thefree energy landscape of a -hairpin folding in water? Proceedings of theNational Academy of Sciences, 99(20):12777–12782, 2002.doi:10.1073/pnas.142430099. URLhttp://www.pnas.org/content/99/20/12777.abstract. → pages 7[176] R. Zhou, G. Krilov, and B. J. Berne. Comment on can a continuum solventmodel reproduce the free energy landscape of a -hairpin folding in water?the poissonboltzmann equation. The Journal of Physical Chemistry B, 108(22):7528–7530, 2004. doi:10.1021/jp037812c. URLhttp://dx.doi.org/10.1021/jp037812c. → pages 7[177] J. Zhu, E. Alexov, and B. Honig. Comparative study of generalized bornmodels: born radii and peptide folding. The Journal of Physical ChemistryB, 109(7):3008–3022, 2005. doi:10.1021/jp046307s. URLhttp://dx.doi.org/10.1021/jp046307s. → pages 7[178] D. M. Zuckerman. Equilibrium sampling in biomolecular simulations.Annual Review of Biophysics, 40(1):41–62, 2011.doi:10.1146/annurev-biophys-042910-155255. URL http://www.annualreviews.org/doi/abs/10.1146/annurev-biophys-042910-155255. →pages 6596Appendix ASupporting Materials97∆E¯MM,V T∆SV T∆SJV ∆Gfold,S ∆∆GexpsolvChignolin -85.4(9) -5.5(1) -2.0(4) -1.0(1) 78.8(9)Trp-cage -92(1) -8.5(2) -16.5(5) 0.8(2) 84(1)BBA -25(4) -10.0(3) -21(1) 0.8(2) 16(4)Villin -75(5) -15(1) -28(2) 0.9(1) 61(5)WW domain -101(1) -23.9(4) -32(1) -0.9(2) 76(1)NTL9 -173(3) -25.9(3) -30(2) -1.1(1) 146(3)BBL -39(7) -12.1(7) -23(2) 1.0(2) 28(7)Protein B -62(7) -11.0(8) -23(2) 0.5(1) 52(7)Homeodomain -11(8) -6.6(8) 0(2) -1.1(1) 3(8)Protein G -76(2) -28.8(6) -50(1) 0.2(2) 48(2)α3D -162(4) -38.6(4) -68(2) -0.1(2) 123(4)λ -repressor -9(4) -30(1) -57(4) 0.9(3) -21(4)Table A.1: Intermediate and final values for the expected change in free en-ergy of solvation upon folding. ∆SJV refers to the joint entropy method.Units are kcal/mol; standard errors are given in brackets.PBEQ GBMV2 GBSW FACTS ACE2 SCP EEF1 ABSINTHChignolin 50.0(9) 59.7(1) 55.8(9) 52.7(9) 55.8(1) 62(1) 70.7(9) 69.9(9)Trp-cage 52(1) 56(1) 50(1) 46(1) 42(1) 51(1) 62(1) 76(1)BBA 2(3) -6(4) -5(4) -14(4) -16(4) -29(4) -11(4) 1(3)Villin 39(4) 35(4) 37(4) 27(4) 24(4) 39(4) 48(3) 56(4)WW domain 35(1) 33(1) 32(1) 23(1) 34(1) 31(1) 39.3(8) 62.0(1)NTL9 117(3) 110(3) 112(3) 100(2) 106(3) 105(3) 125(2) 148(3)BBL 0(6) 7(6) 5(6) 0(6) 2(6) -5(7) 14(5) 6(5)Protein B 28(7) 26(7) 26(7) 20(7) 15(7) 19(7) 33(5) 31(5)Homeodomain -6(8) -11(8) -11(8) -16(8) -16(8) -21(8) -11(6) -10(7)Protein G -46(3) 20(3) 17(3) 9(3) 23(3) 9(3) 33(2) -6(2)α3D -9(3) 56(4) 61(4) 32(3) 20(3) 47(4) 48(3) 13(3)λ -repressor -16(4) -41(4) -34(4) -52(4) -54(4) -55(5) -32(3) -18(3)Table A.2: Changes in free energy of solvation upon folding for default val-ues of tested implicit solvent models. Units are kcal/mol; standard errorsare given in brackets.98Figure A.1: Cartoon and stick representation of an experimentally deter-mined folded state of Chignolin [61].99Figure A.2: Cartoon and stick representation of an experimentally deter-mined folded state of Trp-cage (PDBID:2JOF).100485664728088Chignolin3045607590105Trp-cage402002040BBA203040506070Villin153045607590WW domain90105120135150165NTL915015304560BBL01530456075Protein BPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH30201001020HomeodomainPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH50250255075Protein GPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH4004080120160α3DPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH604020020a) b) c) d)e) f) g) h)i) j) k) l)λ-repressor∆∆Gsolv (kcal/mol)Figure A.3: Changes in free energies of solvation upon folding with negativesurface tension. Expected values are plotted in brown, implicit solventvalues with default settings are plotted in purple, and implicit solventmodel values with negative surface tension (λ = −0.0525) in yellow.The error bars represent the standard errors.1011 2 4 8 20 40020406080Chignolin1 2 4 8 20 40020406080100Trp-cage1 2 4 8 20 40168081624BBA1 2 4 8 20 4001530456075Villin1 2 4 8 20 40020406080WW domain1 2 4 8 20 400306090120150NTL91 2 4 8 20 4010010203040BBL1 2 4 8 20 4015015304560Protein B1 2 4 8 20 40181260612Homeodomain1 2 4 8 20 4040200204060Protein G1 2 4 8 20 404004080120160α3D1 2 4 8 20 4060453015015a) b) c) d)e) f) g) h)i) j) k) l)λ-repressor∆∆Gsolv (kcal/mol)²pFigure A.4: Changes in free energies of solvation upon folding against inter-nal dielectric constant. Expected values are plotted in brown, PB modelvalues in purple, ACE2 model values in yellow, and GBSW model val-ues in blue. The apolar term is not included for either implicit solventmodel. The dielectrics are plotted on a logarithmic scale. The errorbars represent the standard errors. For a better view of the differencebetween expected and implicit solvent model free energies as a func-tion of the internal dielectric constant used, we also plotted ∆∆Gexpsolv -∆∆Gimpsolv in Figure 3.3.1024856647280Chignolin405060708090Trp-cage45301501530BBA203040506070Villin153045607590WW domain90105120135150165NTL91501530BBL102030405060Protein BPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH30201001020HomeodomainPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH50250255075Protein GPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH4004080120160α3DPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH605040302010a) b) c) d)e) f) g) h)i) j) k) l)λ-repressor∆∆Gsolv (kcal/mol)Figure A.5: Changes in free energies of solvation upon folding with no saltcorrection. Expected values are plotted in brown, implicit solventmodel values with default settings in purple, and implicit solvent modelvalues with no salt correction in yellow. The error bars represent thestandard errors.1030.0 0.4 0.8 1.2 1.6 2.0Lagtime (µs)01234ITS (µs)0 50 100 150 200Time (100−1 )444852566064TS (kcal/mol)0 40 80 120 160 200Time (µs)03691215RMSD ()a) b)c)Figure A.6: State and entropy calculations for BBA. An ITS plot of the slow-est timescales is shown in a. Conformational entropies calculated usingthe CC-MLA method with data up to a number of intermediate timepoints are shown for the folded state (brown) and unfolded state (pur-ple) in b. A RMSD/MSM-state superimposition plot is shown in c; theRMSDs relative to the experimental structure are represented as purplepoints, and MSM-defined folded states are shaded in brown. Only thefirst of two simulations is plotted in c.1040.0 0.2 0.4 0.6 0.8 1.0Lagtime (µs)0.000.250.500.751.00ITS (µs)0 20 40 60 80Time (100−1 )546066727884TS (kcal/mol)0 25 50 75 100 125Time (µs)0369121518RMSD ()a) b)c)Figure A.7: State and entropy calculations for Villin. See Figure A.6 for an-notations.1050.0 0.4 0.8 1.2 1.6 2.0Lagtime (µs)0369121518ITS (µs)0 200 400 600 800Time (100−1 )485460667278TS (kcal/mol)0 150 300 450 600Time (µs)048121620RMSD ()a) b)c)Figure A.8: State and entropy calculations for WW domain. See Figure A.6for annotations. Only the second of two simulations is plotted in c.1060 1 2 3 4 5Lagtime (µs)04812162024ITS (µs)0 100 200 300 400Time (100−1 )566472808896TS (kcal/mol)0 200 400 600 800 1000Time (µs)048121620RMSD ()a) b)c)Figure A.9: State and entropy calculations for NTL9. See Figure A.6 forannotations. Only the first of four simulations is plotted in c.1070.0 0.2 0.4 0.6 0.8 1.0Lagtime (µs)051015202530ITS (µs)0 150 300 450 600Time (100−1 )556065707580TS (kcal/mol)0 50 100 150 200 250Time (µs)048121620RMSD ()a) b)c)Figure A.10: State and entropy calculations for BBL. See Figure A.6 for an-notations. Only the first of two simulations is plotted in c. Note thatthe low RMSD values around 100 µs were identified as an intermedi-ate state.1080.0 0.2 0.4 0.6 0.8 1.0Lagtime (µs)0.00.40.81.21.6ITS (µs)0 15 30 45 60Time (100−1 )859095100105TS (kcal/mol)0 20 40 60 80 100Time (µs)0510152025RMSD ()a) b)c)Figure A.11: State and entropy calculations for Protein B. See Figure A.6 forannotations.1090.0 0.2 0.4 0.6 0.8 1.0Lagtime (µs)0.00.40.81.21.6ITS (µs)0 150 300 450Time (100−1 )120.0122.5125.0127.5130.0132.5TS (kcal/mol)0 25 50 75 100 125Time (µs)0510152025RMSD ()a) b)c)Figure A.12: State and entropy calculations for Homeodomain. See FigureA.6 for annotations. Only the first of two simulations is plotted in c.1100.0 0.2 0.4 0.6 0.8 1.0Lagtime (µs)020406080100ITS (µs)0 150 300 450 600Time (100−1 )808896104112120TS (kcal/mol)0 80 160 240 320Time (µs)0510152025RMSD ()a) b)c)Figure A.13: State and entropy calculations for Protein G. See Figure A.6 forannotations. Only the first of four simulations is plotted in c.1110 2 4 6 8 10Lagtime (µs)061218243036ITS (µs)0 80 160 240 320Time (100−1 )130140150160170180TS (kcal/mol)0 60 120 180 240 300Time (µs)051015202530RMSD ()a) b)c)Figure A.14: State and entropy calculations for α3D. See Figure A.6 for an-notations. Only the first of two simulations is plotted in c.1120.0 0.4 0.8 1.2 1.6 2.0Lagtime (µs)051015202530ITS (µs)0 100 200 300 400Time (100−1 )128136144152160168TS (kcal/mol)0 30 60 90 120 150Time (µs)048121620RMSD ()a) b)c)Figure A.15: State and entropy calculations for λ -repressor. See Figure A.6for annotations. Only the first of four simulations is plotted in c.1134856647280Chignolin405060708090Trp-cage45301501530BBA203040506070Villin153045607590WW domain90105120135150165NTL91501530BBL102030405060Protein BPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH30201001020HomeodomainPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH50250255075Protein GPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH4004080120160α3DPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH605040302010a) b) c) d)e) f) g) h)i) j) k) l)λ-repressor∆∆Gsolv (kcal/mol)Figure A.16: Changes in free energies of solvation upon folding with min-imized trajectories. Expected values are plotted in brown, implicitsolvent model values calculated without minimization in purple, andimplicit solvent model values calculated with trajectories minimizedusing the relevant model in yellow. The error bars represent the stan-dard errors.1144856647280Chignolin405060708090Trp-cage45301501530BBA203040506070Villin153045607590WW domain90105120135150165NTL91501530BBL102030405060Protein BPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH30201001020HomeodomainPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH50250255075Protein GPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH4004080120160α3DPBEQGBMV2GBSWFACTSACE2SCPEEF1ABSINTH605040302010a) b) c) d)e) f) g) h)i) j) k) l)λ-repressor∆∆Gsolv (kcal/mol)Figure A.17: Changes in free energies of solvation upon folding with tem-perature corrections. Expected values are plotted in brown, implicitsolvent models values with default settings in purple, and implicit sol-vent values with no temperature correction in yellow. The error barsrepresent the standard errors.115
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Implicit solvent models and free energies of solvation...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Implicit solvent models and free energies of solvation in the context of protein folding Cumberworth, Alexander Michael 2015
pdf
Page Metadata
Item Metadata
Title | Implicit solvent models and free energies of solvation in the context of protein folding |
Creator |
Cumberworth, Alexander Michael |
Publisher | University of British Columbia |
Date Issued | 2015 |
Description | Given the wide and fundamental roles proteins play in cells, as well as their potential medical and industrial applications, a detailed understanding of the relationships between sequence, structure, dynamics, and function is of critical importance. Molecular models are required to solve this problem, as well as models of the associated conformational spaces. One of the most challenging aspects of modeling these vast ensembles is the computer power required to carry out the requisite simulations. Reduced solvent models, and particularly a class referred to as implicit solvent models, have been developed extensively; however, they make many assumptions and approximations that are likely to affect accuracy. Here, several implicit solvent models commonly used for protein modeling are evaluated by comparing the expected changes in free energies of solvation upon folding ΔGsolv derived from micro--ms simulations of fast folding proteins to those given by the implicit solvent models. In the majority of cases, there is a significant and substantial difference between the ΔGsolv values calculated from the two approaches, with the implicit solvent models excessively favouring the folded state over the unfolded state. This could only be remedied by selecting values for the model parameters -- the internal dielectric constant for the polar term and the surface tension coefficient for the apolar term -- that were system specific or physically unrealistic. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2016-04-30 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0166153 |
URI | http://hdl.handle.net/2429/52833 |
Degree |
Master of Science - MSc |
Program |
Genome Science and Technology |
Affiliation |
Science, Faculty of |
Degree Grantor | University of British Columbia |
GraduationDate | 2015-05 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2015_may_cumberworth_alexander.pdf [ 4.05MB ]
- Metadata
- JSON: 24-1.0166153.json
- JSON-LD: 24-1.0166153-ld.json
- RDF/XML (Pretty): 24-1.0166153-rdf.xml
- RDF/JSON: 24-1.0166153-rdf.json
- Turtle: 24-1.0166153-turtle.txt
- N-Triples: 24-1.0166153-rdf-ntriples.txt
- Original Record: 24-1.0166153-source.json
- Full Text
- 24-1.0166153-fulltext.txt
- Citation
- 24-1.0166153.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0166153/manifest