{"@context":{"@language":"en","Affiliation":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","AggregatedSourceRepository":"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider","Campus":"https:\/\/open.library.ubc.ca\/terms#degreeCampus","Creator":"http:\/\/purl.org\/dc\/terms\/creator","DateAvailable":"http:\/\/purl.org\/dc\/terms\/issued","DateIssued":"http:\/\/purl.org\/dc\/terms\/issued","Degree":"http:\/\/vivoweb.org\/ontology\/core#relatedDegree","DegreeGrantor":"https:\/\/open.library.ubc.ca\/terms#degreeGrantor","Description":"http:\/\/purl.org\/dc\/terms\/description","DigitalResourceOriginalRecord":"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO","FullText":"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note","Genre":"http:\/\/www.europeana.eu\/schemas\/edm\/hasType","GraduationDate":"http:\/\/vivoweb.org\/ontology\/core#dateIssued","IsShownAt":"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt","Language":"http:\/\/purl.org\/dc\/terms\/language","Program":"https:\/\/open.library.ubc.ca\/terms#degreeDiscipline","Provider":"http:\/\/www.europeana.eu\/schemas\/edm\/provider","Publisher":"http:\/\/purl.org\/dc\/terms\/publisher","Rights":"http:\/\/purl.org\/dc\/terms\/rights","RightsURI":"https:\/\/open.library.ubc.ca\/terms#rightsURI","ScholarlyLevel":"https:\/\/open.library.ubc.ca\/terms#scholarLevel","Title":"http:\/\/purl.org\/dc\/terms\/title","Type":"http:\/\/purl.org\/dc\/terms\/type","URI":"https:\/\/open.library.ubc.ca\/terms#identifierURI","SortDate":"http:\/\/purl.org\/dc\/terms\/date"},"Affiliation":[{"@value":"Science, Faculty of","@language":"en"},{"@value":"Physics and Astronomy, Department of","@language":"en"}],"AggregatedSourceRepository":[{"@value":"DSpace","@language":"en"}],"Campus":[{"@value":"UBCV","@language":"en"}],"Creator":[{"@value":"Mills, Eric A","@language":"en"}],"DateAvailable":[{"@value":"2015-12-09T23:17:54Z","@language":"*"}],"DateIssued":[{"@value":"2015","@language":"en"}],"Degree":[{"@value":"Doctor of Philosophy - PhD","@language":"en"}],"DegreeGrantor":[{"@value":"University of British Columbia","@language":"en"}],"Description":[{"@value":"We use classical density functional theory to investigate the interactions between solvents and proteins. We examine a diverse experimental literature to establish thermodynamic properties of protein-cosolute interaction, particularly the compensation between transfer entropy and transfer enthalpy. We develop a method of analysing the uncertainties in such measurements and use the method to resolve a long-standing debate over entropy-enthalpy compensation. We develop a classical density functional theory for interactions between proteins and cosolutes. The theory developed here ignores the solvent-solvent interaction but is nonetheless quite accurate. We use this approach to reproduce transfer free energies reported elsewhere, and show that the cDFT model captures the desolvation barrier and the temperature dependence of the transfer free energy. We use experimental values that we have analyzed to define the parameter space of a model density functional theory approach. We then extend the classical density functional theory to capture protein-water interactions, thus developing a new implicit solvent model. Along the way we give a proof that the free energy of a bath of particles in a finite external potential is independent of the external potential in the isothermal-isobaric ensemble. We finally discuss the challenges remaining in implementing our implicit solvent model.","@language":"en"}],"DigitalResourceOriginalRecord":[{"@value":"https:\/\/circle.library.ubc.ca\/rest\/handle\/2429\/55761?expand=metadata","@language":"en"}],"FullText":[{"@value":"Protein-Solvent Interactions and Classical DensityFunctional TheorybyEric A MillsBSc Physics, McMaster University, 2006MSc Physics, McMaster University, 2008a thesis submitted in partial fulfillmentof the requirements for the degree ofDoctor of Philosophyinthe faculty of graduate and postdoctoralstudies(Physics)The University of British Columbia(Vancouver)December 2015c\u00a9 Eric A Mills, 2015AbstractWe use classical density functional theory to investigate the interactions be-tween solvents and proteins. We examine a diverse experimental literatureto establish thermodynamic properties of protein-cosolute interaction, par-ticularly the compensation between transfer entropy and transfer enthalpy.We develop a method of analysing the uncertainties in such measurementsand use the method to resolve a long-standing debate over entropy-enthalpycompensation. We develop a classical density functional theory for inter-actions between proteins and cosolutes. The theory developed here ignoresthe solvent-solvent interaction but is nonetheless quite accurate. We use thisapproach to reproduce transfer free energies reported elsewhere, and showthat the cDFT model captures the desolvation barrier and the temperaturedependence of the transfer free energy. We use experimental values that wehave analyzed to define the parameter space of a model density functionaltheory approach. We then extend the classical density functional theory tocapture protein-water interactions, thus developing a new implicit solventmodel. Along the way we give a proof that the free energy of a bath ofparticles in a finite external potential is independent of the external poten-tial in the isothermal-isobaric ensemble. We finally discuss the challengesremaining in implementing our implicit solvent model.iiPrefaceSome of the content of this thesis has been published previously or is cur-rently in submission. As my supervisor and I are the sole authors of thesepapers they represent my original work, prepared with my supervisor\u2019s edit-ing, advice, and direction.\u2022 The bulk of chapter 3 was published in \u201cMills, E A and Plotkin, SS. \u2018Density functional theory for protein transfer free energy.\u2019 TheJournal of Physical Chemistry B, 117(42):13278-13290, 2013.\u201d\u2022 Chapter 2 was published as \u201cMills, E A and Plotkin, S S. \u2018Proteintransfer free energy obeys entropy-enthalpy compensation.\u2019 The Jour-nal of Physical Chemistry B, 119(44):14130-14144, 2015.\u201diiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . xList of Chemical Nomenclature . . . . . . . . . . . . . . . . . . xiiList of Symbols . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . xvi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Experimental Methods . . . . . . . . . . . . . . . . . . . . . . 21.2 Entropy Enthalpy Compensation . . . . . . . . . . . . . . . . 41.3 Implicit Solvent Models . . . . . . . . . . . . . . . . . . . . . 61.4 Classical Density Functional Theory . . . . . . . . . . . . . . 131.5 Molecular Dynamics . . . . . . . . . . . . . . . . . . . . . . . 181.6 Computational Methods . . . . . . . . . . . . . . . . . . . . . 231.7 Aims of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . 27iv2 Experimental Analysis: Entropy-Enthalpy Compensationand Cosolutes . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.1 Methods and Theory . . . . . . . . . . . . . . . . . . . . . . . 302.1.1 Thermodynamic Equations for Protein Unfolding . . . 302.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.2.1 Monte Carlo Method to Determine Statistical Errors 342.2.2 Transfer Entropy and Enthalpy for Various Proteinsand Solvents . . . . . . . . . . . . . . . . . . . . . . . 362.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . 503 Classical Density Functional Theory and Protein-CosoluteInteractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563.1 Volume and Area Terms in the Transfer Free Energy . . . . . 573.1.1 Volume Considerations . . . . . . . . . . . . . . . . . . 573.1.2 Surface Considerations . . . . . . . . . . . . . . . . . . 603.1.3 Combined Surface\/Volume Model for the Transfer FreeEnergy . . . . . . . . . . . . . . . . . . . . . . . . . . . 623.2 The Density Functional Theory Formulation . . . . . . . . . . 643.2.1 Validation Tests in Model Solvents . . . . . . . . . . . 703.2.2 Connecting DFT to Previous Surface\/Volume Models 713.3 Empirically Deriving DFT Transfer Free Energy Parameters . 743.4 Using DFT for Implicit Solvent Models . . . . . . . . . . . . 773.4.1 Depletion and Impeded-Solvation Interactions in anImplicit Solvent Model . . . . . . . . . . . . . . . . . . 783.5 Transfer Enthalpy and Entropy from Density Functional Theory 823.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864 Theoretical Considerations in Classical Density FunctionalTheory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 874.1 Defining the Transfer Free Energy . . . . . . . . . . . . . . . 874.2 The DFT Formulation . . . . . . . . . . . . . . . . . . . . . . 904.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95v5 Including Solvent-Solvent Interactions in DFT . . . . . . . 975.1 A Simple Form for the Solvent-Solvent Interaction . . . . . . 975.2 Making the Form of Solvent-Solvent Interactions More General1005.3 Fitting the Model to Data . . . . . . . . . . . . . . . . . . . . 1015.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1096 Conclusions and Future Directions . . . . . . . . . . . . . . 1136.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1136.2 Theoretical Work in cDFT . . . . . . . . . . . . . . . . . . . . 1156.3 Error Analysis in Thermodynamic Data . . . . . . . . . . . . 1166.4 Implementing the cDFT Implicit Solvent . . . . . . . . . . . . 1176.5 Parting Thoughts . . . . . . . . . . . . . . . . . . . . . . . . . 118Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120A Proof of Theorems in DFT . . . . . . . . . . . . . . . . . . . 146B Simulation Parameters . . . . . . . . . . . . . . . . . . . . . . 149viList of TablesTable 2.1 Thermodynamic parameters for several proteins . . . . . . 39Table 2.2 Comparison of the variance and covariance of fits to \u2206Gvs T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40Table 2.3 Empirical values for thermodynamic unfolding parameters 48Table 3.1 van der Waals parameters for the atoms used . . . . . . . 71Table 3.2 Comparison of test cases between density functional theory(DFT) and thermodynamic integration (TI) . . . . . . . . 71Table 3.3 Parameter values yielding transfer free energies . . . . . . 76Table B.1 van der Waals parameters for atoms used in simulations inthis thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . 150viiList of FiguresFigure 1.1 Protein structure elements . . . . . . . . . . . . . . . . . . 2Figure 1.2 Diagram of dihedral angles. . . . . . . . . . . . . . . . . . 21Figure 1.3 Obtaining the parameters in the CHARMM methodology. 22Figure 2.1 Stability and enthalpy as a function of temperature . . . 37Figure 2.2 Analyzing unfolding fraction data and heat capacity data. 38Figure 2.3 Distribution in \u03c3 arising from the bootstrapping methoddescribed in section 2.2.2. The overlap of the two distri-butions represents the statistical significance of entropy-enthalpy compensation\u2013in this case p = 5\u00d7 10\u22125. . . . . . 47Figure 2.4 Entropy-enthalpy compensation for protein unfolding . . 51Figure 2.5 Further illustration of entropy-enthalpy compensation . . 52Figure 2.6 Entropy-enthalpy compensation for the transfer of variousproteins to various solvents . . . . . . . . . . . . . . . . . 53Figure 2.7 Concentration-dependence of the heat capacity . . . . . . 54Figure 2.8 A diagram illustrating the possible categories of cosolute . 55Figure 3.1 A diagram of the Tanford transfer model . . . . . . . . . 57Figure 3.2 The change in volume upon collapse . . . . . . . . . . . . 59Figure 3.3 Total free energy change \u2206G upon collapse . . . . . . . . 64Figure 3.4 Comparison of cosolute potential functions . . . . . . . . 72Figure 3.5 Solvent-induced force on a pair of \u201chard-wall\u201d spheres . . 80Figure 3.6 Schematic renderings of the solute spheres . . . . . . . . . 81viiiFigure 3.7 Allowed volume in parameter space for the proteins we in-vestigated, as predicted by the classical density functionaltheory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84Figure 4.1 Diagram of the transfer free energy. . . . . . . . . . . . . 88Figure 4.2 Density, potential, and \u03a6\u2032 of water . . . . . . . . . . . . . 93Figure 5.1 The solvent density \u03c6 vs. external potential . . . . . . . . 102Figure 5.2 Testing the condition in equation 5.15 . . . . . . . . . . . 103Figure 5.3 Transfer of a pair of bonded carbon atoms from vacuumto water. . . . . . . . . . . . . . . . . . . . . . . . . . . . 104Figure 5.4 Transfer free energies with DFT and surface area . . . . . 106Figure 5.5 Effective and actual potentials . . . . . . . . . . . . . . . 107Figure 5.6 Solvent-solvent interaction potentials . . . . . . . . . . . . 108Figure 5.7 Comparison of DFT predicted free energies with resultsfrom explicit solvent simulations . . . . . . . . . . . . . . 110Figure 5.8 Testing convergence of a DFT free energy calculation . . 111ixList of AbbreviationsAbbreviation DescriptionABSINTH Assembly of Biomolecules Studied by an Implicit,Novel, and Tuneable HamiltonianAFM Atomic Force MicroscopeALS Amytrophic Lateral SclerosisAMBER Assisted Model Build with Energy RefinementASA Accessible Surface AreaBAR Bennet Acceptance RatiocDFT Classical Density Functional TheoryCHARMM Chemistry at Harvard Macromolecular MechanicsDFT Density Functional TheoryDSC Differential Scanning CalorimetryER Endoplasmic ReticulumGB\/SA Generalized Born\/Surface AreaGROMACS GROningen MAchine for Chemical SimulationsLJ Lennard-JonesMD Molecular DynamicsNMR Nuclear Magnetic ResonanceOPLS Optimized Potential for Liquid SimulationsO-Z Ornstein-ZernikeRISM Reference Interaction Site ModelTI Thermodynamic IntegrationUA United AtomvdW van der WaalsxWHAM Weighted Histogram Analysis MethodxiList of ChemicalNomenclatureAbbreviation Description\u03b1-CTgen \u03b1-chymotrypsinogen\u03b1-Lac \u03b1-Lactalbumin\u03b2-hydrox \u03b2-hydroxectoineAla AlanineArg ArginineAsn AsparagineAsp Aspartic acidCSP Cold-Shock proteinCys CysteineGdmHCl Guanidine hydrochlorideGlu GlutamineGln Glutamic acidGly GlycineHis HistidineHPr Histidine-containing phosphocarrier proteinIle IsoleucineKCl Potassium ChlorideLeu LeucineLys LysineMet MethioninePhe PhenylalaninexiiPro ProlineSer SerineSOD1 Super-oxide dismutase 1Thr ThreonineTMAO Trimethylamine N-oxideTrp TryptophanTryps inh. Trypsin inhibitorTyr TyrosineVal ValinexiiiList of SymbolsSymbol Description\u03b2 The inverse of kBT\u000f van der Waals well depth; dielectric constant\u03bb Coupling parameter; thermal wavelength\u00b5 Chemical potential\u03c1 Charge density; equilibrium density\u03c3 Standard deviation; Weighting function in WDA ap-proach to DFT\u03a6 Solvent-solvent interaction functional\u03c6 Density of Solvent Molecules\u03c72 Chi-squared test statistic\u03c4 Weighting function in WDA approach to DFTA Surface AreaCp Heat Capacity (at constant pressure)F ForceN Number of particles in the systemG Free energyg(r) Correlation functionH EnthalpyKu Equilibrium constant (of unfolding)kB Boltzmann\u2019s ConstantP Pressurep probabilityS EntropyxivR Atom CoordinateT Temperaturet time; thicknessU Potential EnergyV Volume; potential energy termsV External potentialW WorkxvAcknowledgementsI would like to thank my supervisor, Dr. Steven S Plotkin, for his guidanceand advice throughout my PhD.I would like to acknowledge the following members of the Plotkin andRottler groups for their feedback on this project at various stages: MiguelGarcia, Shazhad Ghanbarian, Mona Habibi, Liam Huber, Ali Mohazzab,Amanda Parker, and Anton Smessaert.Most importantly, thank you Sara; your support made this possible.xviChapter 1IntroductionProteins are a class of biological molecules which facilitate essentially ev-ery biological process. The primary activity of DNA is coding for proteins;proteins thus act as the intermediary between genetic information and thephysical outcome in an organism. Proteins act as hormones, enzymes, cel-lular structure elements, transportation, and play many other roles in theorganism. A full understanding of their structure and function is thus soughtboth for foundational understanding of biology and for purposes of practicalmedical treatment of currently intractable diseases. For example, most newdrug design efforts target proteins[1]. Understanding proteins is crucial forefforts to combat a variety of diseases, including cancer, Alzheimer\u2019s, andALS[2].Proteins are long molecules made up of a chain of amino acids. Thereare twenty different amino acids in eukaryotic proteins, forming an alphabetout of which the sequence of each protein is made. Once assembled intoa protein, an individual amino acid is known as a residue, and consists oftwo parts: a backbone component that is the same for each amino acid andlinks them all together, and a side chain that differentiates the amino acidsand provides much of their functionality. Many proteins \u201cfold\u201d, adoptinga well-defined three-dimensional structure. This structure then determinesthe function of the protein. The central dogma of molecular biology is thatthe folded structure of the protein is uniquely determined by its sequence[3].1Figure 1.1: Protein structure elements. Primary (A), secondary (B),tertiary (C) and quaternary (D) structure of a protein.Understanding the way in which the folded structure is determined from thesequence and the protein\u2019s environment is a key challenge facing researchers.In protein terminology the primary structure refers to the sequence ofamino acid residues; see Figure 1.1. The secondary structure is a small setof structures which are seen in many proteins. These include \u03b1-helices and\u03b2-sheets. These secondary structures and so-called unstructured regions(regions of the primary structure which do not have an identified secondarystructure) fold together to form the tertiary structure. Individual proteinmolecules (monomers) often come together to form complexes, and this isreferred to as quaternary structure.1.1 Experimental MethodsGiven the limits of ab initio structure prediction and the long time scales onwhich proteins fold, computational studies typically start with experimentalstructures. Further, since the structure of the protein is the most funda-2mental thing about it from a biochemical perspective, much work has goneinto determining it. The most common ways of determining this structureare x-ray crystallography and nuclear magnetic resonance (NMR). X-raycrystallography methods rely on scattering x-rays from proteins which havebeen condensed into a regular crystal[4]. NMR methods use the relaxation ofnuclear spins to determine the chemical environment of each nucleus (typ-ically hydrogen) and thus the structure of the protein. Various structuremotifs in a protein give rise to distinct NMR signals, which allows for thestructure to be reconstructed with specialized software[5]. Both x-ray crys-tallography and NMR require a thermodynamically large number of copiesof the protein in the same conformation. These methods are thus limitedto well-structured proteins. Obtaining structural ensembles of proteins (orregions of proteins) that are intrinsically disordered is an ongoing challenge.In addition to the folded structure of the protein, experiments can seekout the thermodynamics of folding and unfolding. Typically the proteinof interest is placed in a buffer and either the temperature or the solutioncomposition is varied to drive the transition from folded to unfolded. Thetechniques to measure the resultant behaviour most relevant to this thesisare differential scanning calorimetry (DSC) and circular dichroism.Circular dichroism refers to the differential absorption of right and leftcircularly polarized light. At certain wavelengths in the UV range secondarystructures of a protein have a characteristic difference in the amount of rightand left circularly polarized light they absorb. Thus the circular dichroismat that wavelength of a sample indicates the percentage of proteins thathave those secondary structures intact\u2013i.e the fraction of folded proteins[6].This analysis typically assumes a two-state folding process, so that if asingle secondary structure element is missing, the entire protein is unfolded.As we will discuss in chapter 2, the two-state assumption is quite good fora broad class of proteins.Calorimetry refers to any of the variety of ways of determining the heatcapacity of a sample. Differential scanning calorimetry involves heatingtwo samples, one of which contains the protein of interest and the other ofwhich is a reference. The difference in heat required to keep both samples3to the same temperature vs time is then recorded and the heat capacityextracted from this. Of particular interest is the heat capacity as a functionof temperature during the process of unfolding the protein. This informationcan be used to determine the stability of a protein (that is, the minimumwork required to unfold it at a certain temperature) as well as the extent towhich that stability is determined by enthalpic and entropic effects[7].There are obviously many other ways proteins are studied experimen-tally. Experiments that pull on the protein with AFM or optical tweezersare used to gain information about the stability of the protein structureand the unfolding and folding pathways. Binding experiments can also beperformed, in which an agent which binds to the protein in one state (egthe unfolded state) but not another. This agent can then be measured andserve as a proxy for the amount of unfolded protein.1.2 Entropy Enthalpy CompensationEnthalpy and entropy play an intimately connected role in the free energychange during numerous biochemical processes. It has long been known,for example, that the transfer of hydrocarbons such as alkanes or alcoholsfrom pure solvent to water is generally exothermic or enthalpically favorable(\u2206H \u2248 \u2212(3-6)kcal\/mol for methane-butane) but entropically unfavorable,with T\u2206S typically 2-3\u00d7 the magnitude of the enthalpic contribution [8, 9].These opposing thermodynamic forces result in a net free energy changethat is smaller than either of the enthalpic or entropic contributions.The first part of this thesis concerns an effect known as entropy-enthalpycompensation. For a variety of physical processes including solute trans-fer [10], unfolding of various proteins [11], and ligand binding, ionization,and hydrolysis [12], the changes in enthalpy and entropy obey a nearly linearrelationship when a variable such as binding ligand is varied; this is referredto as entropy-enthalpy compensation [12\u201317]. The effect is ubiquitous butnot universal [18]. The slope of the enthalpy vs. entropy plot, referredto as the compensation temperature Tc, ranges from about 150K (e.g. foralkane vaporization [19]) to about 300K (most processes). The difficulty in4designing high affinity drugs has been attributed to entropy-enthalpy com-pensation [20\u201322].As mentioned above, entropy-enthalpy compensation is not an inevitableconsequence of statistical mechanics, particularly along chemical reactioncoordinates. Long-lived metastable states due to large barriers, and thusthe absence of any significant entropy-enthalpy compensation along the re-action coordinate, are fairly common in condensed matter and biophysics.The diamond phase of carbon is metastable to graphite at standard tempera-ture and pressure, with an enormous conversion barrier; allotropes of boron,polymorphs of silica, and martensite in steel are all metastable phases withprohibitive transition barrier; colloidal systems and emulsions have long-lived metastable phases; long-lived structure with slow, glassy dynamics iscommon in supercooled liquids; the covalent bonds forming the backbonesof DNA, RNA, and proteins are metastable to hydrolysis; several proteinshave native, functional states that are metastable, but simply have enor-mous kinetic unfolding barriers, including alpha-lytic protease, subtilisin,Streptomyces griseus protease B and the aspartic peptidase pepsin [23]. Inmultimeric systems of chain length . 100 amino acids, native protein struc-tures have been observed to have higher free energy than the amyloid phase,implying that a significant portion of the proteome is conformationally inmetastable equilibrium[24]. In contrast to these observations, as a generalrule, entropy-enthalpy cancellation does play a critical role in governingthe foldability of proteins and resolving the Levinthal paradox [25]. Smallbarriers in protein folding have been shown to arise due to the locality ofinteractions and the concomitant loss of entropy in forming stabilizing in-teractions. If it were not for entropy-enthalpy cancellation as protein chainconformations progressed towards native-like folds, folding barriers wouldbe prohibitively high, proteins could not fold on biological time scales, andlife as we know it would not be possible.The entropy is often obtained from measurements of the enthalpy andfree energy by subtraction; one complication however is that errors in en-thalpy are often much larger than errors in free energy. In these casesthe error in enthalpy can induce a spurious linear relation to the entropy,5for example in early measurements for oximation reactions of alkyl thymylketones [26], correlated entropy-enthalpy errors were sufficiently large thatthey could account for the whole measurable effect of entropy-enthalpy com-pensation, which could then not be definitively proven, regardless of whetheror not it existed. Correlated errors generally have an effective compen-sation temperature equal to the temperature at which the measurementswere taken. Compensation exists however when large entropy and enthalpychanges either cancel or compensate each other to yield a relatively small netfree energy gain for a given process, regardless of the compensation temper-ature, and includes cases where the compensation temperature equals thelab temperature [13, 16, 27]. As well, even if the compensation temperatureis quite different from the temperature of the experiments, if the correlatedscatter is sufficiently large, it can rule out the significance of the effect. Thusthere is a need to introduce more rigorous error analysis to judge the sig-nificance of any observed entropy-enthalpy compensation. We develop suchan analysis in Chapter 2.1.3 Implicit Solvent ModelsProteins fold and function in the crowded environment of the cell. Cytoso-lic proteins must negotiate a complex milieu which in many ways is signifi-cantly different than the environment in the test tube: roughly 15% of watermolecules are motionally restricted by protein and membrane surfaces [28];the surrounding solvent is enriched in ions such as Potassium but depleted inSodium and Chlorine; osmoprotectants such as trehalose and various aminoacids are present in significant concentration; numerous membrane surfacessuch as the nucleus, ER, and Golgi impose charged substrates for proteininteraction; macromolecular agents such as the microtubules, actin, ribo-somes, soluble proteins and RNA occupy roughly 30% (\u2248 300g\/`) of thecellular volume, and modulate stability [29], aggregation propensity [30],and dissociation constants [31, 32].Non-cytosolic proteins also fold in environments distinct from the testtube as well as the cytosol, particularly with respect to ionic and redox condi-6tions as well as the chaperone complement. Proteins destined for the plasmamembrane or extracellular matrix are trafficked by the secratory pathwaythrough the ER and Golgi [33]. The environments in the ER and cytosolare sufficiently different that the conditions for protein folding are generallymutually exclusive between the two milieux. Folding generally occurs inthe lumen of the ER, while function occurs either on the plasma membraneor in the extracellular matrix, which is itself densely occupied by highlycharged glycosaminoglycans such as hyaluronan and heparin sulfate\u2014largemolecules that may facilitate cellular migration and regulate secreted pro-tein activity. Fibrous proteins such as collagen and fibronectin also occupythe extracellular space, and provide structural rigidity while allowing rapiddiffusion of nutrients and signalling metabolites between constituent cells.The above examples demonstrate the need to correctly account for the ef-fect of the cell environment on protein folding, stability, and function. Accu-rately accounting for the effects of the cell environment presents a challengehowever to both experimental and computational studies. Experimentally,most of what is known about protein folding and stability has resulted fromin vitro studies at dilute concentrations, and many questions remain as tohow well such results apply to a realistic cell environment. Computationally,including explicit solvent along with a realistic concentration of osmolytesin a box of sufficient size to implement periodic boundary conditions outsidethe range of an electrostatic cutoff typically increases the number of parti-cles in the simulation by a factor on the order of ten or more [34]. Whilethis can be done for small proteins such as Trp-cage [34], investigating largerproteins generally requires coarse-grained models to keep the computationalresources required reasonable [35].Computational studies of crowding on isolated monomeric minimal \u03b2-barrel proteins find that the folding temperature is increased and the foldingtime decreased [36, 37]. However, molecular crowding has been shown in se-cretory cells to impair protein folding and lead to aggregate formation in theER [38]. It has been estimated that increasing the total intracellular proteinconcentration by 10% can potentially increase the rate of protein misfold-ing reactions following a nucleation-polymerization mechanism by a factor7of 10 [39]. Consistent with these observations and estimates, another MDfolding study of a coarse-grained model of crambin found that the presenceof multiple protein copies with a weak inter-protein attractive potential (amore realistic scenario) hindered correct monomeric folding and predisposedthe system to aggregation and misfolding [40].The above considerations motivate the creation of computational mod-els, with which we can account for the cellular environment around a proteinin an accurate but less computationally expensive way. Further, while onemight na\u00a8\u0131vely suspect that an explicit approach would yield results that are,in principle, exact, a number of difficulties arise in implementing these mod-els. The \u201dbest\u201d form of the interaction between water and other moleculesdepends on the context of the simulation, and a number of different watermodels are available with various strengths and drawbacks[41, 42]. Implicitsolvent models attempt to address these issues.Implicit solvent models are attempts to compute the transfer free energyof each configuration of the molecule of interest from a vacuum to a solvent(or, more generally, from one solvent environment to another) through somefunction of the coordinates of the solute biomolecule.\u2206Gtransfer = g({Ri}) (1.1)where {Ri} is the set of coordinates that defines the position and configura-tion of the biomolecule. Knowing the transfer free energy (or its derivativeswith respect to atom position) for each configuration allows the simulationto update the position of the protein atoms as if the solvent was present.Expressing the transfer free energy in the form of equation 1.1 eliminates theneed to explicitly simulate the solvent and hence speeds up the simulation\u2014provided the implicit solvent algorithm is fast enough. A number of modelsto compute this transfer free energy have been proposed, with the goal ofcalculating \u2206G with less computational resources than an explicit solvent,while maintaining enough accuracy for the purposes of the simulation. Manysuch models have been proposed, with various strengths and weaknesses[43].The most common implicit solvent models split the transfer energy into8two parts: a dielectric contribution and a non-polar contribution. Thesecomponents are each modelled separately and their contributions summed[44].\u2206G = \u2206Gnonpolar + \u2206Gdielectric (1.2)The dielectric part is typically the most time-consuming part to calcu-late because of the long-ranged forces involved. The solvent is often as-sumed to be a continuous dielectric, and the transfer free energy of thebiomolecule from vacuum to solvent can then be computed via solving thePoisson-Boltzmann (PB) equation with finite element or finite differencemethods[45, 46], which is accurate up to the continuum assumption butrelatively costly[47], or by faster but more approximate methods such asthe Generalized Born (GB) method[48]. Generalized Born in particular hasbecome popular, and has been implemented and optimized for a variety offorcefields used in molecular dynamics[49, 50].In the GB model the starting point is the standard Poisson equation fora dielectric:\u2207 [\u000f(r)\u2207V (r)] = \u22124pi\u03c1(r) (1.3)where \u000f(r) is the position-dependent dielectric, V the electric potential, and\u03c1(r) the density of charge. This can be recast as an equivalent equation forthe Green\u2019s function:\u2207 [\u000f(r)\u2207G(ri, rj)] = \u22124pi\u03b4(ri \u2212 rj) (1.4)Equation 1.4 has an analytical solution for a model system which consistsof a collection of charges at positions ri inside a sphere of dielectric \u000fin ofradius Rsphere surrounded by an infinite medium with dielectric \u000fout, withthe condition that \u000fout \u001d \u000fin \u2265 1:G(ri, rj) =1\u000fin|ri \u2212 rj | + F(ri, rj) (1.5)\u2206Gel =12\u2211ijF(ri, rj)qiqj (1.6)9where \u2206Gel is the free energy to insert the set of charges qi into the dielectricsphere at positions ri and F(ri, rj) is given byF(ri, rj) =(1\u000fin\u2212 1\u000fout)1\u221ar2ij +(Rsphere \u2212 r2iRsphere)(Rsphere \u2212 r2jRsphere)(1.7)The term Rsphere \u2212 rRsphere is then defined to be the generalized born radiusR. That this radius is constant is an approximation, as it should in principleinclude r. The free energy to insert a single charged atom is then\u2206Gel = \u221212(1\u000fin\u2212 1\u000fout)q2iRi(1.8)(the limiting case of F when ri = rj) in this case, and this expression is usedto calculate the Born radii for each atom.Equation 1.7 can be modified in various ways, which has given rise toseveral different methods of calculating Born radii[48, 51, 52]. These meth-ods all share the same essence of calculating a Born radius from a list ofknown self-interaction values and then using that radius to calculate theinteraction between charges.The nonpolar contribution consists of the free energy to create a cavity inthe dielectric and the free energy arising from the van der Waals interactionbetween the protein and water. The observed linear dependence of the logsolubility on the number of CH2 groups and hence chain length, particularlyfor long chain saturated fatty acids (decanoic acid and longer), and long-chain aliphatic alcohols (1-butanol and longer), can be taken to indicate afree energy change upon transfer to solvent that scales linearly with eithervolume or surface area. Historically, surface area has been chosen, under theassumption that interactions with the solvent take place at the surface ofthe molecule in question [53, 54]. Then the free energy difference between anamino acid in water and in a solvent with some osmolyte concentration is,for a given configuration, given in terms of the solvent accessible surface area(SASA) of that configuration by the phenomenological expression: \u2206G =10\u03b3 \u00b7SASA+c, where \u03b3 is obtained from, eg, a tri-peptide experiment [55]. Thisapproximation is severe, and neglects aspects such as the length of the vander Waals interaction[56] and the temperature dependence of the free energy,despite their known importance. The desolvation barrier is another effect ofimportance which SASA approaches cannot capture[57]. The surface areaapproach can be expanded such that\u2206Gnonpolar =\u2211i\u03b3iAi + b (1.9)where \u03b3i and Ai are different regions of the protein. Distinct coefficients caneven be assigned to each residue type. These modifications do not addresseither the temperature dependence or the lack of a desolvation barrier.In addition to the Generalized Born\/Surface Area (GB\/SA) approach,there are a number of other approaches which use the general scheme ofexpressing the transfer free energy as a sum of non-polar and dielectricconbributions. Some attempt more efficient calculation of the dielectriccomponent[58], while others tweak the nonpolar contribution in ways suchas adding a term proportional to the volume of the biomolecule[59]. Recentsimulation studies have found significant volume contributions to transferfree energies, however [35]. In these studies, model solvents with no enthalpicinteraction (hard sphere solvents) still showed significant transfer free ener-gies, due solely to excluded volume. Volume corrections to the surface areamodel, computed by scaled particle theory or RISM approaches, have beeninvestigated by several authors [60\u201363]. As well, Baker and colleagues havefound that the inclusion of volume terms (computed by scaled particle the-ory) and dispersion integral terms (computed by Weeks-Chandler-Andersentheory) were essential for an accurate implicit solvent description of atomic-scale nonpolar forces [64].While the combination of generalized Born electrostatics and surfacearea nonpolar contributions (GB\/SA) is perhaps the most popular choiceof implicit solvent, others have been proposed as well. Kovalenko andcolleagues[65] have developed an approach based on solving the Ornstein-Zernike (O-Z) equation within the reference site interaction model (RISM)11proposed by Chandler[66]. The O-Z equation describes density correlationsin a fluid, through the equationh(r12) = c(r12) + \u03c1\u222bd3r3 c(r13)h(r23) (1.10)where h(r) = g(r) \u2212 1, the so-called total correlation function, and c(r) isthe direct correlation function. The O-Z equation defines an expansion incorrelation functions between the model\u2019s reference sites, which requires aclosure relation to then solve[67]. In the RISM-3D approach this closurerelation is taken to beg(r) = exp(\u03c7(r)) for \u03c7(r) \u2264 0 (1.11)g(r) = 1 + \u03c7(r) for \u03c7(r) > 0 (1.12)\u03c7(r) = \u2212\u03b2u(r) + h(r)\u2212 c(r) (1.13)where u(r) is the interaction potential between sites.In principle this approach can be expanded to arbitrary accuracy, but inpractice only a few paths can be taken from any given reference site beforethe computational cost becomes prohibitive. The speed is further improvedby only updating the solvent density every few simulation steps, and by usingthe previous value at each point as an initial guess. This method requiresa classical interaction potential between the protein and the solvent andbetween the solvent molecules themselves, and is limited by the accuracy ofthis potential.Another implicit solvent approach that differs fundamentally from GB\/SAis the ABSINTH model proposed by Vitalis and Pappu[68]. Here the trans-fer free energy is not broken up into dielectric and nonpolar components, butrather into a direct mean field interaction term and a screening term. Thisnot only accounts for the solvent-protein interactions, but also the screeningof the protein-protein interactions. The protein is broken up into solvation12groups, the solvation free energy of which is expressed as\u2206Gtotalsolv =NSG\u2211i=1[ni\u2211k=1\u03bbik \u00b7 vik]\u00b7\u2206Gisolv (1.14)Here \u2206Gisolv is the experimental solvation free energy of a particular group ofatoms (based on model compounds). The vik are solvation states of the groupof atoms measured by the available volume surrounding the atoms, ratherthan the surface area, and \u03bbik are a set of weights to improve the accuracy ofthe model. The available volume here is measured by considering a sphere ofsome radius around the atom, and subtracting from that sphere the volumeof all other protein atoms which overlap. The remaining volume is thusavailable to solvent.Some of the molecular dynamics approaches discussed above implicitlyuse a solvent without an implicit solvent term appearing in the potentialfunction. As discussed above, Go\u00af models and associative memory hamilto-nians, for example, use the known native structures of a protein to define aset of interactions between atoms or residues. Since these structures are ob-served in the presence of solvent, solvent effects are captured to some extentin these model interactions.1.4 Classical Density Functional TheoryDensity functional theory (DFT) is best known in the context of quan-tum DFT[69], where it is widely used in computational studies of electronicand atomic structure, vibrational spectra, magnetic resonance, and reactiondynamics[70, 71]. The method has also been applied to classical densityfields to compute correlation functions and dynamics in liquids[72\u201377]. Infact the essence of classical density functional theory (cDFT) actually pre-dates the formal development of DFT[78]. Since then, cDFT has been widelyused to model liquids around nanostructures[79, 80] and has also been ap-plied to biomolecules[81, 82]. Density functional theories of inter-residuecontact probability developed by Plotkin and Onuchic have elucidated the13effects of energetic and entropic heterogeneity on protein folding free energybarriers [83, 84], while Wolynes has used DFT in fundamental studies of glassphysics and the glass transition [85\u201391], and also in protein folding[92\u201394].Density functional theory relies on the following insight: if the free energyof a system is expressed as a functional of the single particle density, thesingle particle density that minimizes the functional will be the true singleparticle density, and the resulting value of the functional will be the true freeenergy. This is not obvious; the full expression for the free energy includesprobabilities over N positions and momenta, where N is the number ofparticles in the system. Nonetheless a functional of the single particle densitycan be used to determine the real free energy\u2013a proof of this is shown inAppendix A. The form of the density functional is taken to beG =\u222bd3r kBT [\u03c6(r) ln(\u03b4V \u03c6(r))\u2212 \u03c6(r)] + V(r)\u03c6(r)+ \u03a6[\u03c6] (1.15)where G is the free energy of the system, \u03c6 the position dependent densityof the solvent or cosolute, \u03b4V a volume element, V the external potential,and \u03a6[\u03c6] the free energy functional arising from inter-particle interactionterms. The equilibrium free energy and density is found from equation 1.15through functional differentiation:\u03b4\u03b4\u03c6[G\u2212 \u00b5\u222bd3r \u03c6(r)]= 0 (1.16)where \u00b5, the Lagrange multiplier associated with the condition that particlenumber is conserved, is the chemical potential. In the classical DFT litera-ture G \u2212 \u03a6 is known as the ideal free energy, which is the same in each ofthe cDFT approaches we discuss below, while \u03a6 is known as the excess freeenergy. In Appendix A we review the proof that G is a unique function of\u03c6, the single particle density, and hence the function \u03c6 that minimizes G isthe equilibrium density.In traditional approaches to classical DFT[95] \u03a6 is broken up into twoparts: a hard-sphere term and a term arising from any attractive particle-14particle interactions\u2013\u03a6 = \u03a6att + \u03a6hs. The attractive term is taken to be:\u03a6att =\u222bd3r d3r\u2032 Uatt(r\u2212 r\u2032)\u03c6(r)\u03c6(r\u2032) (1.17)where Uatt(r \u2212 r\u2032) is the attractive part of the solvent-solvent potential.This form assumes a uniform fluid; that is, the solvent-solvent correlationfunction g(r, r\u2032) = 1.The hard-sphere term in the solvent-solvent interaction functional arisespurely from entropic terms and does not have an analytical form. Onecommon approach is to take a weighted density average (WDA), so that thehard-sphere term is written as[95]\u03a6hs =\u222bd3r \u03c6\u00af\u03c3(r)F(\u03c6\u00af\u03c4 (r)) (1.18)where \u03c6\u00af\u03c3 and \u03c6\u00af\u03c4 are weighted averages of \u03c6 around position r\u2013e.g. \u03c6\u00af\u03c3 =\u222bd3r\u2032 \u03c3(r \u2212 r\u2032)\u03c6(r\u2032), and \u03c6\u00af\u03c4 = \u222b d3r\u2032 \u03c4(r \u2212 r\u2032)\u03c6(r\u2032). F is an arbitraryfunction and \u03c3(r\u2212 r\u2032) and \u03c3(r\u2212 r\u2032) are local weighting functions, which canhave any form but need to go to zero for large r \u2212 r\u2032 in order to be useful.F can be assigned by requiring that the equation of state for a uniformsystem in the absence of external potential (such that \u03c6(r) = \u03c1, the bulkdensity) reproduce some known equation of state. For example, to recoverthe Carnahan-Starling equation of state,P = NkBT\/V + kBT1 + y + y\u2212y3(1\u2212 y)3 (1.19)where y = pi\u03c1d3\/6 and d the hard sphere diameter, F must beF(\u03c1) = kBT y(4\u2212 3y)(1\u2212 y)2 (1.20)so that the pressure from the density functional with a uniform density,P =\u2202G[\u03c1]\u2202V(1.21)15matches the Carnahan-Starling pressure.Similarly the forms of the weighting functions \u03c3(r \u2212 r\u2032) and \u03c4(r \u2212 r\u2032)are determined by comparison with other systems. For example, in onedimension the weighting functions\u03c3(r) =12\u03b4(d2\u2212 |r|)(1.22)\u03c4(r) =1d\u0398(d2\u2212 |r|)(1.23)where \u0398 is the Heaviside function, reproduces the analytic result for onedimensional hard rods of length d. This then can be generalized to a threedimensional system, in particular since many systems of interest to earlycDFT studies are confined and thus behave as lower dimensional systems oncertain length scales[80].The weighted density average approach can be extended to include gra-dients of the density\u2013terms such as 1\/2\u222bd3r k(\u2207\u03c6)2, with k a constant.Oxtaby and colleagues, for example, use such terms to apply classical DFTto crystal growth and nucleation problems[96].Despite the fact that the correlation function g(r, r\u2032) was assumed tobe uniform in writing \u03a6att, WDA models do not assume a uniform den-sity. Rather the correlation function is implicit in the form of the weightedaverages. Actually obtaining this correlation function, however, typicallyinvolves first solving for the density and thus is non-trivial to obtain fromthe density functional. In this approach the total correlation function is aprediction of the theory rather than an input.In general these approaches favour accuracy over speed. By appropri-ately modelling the solvent-solvent interaction terms and iterating a numberof times to find the minimum free energy, cDFT can capture realistic cor-relation functions and energies, but at a computational cost high enoughto be unfeasible for implicit solvent applications in molecular dynamics inlarge systems.Kinjo and Takada[81] have applied cDFT to protein systems. Theirapproach treats both the solvent and the protein with a density field. This16approach allows them to study crowding effects in a general way, but treatsthe protein in an extremely simplified manner in order to reduce it to amanageable density field.Borgis and colleagues have also applied cDFT to proteins[82], and solva-tion in particular. Their approach differs somewhat from more traditionalcDFT schemes. Rather than break the solvent-solvent interaction potentialup into attractive and hard sphere components, Borgis writes the interactionfunctional \u03a6 as[97, 98]\u03a6[\u03c6] =\u222b \u222bd3rd3r\u2032\u03c6(r)\u03c6(r\u2032)U(r\u2212 r\u2032)g(r, r\u2032) (1.24)where U is the full solvent-solvent interaction potential and g is the corre-lation function, which is taken from experimental structure factors. Thushere, rather than the solvent-solvent correlation function being a predictionof the theory, it is an input. This approach makes two implicit assumptions:first, that the correlation function is equal to the direct correlation functionover the interaction potential, and second, that the two-body and higherentropic terms are zero. To see the first assumption, we note that the directcorrelation function can be found from the free energy functional by\u03b42GkBT\u03b4\u03c6(r)\u03b4\u03c6(r\u2032)= c(r, r\u2032) (1.25)which, for the interaction functional 1.24, gives g(r, r\u2032) \u00b7 \u03b2U(r, r\u2032) = c(r, r\u2032).To see the second assumption we note that the interaction term is entirelyenthalpic; \u2202G\/\u2202T generates only the ideal gas term in this model. Putanother way, in the hard sphere limit, in which U(r) = 0 for any region inwhich the correlation function is non-zero, the Borgis interaction functionalis identically zero and the model reduces to an ideal gas.One feature both the Borgis approach and the traditional cDFT ap-proaches share is the need to iterate over the equations resulting from ap-plying equation 1.16 to the respective models to obtain a density whichconverges to the final self-consistent solution. This iterative nature of thesolution creates a cost penalty to applying these methods. In this thesis17we seek a model which avoids iterative solutions in order to develop analgorithm that can be applied many millions of times during a moleculardynamics simulation. Thus, while other approaches start from a relativelyaccurate functional and seek to simplify it, we will start from the most ba-sic functional and seek to add as little as possible to make it sufficientlyaccurate.1.5 Molecular DynamicsOne of the principal ways of studying proteins computationally is molecu-lar dynamics (MD)[99]. As in much of condensed matter, the underlyingphysics of the protein molecule is in principle known: a collection of nucleiand electrons which interact via the electromagnetic force. But perform-ing explicit quantum mechanical calculations on many thousands of atomsat 300K in many configurations is well beyond even the most powerful ofcomputers, and will likely remain so for some time. Thus a series of approx-imations are made to reduce the problem to something more manageable.The most basic approximation is that the atoms of the protein are treated asclassical particles. These particles interact with each other through a seriesof potentials. In the so-called All Atom approach, each atom of the proteinis treated as a point particle, and the atoms interact through potentials suchas bond potential, angle potentials, dihedral potential, Coulomb potential,and van der Waals potential; i.e. the total potential energy U of the systemis given byU =\u2211ijV bondij + VCoulombij + VvdWij (1.26)+\u2211ijkV angleijk (1.27)+\u2211ijklV dihedralijkl (1.28)where each term will be discussed below. Once these terms have been de-termined, the basic approach of a molecular dynamics program is straight-18forward: at a given time, for each particle in the system, compute the sumof the forces on that particle,Fi = \u2212\u2207iU (1.29)then evolve the system using Newton\u2019s second law F = ma and a discretetimestep \u2206t. This is repeated until a simulation of sufficient length to samplethe properties of interest has been obtained. The difficulty in practice is the\u201csufficient length\u201d; a typical value of \u2206t for an all-atom simulation is 2 fs,while the folding time of a protein might be on the order of 10 s[100]. Thuseven in this classical approximation many phenomena are out of reach. Thetimestep \u2206t is determined by requiring a certain level of accuracy in thesimulation. Discretizing the integration of Newton\u2019s second law introduceserror in the system. Conceptually this arises from the possibility that aparticle will move through a region in which the potential changes so rapidlythat the integration step cannot keep up. This results in a certain amount of\u201cshadow work\u201d done by the integrator on the system[101]. The timestep isthus a compromise between choosing a large value to speed up the simulationwhile keeping it small enough to maintain stability. In practice, 2 fs is oftenused as a standard value, with the condition that the bonded interactionsinvolving hydrogens are rigidly constrained (or a so-called \u201cunited atom\u201dforcefield is used to eliminate the hydrogen atoms).The bond potential Vbond is often assumed to be harmonic;Vbond(r1 \u2212 r2) = 12k(|r1 \u2212 r2| \u2212 d0)2 (1.30)There are other forms, though; the Morse potential is one such alternative[102].The bond can also be rigidly constrained. The angle potentials also take aharmonic form,Vangle(\u03b8123) =12k(\u03b8123 \u2212 \u03b80)2 (1.31)In this case there is, in principle, a concern that the form of the potentialis not periodic in \u03b8; in practice this concern is dealt with by ensuring thatk has a value such that \u03b8123 \u2212 \u03b80 is never so large as to be an issue. The19dihedral potential (see figure 1.2) cannot in general be harmonic, becauseit is much softer than the angle potential and there are many situations inwhich dihedral angles vary through the full 2pi. Further there may be severallocal minima. An example of this is the cis and trans configurations of smallmolecules. Common forms for the dihedral angle thus includeVdihedral(\u03b81234) =k(1\u2212 cos(n\u03b81234 \u2212 \u03b80)) (1.32)Vdihedral(\u03b81234) =5\u2211n=1Cn(cos\u03b8)n (1.33)There are two types of dihedral angles considered. Proper dihedrals occurwhen four atoms are bonded in a line, as in figure 1.2 panel A. The dihedralangle is then defined as the angle between the plane formed by atoms i,j, and k and the plane formed by atoms j, k, and l. Improper dihedralspertain to three atoms bonded to a central atom, as in figure 1.2 panel B.Again, the dihedral angle is defined as the angle between the plane formedby atoms i, j, and k, and the plane formed by atoms i, k, and l, but here theinterpretation of this angle is somewhat different. While proper dihedralsare used to specify cis and trans configurations, improper dihedrals are usedto stiffen groups of planar atoms.The form of the force-field potential is chosen to account for the fact thatinteractions between atoms can be strongly non-two-body. Thus the angleand dihedral terms attempt to capture the physics of bonded interactions,which tend towards particular geometric arrangements.The van der Waals interaction and the Coulomb interaction are termed\u201cnon-bonded\u201d interactions. They are present between all pairs of atoms thatare in different molecules or in the same molecule but separated by at leasta certain number of bonds (typically three). In most cases they have thefamiliar forms:VvdW(r) = 4\u000f((\u03c3r)12 \u2212 (\u03c3r)6)(1.34)VCoulomb(r) = kqq1q2r(1.35)20Figure 1.2: Diagram of dihedral angles. Panel A shows an example of aproper dihedral angle, while panel B illustrates an improper dihedral.The van der Waals interaction can, however, be modelled by a 10-12 po-tential (which may be more appropriate for hydrogen bonding) rather thana 6-12 potential, and some have argued the Buckingham potential is morephysically motivated as it attempts to find a form for the repulsive interac-tion based on observed virial coefficients[103].This thesis will discuss the derivation of a new implicit solvent model. Toput this derivation into context, it is useful to summarize how the parametersin molecular dynamics forcefields are obtained in the first place.The parameters in equations 1.30-1.35 need to be determined, and thereare a number of ways of doing so. Each of them involves attempting to repro-duce some aspect of the \u201creal\u201d system. The essential problem, as mentionedabove, is that molecules are quantum mechanical entities and we would liketo simulate them as classical particles. The CHARMM forcefield[104], forexample, uses quantum chemical calculations to generate parameters. Theamino acids are broken down into small molecules amenable to a Hartree-Fock computation. These molecules are then placed in the vicinity of a watermolecule. The water molecule is placed at a number of different positions toobtain the equilibrium distance to various atoms in the model compound,and the interaction energy at that equilibrium distance; this process is illus-trated in figure 1.3. The van der Waals and Coulomb parameters are thenset to reproduce these interaction energies and equilibrium distances. Simi-larly, the bonded parameters are found by varying the distances, angles, anddihedrals of each set of atoms, and finding minimum values and the second21Figure 1.3: Obtaining the parameters in the CHARMM method-ology. Panel A illustrates a model compound (in this case imidazole)and a nearby water. Panel B shows the interaction energy, calculated inGaussian[105], for various water-nitrogen distances. The van der Waals andCoulomb parameters are then picked to reproduce the minimum interactionenergy and the equilibrium distance.derivatives of the energy with respect to separation. The bond parametersare set to reproduce the harmonic well each set of atoms rests in.The OPLS forcefield, on the other hand, compares thermodynamic prop-erties of Monte Carlo simulations of pure liquids of small molecules with ex-perimentally measured bulk properties such as density and heat of vaporization[106].In each of these forcefields the results for small model compounds are thencombined into a force-field for the entire protein.AMBER is similar to OPLS in that it uses experimental results to de-termine the force-field parameters, but fits a somewhat different set of prop-erties, such as observed normal modes and vibrational spectra[107].The all-atom approach is the one most relevant to this thesis, but it isonly one of a number of approaches to molecular dynamics. There are anumber of coarse-graining methods which can be employed. The simplestare United Atom (UA) forcefields[108]. In these the only coarse-grainingis to remove most of the hydrogen atoms, incorporating their effects intomodified parameters of the heavy atoms the hydrogens were bonded to\u2013e.g.22a methyl group would become a single atom labeled CH3 with an atomicmass of 15 and a radius larger than that of a bare carbon atom.Coarse graining of atoms can be taken further, subsuming more andmore atoms into single particles. Entire side-chains can be approximatedas a single particle, or, in the most coarse-grained approached, each residuecan be modelled as a single bead[109]. In each of these cases the trade offis for increased simulation speed at the cost of some accuracy.Coarse-grained approaches require force-fields to be parameterized, andas with all-atom, this can be done in a variety of ways. Typically the param-eters are not calculated ab initio, but are tuned to reproduce some aspectof the proteins to be studied. The widely used Go\u00af model generates a force-field by starting with the experimentally determined folded structure of theprotein, then setting attractive interactions between all pairs of particlesin contact in the folded structure (with a equilibrium distance fixed to theobserved contact distance) and repulsive interactions between all pairs ofparticles not in contact[110]. This type of model can be useful for manystudies, but obviously cannot be used for things such as structure predic-tion or investigating intrinsically disordered proteins (proteins that do notadopt a well-defined structure).More versatile approaches include Associated Memory Hamiltonians.Here, rather than just one folded structure, a database of proteins is used.Each segment of the protein is matched to segments of other proteins thathave the similar sequences[111]. The potential energy function for a givenprotein is then constructed based on an appropriately weighted average ofthe structures from these similar sequences. In this approach the structureof the protein of interest does not need to be known and intrinsically disor-dered proteins can be examined. It shares with the Go\u00af model an essentiallyphenomenological character though.1.6 Computational MethodsAs mentioned above, simulations, while providing a level of detail not pos-sible in experiments, are limited in time scale. A protein and accompanying23water molecules, simulated on a high performance cluster of CPUs, mightrun a nanosecond per hour of wallclock time[112]. Thus simulations of morethan a few hundred nanoseconds become prohibitive. The D.E. Shaw com-pany, using a dedicated hardware system and custom software, can simulateon the order of a millisecond[113]. Protein dynamics, however, can occuron the time scales of seconds or longer. Thus while experimental methodsdetermine free energies by looking at populations, computational methodsrequire other approaches.One of the most relevant questions we can ask of a system is, What isthe difference in free energy between two states? Because we cannot easilysimulate the systems of interest to biophysics for long enough times to ade-quately sample each state of interest, we require non-equilibrium techniquesto find these differences in free energy. Of particular application to thisthesis are thermodynamic integration[114, 115], Bennet acceptance ratio(BAR)[116], the Jarzynski equality[117], and weighted histogram analysismethod (WHAM).Thermodynamic Integration (TI)[114, 115] makes use of the followingrelation from statistical mechanics: the difference in free energy \u2206G betweenHamiltonians HA and HB is given by\u2206G =\u222b 10d\u03bb\u2329dUd\u03bb\u232a(1.36)where the Hamiltonian is parameterized to be a continuous function of \u03bbsuch that H(\u03bb = 0) = HA and H(\u03bb = 1) = HB, \u2206G is the change in freeenergy, \u3008...\u3009 indicates the thermal average, and U is the potential energy ofthe system. A variety of terms in the forcefield can be set to change with\u03bb: molecules can be coupled or decoupled from the surrounding solventmolecules, bonds can be formed or broken, and atoms can even be smoothlymoved from one element to another in an unphysical process that nonethelessgives meaningful end points. The simulation (or simulations) is then set toobtain sufficient sampling at each value of \u03bb to evaluate the integral inequation 1.36.The Bennet Acceptance Ratio[116] also calculates free energy differences24relative to changes in the forcefield parameters. This approach starts withthe following theorem: for any f(x) such that f(x)\/f(\u2212x) = e\u2212x, the freeenergy difference between two systems with Hamiltonians HA and HB ise\u2212\u03b2(\u2206G\u2212C) =\u3008f(\u03b2(UB \u2212 UA \u2212 C))\u3009A\u3008f(\u03b2(UA \u2212 UB + C))\u3009B (1.37)where \u2206G is the free energy difference between systems with HamiltoniansHA and HB, UA and UB are the potential energies of the HA and HBforcefields respectively, C is an arbitrary constant, and \u3008...\u3009A and \u3008...\u3009B arethe thermal averages in the A and B ensemble respectively. Equation 1.37is typically evaluated by running a simulation with Hamiltonian HA anda simulation with Hamiltonian HB. Then in simulation A the quantityf(\u03b2(UB\u2212UA\u2212C)) can be computed at each time step and the average overthe simulation can be taken to find \u3008f(\u03b2(UB\u2212UA\u2212C))\u3009A. The correspondingquantities for simulation B can be calculated in the same way, and fromthis the free energy difference is calculated. Bennet showed that the errorin the estimation of \u2206G for a given finite sampling is minimized by settingf(x) = (1 + ex)\u22121 and C \u2248 \u2206G. While we obviously don\u2019t know \u2206G apriori, this approach allows an iterative scheme to obtain a self-consistentresult.Strictly speaking, equation 1.37 requires that systems A and B occupythe same phase space. This is rarely true in cases of interest, so in practicethe full change one wishes to examine is parameterized and broken up intosmall segments. The free energy to move from state i to state i + 1 iscalculated along the path and summed to arrive at the total change in freeenergy. In this way the practical implementation of BAR is very similar tothat of TI.The Jarzynski equality[117] allows the derivation of the change in freeenergy between two states from many independent simulations determiningthe work done to move the system between these states. Jarzynski showedthat\u3008e\u2212\u03b2W \u3009 = e\u2212\u03b2\u2206G (1.38)25where W is the work done in a given simulation, \u3008...\u3009 is the average oversimulations, and \u2206G is the difference in free energies between states. Tomake use of the Jarzynski equality one performs many simulations in whichthe system is driven from state A to state B, and the work done in doing sois calculated. One factor complicating this analysis is finding an unbiasedestimator for \u3008e\u2212\u03b2W \u3009. In particular, given some finite number of simulationsN , each of gives rise to a work Wi, the estimatorN\u2211i=1e\u2212\u03b2Wi (1.39)is biased. Unbiased estimators of the change in free energy using the Jarzyn-ski equality have been developed though[118] and the Jarzynski equality isa useful tool in computational biophysics.The weighted histogram analysis method (WHAM)[119] relies on a tech-nique for estimating the thermal average of an observable in the absence ofbias from measurements in the presence of bias. Given Nsims simulationswhich have different biasing potentials Ui(z) along some reaction coordinatez, the probability P (z) of the system being in state z can be calculated bysolving self-consistently the following:P (z) =\u2211Nsimsi ni(z)\u2211Nsimsi Ni exp (\u03b2 [Fi \u2212 Ui(z)])(1.40)Fi = \u2212kBT ln(\u2211binsP (z) exp [\u2212\u03b2Ui(z)])(1.41)where ni(z) is the number of counts in the bin centred on z, Ni the totalnumber of counts for simulation i, and Fi an energy shift introduced tooptimize the estimate of P (z).The practical implementation of WHAM involves performing many sim-ulations with different biasing potentials that result in a range of valuesof the reaction coordinate (e.g. the distance between two monomers for astudy of binding free energy). The simulations need to be spaced closelyenough along the reaction coordinate that the states sampled in different26simulations overlap. Then WHAM can be used to determine the free energyof moving along that reaction coordinate from the knowledge of P .To summarize these methods: Thermodynamic Integration and BennetAcceptance Ratio are useful in calculating the free energy difference betweentwo states with different forcefield, such as molecules that do not interactwith their surroundings in one state but do in the other. Jarzynski allowsone to calculate the free energy difference between two states when it ispossible to use an external pulling force to move the system from one stateto the other, and when this can be done many times to establish an average.WHAM can be used when a path in some coordinate can be defined betweenthe two systems, and simulations at various points along that path can beperformed.It is important to note here that all these methods calculate the differencein free energy between two systems. The absolute free energy of a system isa quantity we cannot calculate, nor would it be of particular interest if wecould. Only relative free energies matter.1.7 Aims of this ThesisThis thesis investigates protein-solvent interactions using classical densityfunctional theory. In Chapter 2 we examine a variety of experimental dataon the transfer free energy to move a protein into various solutions, and inparticular the difference in transfer free energy between moving the proteinfrom vacuum to pure water and moving the protein from vacuum to waterplus a cosolute. We introduce a new way of looking at the uncertainty inmeasuring the enthalpy and entropy of such transfers and ascertain whetherentropy-enthalpy compensation is a real effect or an experimental artifact.In chapter 4 we introduce classical density functional theory in the contextof transfer free energies and prove that the free energy of a bath of particlesis independent of external potential under certain conditions. In chapter3 we apply cDFT to cosulutes and show that even in a very approximateform the theory still produces useful insights. We also re-examine the issue ofentropy-enthalpy compensation through the lens of cDFT. Finally in chapter275 we develop a cDFT implicit solvent model. We finish with a discussion offuture directions for this work.28Chapter 2Experimental Analysis:Entropy-EnthalpyCompensation and CosolutesAs mentioned in Section 1.2, entropy-enthalpy compensation is a phenomenonin which changes in entropy and enthalpy upon some perturbation largelycancel, leaving a change in free energy that is much smaller in magnitudethan the changes in entropy and enthalpy. In this chapter we investigatehow broadly this effect of entropy-enthalpy compensation applies to macro-molecular systems, by analyzing the experimentally-derived enthalpy andentropy of transferring two-state proteins from water, perhaps with buffersand at some pH which need not be 7, into the same solution but in the pres-ence of various cosolutes. These cosolutes can be osmolytes, denaturants,crowders, or other proteins; i.e. we place no restriction on the size or on howrelatively favorable or unfavorable the interactions are with the protein[35].While the rest of the thesis examines protein-solvent interactions usingtheoretical and computational tools, in this chapter we examine experimen-tal data on protein-solvent interactions. In addition to addressing entropy-enthalpy compensation, a topic of interest in and of itself, our analysis inthis chapter will motivate our later development of an implicit solvent the-ory that accounts for entropy as well as enthalpy. Many presently available29implicit solvent theories typically do not account for entropy, and are purelyenthalpic. We will see in this chapter that entropy and enthalpy enter thetransfer free energy on equal footing, and must both be accounted for tocreate an accurate theory of implicit solvation. We will also return to thedata in this chapter in Section 3.5 to show how it can inform our classicalDFT approach to solvation in a direct way.In what follows, we begin by introducing various thermodynamic equa-tions that define the two-state model in Section 2.1. In Section 2.2.1 weintroduce a Monte Carlo procedure for estimating the experimental uncer-tainty of thermodynamic quantities obtained from calorimetry assays. InSection 2.2.2 we show that entropy-enthalpy compensation occurs for thetransfer of a diverse set of two-state proteins to various solvents. This maybe the most general class of systems that have been observed to obey com-pensation. While it may be intuitive that a given protein and solvent se-ries may compensate, it is not obvious that there would exist compensationacross both solvents and proteins. For example, the excluded volume compo-nent of transfer is generally non-compensated and different across protein-solvent systems. We use the method derived in Section 2.2.1 to confirmthat entropy-enthalpy compensation is a significant effect. In Section 2.2.2we plot the experimental data at lab temperature, which exhibit definitiveentropy-enthalpy compensation across a diverse set of proteins and coso-lutes. In this section we emphasize the importance of accounting for the (of-ten neglected) concentration dependence of the heat capacity change uponunfolding. Finally we conclude in Section 2.3.2.1 Methods and Theory2.1.1 Thermodynamic Equations for Protein UnfoldingTwo-state models in protein folding have a long and rich history, and haveempirical validity for many proteins [120\u2013122]; various aspects of two-statefolding, including applications to protein denaturation, protein stability, andthe prediction of so-called m-values, are described elsewhere [25, 121\u2013132].30Here, we adopt the two-state model for a set of proteins that either havebeen shown previously to satisfy the van\u2019t Hoff two-state criterion[132] orthat have comparably small residuals when fit to a two-state model.The changes in enthalpy \u2206H, entropy \u2206S, and free energy \u2206G uponunfolding can be obtained if the change of heat capacity upon unfolding\u2206Cp = Cpu\u2212Cpn is measured. Here Cpu and Cpn are the unfolded and nativestate heat capacities respectively, which may be temperature-dependent. Atemperature-independent unfolding heat capacity is often used as a goodapproximation [133, 134], while others have considered a linear temperature-dependence of the unfolding heat capacity [7, 135]. Here, we adopt the mostgeneral temperature dependence of the unfolding heat capacity, followingthe method used in Wintrode et. al. [136], wherein the folded heat capacityCpn is observed to obey a linear temperature-dependence, and the unfoldedheat capacity obeys a non-linear temperature-dependence determined by theheat capacities of the amino acid constituents. Specifically, the heat capacityof the unfolded state Cpu is given byCpu = (N \u2212Ngly \u2212 1)Cp(bb) + Cp(N\/C term) +N\u2211i=1Cp(Ri) . (2.1)Here N is the chain length, Ngly is the number of glycine residues in thepolypeptide chain, Cp(bb) is the heat capacity of the peptide backbone,Cp(N\/C term) is the heat capacity of the N- and C- termini, and Cp(Ri)is the heat capacity of the side chain corresponding to the ith amino acid(glycine is included in this sum). Values for Cp(bb), Cp(N\/C term), andCp(Ri) have been obtained by Makhatadze and Privalov [137] for temper-atures of 5, 25, 50, 75, 100, and 125\u25e6 C. For the proteins and cosolutes weconsider in Section 2.2.1, we use the values in reference ([137]) to interpolateCpu(T ) from 5 to 125\u25e6 C with a cubic spline.With \u2206Cp(T ) determined numerically, \u2206H, \u2206S, and \u2206G can be calcu-31lated from\u2206H = \u2206Hf +\u222b TTf\u2206Cp(T\u2032)dT \u2032 (2.2)\u2206S = \u2206Sf +\u222b TTf\u2206Cp(T\u2032)T \u2032dT \u2032 (2.3)\u2206G = \u2206H \u2212 T\u2206S (2.4)The reference temperature Tf is taken to be the temperature at which theunfolding free energy is zero: \u2206H(Tf ) = Tf\u2206S(Tf ). The unfolding heatcapacity is given by \u2206Cp(T ) = Cpu(T )\u2212Cnp (T ), with Cpu and Cnp describedabove. The non-linearity in the heat capacity is fixed in the model by thecomposition of the protein. The linear temperature-dependence of the na-tive heat capacity is determined empirically from the fit to the data foreach protein-cosolute system. Thus when using this model to fit data, thefree parameters in the heat capacity are the unfolding heat capacity at thetransition temperature, \u2206Cpf , and the linear coefficient to the temperaturedependence of the heat capacity of the native state, \u2206C \u2032pn. \u2206Cp(T ) is pa-rameterized as \u2206Cp = \u2206Cpf + Cpu(T ) \u2212 Cpu(Tf ) \u2212 \u2206C \u2032pn(T \u2212 Tf ), whereCpu(T ) has the non-linear T dependence in equation (2.1).In the approximation that \u2206Cp is a linear function of temperature:\u2206Cp = \u2206Cpf + \u2206C\u2032p(T \u2212 Tf ), where \u2206C \u2032p = \u2202\u2206Cp\/\u2202T , Equations (2.2)-(2.4) become\u2206H = \u2206Hf + \u2206Cpf (T \u2212 Tf ) +\u2206C \u2032p2(T \u2212 Tf )2 (2.5)\u2206S = \u2206Sf + \u2206Cpf ln(TTf)+ \u2206C \u2032p[T \u2212 Tf \u2212 Tf ln(TTf)](2.6)\u2206G = \u2206H \u2212 T\u2206S (2.7)The expressions for \u2206H, \u2206S, and \u2206G in the limiting case of a T -independentunfolding heat capacity may be obtained by setting \u2206C \u2032p = 0 in (2.5)-(2.7);32e.g. the unfolding free energy is\u2206G =(1\u2212 TTf)\u2206Hf +[T \u2212 Tf \u2212 T ln(TTf)]\u2206Cpf (2.8)In Section 2.2.1, we examine the effect these approximations have on theparameters obtained from experimental data.The probability pu for the system to be unfolded in the two-state modelispu = 1\/(1 + e\u03b2\u2206G) (2.9)with \u2206G given in equation (2.4). Equation (2.9) can be equivalently writtenas \u2206G = \u2212kBT lnKu where Ku is the folding equilibrium constant, givenin the two-state model by Ku = pu\/(1\u2212 pu).The total heat capacity in the two-state model is given (for example bydifferentiating \u3008H\u3009 = Hn(1\u2212 pu) +Hupu with respect to T ) byCp = Cpu \u2212\u2206Cp + \u2206Cp1 + e\u03b2\u2206G+\u2206H24kBT 2sech2(\u2206G2kBT)(2.10)where Cpu, \u2206Cp, \u2206G, and \u2206H are all temperature-dependent.A plot of the Gibbs free energy vs temperature obtained from empiricaldata may be fit to Equation (2.4) to obtain values of \u2206Hf , \u2206Sf , \u2206Cpf , and\u2206C \u2032p. Similarly, a plot of the fraction of unfolded protein vs temperaturemay be fit to equation (2.9), or a plot of excess heat capacity may be fit toequation (2.10) to obtain values of these parameters. In all cases, once \u2206Hf ,\u2206Sf , \u2206Cpf , and \u2206C\u2032p are obtained, equations (2.2) and (2.3) can be used toobtain \u2206H(T ) and \u2206S(T ) at various temperatures. In Section 2.2.1 we willcompare the best-fit values for the three models of \u2206Cp described above, i.e.Equations (2.2)-(2.4), (2.5)-(2.7), and the temperature-independent \u2206Cpmodel (cf. Equation (2.8)). This procedure can be performed at variouscosolute concentrations, providing experimental data is available. Then thechanges in unfolding enthalpy \u03b4\u2206H(T, c) and entropy \u03b4\u2206S(T, c) upon trans-fer from a solution of cosolute concentration 0 to one of concentration c ata given temperature T can be determined.33We also define the change in the midpoint parameters at each respectivecosolute concentration, nonzero and zero, upon transfer. :\u03b4\u2206Hf (c) \u2261 \u2206H(Tf (c), c)\u2212\u2206H(Tf (0), 0) (2.11)\u03b4\u2206Sf (c) \u2261 \u2206S(Tf (c), c)\u2212\u2206S(Tf (0), 0) (2.12)\u03b4\u2206Cpf (c) \u2261 \u2206Cp(Tf (c), c)\u2212\u2206Cp(Tf (0), 0) (2.13)In what follows we will often drop the explicit concentration dependencein writing various thermodynamic equalities when it is unambiguous.2.2 Results2.2.1 Monte Carlo Method to Determine Statistical ErrorsTo analyze the uncertainty involved in fitting the data, we perform a MonteCarlo procedure. We fit a given data set, such as Cp(T ), \u2206G(T ), or pu(T ), toequations (2.10), (2.4), or (2.9) respectively. Using the root mean square ofthe residuals for a given fit, we then generate a large number of sample datasets by drawing each point from a normal distribution with a mean equal tothe value of the best fit curve at that point and a standard deviation equalto the root mean square of the residual. Each of these generated sampledata sets is then fitted and new fit parameters are obtained, thus generatinga distribution of values for \u2206Hf , \u2206Sf , and, depending on the model, either\u2206Cp, \u2206Cpf and \u2206C\u2032p, or \u2206Cpf and \u2206C\u2032pn. We can fit the different modelsfor \u2206Cp described in Section 2.1.1 to compare the parameters extracted.The uncertainty in the thermodynamic parameters could also be obtainedby examining the covariance matrix of the fit parameters, but the MonteCarlo method we use allows us to extrapolate uncertainties to other regimes(as we will do in section 2.2.2) without truncating any moments in obtainingthe variance.As an example of fitting the stability \u2206G(T ), we have used experimentalmeasurements by Zweifel and Barrick of the thermal denaturation of notchankyrin in various concentrations of urea[138]. The best fit to the 0M data34in figure 4a for reference ([138]) (plotted as green circles in Panel A ofFigure 2.1) yields a root mean square of the residuals of 0.383 kJ\/mol, sothe square of this becomes the variance of the normal random distributioncentered around the value of the best fit curve. From this we generate 1000sample data sets, each of which is fit to either Equation (2.4), (2.7) or (2.8),depending on the model. In Table 2.1, we compare the parameters obtainedwith the three different models of \u2206Cp described above. We see that theparameters are consistent with each other, and with the tabulated value inreference ([138]). We analyze the variances and correlations of this data; thisis reported in Table 2.2. We see that all parameters are generally stronglycorrelated or strongly anti-correlated. Figures 2.1A,B show that the threemodels give similar curves even when extrapolating to high temperatures,though the variance in \u2206G at high T is significantly smaller for the T -independent \u2206Cp model than the other two models considered.We perform the same analysis to compare the three heat capacity modelsfor the stability \u2206G vs T data for hisactophilin given in reference ([6]). Thecomparison between the parameters that the three models give for fitting thesame data are given in Table 2.1. The midpoint parameters \u2206Hf and \u2206Sffor the different models again all agree within the uncertainties obtainedfrom the Monte Carlo procedure.Figure 2.1 panels C and D plot data from reference ([6]) for the stability\u2206G vs T for hisactophilin in 0 M and 1 M urea. The data for 1 M ureaincludes both hot denaturation and cold denaturation regions. Comparingpanels C and D of Figure 2.1 we can see that the model variance is muchless for the 1M urea data, in that all three models predict similar curves,presumably as a result of having a larger range of \u2206G(T ) data to fit. Thelarge uncertainty in the non-linear T dependent model for the 1 M urea datais likely caused by fitting a more flexible model to a limited set of data.Interestingly, there is a change in sign of the curvature at low tempera-tures in Panel C of Figure 2.1 for the non-linear T -dependent model. Thiseffect is caused by a change in sign of \u2206Cp at around 310 K. A similar effectis seen at high temperatures for some generated sets of data in the non-linearT -dependent model at 1 M urea (Figure 2.1D).35We have performed the same analysis on data measuring the fraction ofunfolded protein as a function of temperature. The data examined is forhistidine-containing phosphocarrier protein (HPr), from reference ([139]),and for Arc Repressor, from reference ([140]). The midpoint parameters\u2206Hf and \u2206Sf are again in agreement between the three models (see Table2.1) but the way that the different heat capacity models extrapolate quan-tities such as the stability and the enthalpy is markedly different, as it wasfor hisactophilin\u2014 see Figure 2.2 for HPr.We have performed the same analysis on heat capacity vs T data for \u03b1-lactalbumin, from reference ([7]). The midpoint parameters \u2206Hf and \u2206Sfare in agreement between the models (Table 2.1). Again, however, the waythat the models extrapolate stability and enthalpy is very different (Figure2.2).In all the cases we have examined, the values of the unfolding entropy andenthalpy at the transition midpoint are robust across all three models. Fur-ther, the value \u2206Cpf agrees within uncertainty for all the cases we looked atbetween the linear T -dependent \u2206Cp model and the non-linear T -dependent\u2206Cp model. Fitting experimental data to a temperature-independent \u2206Cpmodel will be sufficient, if only midpoint parameters are required and theaccuracy of the unfolding heat capacity \u2206Cpf is not particularly important.However Figures 2.1 and 2.2 indicate that such data is prone to significantextrapolation errors.2.2.2 Transfer Entropy and Enthalpy for Various Proteinsand SolventsTransfer Entropy and Enthalpy at the Transition MidpointThe above analysis indicates that the thermodynamic parameters obtainedby fitting experimental data are most accurately determined near the tran-sition midpoint. We thus now examine the entropy and enthalpy of transferfor various proteins at their transition midpoints, from water to water plusvarious cosolutes, cf. Equations (2.11) and (2.12).For a number of proteins, thermodynamic data exists for more than one36Figure 2.1: Stability and enthalpy as a function of temperaturefor notch ankyrin and hisactophilin. The green circles are experimentaldata from ref ([138]) for notch ankyrin (panels A and B) and ref ([6]) forhisactophilin (panels C and D). The blue lines are fits for the T -independent\u2206Cp model, the red lines are fits for the linear T -dependent model, andthe black lines are fits for the non-linear T -dependent model (cf. Section2.1.1). The solid lines arise from the best fit parameters for each model,while the dashed lines represent one standard deviation away, determinedby the Monte Carlo procedure described in Section 2.2.1. (Panel A) Stabilityfor notch ankyrin vs temperature in buffer. (Panel B) Enthalpy for notchankyrin in buffer. (Panel C) Stability for hisactophilin in buffer. (Panel D)Stability for hisactophilin in buffer with 1M urea. The insets in Panels A, C,and D show the correlation of the midpoint enthalpy and entropy, in whichthe models are represented by the same colors as above. All data in theinsets lies on top of the red scatter points; the blue and black points havebeen displaced for clarity. 1000 Monte Carlo instances have been generatedfor the inset plots. Bars indicate one standard deviation.37Figure 2.2: Analyzing unfolding fraction data and heat capacitydata. Panels A and B show fraction of unfolded population data and heatcapacity data from refs ([139]) and ([7]) respectively (green circles), alongwith best fit curves. Panels C and D show the corresponding stability asa function of temperature, and panels E and F show the correspondingenthalpy as a function of temperature. In all panels the blue lines are fitsfor the T -independent \u2206Cp model, the red lines are fits for the linear T -dependent model, and the back lines are fits for the non-linear T -dependentmodel. The solid lines arise from the best fit parameters for each modelwhile the dashed lines represent one standard deviation away, determinedby the Monte Carlo procedure described in Section 2.2.1.38Table 2.1: Thermodynamic parameters for several proteins usedin our analysis, and comparison to literature data where available: \u03b1-Lactalbumin, from heat capacity data in ref. ([7]), Arc Repressor, fromfraction of unfolded protein data in ref. ([140]), Creatine Kinase from heatcapacity data in ref. ([141]), Hisactophilin, from stability data in ref. ([6]),Histidine-containing phosphocarrier protein (HPr), from fraction of unfoldedprotein data in ref. ([139]), Notch Ankyrin, from stability data in ref.([138]), and RNase A from heat capacity data in ref. ([142]). Values obtainedfrom the three models of the temperature dependence of \u2206Cp are given, aswell as the values obtained from the appropriate reference where available.The reference value in ref. ([138]) assumed a temperature-independent \u2206Cp,the value of \u2206Hf from ref. ([6]) was obtained by integrating Cp up to Tf ,and the values from ref. ([139]) were obtained assuming a temperature-independent \u2206Cp.Protein Model \u2206Hf \u2206Sf \u2206Cpf \u2206C\u2032p \u2206C\u2032pn(kJ\/mol) (kJ\/mol\/K) (kJ\/mol\/K) (kJ\/mol\/K2) (kJ\/mol\/K2)T independent 297 \u00b1 1 0.876 \u00b1 0.003 4.38 \u00b1 0.07 \u2013 \u2013\u03b1-Lactalbumin T linear 304 \u00b1 1 0.896 \u00b1 0.004 2.88 \u00b1 0.26 -0.101 \u00b1 0.017 \u2013T non-linear 304 \u00b1 1 0.895 \u00b1 0.004 2.93 \u00b1 0.26 \u2013 0.065 \u00b1 0.017Reference Value[7] 310 \u2013 5.3 -0.05 \u2013T independent 139 \u00b1 3 0.454 \u00b1 0.011 0.536 \u00b1 0.5 \u2013 \u2013Arc Repressor T linear 118 \u00b1 7 0.385 \u00b1 0.02 1.45 \u00b1 1.5 1.18 \u00b1 0.44 \u2013T non-linear 117 \u00b1 7 0.384 \u00b1 0.024 1.49 \u00b1 1.4 \u2013 1.15 \u00b1 0.4Creatine Kinase\u2217 T independent 780 \u00b1 3 2.37 \u00b1 0.01 25 \u00b1 2 \u2013 \u2013T linear 733 \u00b1 3 2.24 \u00b1 0.01 66 \u00b1 4 -73 \u00b1 3 \u2013T independent 215 \u00b1 7.4 0.659 \u00b1 0.022 6.15 \u00b1 0.75 \u2013 \u2013Hisactophilin T linear 218 \u00b1 5.9 0.669 \u00b1 0.018 12.7 \u00b1 1.7 -0.610 \u00b1 0.17 \u2013T non-linear 218 \u00b1 17 0.667 \u00b1 0.051 12.8 \u00b1 3.3 \u2013 0.784 \u00b1 0.28Reference Value[6] 226 \u2013 \u2013 \u2013 \u2013T independent 312 \u00b1 2 0.929 \u00b1 0.006 5.27 \u00b1 0.7 \u2013 \u2013HPr T linear 314 \u00b1 2 0.935 \u00b1 0.006 4.41 \u00b1 0.85 0.359 \u00b1 0.150 \u2013T non-linear 314 \u00b1 2 0.935 \u00b1 0.006 4.44 \u00b1 0.80 \u2013 0.367 \u00b1 0.14Reference Values[139] 316 0.941 6.0 \u2013T independent 593 \u00b1 9 1.86 \u00b1 0.03 15.1 \u00b1 0.5 \u2013 \u2013Notch Ankyrin T linear 602 \u00b1 30 1.89 \u00b1 0.1 16.2 \u00b1 3.5 -0.045 \u00b1 0.14 \u2013T non-linear 601 \u00b1 33 1.89 \u00b1 0.1 15.7 \u00b1 3.5 \u2013 0.114 \u00b1 0.14Reference Value[138] \u2013 \u2013 15.1 \u2013 \u2013T independent 496 \u00b1 1 1.477 \u00b1 0.002 14.3 \u00b1 0.1 \u2013 \u2013RNase A\u2217 T linear 468 \u00b1 1 1.396 \u00b1 0.003 22.6 \u00b1 0.2 0.89 \u00b1 0.02 \u2013Reference Values[142]\u2021 515 1.52 \u2013 \u2013 \u2013Reference Values[142]\u00a7 479 1.42 \u2013 \u2013 \u2013\u2013 Not applicable for the respective model.\u2217 Literature data had background heat capacity subtracted for these proteins, so thenon-linear temperature-dependent model could not be applied.\u2021 Literature values obtained from differential scanning calorimetry. \u00a7 Literature valuesobtained from spectroscopy measurements.39Table 2.2: Comparison of the variance and covariance of fits to\u2206G vs T data for Notch Ankyrin from ref. ([138]), for the three mod-els of the temperature dependence of \u2206Cp discussed. For each model, amatrix is given in which the diagonal elements are the relative deviationsfor that quantity, and the off-diagonal elements are the correlation coeffi-cients for the two quantities. Relative deviation for e.g. \u2206Hf is definedas (\u3008\u2206H2f \u3009 \u2212 \u3008\u2206Hf \u30092)1\/2\/\u3008\u2206Hf \u3009, where averages are over the Monte Carlogenerated data. In all models the entropy and enthalpy of unfolding arehighly correlated.T -independent \u2206Cp\u2206Hf \u2206Sf \u2206Cp\u2206Hf 0.016 0.9997 0.989\u2206Sf 0.017 0.989\u2206Cp 0.033T -linear \u2206Cp\u2206Hf \u2206Sf \u2206Cpf \u2206C\u2032p\u2206Hf 0.055 0.99997 0.987 -0.960\u2206Sf 0.056 0.988 -0.959\u2206Cpf 0.22 -0.991\u2206C \u2032p 3.0T -non-linear \u2206Cp\u2206Hf \u2206Sf \u2206Cpf \u2206C\u2032pn\u2206Hf 0.055 0.99997 0.987 -0.958\u2206Sf 0.056 0.988 -0.957\u2206Cpf 0.22 -0.990\u2206C \u2032pn 1.20cosolute; as well, for a number of cosolutes thermodynamic data exists formore than one protein. These commonalities and differences enable usefulcomparisons.Because \u2206Hf = Tf\u2206Sf , the folding temperature Tf is unchanged upontransfer to cosolute if \u03b4\u2206Hf\/\u2206H0f = \u03b4\u2206Sf\/\u2206S0f where \u2206H0f and \u2206S0f arethe mid-point enthalpy and entropy of unfolding at cosolute concentrationc = 0, respectively, and \u03b4\u2206Hf and \u03b4\u2206Sf are defined in Equations (2.11) and(2.12). Thus, on a plot with \u03b4\u2206Hf\/\u2206H0f and \u03b4\u2206Sf\/\u2206S0f on the ordinateand abscissa, a line of slope unity would constitute no change in folding40temperature due to the cosolute. Further, sinceT 0f + \u03b4Tf =\u2206H0f + \u03b4\u2206Hf\u2206S0f + \u03b4\u2206Sf\u2248 T 0f + T 0f(\u03b4\u2206Hf\u2206H0f\u2212 \u03b4\u2206Sf\u2206S0f)we conclude that the distance of each point from the line y = x is, to firstorder, the relative change in the folding temperature, \u03b4Tf\/T0f , as a result ofthe transfer.We may further interpret the deviation from the line y = x on a plotof \u03b4\u2206Sf\/\u2206S0 vs \u03b4\u2206Hf\/\u2206H0 as the free energy of transfer at fixed tem-perature T 0f , \u03b4\u2206G(T0f ), divided by the unfolding enthalpy \u2206H0f . That is,\u03b4\u2206Hf\/\u2206H0f \u2212 \u03b4\u2206Sf\/\u2206S0f = \u03b4\u2206G(T 0f )\/\u2206H0f .In Figures 2.4 and 2.5, we divide both \u03b4\u2206Hf\/\u2206Hf and \u03b4\u2206Sf\/\u2206Sf bythe concentration c of the cosolute, to compare cosolutes solutions havingdifferent concentrations. Thus the axes in Figures 2.4 and 2.5 can be thoughtof as a decomposition of m-values[143] into enthalpic and entropic compo-nents, each normalized by the corresponding unfolding enthalpy or entropyin the absence of cosolute.Linear regression to the data in Figures 2.4 and 2.5, when taken togethergives a slope of 0.99 \u00b1 0.04. The statistical test outlined in Krugg et. al.[26] requires that the slope of the best fit \u03b4\u2206Hf\/\u2206H0f vs. \u03b4\u2206Sf\/\u2206S0f line bemore than 2\u03c3 away from the harmonic mean of the temperatures at whichthe experiments were performed; thus for Figure 2.4 this requires that theslope be 2\u03c3 away from unity. Figure 2.4 fails this test. Nevertheless, weshow that the results in Figure 2.4 are in fact statistically significant, giventhe magnitude of the experimental errors. We now describe a treatment ofthe statistical significance of entropy-enthalpy compensation that is validwhen the slope of the \u2206H-\u2206S plot is near unity.The Monte Carlo method in Section 2.2.1 can be applied to data atvarious concentrations of cosolutes to assess the significance of the linearrelationship between enthalpy and entropy observed in Figures 2.4-2.5. Fit-ting each experimental data set to the appropriate equation as describedin Section 2.2.1 results in a set of best fit parameters, as well as a set of41residuals from the best fit. For the following protein\/cosolute systems\u2014\u03b1-lactalbumin in ethanol[7], arc repressor in KCl[140], creatine kinase inglycerol[141], hisactophilin in urea[6], histidine containing phosphocarrier inurea[139] notch ankyrin in urea[138], and RNase A in urea[142] \u2014 1000 datasets were generated as described in Section 2.2.1, for each of several concen-trations of cosolute. Thus 1000 values for \u03b4\u2206Hf and \u03b4\u2206Sf were generated.We plot the results of this procedure as scatter points in Figures 2.4-2.6.If we take the average of the extent of the scatter for these six datapoints as an estimate for the experimental uncertainty, and apply it to allother data in Figure 2.4, we can assess the significance of the apparentlinear relationship in the plot. The analysis rests on the assumption thatthe deviations from the line of slope unity, \u03b4Tf = 0, are much smaller thanthe deviations from zero of either \u03b4\u2206Hf or \u03b4\u2206Sf , i.e., that the data areessentially distributed along the diagonal. We may then consider the dataas transformed to a coordinate system that is rotated pi\/4 counterclockwise,and translated so that the origin coincides with the mean of the data. Thenthe data consists approximately of points distributed along the abscissa allhaving zero ordinate. If the variance of this data is large compared to whatwould be expected from the experimental error as derived from the aboveMonte-Carlo method, then the result of entropy-enthalpy compensation issignificant.To assess the significance we use a bootstrapping method, to avoid re-quiring an assumption as to the distribution of the points along the y = xline. From the 48 data points in Figures 2.4 and 2.5 we perform randomsampling with replacement to generate new sets of data (also with n = 48).We find the standard deviation of each of these generated sets and thusobtain a distribution in \u03c3. We obtain another distribution in \u03c3 by samplingfrom the Monte Carlo generated deviations. From the 6000 total generatedpoints (1000 for each of 6 proteins) we sample with replacement to obtainmany sets of 1000 deviations. The standard deviations of these sets formsanother distribution in \u03c3. Distributions formed with this procedure are plot-ted in figure 2.3. The overlap of these two distributions then provides thesignificance\u2013the likelihood that the scatter in the experimental data arises42from the fit uncertainty. For the data in Figures 2.4 and 2.5 we obtainp < 10\u22126. With near certainty, these results illustrate entropy-enthalpycompensation rather than experimental error.For almost all the points in Figures 2.4 and 2.5 the magnitude of theenthalpy change is larger than the magnitude of the entropy change. Noneof the systems we examined showed a destabilizing cosolute with a changein entropy of unfolding larger than the change in enthalpy of unfolding, andonly a few systems showed a stabilizing cosolute with a change in entropy ofunfolding larger than the change in enthalpy of unfolding. Thus for the mostpart enthalpy drives the change in stability, while entropy tries to catch upand partially compensates.Figure 2.5 also shows the transfer enthalpy and entropy for simulationresults by O\u2019Brien et al. on Cold shock protein and protein L [144] (openblack symbols in Figure 2.5). Here, thermodynamic parameters were ex-tracted from fits to simulated heat capacity curves, for the transfer of theabove proteins to either urea or TMAO. A surface-area based Tanford trans-fer model [145] was used to model the cosolute solution. We have not foundexperimental values for these thermodynamic parameters in the literature;the values in Figure 2.5 are thus predictions as a consequence of both the sim-ulation method for generating unfolded ensembles, and the Tanford transfermodel, which are subject to experimental test.Importance of the Cosolute- and Temperature-Dependence of theUnfolding Heat CapacityThe unfolding heat capacity \u2206Cp(T, c) is generally both temperature- andconcentration-dependent. We define the change in unfolding heat capacityupon transfer, \u03b4\u2206Cpf , in Equation (2.13). While the quantities \u03b4\u2206Hf and\u03b4\u2206Sf in Equations (2.11) and (2.12) are independent of \u03b4\u2206Cpf , since theyare always evaluated at the respective transition temperatures, the ther-modynamics for transfer at fixed temperature (e.g. lab temperature) doesdepend on \u03b4\u2206Cpf .A number of the works cited here obtained a concentration-independent43\u2206Cp however, by equating \u2206Cp to the slope of \u2206Hf vs. Tf data for variouscosolute concentrations. This assumes that \u2206Cp is constant with varyingcosolute concentration, and hence \u03b4\u2206Cpf = 0. These proteins\/cosolutesare thus indicated in Table 2.3 by \u201c0\u2020\u201d in the column for \u03b4\u2206Cpf . This as-sumption may be sufficient if \u03b4\u2206Hf , \u03b4\u2206Sf , or \u2206Cp(c = 0) is the quantityof interest, however when Equations (2.2) and (2.3) are used to evaluatethermodynamic parameters at lab temperature, this assumption producesunacceptably large errors. Examples of the change in thermodynamic valuesobtained by setting \u03b4\u2206Cpf = 0 are shown in Figure 2.6, for the transfer ofHisactophilin and RNase A to urea: neglecting \u03b4\u2206Cp changes the resultingvalue of \u03b4\u2206H by \u2248 80 kJ\/mol for Hisactophilin and \u2248 40 kJ\/mol for RNaseA. This is to be compared with the error introduced by neglecting the tem-perature dependence of \u2206Cp, which was \u2248 30 kJ\/mol for Hisactophilin and\u2248 10 kJ\/mol for RNase A.In Table 2.3 we have made a note of where the \u03b4\u2206Cp = 0 assumption hasbeen made, and we have not plotted the corresponding \u03b4\u2206Hlab and T\u03b4\u2206Slabdata in Figure 2.6. Note that the potential for large errors due to \u03b4\u2206Cp isirrelevant when comparing data at the respective folding temperatures, i.e.for the quantities in Equations (2.11) and (2.12), and Figures 2.4 and 2.5.Figure 2.7 plots the concentration-dependence of \u2206Cpf for several protein-cosolute systems, obtained by using the non-linear temperature-dependentmodel in equation 2.1. The values plotted do not change significantly ifthe linear temperature-dependent model is used (see for example Table 2.1,which shows that the values obtained from the two models are comparable).Some proteins have a \u2206Cpf showing weak concentration-dependence (e.g.Barstar in GdmHCl, Acylphosphatase in urea), while for others, \u2206Cpf showssignificant concentration-dependence (RNase A in urea, \u03b1-Lactalbumin inethanol). Furthermore, \u2206Cpf need not even be monotonic in T ; the non-monotonic behaviour exhibited by RNase A in urea is well beyond whatcan be explained by the experimental uncertainty. Factoring in the inter-model uncertainty does not change this; as panel B shows the non-monotonicbehaviour is present in both the T-independent and linear T-dependentmodels. A concentration-independent heat capacity is often obtained from44linear fits of the unfolding enthalpy vs. melting temperature for variousosmolyte concentrations. For the proteins in Figure 2.7 that have strongc-dependence, this would be a recipe prone to large errors.Transfer Entropy and Enthalpy at Lab TemperatureFor 19 proteins and cosolutes that we had investigated, the concentrationdependence of \u2206Cpf is known. For these proteins, we have obtained thetransfer enthalpy of unfolding \u03b4\u2206H(T = 25\u25e6C) and the transfer entropy ofunfolding \u03b4\u2206S(T = 25\u25e6C) at lab temperature; values are tabulated in Ta-ble 2.3. For 12 protein-cosolute systems, thermodynamic parameters at thefolding temperatures corresponding to different cosolute concentrations weretabulated in the literature. For these systems, a temperature-independent\u2206Cp model was invariably used to obtain the tabulated values. We thus hadto also assume a temperature-independent \u2206Cp model in order to extrapo-late the thermodynamic values to lab temperature. We show below howeverthat this procedure may be prone to large errors.Seven references contained plotted data, which we had fitted to obtainthermodynamic parameters. For 5 of these protein-cosolute systems, thenon-linear temperature-dependent \u2206Cp model was used to extrapolate tolab temperature. Two of these systems had baselines subtracted in thepublished data, so a linear temperature-dependent \u2206Cp model was used toextrapolate to lab temperature. All of these 7 protein-cosolute systems showscatter due to our Monte Carlo procedure that is indicated in Figure 2.6(though for \u03b1-Lactalbumin in ethanol and RNase A in urea the scatter issmall).The extent of the scatter in Figure 2.6 makes it clear that for any ofthe three methods of obtaining \u03b4\u2206H and \u03b4\u2206S (heat capacity, fraction un-folded, or unfolding free energy, with the method for each protein indicatedin Table 2.3), the uncertainties are highly correlated and can be quite large.There is very little scatter orthogonal to the lines of constant stability onthe \u03b4\u2206H-T\u03b4\u2206S plot. The scatter along the equi-stability line is significantlylarger however; for arc repressor in particular, the scatter is large enough to45render the sign of \u03b4\u2206H and T\u03b4\u2206S uncertain. The scatter in the data forRNase transfer to urea is quite small on the other hand, even though thescatter in the data for creatine kinase in glycerol, also from heat capacitymeasurements, is large. The average standard deviation along the diagonalwas 21 kJ\/mol, compared with an average of error of 0.32 kJ\/mol perpen-dicular to the diagonal, corresponding to the change in the unfolding freeenergy upon transfer \u03b4\u2206G.We apply the same procedure described in Section 2.2.2 to evaluate thesignificance of the entropy-enthalpy compensation here. In this case thereare 23 data points in Figure 2.6, and the data themselves have a samplestandard deviation of about s \u2248 65 kJ\/mol, whereas the mean Monte-Carlostandard deviation applied to each data point is about \u03c3 \u2248 19 kJ\/mol.Bootstrapping with the same procedure described in section 2.2.2 rejectsthe hypothesis that the scatter arises from the uncertainty in fitting witha significancep = 5 \u00d7 10\u22125. The result in Figure 2.6 thus also illustratesentropy-enthalpy compensation rather than experimental error.One caveat of the significance is that the experimental uncertainty onlyincluded the fit uncertainty, and not the model to model uncertainty. At labtemperature the model to model uncertainty is approximately a factor of twolarger than the fit uncertainty. To assess the significance of entropy-enthalpycompensation with model to model uncertainty factored in, we perform an-other bootstrapping procedure. This time we create a distribution in \u03c3 bysampling many sets of 6 standard deviations from the 6 deviations foundfrom the model to model uncertainty. This distribution is then comparedto the distribution arising from bootstrapping from the experimental labtemperature data. With this consideration the significance from the boot-strapping method drops to p = 0.085.4600.020.040.060.0810 20 30 40 50 60 70 80 90 100Distribution from DataDistribution from MC uncertaintiesCounts (Normalized)Standard Deviation (kJ\/mol)Figure 2.3: Distribution in \u03c3 arising from the bootstrapping method de-scribed in section 2.2.2. The overlap of the two distributions representsthe statistical significance of entropy-enthalpy compensation\u2013in this casep = 5\u00d7 10\u22125.47Table 2.3: Empirical values for thermodynamic unfolding parameters upon transfer of various proteinsto various solvents. For each protein-cosolute system, the values are listed at the concentration corresponding to(100g\/l). The literature reference from which the values were obtained, along with the corresponding figure ortable in that reference, is listed in the last column. For systems in which the parameters were obtained throughfitting a curve to the data in the reference work, the equation from this work used for the fitting is also listed belowthe table. Systems without an equation listed were those in which \u2206Hf and \u2206Sf values were directly available.Protein Cosolute pH \u2206H0f\u2217\u2217 \u2206S0f Tf \u2206Cpf\u03b4\u2206Hf\u2206H0f\u03b4\u2206Sf\u2206S0f\u03b4\u2206Cpf \u03b4Tf \u03b4\u2206Hlab T\u03b4\u2206Slab Ref\/Fig\/Tab\u00a7\u03b1-CTgen\u03b1 Trehalose 2.5 403 1.27 317 4.4 6.7e-3 9.7e-4 0\u2020 1.8 2.69\u2020 0.37\u2020 [146] Tab. 1\u03b1-lactalbumin\u2217 Ethanol 8 310 0.916 338 5.3 -8.2e-3 -5.9e-3 0.10 19 1.11 1.67 [7] Fig. 3c\u03b1-lactalbumin TMAO 7 209 0.663 315 6.5 0.27 0.25 -0.133 5.0 22.5 17.4 [147] Tab. 1Acylphosphatase Urea 5.5 351 1.06 331 6.2 -0.11 -0.10 -0.037 -3.7 -4.17 1.07 [148] Tab. 2Arc Repressor\u2217 KCl 4 141 0.463 304 1.0 0.25 0.14 5.20 29 -89.1 -102 [140] Fig. 7aBarstar GdnHCl 8 292 0.849 343 6.2 -0.43 -0.41 -1.9 -12 -16.2 -5.64 [149] Tab. 2Cro Protein Urea 6 195 0.591 330 3.8 -0.22 -0.20 0.09 -8.2 -7.47 -3.45 [150] Tab. 2Cytochrome c Sorbitol 2 226 0.740 305 5.2 0.12 0.09 -1.7 8.4 13.6 7.28 [151] Tab. 1Cytochrome c Trehalose 7 161 0.502 321 N\/A -3.4e-2 -3.7e-2 0\u2020 24 -5.38\u2020 -4.78\u2020 [146] Tab. 1Creatine Kinase\u2217 glycerol 8.05 782 2.38 329 92 2.0e-2 1.9e-2 2.46 0.32 -93.1 -88.4 [141] Fig. 4cDe Novo \u03b1 B GdnHCl 7.3 103 0.300 343 2.3 -0.53 -0.52 0.21 -7.2 0.564 4.52 [152] Tab. 2De Novo \u03b1 C GdnHCl 7.3 153 0.441 347 2.7 -0.48 -0.47 -0.07 -6.5 3.94 10.5 [152] Tab. 2Hisactophilin\u2217 Urea 5.8 215 0.658 327 6.1 -0.89 -0.85 3.63 -87 -142 -123 [6] Fig. 6bHexokinase Glucose 8 700 2.19 320 30 0.51 0.46 1.5 11 -30.0 -58.1 [153] Tab. 2HPr\u03b2 Urea 7 315 0.935 336 4.4 -0.16 -0.16 -0.0054 -27.1 36.4 39.7 [139] Fig. 2aLectin (Pea) Urea 7.2 1130 3.25 347 22 -1.6e-2 -1.1e-2 0.74 -1.8 -7.12 -3.40 [154] Tab. 2Lysozyme DMSO 2.5 535 1.58 339 7.8 -4.4e-3 -3.4e-3 0\u2020 -034 1.00\u2020 1.50\u2020 [155] Tab. 2Lysozyme Trehalose 7 397 1.20 331 N\/A -1.5e-2 -2.2e-2 0\u2020 2.4 -5.99\u2020 -7.09\u2020 [146] Tab. 1Lysozyme TMAO 6 535 1.50 357 6.8 7.5e-2 6.5e-2 0.089 3.3 12.3 5.59 [147] Tab. 1Notch Ankyrin\u2217 Urea 8 592 1.86 318 15 -0.45 -0.43 -1.26 -11 -63.2 -43.8 [138] Fig. 4bRNase A \u03b2-hydrox\u03b3 5.5 364 1.09 334 4.4 2.8e-2 2.4e-2 0.194 1.3 -3.16 -4.36 [156] Tab. 1RNase A Betaine 5.5 364 1.09 334 4.4 4.2e-2 4.1e-2 0.214 0.32 5.39 3.42 [156] Tab. 1RNase A Betaine 6.0 364 1.09 334 0 5.3e-2 5.0e-2 0\u2020 0.95 0\u2020 -0.291\u2020 [157] Figs. 2,4dRNase A Trehalose 7 385 1.20 321 4.7 1.4e-2 9.1e-3 0\u2020 1.6 5.51\u2020 3.38\u2020 [146] Tab. 1RNase A Glycine 6.0 364 1.09 334 0 -2.0e-2 -2.7e-2 0\u2020 2.4 0\u2020 -0.659\u2020 [157] Figs. 2,4dRNase A Sarcosine 6.0 364 1.09 334 0 1.1e-2 -4.0e-4 0\u2020 3.8 0\u2020 -1.20\u2020 [157] Figs. 2,4d48cont.Protein Cosolute pH \u2206H0f \u2206S0f Tf \u2206Cpf\u03b4\u2206Hf\u2206H0f\u03b4\u2206Sf\u2206S0f\u03b4\u2206Cpf \u03b4Tf \u03b4\u2206Hlab T\u03b4\u2206Slab Ref\/Fig\/Tab\u00a7RNase A TMAO 7 490 1.46 336 5.2 7.3e-2 6.2e-2 0.022 3.5 16.1 9.35 [147] Tab. 1RNase ACal GdnHCl 7 515 1.53 337 11 -0.23 -0.22 0\u2020 -4.3 -0.246\u2020 10.8\u2020 [142] Tab. 1RNase AUV GdnHCl 7 452 1.35 335 6.3 -0.16 -0.14 0\u2020 -7.8 0.44\u2020 12.3\u2020 [142] Tab. 1RNase ACal Methylurea 7 515 1.53 337 7.1 -5.3e-2 -4.5e-2 0\u2020 -2.8 -0.508\u2020 4.19\u2020 [142] Tab. 1RNase AUV Methylurea 7 452 1.35 335 4.8 -4.4e-2 -3.4e-2 0\u2020 -3.5 -0.419\u2020 4.28\u2020 [142] Tab. 1RNase ACal Dimethylurea\u03b4 7 515 1.53 337 3.7 -2.8e-2 -1.8e-2 0\u2020 -3.4 -0.08\u2020 5.04\u2020 [142] Tab. 1RNase AUV Dimethylurea\u03b4 7 452 1.35 335 1.2 -8.7e-2 3.2e-2 0\u2020 -39 0.79\u2020 5.64\u2020 [142] Tab. 1RNase ACal Ethylurea 7 515 1.53 337 3.6 -4.0e-2 -2.7e-2 0\u2020 -4.5 0.13\u2020 6.02\u2020 [142] Tab. 1RNase AUV Ethylurea 7 452 1.35 335 3.9 -4.2e-2 -2.7e-2 0\u2020 -5.2 1.05\u2020 8.25\u2020 [142] Tab. 1RNase ACal Butylurea 7 515 1.53 337 6.0 -0.21 -0.17 0\u2020 -16 4.3\u2020 23.4\u2020 [142] Tab. 1RNase AUV Butylurea 7 452 1.35 335 7.3 -0.26 -0.21 0\u2020 -21 7.4\u2020 30.3\u2020 [142] Tab. 1RNase A\u2217 Urea 7 501 1.49 336 14.6 -4.0e-2 -2.9e-2 -1.5 -3.8 43.0 46.6 [142] Fig. 1cSsh10b GdnHCl 6.8 307 0.840 365 N\/A -0.18 -0.14 0\u2020 -17 -54.4\u2020 -34.3\u2020 [158] Tab. 2Tryps inh\u000f Trehalose 7 236 0.711 332 1.1 8.8e-3 3.4e-3 0\u2020 1.8 2.08\u2020 0.74\u2020 [146] Tab. 1Ubiquitin PVP 5.4 100 0.265 377 1.5 -0.2 -0.2 -0.1 0 -26.0 -14.8 [159] Tab. 3Ubiquitin Ficoll 5.4 100 0.265 377 1.5 -0.5 -0.6 -0.5 94 -36.1 -44.5 [159] Tab. 3Ubiquitin BSA 5.4 100 0.265 377 1.5 0.5 0.5 1.6 0 -310 -295 [159] Tab. 3Ubiquitin Lysozyme 5.4 100 0.265 377 1.5 -0.1 -0.2 0.1 47 -88.1 -107 [159] Tab. 3Ubiquitin NaCl 2 203 0.617 329 3.0 0.547 0.383 0\u2020 39 111\u2020 70.4\u2020 [160] Tab. 1Ubiquitin CaCl2 2 203 0.617 329 4.6 3.21 2.67 0\u2020 48 652\u2020 490\u2020 [160] Tab. 1Ubiquitin MgCl2 2 203 0.617 329 4.4 1.87 1.49 0\u2020 50 379\u2020 273\u2020 [160] Tab. 1Ubiquitin GdmCl 2 203 0.617 329 4.7 0.248 0.201 0\u2020 13 50.3\u2020 37.0\u2020 [160] Tab. 1\u00a7Literature reference and corresponding tabulated value, or figure used to extract the values listed here.\u2217\u2217Values listed for \u2206H0f and \u2206S0f are the unfolding enthalpy and entropy at Tf , 0M cosolute concentration, and pH indicated in thecorresponding column. Throughout this table, values for enthalpy, entropy, and heat capacity are given in kJ\/mol, kJ\/mol\/K, andkJ\/mol\/K respectively.aFit to Equations 2.4 and 2.9 bFit to Equation 2.4. cFit to Equation 2.10. d Used \u2206H(Tf ) = Tf\u2206S(Tf ) to obtain unfolding entropyN\/A: \u2206Cp data not available.\u2020\u2206Cp measured for one concentration and assumed constant with respect to concentration in thesereferences. The values for \u03b4\u2206Hlab and T\u03b4\u2206Slab for these systems are thus likely inaccurate.\u03b1\u03b1-chymotrypsinogen. \u03b2Histidine-containing phosphocarrier protein. \u03b3 \u03b2-hydroxyectoine. \u03b4 N-N\u2019 Dimethylurea. \u000fTrypsin Inhibitor.Cal Measurements taken with calorimetry; UV Measurements taken with UV absorption;\u2217 Monte-Carlo error analysis is performed on this protein in Figures 2.4-2.6.492.3 Concluding RemarksIn this chapter we have observed and analyzed significant entropy-enthalpycompensation across both diverse proteins and diverse cosolute solutions, byperforming a rigorous thermodynamic analyis of calorimetric and spectro-scopic data, which included Monte-Carlo error estimates and a comparisonacross different models of the temperature-dependence of the unfolding heatcapacity. Uncertainties in enthalpy and entropy, while much larger than theuncertainty in free energetic stability, do not rule out significant entropy-enthalpy compensation as a general phenomenon in protein transfer. Theaccuracy of the temperature-dependence and concentration-dependence ofthe unfolding heat capacity is not important near the folding transition, butis important if we are interested for example in the stability at lab temper-ature.We can consider several possible scenarios for stabilizing and destabiliz-ing cosolutes. A stabilizing cosolute, for example, could act in two differentways: it could increase the change in enthalpy upon unfolding, stabilizingthe protein, while increasing the entropy of unfolding by a lesser amount.Or it could decrease the change in entropy upon unfolding, while decreasingthe enthalpy by a lesser amount. We refer to the former case as an enthalpi-cally stabilizing cosolute, and the latter case as an entropically stabilizingcosolute. Conversely a cosolute could decrease the change in enthalpy uponunfolding, destabilizing the protein, while decreasing the entropy of unfold-ing by a lesser amount; we refer to this as an enthalpically destabilizingcosolute. An entropically destabilizing cosolute would then be one that in-creases the entropy of unfolding while increasing the enthalpy of unfoldingby a lesser amount. A cosolute could also in principle be stabilizing in bothenthalpy and entropy (or destabilizing in both), which we refer to as uncom-pensated stabilization (or destabilization). These regions are illustrated infigure 2.8.Each of these regions could in principle be populated, but examiningthe data in figures 2.4 and 2.5 shows an interesting pattern. In the sys-tems we looked at, the majority of cosolutes were enthalpically stabilizing50DestabilizeStabilize10.50.5-0.5-1 1-0.5-1Hisactophilin with Urea(pH 5.8)42Barstar with GdnHCl(pH 8)53Ubiquitin with Ficoll(pH 5.4)63Ubiquitin with lysozyme(pH 5.4)63SSH10b with GdnHCl(pH 6.8)62Ubiquitin with NaCl(pH 2)64Ubiquitin with GdmHCl(pH 2)64alpha-LA with TMAO(pH 7)51Arc repressor with KCl(pH4)44Drosophila Notch ankyrin with Urea(pH 8)41RNase A with GdnHCl (Cal)(pH 7)46RNase A with GdnHCl (UV)(pH 7)46RNase A with Butylurea (Cal)(pH 7)46RNase A with Butylurea (UV)(pH 7)46Cytochrome-c with Sorbitol (pH 2)55Acylphosphatase with Urea (pH 5.5)52Cro Protein with Urea (pH 6)54De Novo alpha B with GdmHCl (pH 7.3)56De Novo alpha C with GdmHCl (pH 7.3)56Hexokinase with Glucose (pH 8)57Pea Lectin with Urea (pH 7.2)58HPr with urea(pH 7)43HfH0f cSfS0 fcFigure 2.4: Entropy-enthalpy compensation for protein unfolding transfer en-thalpy \u03b4\u2206Hf and unfolding transfer entropy \u03b4\u2206Sf , both evaluated at the folding midpointand suitably normalized as described below. The legend, listed from the upper right datapoint to the lower left data point, indicates the protein, cosolute, pH, and correspond-ing source of the experimental data. Cosolutes above and to the left of the diagonalare destabilizing as noted; cosolutes below and to the right of the diagonal are stabi-lizing. Abscissa\/ordinate are the transfer enthalpy\/entropy normalized by the unfoldingenthalpy\/entropy in the absence of solute, per 100g\/L of cosolute, i.e. \u03b4\u2206Hf\/(\u2206Hf \u00b7 c) vs\u03b4\u2206Sf\/(\u2206Sf \u00b7 c). Also plotted here are the Monte Carlo-generated scatter points for ArcRepressor in KCl (red circle), Notch Ankyrin in urea (green circle), and Hisactophilin inurea (blue circle). Bars on each of these three points show the standard deviation in thedirection of the scatter. The scatter here does not substantially reduce the significance ofthe linear compensating trend. See also Table 2.3, which gives thermodynamic parametersfor the proteins we study here.51DestabilizeStabilize0.050.0250.025-0.025-0.05 0.05-0.025-0.05RNase A with betaine(pH 5.5)60Trypsin inhibitor with trehalose(pH 7)50alpha-CTgen with trehalose(pH 2.5)50Lysozyme with trehalose(pH 7)50Cytochrome with trehalose(pH 7)50RNase A with betaine(pH 6)61RNase A with beta-hydroxyectoine(pH 5.5)60RNase A with trehalose(pH 7)50RNase A with sarcosine(pH 6)61RNase A with glycine(pH 6)61RNase A with TMAO(pH 7)51Lysozyme with TMAO(pH 6)600.0750.075Creatine Kinase with glycerol(pH 8.05)45RNase A with Methylurea (UV)(pH 7)46RNase A with N-N' Dimethylurea (UV)(pH 7)46RNase A with ethylurea (UV)(pH 7)46RNase A with ethylurea (Cal)(pH 7)46RNase A with N-N' Dimethylurea (Cal)(pH 7)46RNase A with Methylurea (Cal)(pH 7)46Lysozyme with DMSO (pH 2.5)59alpha-Lac with ethanol (pH 8)38-0.075-0.075RNase A with Urea (Cal)(pH 7)46Protein L with TMAO (sim)48Cold Shock Protein with TMAO (sim)48Cold Shock Protein with Urea (sim)48Protein L with Urea (sim)48alpha-Lac with ethanol, (pH 5.8)38HfH0f cSfS0 fcFigure 2.5: Further illustration of entropy-enthalpy compensationfor various proteins and solvents. The notation here is the same as in Figure2.4, but the scale of the plot is significantly smaller. Scatter as a result ofuncertainty for Creatine Kinase in glycerol (blue circle) is shown, along withbars to indicate the standard deviation. Scatter was calculated for RNase Ain urea (cyan circle) and \u03b1-lac in ethanol (mustard circle), but the scatteris smaller than the data point appearing on this plot. Black open symbolscorrespond to simulation data using the Tanford transfer model taken fromO\u2019Brien et. al. (ref. [144]). Legend labels are ordered from upper right datapoint to lower left data point.52-160-120-80-4040-160 -120 -80 -40 40RNase with betaine(pH 5.5)60RNase with glycine(pH 6)61RNase with sarcosine(pH 6)61RNase with beta-hydroxyectoine(pH 5.5)60Barstar with GdnHCl(pH 8)53Ubiquitin with PVP(pH 5.4)63Drosophila Notch ankyrin with Urea(pH 8)41Ubiquitin with Ficoll(pH 5.4)63Creatine Kinase with glycerol(pH 8.05)45Arc repressor with KCl(pH 4)44Ubiquitin with lysozyme(pH 5.4)63Hisactophilin with Urea, (pH 5.8)42RNase with betaine(pH 6)61Acylphosphatase with Urea (pH 5.5)52Hexokinase with Glucose (pH 8)57RNase A with Urea (pH 7)46assuming constantassuming constantalpha-Lac with ethanol, (pH 5.8)38DestabilizeStabilizewith cassuming constant with Twith cassuming constant with T8080HPr with urea(pH 7)43H (kJ\/mol)S (kJ\/mol)TFigure 2.6: Entropy-enthalpy compensation for the transfer of vari-ous proteins to various solvents is also seen by plotting \u03b4\u2206H vs. T\u03b4\u2206Sat lab temperature (25\u25e6 C). The points cluster close to the \u03b4\u2206H = T\u03b4\u2206Sline; the deviation from that line (horizontal or vertical) represents the abso-lute change in stability upon transfer at 25\u25e6 C. Points above the line corre-spond to destabilizing cosolutes, points below the line correspond to stabiliz-ing solutes. Scatter points representing the range of uncertainty in obtainingthe enthalpy and entropy are also shown, as determined by the Monte Carlomethod described in Section 2.2.1. The scatter is highly correlated with amagnitude that in some cases is large enough to change the sign of \u03b4\u2206H andT\u03b4\u2206S. The compensation is statistically significant however\u2014see Section2.2.2. In the case of \u03b1-Lactalbumin in ethanol, the scatter was smaller thanthe symbol. The cyan circle with black outline indicates the Hisactophilindata assuming \u2206Cp is independent of the concentration of urea, but hasnon-linear temperature-dependence obtained by fitting to Equation (2.4);the cyan circle with black square outline is the value obtained assuming \u2206Cpis independent of temperature, but still accounting for the concentration-dependence. These approximations both introduce significant error: \u2248 80kJ\/mol and 30 kJ\/mol respectively. Similarly, assuming a concentration-independent unfolding heat capacity introduces an error of \u2248 40 kJ\/mol forRNase A in urea (circled blue cross) and a temperature-independent \u2206Cpintroduces an error of \u2248 10 kJ\/mol. Legend labels are ordered from upperright data point to lower left data point.53-12-8-4040 1\u03b1-Lac in ethanol (Cmax= 30%)Acylphosphatase in urea (Cmax = 2.8 M)Barstar in GdmHCl (Cmax = 1.8 M)RNase A in urea (Cmax = 7 M)Hisactophilin in urea(Cmax = 1 M)c\/cmaxC pf(c)C pf(c=0)(kJ\/mol\/K)0.50102030400 2 4 6T-independent ModelLinear T-dependence Modelc (M)Cpf (kj\/mol\/K)A BFigure 2.7: A) Concentration-dependence of the heat capacity forseveral protein-cosolute systems. For RNase A in urea, hisactophilin in urea,and \u03b1-Lactalbumin in ethanol, error bars were determined from the MonteCarlo method described in Section 2.2.1. Error bars are not present foracylphosphatase or barstar because the corresponding literature data werenot available for application of the Monte-Carlo method. The x-axis wasnormalized to facilitate comparison across proteins. B) heat capacity vsconcentration for the T-independent and linear T-dependent models of \u2206Cpfor RNase A in urea.or enthalpically destabilizing. Few were entropically stabilizing, and noneentropically destabilizing. None could be confidently assigned as uncom-pensated stabilizers or destabilizers based on the error bars shown. Thepicture that emerges here is one in which enthalpy plays the dominant rolein stabilization or destabilization, with entropy partially compensating theeffect of enthalpy. This is consistent with recent results by Senske et al.[161]which found that even some macromolecular crowders, which are typicallyassumed to act primarily through entropy, and in fact enthalpic stabilizers.Early results by Ben-Naim [162], Grunwald [163], Karplus [164], andLee [165] have analyzed the invariable entropy-enthalpy compensation thatoccurs during solvent reorganization around a solute due to solvent-solventinteractions. In these theories, cavity creation results in a singular solute-solvent potential and is non-compensating, the limiting case being the freeenergy of inserting a non-interacting, hard-sphere solute. This issue is un-54Figure 2.8: A diagram illustrating the possible categories of cosolutediscussed in section 2.3likely to be a factor in the transfer scheme wherein a solute (protein) istransferred from pure buffer to solution containing cosolute: volume is in-deed lost to buffer and cosolute upon transfer at constant pressure, but is alsogained to the pure buffer system. A systematic analysis of entropy-enthalpycompensation in protein transfer using density functional theory to capturethe effects of solvation is an interesting topic for future work.[166, 167]55Chapter 3Classical Density FunctionalTheory and Protein-CosoluteInteractionsThe problem we consider in this chapter is that of calculating the free en-ergy change upon moving a solute such as a protein from a pure waterenvironment and inserting it into a water and cosolute environment. Figure3.1 illustrates the problem we are considering in the context of the Tanfordtransfer model for protein folding [168]. The cycle depicted here impliesthat if we know properties (such as the unfolding free energy) of the proteinof interest in water, and we know the transfer free energy of each state ofthe protein from water to water and cosolute, we can find the correspondingproperties of the protein in water and cosolute. This can be implementedeither as an implicit solvent model, which we will discuss in Section 3.4, orin a post-processing way as in Ref. [145], which we will use to empiricallyfit our DFT model in Section 3.3.The organization of this chapter is as follows. We begin in Section 3.1 byinvestigating the expected behavior of the surface and volume contributionsto the transfer free energy in a heuristic model. In Section 3.2 we derive theprincipal equations for the DFT model of the transfer free energy. In Section3.2.2 - Section 3.4, we consider several examples of how the DFT model can56Figure 3.1: A diagram of the Tanford transfer model, for a transferprocess going from a pure water environment to one of water and cosolutes.Knowledge of the free energy of unfolding \u2206Gu\u2192fwat in the absence of cosolutescan be combined with the transfer free energies of the folded (\u2206Gfoldw\u2192o)and unfolded (\u2206Gunv\u2192s) states to obtain the free energy of unfolding in thepresence of cosolutes \u2206Gu\u2192fcos .be applied, making connections with the model developed in Section 3.1. Wefinally conclude and give our outlook on future directions for this approach.3.1 Volume and Area Terms in the Transfer FreeEnergy3.1.1 Volume ConsiderationsTo appreciate the terms that we expect in an expression for the transferfree energy, we initially consider both volume and surface area effects in amore qualitative way. We consider the difference in volume occupied by thefolded and unfolded states, or more precisely the expanded and collapsedstates of a polymer, to obtain the corresponding free energy difference in the57presence of a bath of \u201chard-sphere\u201d cosolutes. There are thus no surface in-teractions to consider, and we seek to estimate the magnitude of the volumeeffect; we also ignore for the time being the change in internal free energyas the polymer collapses. The free energy change upon collapse of a proteinor polymer then arises from the change in entropy of the cosolutes, due tothe change in available phase space. For hard-sphere cosolutes, the volumeoccupied by the expanded polymer will be larger than that of the collapsedpolymer. The same considerations apply to a collapsed vs. expanded pro-tein; unfolded states of proteins are generally found to be expanded relativeto the folded state [169]. In what follows, let ra be the mean amino acidradius, ro the cosolute radius, and Np the number of amino acids in thepolymer or protein. Treating the unfolded protein crudely as a meander-ing cylindrical tube (see Figure 3.2a inset), the volume is approximatelypi(ra + ro)2(2raNp + 2ro), which is that of a cylinder of radius ra + ro andlength 2Npra + 2ro. The volume of the collapsed globule, or folded protein,can be modelled as a sphere of radius Rp + ro, where Rp is the protein ra-dius as probed by a zero-radius cosolute particle, i.e. the collapsed volumeis (4\/3)pi(Rp+ ro)3. When ro = 0, the unfolded and folded volumes must beequal, giving R3p = (3\/2)Npr3a. The change in available volume for cosolutes\u2206V (ro) upon polymer collapse is thus positive, and is plotted in Figure 3.2as a function of cosolute radius ro, for a chain of length Np = 70.We can compare the results of the above simple model to data takenfrom simulations of a C\u03b1 Go\u00af model of cold-shock protein (PDB 2L15), with70 amino acids, generated with the GROMACS molecular dynamics pack-age. The Go\u00af potential was generated using a shadow map for the nativecontacts [170] by the SMOG@ctbp server [171]. The simulated free en-ergy surface has a double-well structure with well-defined folded (f) andunfolded (u) ensemble as observed in C\u03b1 Go\u00af models for other single do-main proteins [172]. We take conformational snapshots in each ensembleand measure the volume using a variable probe radius with the programVOIDOO [173]. The average volume change \u2206V = \u3008Vu\u3009 \u2212 \u3008Vf \u3009 for a givenprobe radius is plotted in Figure 3.2a. The theory and simulation datacompare quite well given the simplicity of the model.580 2 4 6 8 1001020304050051015202530352 4 6 8 10a)b)UnfoldedFoldedFigure 3.2: a) The change in volume upon collapse \u2206V (ro) = Vu\u2212Vf ,as a function of cosolute radius ro, for a polymer chain of length Np =70 residues and with ra = 6 A\u02da. The magnitude of the change in volumemonotonically increases as ro increases. Also plotted are the average \u2206V =\u3008Vu\u3009\u2212\u3008Vf \u3009 values of simulation trajectories of Cold-Shock Protein (N = 70,PDB 2L15) against probe radius. (Inset) Schematic of collapsed\/folded andunfolded polymer. Folded polymer has radius Rp; unfolded polymer has tuberadius ra and length Npra. b) Minus the change in free energy upon collapseas a function of cosolute radius ro, for both constant packing fraction \u03b7 andconstant concentration \u03c1. The value of \u03c1 was set to 1M , and the value of \u03b7was set so that the free energy change would be equal to that at constant \u03c1at a typical cosolute radius of 3.1 A\u02da. This gave a packing fraction \u03b7 \u2248 0.075.59We now consider the free energy as a function of either uniform density\u03c1 or packing fraction \u03b7 of the cosolutes. Given a large effective box withvolume Vbox containing a given protein, the packing fraction of cosolutes \u03b7(i.e. the volume density) is given by\u03b7 =43pir3oNoVbox \u2212 Vprot(ro) \u224843pir3oNoVbox=43pir3o \u00b7 \u03c1 ,where \u03c1 is the number density. So, at a fixed packing fraction the numberof cosolutes No scales as r\u22123o .To estimate the volume contributions to the free energy change uponcollapse, \u2206GV (ro), as a function of cosolute radius but at either fixed densityor packing fraction, we use the ideal gas approximation for the osmoticpressure posm = \u03c1kBT to obtain\u2206GV (ro) = posm\u2206V (ro) = \u03c1kBT\u2206V (ro) =\u03b7kBT\u2206V (ro)43pir3o(3.1)where \u2206V (ro) is obtained from the model above.A plot of the magnitude of the free energy change upon collapse as afunction of cosolute radius, here exclusively due to the increase in entropyof cosolute particles, is shown in Figure 3.2b. Based on these considerationswe can estimate the volume-like contribution for typical cosolute sizes andconcentrations. Taking TMAO (Trimethylamine N-oxide) as an example,we expect the cosolute radius to be about 2 A\u02da, from the water oxygen-TMAO nitrogen radial distribution function[174]. Given this radius anda concentration of 300 g\/L, for a protein of length Np = 70 we estimatea volume contribution to the free energy of \u2248 4kBT . The free energy ofunfolding is linear in protein length, so a larger protein of Np = 300 has anestimated \u2206G \u2248 17kBT .3.1.2 Surface ConsiderationsThe presence of cosolutes in solution can make the effective solvent more re-pulsive to protein resulting in stabilization, or more attractive to the protein60resulting in denaturation. What effect is observed depends on the energy \u000fof cosolute-protein binding and also the concentration c (or equivalently thechemical potential \u00b5) of the cosolute.The energy \u000f of binding of the cosolute is actually the difference in in-ternal free energy of binding between cosolute and water, since for examplewater may have some attraction to the polymer, and also a cosolute maysupplant more than one water molecule in the process of binding.Previous treatments of transfer free energy analysis as a condensationproblem onto the surface of the protein have been undertaken primarily inthe context of protein denaturation and the prediction of m-values [175\u2013177]. The process of condensation of a cosolute to a surface is equivalent tothe well-known statistical mechanical problem of Langmuir\u2019s isotherm [178],for which the partition function Z in the (T, \u00b5) ensemble for a substrate withM absorbing sites is given by(1 + e\u2212\u03b2(\u000f\u2212\u00b5))M. The mean covering ratio fis then given byf =kTM\u2202 logZ\u2202\u00b5=11 + e\u03b2(\u000f\u2212\u00b5), (3.2)and the mean energy of condensation on the surface is Mf\u000f. Here we neglectinteractions between cosolutes when bound. The Helmholtz free energy inthis model is given byG = \u2212pV + fM\u00b5 = \u2212kBT log(Z) + fM\u00b5with T, \u00b5 partition function Z as given above.We can relate the Langmuir isotherm to the free energy of a proteinsurface by assuming that each cosolute occupies an area a0 \u2248 pir2o on theprotein surface, so that we can write M = A\/a0, where A is the protein\u2019ssolvent accessible surface area in a given conformation. The change in freeenergy GA upon condensation becomesGA = \u2212kBT Aa0log(1 + e\u2212\u03b2(\u000f\u2212\u00b5))+ fAao\u00b5 (3.3)If the concentration of unbound cosolute is dilute, an ideal gas approxi-mation suffices for the chemical potential: \u00b5 = kT log (\u03c1\/\u03c1Q), where \u03c1Q is a61reference concentration (typically taken to be 1M). The quantity e\u2212\u03b2\u000f\/\u03c1Q istypically treated as an equilibrium constant in the literature [176, 177]. Weconsider both dilute and non-dilute limits below. The protein\u2019s exposed sur-face area is obtained from the volume given in Section 3.1.1 by A = \u2202V\/\u2202ro,so the collapsed exposed area is 4pi (Rp + ro)2 and the expanded (randomcoil) exposed area is 2pi (ro + ra) [(2Np + 1) ra + 3ro].3.1.3 Combined Surface\/Volume Model for the Transfer FreeEnergyWe can now write the total free energy of collapse \u2206G arising from cosolutesby combining the volume and surface area terms in equations (3.1) and (3.3).We can also remove the ideal gas assumption by expressing \u2206G in termsof the Carnahan-Starling (CS) approximations to the pressure and chemicalpotential: [179]p = \u03c1kBT1 + \u03b7 + \u03b72 \u2212 \u03b73(1\u2212 \u03b7)3\u00b5 = kBT log(\u03c1\/\u03c1Q) + kBT8\u03b7 \u2212 9\u03b72 + 3\u03b73(1\u2212 \u03b7)3 . (3.4)where \u03b7 is the volume fraction (the volume per molecule times \u03c1).Then thefree energy becomes:\u2206G = p\u2206V +(kBTpir2olog (1\u2212 f) + f\u00b5pir2o)\u2206A (3.5)with f given in (3.2) and p and \u00b5 given in (3.4), and where\u2206V (ro) =43pi((3Np2)1\/3ra + ro)3\u2212 2pi(ra + ro)2(Npra + ro)\u2206A(ro) = 4pi((3Np2)1\/3ra + ro)2\u2212 2pi(ra + ro)[(2Np + 1)ra + 3ro]are the volume and surface area change upon folding (or collapse).We plot equation (3.5) in Figure 3.3 as a function of cosolute radius ro,62for condensation energies \u000f = 2kBT and \u000f = \u2212kBT . To assess the limits ofthe ideal gas model, we have also plotted the ideal gas results in Figure 3.3.For repulsive cosolute-protein interactions, both surface and volume termsstabilize the folded or collapsed state (Figure 3.3). The free energy changeupon collapse is monotonically decreasing (increasing in magnitude) fromzero, and more strongly favoring collapse as cosolute radius is increased.Non-ideal excluded volume effects in the cosolute pressure and chemicalpotential enhance the stabilizing effect. For attractive cosolute-protein in-teractions, the situation is more complex. At small values of cosolute radiusro, the collapsed phase is destabilized by cosolute-protein binding, which fa-vors expansion. As ro increases, the volume change upon collapse increases,which begins to entropically favor collapse. The osmotic pressure initiallyincreases modestly, additionally favoring collapse. However the chemical po-tential also increases modestly, driving condensation of cosolute and favoringexpansion. These two effects nearly cancel each other rendering the real andideal gas curves nearly coincident up to ro \u2248 4A\u02da. The sigmoidal dependenceof covering fraction f in equation (3.2) on chemical potential \u00b5 results ina sudden condensation of cosolute onto the protein around ro \u2248 5A\u02da, whichinduces the system to favor expansion at these radii. While the number ofcondensed cosolutes is bounded, the osmotic pressure is not, and eventuallycollapse is favored once again through volume terms. The cosolute radiusro can only increase until \u03b7 \u2248 0.6 (near crystal packing densities), giving acutoff of r(cut)o \u2248 (3\u03b7\/4pi\u03c1)1\/3, or about 6.2A\u02da for 1M concentration.In the limit that the cosolute is dilute, \u03c1e\u2212\u03b2\u000f\/\u03c1Q \u001c 1 and we can expandthe logarithm in equation (3.5) to obtain an area contribution to the freeenergy of \u2212\u03c1kBTAe\u2212\u03b2\u000f\/a0\u03c1Q, so that the free energy change upon unfoldingbecomes\u2206G = \u03c1kBT(\u2206V \u2212Ate\u2212\u03b2\u000f). (3.6)Here we have used the fact that (a0\u03c1Q)\u22121 has units of length and can be thusbe interpreted physically as a thickness t over which the surface interactionacts.Having looked at these preliminary volume and surface considerations,63C-SIGIGC-S01020-10-201 2 3 4 5 60.068 0.16 0.32 0.540.0200.0025Figure 3.3: Total free energy change \u2206G upon collapse in units of kBT ,as a function of cosolute radius ro. Values of packing fraction \u03b7 correspond-ing to the values of ro on the x-axis are shown above the plot. Curves aretaken from equation (3.5) which combines surface area and volume terms.Here the polymer length Np = 70, the cosolute concentration \u03c1 = 1M, andra = 6A\u02da. Red curves show \u2206G upon collapse for a repulsive cosolute withinteraction energy +2kBT , i.e. a crowding particle. Blue curves show \u2206Gupon collapse for an attractive cosolute with interaction energy \u2212kBT , i.e.a weak denaturant. Plotted are both the model with ideal gas (IG, dashed)and Carnahan-Starling (C-S, solid) pressure and chemical potential.we now turn to a classical density functional theory formulation, whichprovides a more complete understanding of the transfer free energy, andas well, reduces to equation (3.6) in the appropriate limits.3.2 The Density Functional Theory FormulationWe now consider a density functional formulation of the problem of transferfree energy. In what follows, we will assume that the intra-protein energy ofa given configuration of a protein is in principle known and the net interac-64tion between any given site on the protein and either the cosolute or wateris in principle known. We then wish to calculate \u2206G, the free energy oftransferring the protein from water to an cosolute solution, or, equivalently,of transferring the cosolutes from an aqueous solution to one containing theprotein (see Figure 3.1). In short, we wish to consider the effect that thepresence of cosolutes has on the free energy of the protein.The uniqueness of the Kohn-Sham density functional may be extendedto finite temperatures, so that the free energy of the protein-solvent systemis uniquely expressed as a functional of the single particle density \u03c6(r) [180].We thus seek an expression for the free energy of the cosolutes and waterin an arbitrary external potential. For our purposes in obtaining a transferfree energy, we will treat a given protein configuration, with atom positions{Ri}, as the source of the external potential. We write the free energy inthe standard way [80]:G({Ri}) =\u222bd3r kBT (\u2212So(\u03c6o(r))\u2212 Sw(\u03c6w(r))) + Vo(r)\u03c6o(r) + Vw(r)\u03c6w(r)+ \u03a6o[\u03c6o] + \u03a6w[\u03c6w] + \u03a6ow[\u03c6o, \u03c6w] (3.7)Here \u03c6j is the density function for the cosolutes (o) or water (w), and Vj theexternal potential on the respective species. The entropy density for eachspecies can be written asSo(r) + Sw(r) = \u2212\u03c6o(r) log[\u03bb3o\u03c6o(r)]\u2212 \u03c6w(r) log [\u03bb3w\u03c6w(r)] (3.8)where \u03bbo and \u03bbw are constants with units of length, analogous to thermalwavelengths. The terms \u03a6o, \u03a6w, and \u03a6ow are the multi-particle correlationcontributions to the free energy for the respective species. For example, thetwo particle correlation part of \u03a6o would have the form\u03a6(2)o [\u03c6o] =\u222b \u222bd3r1d3r2 \u03c6o(r1)\u03c6o(r2)Uoo(r1 \u2212 r2)g(r1, r2|V) (3.9)where Uoo is the interaction potential between two cosolutes and g the two-particle correlation function. The full multi-particle function is not known65exactly, and so, as in electronic DFT, while equation (3.7) is exact in prin-ciple, approximations must be made to use it in practice [181].We now make two key assumptions. The first is that the water and coso-lute densities are completely correlated, such that all vacua are occupied byeither water or cosolute. Thus Nwvw +Novo = V , where vi is the volume ofan individual water or cosolute molecule, and V the total volume. Dividingthis by V vw and allowing the local density of a given species to vary gives\u03c6w(r) + f\u03c6o(r) = \u03c1w (3.10)where f = vo\/vw and \u03c1w = 1\/vw (the factor of f allows for the cosolutemolecule to be a different size than the water molecule). Equation (3.10) isnot valid in the interior of the protein, so we split our system up into tworegions: a hard wall region Vhw in which \u03c6w = \u03c6o = 0, and the rest of thesystem, which has a volume V identical to the volume of the cosolute-waterbath prior to the insertion of the protein, and in which Equation (3.10) isvalid. We further take Vhw to be the same as the change in volume of theaqueous system the protein was removed from in the transfer process (seeFigure 3.1), so that the total system of water, protein, and cosolute-watersolution does not change volume during the transfer process.With the approximation of equation (3.10) we can writeVo(r)\u03c6o(r) + Vw(r)\u03c6w(r) =\u2206V(r)\u03c6o(r) + Vw(r)\u03c1w (3.11)\u03a6o[\u03c6o] + \u03a6w[\u03c6w] + \u03a6ow[\u03c6o, \u03c6w] =\u03a6t[\u03c6o] (3.12)where \u2206V(r) = Vo(r)\u2212 fVw(r).The second approximation in our treatment is that the cosolute numberdensity is much less than that of water. Using this approximation along withthe one given in Equation (3.10), the entropy in Equation (3.8) becomes\u2212So(r)\u2212 Sw(r) =\u03c6o(r) log[\u03bb3o\u03c6o(r)]+ (\u03c1w \u2212 f\u03c6o(r)) log[\u03bb3w(\u03c1w \u2212 f\u03c6o(r))](3.13)\u2248\u03c6o(r) log[\u03bb3o\u03c6o(r)]\u2212 f\u03c6o(r) + \u03c1w log [\u03bb3w\u03c1w]\u2212 f\u03c6o(r) log [\u03bb3w\u03c1w]66In this way we express each part of equation (3.7) in terms of cosolute densityand constant terms. The free energy functional may then be written asG =\u222bd3r kBT (\u03c6o(r) log [\u03bbo\u03c6o(r)]\u2212 (\u03b3 + 1)\u03c6o(r)) + \u2206V(r)\u03c6o(r)+ V \u03c1w log \u03bb3w\u03c1w + U\u03c1w + \u03a6t[\u03c6o] (3.14)where U \u2261 \u222b d3rV(r), and \u03b3+ 1 \u2261 f(1 + log(\u03bb3w\u03c1w)). Since V is the volumeof the system, the term V \u03c1w is equal to V\/vw = N\u2032w, the total number ofwater molecules in a system of pure water of volume V .Thus, dropping the subscripts, letting V \u2261 \u2206V, and ignoring any posi-tion independent terms, we can write the free energy asG =\u222bd3r kBT(\u03c6(r) log \u03bb3\u03c6(r)\u2212 \u03c6(r))+ kBT\u03b3\u03c6(r) + V(r)\u03c6(r)+ \u03a6[\u03c6] (3.15)where \u03a6[\u03c6] is the functional containing the multi-particle correlation partof the free energy, and \u03bb \u2261 \u03bbo is a constant with units of length analogousto the thermal wavelength, whose value will be shown to be unimportant.For now we will formally manipulate \u03a6 without making assumptions aboutits form. We can find the density that minimizes the free energy by use ofthe Euler-Lagrange equations, with the constraint that the cosolute densitywhen integrated over the total volume is the total number of cosolutes:\u222bVd3r \u03c6(r) = No . (3.16)We thus write\u03b4\u03b4\u03c6[G\u2212 \u00b5o(\u222bVd3r \u03c6(r)\u2212No)]= 0or kBT log \u03bb3\u03c6(r) + V(r)\u2212 kBT\u03b3 + \u03b4\u03a6\u03b4\u03c6\u2212 \u00b5o = 0 (3.17)where \u00b5o is the Lagrange multiplier corresponding to the constraint in equa-tion (3.16). Physically, we can interpret equation (3.17) as a statement that67\u03b4G\u03b4\u03c6 is equal to the chemical potential \u00b5o, and thus must be a constant valueat all points in space. Solving this for the density field gives\u03c6(r) = e\u03b3\u03bb\u22123e\u2212\u03b2(V(r)+\u03a6\u2032\u2212\u00b5o) (3.18)where \u03a6\u2032 \u2261 \u03b4\u03a6\u03b4\u03c6 .To obtain \u00b5o from equation (3.18), we use the constraint on the totalnumber of particles in equation (3.16) which yieldse\u03b2\u00b5o =e\u03b3\u03bb3No\u222bV d3r e\u2212\u03b2(V(r)+\u03a6\u2032). (3.19)From here we can obtain the transfer free energy, which is given bythe free energy of the cosolute bath in the presence of the external proteinpotential, V(r), minus the free energy of the cosolute bath without theprotein potential (V(r) = 0). We thus have\u2206G =\u2206\u00b5oNo=\u2212 kBTNo log(e\u03b3\u03bb\u22123No\u222bVd3r e\u2212\u03b2(V(r)+\u03a6\u2032f (r)))+ kBTNo log(e\u03b3\u03bb\u22123No\u222bVd3r e\u2212\u03b2\u03a6\u2032i)(3.20)where the volume V integrated over is the volume outside of hard-wall vol-ume of the protein, and is the same in the initial and final systems. Thedifference \u2206G is independent of \u03bb and \u03b3.The bath in the initial state is homogeneous and isotropic, so \u03a6\u2032i inequation (3.20) is independent of position. Thus it may be factored out ofthe integral, \u222bVd3r e\u2212\u03b2\u03a6\u2032i = V e\u2212\u03b2\u03a6\u2032iso that\u2206G = \u2212kBTNo log(1V\u222bVd3r e\u2212\u03b2(V(r)+\u2206\u03a6\u2032))(3.21)where \u2206\u03a6\u2032 = \u03a6\u2032f (r)\u2212 \u03a6\u2032i. The expression in equation (3.21) consists of the68logarithm of the integral of a Boltzmann weight for the effective potentialV(r)+\u2206\u03a6\u2032(r). Here V(r) and \u2206\u03a6\u2032(r) enter on equal footing. Recall that Vis the protein-cosolute potential, treating the protein as an external source.\u03a6\u2032 is the functional derivative of the multi-particle part of the free energy. Ifwe use the two-particle cosolute contribution from equation (3.9), we obtain\u2206\u03a6(2)\u2032o =\u03b4\u03a6(2)o\u03b4\u03c6o(r)\u2223\u2223\u2223\u2223\u2223V\u2212 \u03b4\u03a6(2)o\u03b4\u03c6o(r)\u2223\u2223\u2223\u2223\u2223V=0=\u222bd3r\u2032[\u03c6of (r\u2032)g(r, r\u2032|V)\u2212 \u03c6oi(r\u2032)g(r, r\u2032|V = 0)]Uoo(r, r\u2032),(3.22)which gives the difference of two terms in the presence and absence of theexternal protein potential, where each term corresponds to the equilibrium-averaged interaction energy between cosolutes, up to pair correlations. Thusthe term \u2206\u03a6\u2032 in Equation (3.21) can be interpreted as the change in energydue to redistribution of the environment in response to the change in externalpotential.We can recast equation (3.21) into a form that will be somewhat moreuseful later:\u2206G = \u2212kBTNo log(1 +1V\u222bVd3r [e\u2212\u03b2(V(r)+\u2206\u03a6\u2032) \u2212 1])(3.23)which has the advantage that when V and \u2206\u03a6\u2032 are both zero, the integrandis also zero, and thus the integral can be taken over all space.In equation (3.23) we can take the limit V \u2192\u221e, with No\/V = \u03c1 fixed.Then, assuming that the region over which the integrand in equation (3.23)is non-zero is finite, we can expand the logarithm to first order to obtain\u2206G = \u2212kBTNo 1V\u222bd3r(e\u2212\u03b2(V(r)+\u2206\u03a6\u2032) \u2212 1)(3.24)which has the form\u2206G = pid\u2206Veffwhere pid = NokBT\/V is the ideal gas osmotic pressure, and Veff =\u222bd3r[1\u2212 e\u2212\u03b2(V(r)+\u2206\u03a6\u2032)]is an effective change in volume. In the dilute limit, the osmotic pressurep = pid; then Veff may be interpreted as the change in volume available to69the cosolutes.We now need to address \u2206\u03a6\u2032 to progress further. The obvious first ap-proximation is to set \u2206\u03a6\u2032 = 0; we will see below that this approximationcan in fact go quite a long way, depending on the solvent. This is con-sistent with the observations in Figure 3.3 where the ideal gas approxima-tion, which neglects cosolute-cosolute correlations, holds for typical molecu-lar radii at 1M concentration. It is worth noting that this is not ignoring thecosolute-cosolute, cosolute-water, and water-water correlations completely;it is merely assuming that they are the same in the initial and final baths.Making this approximation, we have\u2206G = \u2212kBTNo log(1 +1V\u222bVd3r [e\u2212\u03b2V(r) \u2212 1])(3.25)Equation (3.25) represents an approximation to the transfer free energythat, while severe, nonetheless takes into account both the change in energyand change in entropy of the cosolute bath.3.2.1 Validation Tests in Model SolventsAs a test of the density functional theory, we have used equation (3.25)to calculate the transfer free energy of several small molecules into modelcosolutes. To simplify the simulations, we looked at transfer from vacuumto a van der Waals gas of cosolutes, which were taken to be single atomsinteracting through a VDW potential. The density of the cosolutes wasset to 1M. The molecules we transferred were the side chains of alanineand valine, with C-\u03b2 capped with a hydrogen to replace the backbone (ie,the molecules were methane and propane). The coordinates were takenfrom an existing protein structure file, and the angle and bond parameterswere generated with the GROMACS utility pdb2gmx. The charges were setto zero for all atoms, and the interaction was purely van der Waals. Welist the VDW parameters in Table 3.1. Figure 3.4 shows the interactionpotential for the two different cosolutes we used. The transfer energieswere calculated both with equation (3.25) and by simulating the transfer70Table 3.1: van der Waals parameters for the atoms used in thesimulation test of the DFT, as taken from the CHARMM parameter set.Cos2 is a relatively attractive spherical cosolute, while the potential ofCos1 is dominated by steric repulsion. The interaction is parameterizedas V (r) = 4\u000f[(\u03c3\/r)12 \u2212 (\u03c3\/r)6].Atom \u03c3 (A\u02da) \u000f (kJ\/mol)Ala C-\u03b2 0.36705 0.33472Ala H 0.23520 0.092048Val C-\u03b2 0.40536 0.08368Val C-\u03b3 0.36705 0.33472Val H 0.23520 0.092048Cos1 0.40536 0.08368Cos2 0.36705 0.33472Table 3.2: Comparison of test cases between density functional the-ory (DFT) and thermodynamic integration (TI)Molecule\/cosolute DFT \u2206G (kJ\/mol) TI \u2206G (kJ\/mol)Ala\/Cos1 0.188\u00b1 0.002 0.187\u00b1 0.002Val\/Cos1 0.255\u00b1 0.004 0.261\u00b1 0.004Ala\/Cos2 0.055\u00b1 0.002 0.059\u00b1 0.003Val\/Cos2 \u22120.018\u00b1 0.004 \u22120.011\u00b1 0.004in GROMACS and using Thermodynamic Integration (TI) [182\u2013184]. Theresults are summarized in Table 3.2, and show excellent agreement betweenTI and DFT. This is notable since the result was obtained neglecting theinter-particle correlations, and at 1M the pressure of the cosolutes was \u2248 1.5that of the ideal gas pressure, which indicates that the cosolute-cosoluteinteractions were significant.3.2.2 Connecting DFT to Previous Surface\/Volume ModelsWe now take a simplified model of a protein potential to compare withthe results obtained previously in Section 3.1 for the solvent contributionto the change in free energy upon protein collapse. In this model we willconsider the protein to have an excluded volume of Vprot; that is, within71-0.4-0.200.20.40.2 0.4 0.6 0.8 1Cos1Cos2Figure 3.4: Comparison of cosolute potential functions for the testcases parameterized in Table 3.1. Cos2 is significantly more attractive thanCos1, which is reflected in the transfer free energies in Table 3.2that volume the potential is infinite. From the discussion in Sections 3.1.1-3.2 concerning excluded volume, we saw that the changes in volume treatedthere are volumes from which cosolutes are excluded. We also consider theprotein to have a surface region of thickness t that exerts a potential on thecosolutes of depth \u000f; this region is sufficiently thin that we can approximateits volume as Vsurface \u2248 tA. If we use this model in the expression for thefree energy in the limit of large system size (equation (3.25) ) then we obtaina free energy upon transfer of\u2206G = \u03c1kBT(Vprot + (1\u2212 e\u2212\u03b2\u000f)tA). (3.26)72The DFT transfer free energy with this simplified model provides a nat-ural split between the volume contribution pidVprot and the surface area con-tribution pid(1 \u2212 e\u2212\u03b2\u000f)tA. Thus the DFT result, in the appropriate model,naturally generates the free energy contributions derived in Section 3.1 frommore bespoke considerations. Specifically, if we take the total volume of theprotein upon insertion to be V = Vprot + tA, then equation (3.26) is identi-cal to equation (3.6). The simplified DFT model here reduces to our earlierconsiderations and helps give a physical interpretation of the quantity \u03c1Q asit pertains to the protein surface.We can also see that, in order to obtain a SASA approximation in which\u2206G is independent of temperature, one would have to assume that thecosolute-protein binding energy \u000f\u001c kBT , and that volume terms were eithernegligible compared to surface terms, or they were proportional to them. Wefind below that \u000f \u2248 kBT in order to obtain empirically-derived transfer freeenergies to TMAO, which does not satisfy the above inequality. As well, wecan use the tube model from Section 3.1 for protein volume and surface areato estimate the relative contributions of volume and area: for a cosolute ofradius ro = 2.5 A\u02da and a protein with Np = 70, V\/tA = 0.62 in the unfoldedstate, and V\/tA = 0.77 in the folded state. The volume here is by no meansnegligible.We thus expect on general grounds that the transfer free energy willbe dependent on temperature. One way of looking at the simplified limitfor the transfer free energy in equation (3.26) is as a derivation of a newphenomenological form for the transfer energy, containing both temperatureand volume dependence:\u2206G = \u03b31kBT (Vsolute) + \u03b32kBT (ASA)e\u2212\u03b2\u000f , (3.27)where one can now fit the parameters \u03b31, \u03b32, and \u000f, to empirical data.733.3 Empirically Deriving DFT Transfer Free En-ergy ParametersThe potential V(r) in equation (3.25) is an effective potential given byVo(r) \u2212 fVw(r). Obtaining f and Vw ab initio may be nontrivial, so weexamine some model systems, and compare with empirical methods. Tobegin with, we will assume that the potential takes the form of a sum ofterms from each particle in the protein, where a particle may be an atomin an all-atom model, or a bead modeling an amino acid in a coarse-grainedapproach:V(r) =Np\u2211i=1veffi (r \u2212Ri) .Here Np is the number of particles in the protein, and Ri the position of theith particle.We consider a model consisting of backbone C\u03b1 atoms and coarse-grainedside-chain beads, which then form the particles for our potential. We makethe assumption that the protein-cosolute potentials have a 6-12 form:vi(r) = 4\u000fi[(\u03c3ir)12 \u2212 (\u03c3ir)6],and we wish to determine the potential parameters \u03c3i, \u000fi for each aminoacid that reproduce the transfer energies found experimentally when DFTis applied using the above potential. As a starting point, we examine thoseused by Auton and Bolen [145].Two constraint equations are required for each amino acid. For the firstequation, we note that the beads representing the various amino acid sidechains have residue radii roi that may be obtained from measured partialmolar volumes [185]. We can then apply a constraint to the above 6-12parameters \u03c3i, \u000fi by requiring that at a distance roi from the residue centre,vi(roi) = 0.6 kcal \u00b7mol\u22121 . (3.28)To obtain the remaining equation determining the parameters \u03c3i, \u000fi, we74require that the DFT transfer free energy, as computed by the dilute limit ofequation (3.25) for the single particle representing an amino acid side chain,should be equal to the experimental value as given in reference [145], specif-ically for transfer into a solution of 1M TMAO. This involves computing theintegral over the cosolute-accessible volume in the expression\u03c1kBT\u222bd3r(1\u2212 e\u2212\u03b2vi(r))(3.29)and setting the result to the empirical value of \u03b4gi for each amino acid.The sum of the transfer free energies of each amino acid in a Gly-X-Gly tripeptide is often used to approximate the conformationally-averagedtransfer free energy for a protein.[145] Here we consider the tripeptide trans-fer free energies. The integral in expression (3.29) then involves integrationover a solid angle \u2126i determined by the fraction of solid angle available tothe side chain in the tripeptide vs. that for the isolated residue, i.e.\u2126i =AitriAiiso4piThe potential vi is then fully determined from equation (3.28) along with\u2126i\u03c1kBT\u222b \u221e0dr r2(1\u2212 e\u2212\u03b2vi(r))= \u03b4gi . (3.30)We can now construct potentials for each amino acid transfer free en-ergy given in reference [186]. The parameters derived from doing so arelisted in Table 3.3. The backbone-cosolute interaction was parameterized asvBB(r) = C\/r12, as this better represented its strongly repulsive character.The value of C obtained by fitting to \u03b4gBB was C = 7.510\u00d7 107 kcal\u00b7A\u02da12.In this context, the DFT formulation provides a way of using the infor-mation from tri-peptide experiments in a way that captures both energeticand entropic effects. The parameters just obtained can be used to determinethe change in the transfer free energies for isolated residues as temperaturechanges. The experimental transfer free energies \u03b4gi are predicted to in-crease as temperature increases, with the new values at T = 310K given in75Table 3.3: Parameter values yielding transfer free energies \u03b4g to 1MTMAO for amino acid side chains and backbone at 300K, and the predicted\u03b4g at 310K .Type ro (A\u02da)a \u03b4g (cal\/mol) b \u03c3 (A\u02da) c \u000f (kcal\/mol) d \u03b4g(T = 310K) (cal\/mol) eAla 2.52 -14.64 3.517 0.6286 -12.65Arg 3.28 -109.3 4.088 1.022 -104.0Asn 2.74 55.69 4.564 0.0483 58.06Asp 2.79 -66.67 3.627 1.055 -63.31Gln 3.01 41.41 4.397 0.1710 44.57Glu 2.96 -83.25 3.799 0.9973 -78.88His 3.04 42.07 4.428 0.1707 45.28Ile 3.09 -25.43 4.084 0.5692 -21.59Leu 3.09 11.6 4.246 0.3405 15.15Lys 3.18 -110.23 3.968 1.126 -104.7Met 3.09 -7.65 4.154 0.4538 -3.791Phe 3.18 -9.32 4.237 0.4587 -5.397Pro 2.78 -137.7 3.457 1.987 -133.5Ser 2.59 -39.04 3.4905 0.8849 -36.45Thr 2.81 3.75 3.9312 0.3889 6.41Trp 3.39 -152.9 4.157 1.150 -146.5Tyr 3.23 -114.3 4.020 1.103 -109.2Val 2.93 -1.02 4.021 0.4238 1.78BB f 2.25 90.0 - - 92.7aDistance where the cosolute-amino acid potential is taken to be 0.6 kcal\u00b7mol\u22121bEmpirical transfer free energies to 1M TMAOcvan der Waals size parameterdvan der Waals well depthe predicted transfer free energies at T = 310KfBackbone is parameterized for TMAO by a purely repulsive potential (see text)Table 3.3. Increasing temperature by 0.03kBT increases the transfer freeenergy by \u2248 0.6kBT for a 100 residue protein. This change is not large,but the relative temperature change is also small. The transfer entropy issignificant: d(\u03b4g)\/dT \u2248 20kB.763.4 Using DFT for Implicit Solvent ModelsThe DFT methodology has been applied to the problem of solvation to cal-culate fluid correlation functions, solvation free energies, and reorganizationenergy in charge transfer [167, 187]. The use of time-dependent densityfunctional theory has been well-established to understand solvation dynam-ics in single-component solvents [188] as well as selective solvation in binarymixtures [189, 190]. The methodology has also been applied to the connectstatic and dynamic approaches to the glass transition by Kirkpatrick andWolynes [86]. The DFT methodology as described above may also be beapplied to the problem of finding the effective forces for molecular dynamicssimulation in an implicit solvent, which we briefly describe here.We again write the external potential due to solute-solvent interactionsasV(r) =\u2211jvj(|Rj \u2212 r|)we can write the force on the ith particle from the transfer free energy inequation (3.24) (neglecting solvent inter-particle correlations) asFi = \u2207Ri[kBT\u03c1\u222bd3r(1\u2212 e\u2212\u03b2\u2211j vj(|Rj\u2212r|))]= kBT\u03c1\u03b2\u222bd3r e\u2212\u03b2\u2211j vj(|Rj\u2212r|)\u2207Rivi(|Ri \u2212 r|)= \u03c1\u222bd3r e\u2212\u03b2\u2211j vj(|(Rj\u2212Ri)\u2212r|)\u2207vi(r) (3.31)We immediately see that the integrand is non-zero only when \u2207vi(r)is non-zero, so that if there is an effective cutoff rc such that vi(r) \u2248 0for r > rc, then the integral in equation (3.31) only needs to be takenin the region r < rc. This is a generalization of the result obtained byGo\u00a8tzelmann et al[191], who have shown that for a hard sphere potential,only the solvent density at the surface of the spheres was relevant to thecalculation of depletion forces. Here we extend this analysis to arbitrarypotentials.Consider a particle with a spherically symmetric vi(r), as assumed above.77The net force on this particle when isolated is zero. When a second particleexerting potential vj(r) on the cosolutes is brought near, the net force onthe first due to the solvent is a result of the now asymmetric solvent density.We note here we are treating the indirect force rather than the direct forcebetween the particles, which can be calculated by direct application of theinterparticle potential. The region of asymmetric solvent density constitutesa restricted volume to be integrated over in equation (3.31), as only theregion of overlap between the two spheres defined by the cutoff in potentialaround Ri and Rj contributes to the net force (see e.g. Figure 3.5b below).In addition, the solvent field in this overlap region will maintain cylindricalsymmetry about the axis joining the two particles, which means that theforce will be along this axis as well. This suggests that the force on particlei can be written asFi =\u2211|Rij |<2rcFij(|Rij |)R\u02c6ij .Here R\u02c6ij is the unit vector from particle j to particle i, and Fij is a scalarfunction of the interparticle distance |Rij | \u2261 |Ri\u2212Rj |, which is determinedby the overlap integral in equation (3.31), and which could in principle bepre-computed and tabulated to speed up execution.3.4.1 Depletion and Impeded-Solvation Interactions in anImplicit Solvent ModelWe can use equation (3.31) to investigate the forces due to solvent on col-loidal particles. In what follows, we imagine the \u201csolvent\u201d to be simplifiedcosolutes within an implicit solvent bath. This subject has been well-studied(see e.g. refs. [192\u2013196] ); our goal here is simply to show that the DFT trans-fer free energy provides a natural way of calculating depletion forces as wellas transfer energies, and that even the approximated form in equation (3.31)yields non-trivial results for the depletion force.We investigate a model consisting of two spheres that interact onlyby a hard wall potential of radius rs. Each sphere also interacts witha bath of cosolutes through a 6-12 (van der Waals) potential: V (r) =784\u000f((\u03c3\/r)12 \u2212 (\u03c3\/r)6), with \u03c3 = rs + ro. With this model we examine theforce as a function of the sphere separation d. Any force between the spheresis entirely due to cosolute-mediated effects.When the solute particles are far apart, they dress themselves with coso-lute solvation shells because of the attractive solute-cosolute potential. Aswe imagine moving the two solute particles closer together, eventually therepulsive region of one solute particle overlaps with the attractive region ofthe other solute particle, and vice versa. This situation is unfavorable forthe solute particles, and the energy may be lowered by moving them furtherapart; hence there is a repulsive force at these distances (see Figure 3.5). Asthe solute particles continue to approach each other, the above repulsive re-gion encroaches on the regions of space where the van der Waals potential isdeeper. A larger amount of potentially favorable binding energy is removedper distance travelled, and the repulsive force due to \u201cimpeded-solvation\u201dincreases. The repulsive force is maximal when the solute separation d isroughly 2\u03c3. For separations d < 2\u03c3, the repulsive regions of the two so-lute spheres begin to overlap. This reduces the volume excluded, or moreprecisely repulsive to, cosolutes. This reduced excluded volume results inan attractive force which is the traditional depletion force. Eventually thedepletion force becomes stronger than the above impeded-solvation force,and the net force becomes attractive. We note that such effects would notbe present in standard GB\/SA models of implicit solvation.In general, direct inter-particle interactions must be superimposed onthe above scenario. Which force dominates at a given separation will thendepend on the values for rs, ro, and \u000f, along with the strength of the directinteraction. The above-described repulsive effect has been observed before inhard-sphere solutes using the Derjaguin approximation to obtain an effectivesurface tension [191]. Here we see that the effect arises naturally from thepresence of an attractive potential in the density functional theory.7924-2-402.01.6 2.4 2.8 3.2Figure 3.5: Solvent-induced force on a pair of \u201chard-wall\u201d spheresas a function of the separation distance, as obtained from equation (3.31).Spheres interact with cosolutes through a LJ potential (see text). The onlyparameters that determine the force are thus \u03c3 and \u000f, which appear inthe LJ potential that enters into the DFT expression for the force. Eachcurve in the figure corresponds to a given well-depth \u000f in the sphere-cosolutepotential. The depletion force is dominant at small separation, but there isa region in which the spheres are mutually repulsive due to lost attractionor \u201cimpeded-solvation\u201d to the solvent.80Figure 3.6: Schematic renderings of the solute spheres in Figure 3.5at several distances. a) The sphere-cosolute interaction is through a LJpotential, which is negative beyond a distance \u03c3 = rs + ro (shown as thegreen region), and positive and repulsive for d < \u03c3 (red region). The di-rect sphere-sphere interaction is only through a hard-wall potential of radiusrs. The cosolutes have radius ro. (b) Sphere configuration when distanced = 2\u03c3. An cosolute can just fit between the spheres at this distance- theLJ potential is zero in this configuration if the cosolute (dashed sphere) iscentered directly between the solute particles. Such separations have pos-itive force between the solutes in Figure 3.5a, due to \u201cimpeded-solvation\u201d:the repulsive interaction between one sphere-cosolute pair removes some ofthe attractive region from the other sphere-cosolute pair (region shown inmagenta). At the separation shown in (c), the solvent-induced force betweenthe spheres is now attractive; the volume of the removed attractive regionnow varies weakly with separation, and bringing the spheres closer togethergains free energy by removing the depletion zone highlighted in blue.813.5 Transfer Enthalpy and Entropy from DensityFunctional TheoryHaving developed a DFT for cosolute-protein interactions, we can return tothe experimental results analyzed in Chapter 2 and ask what constraints theobserved data places on our simple models.The transfer enthalpy \u03b4H may be found from the free energy (equa-tion (3.25) in Section 3.2) through \u2202(\u03b2\u03b4G)\/\u2202\u03b2:\u03b4H =\u222bd3r V(r)\u03c6(r) = c\u02dc\u222bd3r V(r)e\u2212\u03b2V(r) , (3.32)where \u03c6(r) is given by\u03c6(r) =Ne\u2212\u03b2V(r)\u222bd3r\u2032 e\u2212\u03b2V(r\u2032)\u2261 c\u02dce\u2212\u03b2V(r) , (3.33)The transfer entropy may be found from \u2212\u2202\u03b4G\/\u2202T or directly from \u03b4S =\u03b2(\u03b4H \u2212 \u03b4G):\u03b4S = \u03b2\u222bd3r V(r)\u03c6(r) +N log(1V\u222bd3r e\u2212\u03b2V(r)). (3.34)We can now take a simple, heuristic model wherein we assume that theprotein occupies a volume Vn in the native state, and over that volume exertsan average potential \u000fn on the cosolutes, relative to that of water. Likewise,the protein occupies an average volume Vu in the unfolded ensemble, andexerts an average potential \u000fu on the cosolutes. Then (3.32) and (3.34)reduce to:\u03b4Hn = c Vn\u000fne\u2212\u03b2\u000fn (3.35)\u03b4Sn = c Vn[\u03b2\u000fne\u2212\u03b2\u000fn +(e\u2212\u03b2\u000fn \u2212 1)](3.36)with similar expressions holding for the transfer enthalpy and entropy of theunfolded state in terms of Vu and \u000fu. This then gives us model predictionsfor \u03b4\u2206H = \u03b4Hu \u2212 \u03b4Hn and \u03b4\u2206S = \u03b4Su \u2212 \u03b4Sn.82The observation of entropy-enthalpy compensation means that the devi-ations from the line \u03b2\u03b4\u2206G = 0 are not large compared to the scale of \u03b2\u03b4\u2206Hor \u03b4\u2206S. The dimensionless quantity\u03b2\u03b4\u2206G = \u2212cVu(e\u2212\u03b2\u000fu \u2212 1)+ cVn(e\u2212\u03b2\u000fn \u2212 1)(3.37)defines a hypersurface as a function of the dimensionless variables cVu, cVn, \u03b2\u000fu, \u03b2\u000fn.We can estimate cVn for a 1M solution and a typical steric native volumefor a two-state protein, Vn \u2248 2.5 \u00d7 104A\u02da3. Then cVn \u2248 15, perhaps largerdepending on the size of the protein or cosolute.Then the inequality |\u03b2\u03b4\u2206G| < a, where a is a constant of order unity,defines a region in the space \u03b2\u000fu, \u03b2\u000fn, cVu bounded by the surfaces \u03b2\u03b4\u2206G = aand \u03b2\u03b4\u2206G = \u2212a. In Figure 3.7, we analyze what requirements entropy-enthalpy compensation imposes on the parameters of the model, by takingthe two proteins that had the minimal and maximal free energy change upontransfer to 100g\/l of cosolute. These systems correspond to Arc repressorin KCl, with \u03b2\u03b4\u2206G = 6.1 at the transition temperature in water, to RNaseA in Butylurea, with \u03b2\u03b4\u2206G = \u22128.1 at the transition temperature in water.The two surfaces \u03b2\u03b4\u2206G = 6.1 and \u03b2\u03b4\u2206G = \u22128.1 in Figure 3.7 are quiteclose: for Arc repressor for example, a difference in unfolding interactionenergy of \u2206\u000fu \u2248 0.2kBT can move points from the stabilizing surface to thedestabilizing surface. We also observe that as \u000fn and \u000fu become increas-ingly negative, all systems, for both stabilizing and destabilizing cosolutes,converge to Vu \u2248 Vn. Thus, if the effective interaction potentials are netattractive, even small changes in unfolded volume can yield dramaticallydifferent values of unfolding transfer enthalpy \u03b4\u2206H for example. We foundthat both Lysozyme\/DMSO and RNase A\/Butylurea systems decreased thevolume experienced by the cosolutes upon unfolding, for most of the range ofthe energetic parameters. Strongly stabilizing cosolutes such as KCl showedsignificant increases in volume upon unfolding, if the interaction energieswere small. This particular observation is consistent with previous findingsfor the volume increase upon unfolding of Trp-cage miniprotein in modelhard-sphere cosolutes[35].83Figure 3.7: Allowed volume in parameter space for the proteinswe investigated, as predicted by the classical density functionaltheory of protein transfer free energy [166] for the model described in Sec-tion 3.5. The two bounding surfaces shown bracket the observed transferfree energy transfer values for the proteins we considered. The transfer freeenergy is taken for cosolutes with concentration 100g\/ml, at the transitionmidpoint in water, \u03b4\u2206G(T of ). The upper surface in green corresponds toa transfer free energy of \u03b2\u03b4\u2206G = 6.1, observed for the transfer of Arc re-pressor to KCl. The lower surface in orange corrsponds to a transfer freeenergy of \u03b2\u03b4\u2206G = \u22128.1, observed for the transfer of RNase A to Butylurea.The volume bounded by the surfaces, where parameters characterizing allother proteins reside, is shaded yellow. The Arc repressor system is alsoconstrained by the transfer enthalpy \u03b2\u03b4\u2206Hf = 13.9, which further restrictsparameter values to lie on the black curve within the green surface. TheRNase A system, with \u03b2\u03b4\u2206H = \u221242, is constrained to lie on the cyancurve. The transfer of Lysozyme to DMSO corresponds to the two redcurves. The lower red curve corresponding to negative interaction energiesobeys Vu < Vn, while the upper red curve crosses the plane Vu = Vn forsome energetic parameters. The transfer of RNase A to N-N\u2019 Dimethylureacorresponds to the two blue curves. Both red and blue curves lie betweenthe surfaces. The intersection of the surfaces with the plane cVu = cVn, andwith the plane \u000fu = 0 are shown as grey solid or dashed lines for the upperand lower surfaces respectively.84In fact, rigorous inequalities for the volume change upon unfolding aswell as the protein-cosolutes effective interaction energies may be shown forthe case of a stabilizing cosolute with negative change in unfolding enthalpyupon transfer (\u03b4\u2206G > 0 and \u03b4\u2206H < 0) and for a destabilizing cosolute withpositive change in unfolding enthalpy upon transfer (\u03b4\u2206G < 0 and \u03b4\u2206H >0). Examples of the former protein cosolute system are ubiquitin in ficoll orlysozyme, RNase A in glycine, and cytochrome c in trehalose. Interestingly,we did not find a protein that fit into the latter class among the proteins weinvestigated. Consider first the case in which \u03b2\u03b4\u2206G > 0 and \u03b2\u03b4\u2206H < 0, andassume that the protein-cosolute energies are negative with respect to water(\u000fu < 0, \u000fn < 0). From equations (3.35) and (3.37), letting u \u2261 \u2212\u000fu andn \u2261 \u2212\u000fn, we have Vuueu > Vnnen and Vu(eu\u2212 1) < Vn(en\u2212 1). EliminatingVu\/Vn from these inequalities yields ueu\/(eu \u2212 1) > nen\/(en \u2212 1). Since thefunction f(x) = xex\/(ex\u2212 1) is monotonically increasing for positive x, thisinequality directly shows that |\u000fu| > |\u000fn|. Therefore (en\u22121)\/(eu\u22121) < 1 andVu\/Vn < 1. Thus the effective volume decreases upon unfolding. Similarly,for the case in which \u03b2\u03b4\u2206H > 0 and \u03b2\u03b4\u2206G < 0, |\u000fn| > |\u000fu| and Vu > Vn.Inequalities may also be obtained in the limit of strong entropy-enthalpycompensation, where |\u03b4\u2206H| \u001d 1 and |\u03b4\u2206G| \u2248 0, still assuming \u000fn, \u000fu < 0.Then equation (3.37) gives Vu\/Vn \u2248 (en \u2212 1)\/(eu \u2212 1). If \u03b4\u2206H is largeand positive, equation (3.35) gives Vu\/Vn \u001c nen\/ueu. Together these yield|\u000fn| > |\u000fu| and Vu > Vn. If \u03b4\u2206H is large and negative, |\u000fu| > |\u000fn| andVn > Vu.For the more realistic case of \u000fn, \u000fu > 0, if \u03b2\u03b4\u2206H < 0, \u03b2\u03b4\u2206G > 0, andthe interaction energies are larger than kBT (\u03b2\u000fn, \u03b2\u000fu > 1), then \u000fu > \u000fn; if\u03b2\u03b4\u2206H > 0, \u03b2\u03b4\u2206G < 0, and \u03b2\u000fn, \u03b2\u000fu > 1, then \u000fn > \u000fu.Further analysis of the parameter space for the simple model introducedhere, as well as more realistic models that include both bulk and a surfaceterms in the free energy functional, are a topic for future research.853.6 ConclusionsIn this chapter we have explored the application of the density functionalframework to protein transfer free energies. We have focused primarily onconceptual questions, such as the role of solvent excluded volume, the tem-perature dependence of transfer free energies, and how the density functionaltheory (DFT) would reduce to a Volume + SASA model of transfer free en-ergy.We compared the DFT results with those from a simplified model thattreated the protein as a tube with a given volume and surface area, onwhich cosolutes could condense. The DFT contains contributions from bothenthalpy and entropy, so it allows for the calculation of the temperature-dependence of the transfer free energy.A further development of the theory presented here which accounts forinterparticle correlations while maintaining computational efficiency is animportant topic for future research. As well, the calculation of transfer freeenergies was implemented here for a model system with simplified poten-tials that were parameterized to experimental values. One could extend thisby implementing the theory using more realistic potential models, and all-atom representations of a protein or peptide. The various approximationsinvolved in these potentials and models could then be tested and the lim-its of their validity determined through comparisons with experiment andsimulation. The DFT framework may also provide a method to obtain com-putationally efficient but still accurate implicit solvent models for moleculardynamics simulation, a subject of immense practical importance. In general,the framework of density functional theory can provide a powerful tool toexplore aspects of solvation in the context of protein folding, and can do soin a systematic way.86Chapter 4Theoretical Considerationsin Classical DensityFunctional Theory4.1 Defining the Transfer Free EnergyHaving shown the usefulness of classical density functional theory in treatingcosolutes, we now turn to water. While cosolutes tend to be relativelydilute, such that the approximation that \u2206\u03a6\u2032 = 0 used in Equation 3.25is acceptable (because the cosolute-cosolute energy is not changed by thepresence of an external potential), water is not dilute and this approximationis unlikely to be very good. Thus a suitable approach to determining \u03a6\u2032 mustbe found.To begin, though, we broaden our scope to consider the transfer freeenergy in a more general sense. The usefulness of the transfer free energy(also known as the free energy of solvation in the context of polar solvents)extends beyond protein simulations\u2013every molecule of interest, whether abiological molecule, a functional inorganic molecule, or a member of somenanostructure, operates in a background of solvent molecules that impactthe function of the molecule one is looking at, but whose behaviour, in andof itself, is not of interest.87Figure 4.1: Diagram of the transfer free energy. The top row representsthe states we can simulate cheaply\u2014those without the solvent present. Thebottom row represents the states we actually want to access\u2014those withthe solvent present. The vertical transitions constitute the transfer process.The transfer process can be broken up into two terms: the change in freeenergy of the solvent, \u2206Gsolvent\u2212solvent, and the change in free energy of theenvironment, \u2206Gsolute\u2212solventThe transfer free energy can be implemented in computational studiesin several different ways. The implicit solvent models discussed in Chap-ter 1 are typically implemented as an extra force at each time step of thesimulation\u2013i.e. the total force on the ith particle will be Fi = Fforcefieldi +\u2207i\u2206G, where Fforcefieldi is the term arising from the explicit particles in thesimulation and \u2206G is the transfer free energy. Alternatively the transferfree energy can be implemented in a post-processing way. This makes use ofthe Tanford transfer model[168] to modify the weights and energies of theobserved simulation states. This approach requires that the initial simula-tion offer sufficient sampling of the final states[197], and thus is limited torelatively small changes in the environment, such as a change from a proteinin water to a protein in water with a small concentration of urea[144].We need to take care to clarify what the transfer free energy consists of,and which parts of it we would like to calculate. Consider an initial statewhich consists of an isotropic bath of solvent molecules and, isolated fromthis bath, a solute molecule. This system has free energy Gi. The final state88will be the solute molecule dissolved in the bath of solvent molecules, and thisstate will have free energy Gf . Then the total change in free energy \u2206Gtotalwill have three terms: a term arising from the solute-solvent interaction, aterm arising from the solvent-solvent interactions (accounting for both thechange in entropy as the ensemble shifts and the change in solvent-solventpotential), and a term arising from the solute-solute interactions (again bothin entropy and enthalpy). That is,\u2206Gtotal = \u2206Gsolute\u2212solvent + \u2206Gsolvent\u2212solvent + \u2206Gsolute\u2212solute (4.1)Experimental measures of the transfer free energy measure \u2206Gtotal, whiletypically implicit solvent models are concerned with calculating the first twoterms, \u2206Gsolute\u2212solvent+\u2206Gsolvent\u2212solvent, which we will refer to as \u2206Gsolvent.The reason for this is that if during the course of a simulation \u2206Gsolvent isimplemented correctly, \u2206Gsolute\u2212solute will then fall out naturally as thesystem adjusts; e.g. the swelling of a polymer in response to a solvent thatmakes surface exposure favourable. Also note that \u2206Gtotal and \u2206Gsolventare equal in the limit of a small rigid solute molecule, since then neither thesolute energy nor its configurational space change.We now wish to consider the general case of \u2206Gsolvent within the frame-work of classical density functional theory. To do this we take the solventto be in an NPT ensemble and the solute molecule to be a fixed source ofexternal potential. We can then take the solvent-solute term \u2206Gsolute\u2212solventas the minimum work required to insert this fixed potential, and the solvent-solvent term \u2206Gsolvent\u2212solvent as the change in free energy of the solvent;\u2206Gsolvent\u2212solvent = N\u2206\u00b5where N is the number of solvent molecules and \u2206\u00b5 is the change in chemicalpotential of the solvent molecules. The chemical potential of the solventmolecules is not the same in general as the chemical potential of the solutemolecule, an important point we will return to. We will show below that\u2206\u00b5 is identically zero in the NPT ensemble.89As a simple example to motivate our conclusions, consider the free energychange to transfer an idealized solute into an ideal gas that initially hasvolume Vi and pressure Pi. The idealized solute has a volume v0 over whichit produces a step potential of U0. The configuration integral of the partitionfunction of the gas after insertion is proportional toZ \u221d [Vf \u2212 v0(1\u2212 e\u2212\u03b2U0)]Nwhere Vf is the final volume of the solvent + solute. Vf is greater than Viif U0 > 0 and less than Vi if U0 < 0; it is determined from the constancy ofthe pressure, Pf = Pi, wherePi =NkBTViThe final pressure may be found fromP = \u2212 \u2202G\u2202V\u2223\u2223\u2223\u2223Vf=NkBTVf + v0(e\u2212\u03b2U0 \u2212 1)so thatVf = Vi \u2212 v0(e\u2212\u03b2U0 \u2212 1)Thus the change in free energy of the solvent isN\u2206\u00b5 = Gf \u2212Gi = NkBT log[Vf \u2212 v0(1\u2212 e\u2212\u03b2U0)Vi]= 0That is, the change in free energy of the ideal gas upon inserting the regionof potential U0 is 0, regardless of U0 and vo. The transfer free energy isthen equal to the change in free energy of the environment\u2013in this caseP\u2206V = \u2212Pv0(1\u2212 e\u2212\u03b2U0).4.2 The DFT FormulationWe consider the solute molecule being transferred as a fixed potential thatacts on the solvent, and we ask what the difference in free energy of the90solvent is in going from the system without that potential to that with theexternal potential. So we write the free energy as a functional of the solventdensity \u03d5:G =\u222bVsysdr V(r)\u03d5(r) + kBT (\u03d5(r) log\u03d5(r)\u2212 \u03d5(r))+ \u03a6[\u03d5] (4.2)Here V(r) is the potential the solvent feels due to the solute, the integral istaken over the volume of the system Vsys, \u03a6 contains all non-ideal terms inthe free energy (ie, all two and higher particle correlation terms) and kBT isBoltzmann\u2019s constant times the temperature. While in earlier chapters weapproximated \u03a6[\u03c6], here we will leave it as is, and manipulate it formallywithout specifying the form it takes.While in Chapter 3 we considered the cosolute contribution to the trans-fer free energy in an NVT ensemble (while later allowing V \u2192 \u221e), in thischapter we consider the more general case of a transfer from vacuum to anarbitrary solvent bath, in an NPT ensemble.Minimizing equation (4.2) subject to a fixed number of solvent particlesamounts to minimizing the functionL = G+ \u00b5(N \u2212\u222bVsysdr \u03d5(r))where the Lagrange multiplier \u00b5 is the chemical potential. The Euler-Lagrange equation becomes:\u03b4L\u03b4\u03d5= V + kBT log\u03d5(r) + \u03b4\u03a6\u03b4\u03d5(r)\u2212 \u00b5 = 0 (4.3)We now define solvent redistribution energy density \u03a6\u2032 \u2261 \u03b4\u03a6\u03b4\u03d5 and note that\u03a6\u2032(r) can be thought of physically as the energy arising from solvent-solventinteractions; if we consider only the two-particle enthalpic component of \u03a6\u2032we get\u03a6\u2032(r) =\u222bdr\u2032 U(r\u2212 r\u2032)g(r, r\u2032)\u03d5(r\u2032)91where U(r \u2212 r\u2032) is the solvent-solvent interaction potential and g(r, r\u2032) thetwo-particle total correlation function.Equation (4.3) gives an (implicit) expression for \u03d5(r),\u03d5(r) = e\u03b2\u00b5e\u2212\u03b2(V(r)+\u03a6\u2032(r)) (4.4)both sides of which can be integrated over the system volume to giveN = e\u03b2\u00b5\u222bVsysdr e\u2212\u03b2(V(r)+\u03a6\u2032(r))or\u00b5 = \u2212kBT log(1N\u222bVsysdr e\u2212\u03b2(V(r)+\u03a6\u2032(r)))(4.5)For the transfer problem we find the difference in free energy N\u2206\u00b5 be-tween the initial case in which the external potential on the solvent is zeroeverywhere, and the final state. We thus write\u2206\u00b5 = \u2212kBT log\uf8eb\uf8ed\u222bVf dr e\u2212\u03b2(V(r)+\u03a6\u2032f (r))Vie\u03a6\u2032i\uf8f6\uf8f8 (4.6)where Vf and Vi are the final and initial system volumes respectively, and\u03a6\u2032f and \u03a6\u2032i are likewise the final and initial solvent redistribution energydensities. In the denominator of equation (4.6) we have used the fact that,in the absence of an external potential, the system is homogeneous andisotropic, so any property of the solvent must be independent of position r.We now consider the conditions under which the transfer is made. Wewill assume that there is some region \u2126\u221e in the solvent that is sufficientlyfar from the solute potential in the final system such thatV(r \u2208 \u2126\u221e) = 0 (4.7)\u03d5f (r \u2208 \u2126\u221e) = \u03d5i (4.8)\u03a6\u2032f (r \u2208 \u2126\u221e) = \u03a6\u2032i (4.9)92Figure 4.2: Density, potential, and \u03a6\u2032 of water as a function of distancefrom a spherical van der Waals potential. The interaction function \u03a6\u2032(r)decays to zero on the same length scale as V(r).Using equation (5.3) in equation (4.8) givese\u03b2\u00b5ie\u2212\u03b2\u03a6\u2032i = e\u03b2\u00b5f e\u2212\u03b2(V(r\u2208\u2126\u221e)+\u03a6\u2032f (r\u2208\u2126\u221e)) . (4.10)Using (4.7) and (4.9) in equation (4.10) gives e\u03b2\u00b5i = e\u03b2\u00b5f , or\u2206\u00b5 = 0 , (4.11)i.e. the change in free energy of the solvent upon transfer is zero. Thus thesolvent transfer free energy is given entirely by the solvent-solute interaction,which is simply the work done by the external potential to insert itself intothe solvent bath.\u2206Gsolvent = \u2206Gsolvent\u2212solute (4.12)93Equations (4.11) and (4.12) together are the principal result of this chapter.We now consider a set of conditions that are sufficient for the existenceof a region \u2126\u221e that satisfies equations (4.7)-(4.9):1. The transfer of the potential source (the solute) into the system occursat constant solvent particle number N , pressure P , and temperatureT .2. The external potential acts on a finite, enclosed region of the system.This implies a fixed solute molecule transferred to a fixed position, sothat we are addressing the solvation contribution to the transfer freeenergy.3. Define lengths `C and `G as follows: At distances r > `C , the di-rect correlation function C(r) satisfies |C(r) \u2212 1| < \u000fC , where \u000fC is asystem-dependent constant. Similarly, at distances r > `G, the totalcorrelation function G(r) satisfies |G(r) \u2212 1| < \u000fG, where again \u000fG isa system dependent constant. We thus require that the direct cor-relation length `C and the total correlation length `G of the solventmolecules are both finite in the initial and final systems.4. The system is sufficiently large that there exists a finite region \u2126\u221ethat is everywhere farther than `C +`G from the region over which thepotential acts.The above conditions ensure that identical initial and final pressuresyield identical values of \u03d5 and \u03a6\u2032 far from the solute. In Figure 4.2 we plotthe density relative to the equilibrium density \u03c6f\/\u03c6i, the external potentialV, and the difference in the solvent-solvent interaction terms \u03a6\u2032f \u2212 \u03a6\u2032i as afunction of distance from a spherical van der Waals potential for a simulationof TIP3P water at 300K. The simulation was performed in GROMACS withthe CHARMM27 forcefield (see Appendix B for specific parameters used).One feature that is immediately apparent is that the length scale over which\u03a6\u2032f \u2212 \u03a6\u2032i is non-zero is very small\u2013slightly larger than 1 nm. This is trueeven for the long-range electrostatic force; due to screening by the dielectric94medium of water (enhanced by the salt present in real biological systems)the effective range of the force will be small. One atom per nm3 correspondsto a concentration of around 1 M, so we can conclude that up to that pointa solute is well described by the infinite dilution limit.4.3 DiscussionTo return to the simple example in section 4.1, we consider the ideal gasform of the DFT transfer free energy, in which \u03a6\u2032 = 0. Then\u03c6 =NVe\u2212\u03b2(V) (4.13)and the work done to insert a step potential U0 over a volume V0 is then\u2206Gsolvent =\u222bd3r\u222b U0odVNVe\u2212\u03b2(V) (4.14)=kTNVV0(1\u2212 e\u2212\u03b2U0) (4.15)which is simply the ideal gas pressure times the change in total volume. Wediscussed in chapter 3 how this result also comes about by considering theconstant volume system and letting V \u2192\u221e. Thus in the ideal gas approx-imation, the solvent-solute transfer free energy is equal to P\u2206V , which isalso the result given from the NVT ensemble in the limit of infinite V. Aswe will discuss below, these equalities are not true when we move away fromthe ideal gas.The result in equation 4.11 is significant for two reasons. The first isthat, to our knowledge, it is a general proof in the context of density func-tional theory. That the solvent-solvent term in the transfer free energy iszero was shown by Yu and Karpluss using different methods for a specificcase[164]. Their result made use of the hyper-netted chain closure relation,which is only one possible closure relation of the Ornstein-Zernike equation.To our knowledge the identity \u2206\u00b5 = 0 has not been extended to all closurerelations, and thus our result here, which is independent of closure rela-95tion, is significant. Indeed, we have shown that the free energy of a bath ofparticles in the NPT ensemble is independent of the external potential solong as that potential is finite in range. Further, while our definition of theterms in the transfer free energy here makes it clear that \u2206\u00b5 correspondsto Karpluss and Yu\u2019s solvent re-organization term, this has not been appre-ciated in other literature on classical density functional theory. Ramirez,Mareschal, and Borgis [198], for example, do not make use of such a resultin their discussion of the transfer free energy.The other reason this result is significant is practical: it is a useful the-orem in developing an implicit solvent model within the cDFT framework.This is the subject we turn to next.96Chapter 5Including Solvent-SolventInteractions in DFT5.1 A Simple Form for the Solvent-Solvent Inter-actionIn developing an implicit solvent model, the overall goal is to develop a wayof computing the solvent forces on protein atoms quickly; the main reasonone uses such models in practice is to speed up simulations. To that endwe require that the final equations for the force involve only the proteincoordinates and constants\u2014not the solvent density. That said, to addresssolvent forces in a DFT approach we obviously need to consider the solventdensity\u2014but we consider approximations to the free energy functional thatgive the solvent density a simple enough form that we can evaluate the sol-vent forces on a given atom as a single spatial integral over scalar functions.The challenge in the context of this work is to express the quantity \u03a6\u2032 (seesection 3.2 equations 3.17 and 3.18) in a way that allows for both accuracyand speed.After the assumption that the solvent-solvent interaction energy doesnot change upon transfer, which we have made in Chapter 3, the next sim-plest approximation mathematically is to assume that the solvent-solvent97interactions take the form of a delta function: \u03a6(r, r\u2032) = f(\u03c6)\u03b4(r\u2212r\u2032). Thatis, limiting the free energy function to terms no higher than two-body,G =\u222bd3rkT(\u03c6(r) ln[\u03bb3\u03c6(r)]\u2212 \u03c6(r))+V(r)\u03c6(r)+\u222b d3r\u2032\u03b32\u03b4(r\u2212r\u2032)\u03c6(r)\u03c6(r\u2032)(5.1)This density functional can be minimized in the usual way to give\u03b4G\u03b4\u03c6= kT ln[\u03bb3\u03c6(r)] + V(r) + \u03b3\u03c6(r) = \u00b5, (5.2)which can be re-arranged to give\u03bb3\u03c6 = e\u03b2\u00b5e\u2212\u03b2Ve\u2212\u03b2\u03b3\u03c6 (5.3)where it is understood that \u03c6 and V depend on position. Equation 5.3 canbe solved for the density to give\u03c6 =1\u03b2\u03b3W[\u03b2\u03b3e\u03b2\u00b5e\u2212\u03b2V\u03bb3](5.4)where W is the Lambert-W function.At this point we can make use of the main result from Chapter 4: in theNPT ensemble the chemical potential is independent of external potential,so long as the potential is finite in range. Thus in Equation 5.2 we can setV = 0 everywhere, and note that this implies \u03c6 must be a uniform constant\u03c1 = N\/V so that\u00b5 = kT ln[\u03bb3\u03c1] + \u03b3\u03c1 (5.5)Inserting this into Equation 5.4 then gives\u03c6(r) =1\u03b2\u03b3W[\u03b2\u03b3\u03c1e\u03b2\u03b3\u03c1e\u2212\u03b2V(r)](5.6)If we take the usual assumption that the potential V has the formV(r) =\u2211ivi(r\u2212Ri)98where Ri is the position of the ith atom in the protein, and vi is the potentialenergy at r due to atom i, then we can calculate the force on the ith atomfrom equation 5.6 viaFi =\u222bd3r \u03c6(r)\u2202vi\u2202Ri(5.7)We can also calculate the transfer free energy through\u2206G =\u222bd3r\u222b Vfin0dV \u03c6(r;V) (5.8)The integral over V evaluates analytically using properties of the Lambert-W function to give\u2206G = \u22121\u03b3\u222bd3r W(\u03b2\u03b3\u03c1e\u03b2\u03b3\u03c1\u2212\u03b2V(r))+12W(\u03b2\u03b3\u03c1e\u03b2\u03b3\u03c1\u2212\u03b2V(r))2+ V(\u03b2\u03c1+12\u03b3(\u03b2\u03c1)2)(5.9)Going back to section 4.3, it is worth noting here that equation 5.9 isnot equal to P\u2206V (when \u03b3 6= 0). For this form of the interaction potential,the pressure can be calculated to beP =kBTV+\u03b3\u03c12(5.10)and the change in volume is\u2206V =\u222bd3r[1\u03c1\u03b2\u03b3W(\u03b2\u03b3\u03c1e\u03b2\u03b3\u03c1\u2212\u03b2V(r))\u2212 1](5.11)The non-interacting solvent case has the somewhat fortuitous result thatthe NPT ensemble transfer free energy, the NVT ensemble transfer freeenergy in the infinite dilution limit, and P\u2206V are all equal. It appears thisis not the case for models with solvent-solvent interaction terms.995.2 Making the Form of Solvent-Solvent Interac-tions More GeneralWhile we could pursue the derivation in Section 5.1 in more detail, insteadwe will stop here to note that we can model the solvent-solvent interaction\u03a6\u2032 as a general function of \u03c6 without much increase in complication, so longas we continue to include the \u03b4-function form for \u03a6\u2032; that is, we can takethe interaction free energy Gint = \u03a6[\u03c6] (see equation 4.2 of Chapter 4) to beGint =\u222b \u222bd3r d3r\u2032 \u03b4(r\u2212 r\u2032)\u03c6(r)f (\u03c6(r\u2032)) (5.12)Performing the same procedure as above, in which we minimize the func-tional G with respect to \u03c6 and use the fact that \u00b5 is independent of externalpotential, gives an implicit expression for \u03c6:\u03c6(r) = \u03c1e\u03b2(f(\u03c1)\u2212f(\u03c6(r))e\u2212\u03b2V(r) (5.13)where \u03c1 is the equilibrium density in the absence of external potential.From equation 5.13 it is clear that \u03c6(r) is local: it depends only on theform of f(\u03c6), the value of the external potential at r, and constants; \u03c6(r)does not depend on the value of \u03c6 anywhere other than r.We set the restriction on equation 5.13 that \u2202\u03c6\u2202V < 0. This has bothphysical and computational motivations. Physically, we want to ensure thatour equations are such that an increase in external potential never leadsto an increase in solvent density. Computationally, we need to ensure that\u03c6(V) is a one-to-one function, and thus equation 5.13 has a unique solutionfor \u03c6 given V. Performing the derivative in equation 5.13 gives\u2202\u03c6\u2202V =\u2212\u03b2\u03c61 + \u03b2 \u2202f\u2202\u03c6\u03c6(5.14)Since \u03c6 and \u03b2 are always positive, this implies that we require\u03b2\u2202f\u2202\u03c6\u03c6 > \u22121 (5.15)100Again, since \u03c6 and \u03b2 are always positive, a sufficient (but not necessary)condition is that \u2202f\u2202\u03c6 > 0. Relaxing this inequality requires knowing some-thing about \u03c6, which for all but the simplest f(\u03c6) is non-analytic. At leastin practice, after \u03c6(V) has been found numerically, equation 5.15 can beused to confirm that \u03c6(V) is one-to-one.As an example, we will takef(\u03c6) =k2\u03b2(\u03c6\u2212 \u03c1)2 (5.16)For this model we plot \u03c6\/\u03c1 as a function of V\/kBT in Figure 5.1. Eventhough for this interaction function it is clear that \u2202f\u2202\u03c6 > 0 does not hold,visually the function can be seen to satisfy \u2202\u03c6\u2202V < 0. And indeed, when weuse the calculated values of \u03c6 to plot the quantity in equation 5.15 in Figure5.1, we see that the inequality is satisfied. Finding the general conditionson f(\u03c6) for equation 5.15 to be satisfied is a topic for future work.5.3 Fitting the Model to DataWe would like to fit the model of section 5.2 to simulation data, so thatit reproduces the correct forces that the solvent exerts on the protein inany given protein configuration. In principle, we could allow both V(r) andf(\u03c6) to be defined by a spline function of general form. This gives a modelwith as many parameters are there are points in these two splines, thoughin practice these will be constrained by equation 5.15, and potentially byfurther constraints we may place on how fast these functions can change.Below we will make the problem more tractable by developing a procedureto fix V(r).It is important to note that V(r) can be different for each atom typein the protein\u2014and indeed should be different. On the other hand, f(\u03c6) isa single function and should depend only on the solvent used. Thus it isnot sufficient to simulate the transfer of a single atom to the desired solventand then find the V(r) and f(\u03c6) which reproduce the solvent density andtransfer free energy; instead we must consider various combinations of atom10100.511.522.533.54-4 -2 0 2 4k=1k=2k=3k=4Figure 5.1: The solvent density \u03c6 vs. external potential for the inter-action function 5.16 for various values of k.types at various distances from each other. Of particular interest must bethe regions in which the solvent feels the potential from more than one atom,as we expect the solvent density in those regions to be the most difficult tocapture. Because f(\u03c6) is a non-linear function whose parameters are limitedonly to the number of spline points we choose to take (in what follows wewill use twenty), using the transfer free energy from e.g. a single atom wouldnot uniquely determine f(\u03c6).The procedure we will use to determine V(r) for various atom typesand f(\u03c6) can be illustrated by example. We take two uncharged C\u03b1 atomsfrom the CHARMM27 forcefield. We then treat these atoms as bondedand use bond contraints to fix them at various distances from each other(d = 0.12, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 nm). For each distance we calculate102-1-0.500.511.52-2 -1.5 -1 -0.5 0 0.5 1 1.5 2k=1k=2k=3k=4Figure 5.2: Testing the condition in equation 5.15; indeed \u03b2 \u2202f\u2202\u03c6\u03c6 > \u22121.the transfer free energy from vacuum to TIP3P water with the Bennet Ac-ceptance Ratio method in GROMACS. We also get the density of watermolecules around each configuration. We then seek to find V(r) and f(\u03c6)which reproduce both the transfer free energies and the water density profilesfor all eight distances.Figure 5.3 shows the simulation results. The solvation shells for bothatoms are clearly visible. Also of interest is the desolvation barrier observedin panel B. It is worth noting that desolvation barriers cannot be captured bySASA models as the surface area of the two atoms monotonically decreasesas they are brought together.Rather than allowing both V(r) and f(\u03c6) to vary independently, wefixed V(r) for a given f(\u03c6) by using f(\u03c6) to calculate \u03c6(V), then finding the1030-1-2-312314182226300.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8ABFigure 5.3: Transfer of a pair of bonded carbon atoms from vacuumto water. Transfers were performed using the CHARMM27 forcefield withTIP3P water. A) Density profiles for each distance; darker areas correspondto higher water densities. The excluded volume (lighter spheres around eachcarbon) and solvation shells (darker rings around the lighter spheres) areclearly visible here. B) Transfer free energy as a function of d. Notable isthe desolvation barrier observable around d = 0.5nm. The local maximumin \u2206G arises because the finite size of water molecules induces preferred andavoided separation distances of the carbons. Uncertainties on the transferfree energy are smaller than the symbol size.104inverse of this function \u03c6\u22121(\u03c6|f) and applying it to the observed simulationdensity profile \u03c6sim(r) for a single carbon atom. SoV(r) = \u03c6\u22121(\u03c6sim(r)|f) (5.17)We then vary f(\u03c6) (and hence V(r)) to minimize the difference betweenthe DFT calculated free energies and the explicit solvent free energies.We parameterized f(\u03c6) as a cubic spline with 20 spline points and useda trust-region minimization approach[199] to find those points. After per-forming several stages of minimization, we arrive at a function f(\u03c6) whichcaptures the results of Figure 5.3 reasonably well. This function is plottedin Figure 5.6, and the resulting transfer free energies are shown in Figure5.4. The cDFT model parameterized here captures the shape of the \u2206Gvs d curve, and in particular captures the desolvation effect observed whend = 0.25 to 0.3 nm. As mentioned above a solvent accessible surface areaapproach cannot capture this\u2013shown is the best fit SASA curve for thesefree energies, even if the fit is of similar quality with respect to the residual.There are several other comparisons we can make. The potential functionV arrived at from equation 5.17 (which implicitly uses f(\u03c6)) can be com-pared to the actual potential the solvent molecules experience. In the caseof TIP3P water and uncharged solute atoms, this comparison is straight-forward as the water hydrogen atoms have no van der Waals radius. Thecomparison between the two potentials is shown in Figure 5.5. Relative tothe true potential, the effective potential has extra local maxima and min-ima, which correspond to the solvation shells of water. These features mustbe present because we recast the many-body behaviour of water into theeffects due to a local potential. Additionally, the well depth is somewhatgreater for the effective potential, and the hard wall does not rise quite asfast. Still the overall form is roughly consistent with what we might expecta priori.We can also compare the effective solvent-solvent interaction functionf(\u03c6) with a function derived directly from simulations. We start by recasting105Figure 5.4: Transfer free energies with DFT and surface area. Thefree energies from Figure 5.3 are compared with the best fit free energiesfrom the cDFT model developed in the text and the best fit free energiesfrom a surface area model. While the residuals of the fits are comparable,the surface area model fails to capture the desolvation barrier observed from0.25 to 0.3 nm. The SASA curve was generated by taking the analytic formof the surface area for two spheres and allowing the effective solvent radiusand the proportionality constant to be varied.106Figure 5.5: Effective and actual potentials. The red curve gives thepotential arrived at by finding the f(\u03c6) that best reproduces the two-carbontransfer energy data and requiring V(r) to reproduce the observed radialdistribution function. The green curve is the actual potential (i.e. a van derWaals potential).equation 5.13 as an equation for f(\u03c6)\u2212 f(\u03c1):f(\u03c6)\u2212 f(\u03c1) = \u2212V(r)\u2212 kBT ln(\u03c6\u03c1)(5.18)This is in fact exactly how we extracted \u03a6\u2032 in Figure 4.2; here f(\u03c6) \u2212 f(\u03c1)has taken the place of \u03a6\u2032. From a simulation of an isolated carbon atom wedetermine \u03c6, and then V in equation 5.18 is taken as the actual potential.Then we can find for each r a V and a f(\u03c6) \u2212 f(\u03c1) (or \u03a6\u2032f \u2212 \u03a6\u2032i in our107Figure 5.6: Solvent-solvent interaction potentials. The red curve isthe best fit f(\u03c6) arrived at from the DFT approach, while the gree curve isthe observed \u03a6\u2032 from the radial distribution function.more general notation) and plot these against each other. This we do inFigure 5.6. Two interesting features emerge: one is that the overall shapeof the functions is similar, and the other is the jump in f(\u03c6) around \u03c6 = 80nm3. This jump corresponds to a \u201cdoubling\u201d back of the \u201ctrue\u201d solvent-solvent interaction function, which is no longer a single-valued function atthese densities. It is encouraging that this feature emerged from a routinewhich simply varied f(\u03c6), and minimized the difference between observedand calculated free energies. That the functions do not overlap entirelyis to be expected, as the effective interaction potential involves projectingessentially non-local interactions onto purely local ones.108Finally, we can compare the transfer free energy of atoms not used in thefitting procedure to observed explicit solvent transfer free energies. To dothis we calculated the transfer free energy of several uncharged amino acidsin GROMACS with the CHARMM27 forcefield using the Bennet acceptanceratio method. These amino acids are composed of various atoms; in additionto several types of carbon (carbon atoms with slightly different force-fieldparameters are used in various positions in the amino acid), nitrogen, oxy-gen, sulfer, and hydrogen are also present. For each of these the effectiveV(r) was determined from observed radial distribution functions and f(\u03c6)through equation 5.17 and the total transfer free energy calculated in theDFT approach. The result is shown in Figure 5.7. The results show generalagreement, with a correlation coefficient of 0.7, and absolute magnitudesthat are comparable. By comparison we can consider the best fit SASAprediction of the transfer free energy vs the explicit simulation free energy,which has a correlation coefficient of 0.65. To compare these another way,using the Kendall \u03c4 test, we arrive at a significance of p = 0.001 for therelationship between the DFT transfer free energy and the explicit solventtransfer free energy, and a significance of p = 0.003 for the relationship be-tween SASA and the explicit solvent transfer free energy. We note here thatthe correlation between SASA and explicit transfer free energy is the bestpossible correlation, while the correlation between the DFT predictions andthe explicit transfer free energy is a starting point which we expect wouldbe improved by fitting f(\u03c6) (and potentially the effective potentials V(r) )to a more representative set of candidate structures rather than two carbonatoms.5.4 DiscussionWe have shown that the protein-water interaction can be captured at least aswell as existing implicit solvent approaches by a classical density functionaltheory with purely local solvent-solvent interactions. The strengths of thismethod are that it captures important effects lacking in traditional solventaccessible surface area models, while still being fast to implement. The109Figure 5.7: A) Comparison of DFT predicted free energies withresults from explicit solvent simulations. The explicit solvent resultswere obtained with the Bennet acceptance ratio method implemented inGROMACS. The DFT predicted free energies were obtained as described inthe text. B) Comparison of SASA predicted free energies with results fromexplicit solvent simulations.reason for this speed is the approximation that allows the density to be asingle valued function of the potential. Thus a table of \u03c6(V) values can becomputed once for a given f(\u03c6) (which does not change from simulationto simulation) and then \u03c6 can be determined at any point from the atom-dependent V(r) functions. Given that each of these functions can be storedas a cubic spline, the total transfer free energy can be computed in a singleintegral over the space surrounding the protein, a procedure of comparablenumber of operations to the evaluation of surface area[200].To make this last point more quantitative, we examine the calculationof solvent accessible surface area implemented in GROMACS, described byEisenhaber et al[201]. Here, to calculate surface area, each atom in themolecule is assigned a set of points around it. The distance from each ofthese points to neighbouring atoms is calculated, and then each point iseither declared buried or exposed. From this the exposed surface of theatom is determined. Such a method could also be applied to the DFTapproach described in this chapter, with two differences: 1) the density mustbe evaluated at each point, and 2) the number of points needed to accurately1107891011121314151610 100 1000 10000 10000002468101214DFT free energyDeviation in SASANumber of pointsTransfer free energy (kJ\/mol)Deviation in SASA (nm2)Figure 5.8: Testing convergence of a DFT free energy calculation. The freeenergy is plotted against the number of points for which the effective externalpotential was neither zero nor infinite, divided by the number of atoms. Alsoplotted is the deviation from true SASA vs the number of points per atomsfrom [201]. The two converge in a similar scale.determine the transfer free energy may be different than the number neededto determine the solvent accessible surface area, particularly as the pointswould need to be distributed in a spherical shell around each atom ratherthan on a surface.Based on the benchmarks described in [201], it requires on the order of106 floating point operations to compute the distance of neighbouring atomsto approximately 3\u00d7 105 points. Evaluating a cubic spline at each of thosepoints and summing the result can be expected to add on the order of 10operations, an increase of approximately 30 %.The question of how many points would be required is more difficult toanswer without fully implementing the implicit solvent method and testingit on various proteins. We can make an estimate, though, from our initialcalculations. The DFT values for \u2206G in Figure 5.7 were computed with athree-dimensional integration grid which covered all the space surrounding111the protein. To make a better comparison to the method in [201], we canimagine a method in which only the points around each atom for which theeffective external potential is neither zero nor infinite are considered. InFigure 5.8 we look at the free energy of transfer of a Leucine amino acid(which contains 22 atoms) calculated by DFT, vs the number of non-zero-non-infinite points. The value converges around 1000 points per atom. In[201] the number of points per atom considered ranged from 300 to 1500,depending on the accuracy wished. All together we estimate that the DFTcalculation can be performed with approximately twice the operations asurface area calculation can be.The procedure outlined here calculates the non-polar component of thetransfer free energy. While density functional theory could be extended toconsider the electrostatic aspects of the transfer free energy, the more limitedapproach described here allows this procedure to be integrated with existingmethods, particularly GB\/SA. Despite its flaws, GB\/SA is undoubtedly themost widely used implicit solvent model in molecular dynamics simulations,and thus the ability for new models to integrate and improve upon it mustbe counted as an advantage.There remains much work to do with this model, which we will considerin the next chapter, on future research directions.112Chapter 6Conclusions and FutureDirections6.1 SummaryIn this thesis we studied protein-solvent interactions and developed a clas-sical density functional theory for those interactions. The interactions be-tween the protein and its environment are inseparable from the intra-proteininteractions. Proteins do not evolve in a vacuum, and the way in which theintra-protein interactions come together to create the folded, functioningprotein depends intrinsically on the protein-solvent interactions.In chapter 2 we examined some of the experimental literature on protein-solvent interactions. Specifically we looked at how the introduction of acosolute changed the enthalpy and entropy of unfolding. We found thatthe change in unfolding entropy and enthalpy upon the introduction of acosolute was an example of a broad class of phenomena known as entropy-enthalpy compensation, in which the change in free energy is small relativeto the change in both entropy and enthalpy. There has been a long-standingdebate concerning the significance of entropy-enthalpy compensation, and ithas been suggested that the effect is due to highly correlated experimentaluncertainties in entropy and enthalpy. We showed that there are indeedhighly correlated uncertainties, but that the effect is still significant in light113of them, at least for the case of the transfer of various proteins from waterto water with various cosolutes. This is not a general statement aboutall systems which seem to display entropy-enthalpy compensation, though,and wider application of procedures similar to the one we developed hereto measure the uncertainty in entropy and enthalpy could establish whichsystems genuinely display the effect and which have uncertainties so largeas to be unable to determine it.In chapter 3 we continued to examine cosolutes (or osmolytes as we alsorefer to them), this time from the standpoint of developing a general theory.We develop a classical density functional theory for protein-cosolute inter-actions, which assumes that the density of water and cosolute are perfectlycorrelated, in the sense that the total density at any point is fixed, and fur-ther assumes that the cosolute-cosolute interaction energy doesn\u2019t changeupon transfer of the protein. We show that the theory gives good resultsfor a monatomic gas, even in the regime in which the gas is significantlynon-ideal. This theory can reproduce the observed transfer free energies ofmoving side chains from water to water plus urea, and captures the tem-perature dependence of the transfer free energy. We showed that the cDFTtheory developed in chapter 3 reduces to a SASA approach with suitableparameters. Our approach reproduces the general phenomenon of a desol-vation barrier in a natural way as well. Finally we examine how a simplecDFT model can be related to entropy-enthalpy compensation and use thedata from chapter 2 to restrict the parameter space of this model.In chapter 4 we take a step back to examine the transfer problem incDFT in more general terms. We show that the free energy of a particle bathat fixed temperature and pressure is independent of the external potentialprovided the external potential is finite in range\u2013this is the dilute solutelimit in the context of transfer. We discuss the implications of this finding\u2013inparticular the fact that all transfer free energy comes from the solute-solventinteractions\u2013and note its usefulness in working with cDFT.In chapter 5 we develop a classical density functional theory of waterto implement as an implicit solvent. To make the mode fast to execute weassume solvent-solvent interaction can be modelled by a completely local114way. In exchange we allow the local function of \u03c6 to take on any form,subject to conditions to keep the solution physical. We fit the resultingtheory to explicit water simulations. The model shows good agreement withexplicit simulations, while capturing the desolvation barrier and temperaturedependence, and while maintaining speed by requiring that \u03c6 depend onlyon the local potential.Each of these chapters involved different aspects of protein-solvent inter-actions and their theoretical description, particularly within the frameworkof classical density functional theory.6.2 Theoretical Work in cDFTWhile classical density functional theory has a long history, there is stillwork to do on the theoretical end. The functional often described as thestarting point for cDFT looks like this:G =\u222bd3r kBT [\u03c6 ln\u03c6\u2212\u03c6] +V(r) +\u222bd3r\u2032U(r\u2212 r\u2032)g(r\u2212 r\u2032)\u03c6(r)\u03c6(r\u2032) (6.1)But, as we note in the introduction, this cannot be correct; for a hard-sphere model in the absence of an external potential the energy terms arezero, which implies an ideal gas solution no matter what density we take.Thus there must be a multi-particle entropy functional.Entropy functionals, however, have seen remarkably little development.Early efforts by Nettleton and Green[202] involved taking the known form ofthe total density functional from radial distribution functions and reverse-engineering a numeric entropy functional. More recent approaches havelooked at entropy functionals of 3-body and higher correlation terms[203],which limits their applicability to the use of cDFT in practice.One alternative approach which seems promising is to express the en-tropy as an integral series in correlation functions; that is, the total entropywould beS = Sideal +\u222bd3r [p(r)Spresent + (1\u2212 p(r))Sabsent] (6.2)115where Sideal is the ideal gas entropy, p(r) is the probability of finding aparticle at position r, Spresent is the entropy of the system of N \u22121 particlesgiven that a particle is at r, and Sabsent is the entropy of the system giventhat a particle is not at r. This type of approach gives rise to an integralseries reminiscent of the Orstein-Zernike equation.We have not had time to pursue this line of analysis in this thesis, but it isa direction for future research. And whether or not this particular equationbears fruit, the search for analytic expressions for the entropy functionalshould not be abandoned.6.3 Error Analysis in Thermodynamic DataWe developed a technique for determining the uncertainty in fitting heatcapacity or fraction of unfolded data and extracting thermodynamic pa-rameters. This technique is useful, but it does overlook one key source ofuncertainty: the uncertainty in the underlying experiment. That the modeluncertainty is important can be seen from the plots in figures 2.2. For manyregions in those plots the difference between models is well outside the un-certainty calculated by the monte carlo technique. This implies the choiceof model itself brings in a great deal of uncertainty. Given that most of theliterature seems to use the temperature-independent \u2206Cp model withoutchecking if a linear or non-linear \u2206Cp model would give the same results,this is a worrisome finding.The uncertainty in the underlying experiment is more difficult to quan-tify. Looking at table 2.3 we see that, for example, RNase A, different ex-periments gave different values for \u2206H0f ; that is, the enthalpy of unfoldingin the absence of cosolute. The variance in \u2206H0f values is somewhat higherthan the variance in Tf values, which raises the concern that this experi-mental uncertainty contributes to entropy-enthalpy compensation in a waywe have not accounted for. Given the statistical significance of our resultit is unlikely this effect will reverse our conclusion, but this is an importantconsideration for future work. Such work will likely require collaborationwith experimentalists, as literature sources typically quote a single value for116each thermodynamic parameter, rather than a range of values indicative ofthe multiple samples they may or may not have tested.6.4 Implementing the cDFT Implicit SolventWe have developed an implicit solvent model based on classical density func-tional theory. Of course, the development of the model is the first steptowards its adoption by researchers performing molecular dynamics. Im-plementing the model across force-fields and MD software packages is anenormous challenge well outside the scope of this thesis, or indeed even asingle research group. Nonetheless this is the ultimate goal of the project.One concern moving forward is what the target system should be for fit-ting the cDFT parameters. We have used explicit water simulations becausewe had easy access to them and they provide a high level of detail aboutenergies and distribution functions. But the case could be made that exper-imental results would provide a better target system, and lead to a set ofparameters which would not depend on any one forcefield. Experimental re-sults have the disadvantage, though, of not cleanly separating the non-polarand polar contributions to the transfer free energy. One possibility wouldbe to start with experimental transfer free energies and subtract off the pre-dicted polar component (either in generalized Born or Poisson-Bolztman)to obtain quasi-experimental non-polar components of the transfer free en-ergies.We have assumed in this thesis that the cDFT model we developed wouldapply to the non-polar contributions to the transfer free energy. There is noreason a priori why we could not also apply cDFT to the polar contribution\u2013that is, the electrostatic interactions. Doing so would require a significantmodification of the work presented here. The water density in such an ap-proach would no longer be a function of position only but also orientation;\u03c6 = \u03c6(r,\u2126). The solvent-solvent interaction energy would then also be afunction of both molecule separation and the orientation of each molecule,which greatly increases its computational cost. One way around this mightbe to break the density up into two fields: a position dentisy \u03c6(r) and a po-117larization density p(r). Performing two three-dimensional integrals wouldbe faster than one five-dimensional integral, but the accuracy of such anapproximation needs to be assessed. It is not clear whether a purely localapproach such as the one adopted in chapter 5 would continue to be effec-tive. On the other hand, it is not obvious that it will not be effective, andextending the theory to polar interactions is a topic of future interest.6.5 Parting ThoughtsAll science relies on others to truly succeed. Findings left on their own donot advance human knowledge. But work on methods perhaps uniquelyrelies on the adoption of other scientists to bear fruit. In this thesis we haveproposed a new method of looking at transfer free energies. The extent towhich this method contributes to the field is, in one sense, out of our control.Regardless of its intrinsic merits, if it does not resonate with the communityin a way that spurs others to work on implementing it in various forcefieldsand MD packages it will not be widely used and its impact will be small. Onthe other hand if it does find wide implementation it could have a greaterimpact than other models, regardless of their intrinsic merit. In workingwithin the existing framework of decomposing the transfer free energy intopolar and non-polar components, and in targeting the non-polar parts, wehope this model will be easily adaptable to existing MD approaches andtherefore more likely to see wide implementation.If we examine the literature on GB\/SA, for example, after the initialpaper by Qiu et al[48], following papers by various groups parameterized themodel for AMBER[204], CHARMM[205], OPLS[206], and GROMOS[207]forcefields, suggested surface area calculation algorithms, and ported themodel to various MD packages such as GROMACS[208] and NAMD[206].This work underlaid the widespread adoption GB\/SA enjoys today.Moving forward then, the future work outlined in this chapter will bean important component of seeing the work in this thesis fulfill its promise,but equally important will be networking with other researchers to see theseresults implemented in a variety of existing software. We hope our shoulders118provide an ample platform for those that would follow.119Bibliography[1] G Sliwoski, S Kothiwale, J Meiler, and E W Lowe. Computationalmethods in drug discovery. Pharmacological Reviews, 66:334\u2013395,2014.[2] William C Guest. Template-Directed Protein Misfolding inNeurodegenerative Disease. PhD thesis, The University of BritishColumbia, 2012.[3] F Crick. Central dogma of molecular biology. Nature, 227:561, 1970.[4] Andrea Ilari and Carmelinda Savino. Protein structuredetermination by x-ray crystallography. In Jonathan M Keith,editor, Bioinformatics, volume 452 of Methods in Molecular Biology,pages 63\u201387. Humana Press, 2008. ISBN 978-1-58829-707-5.[5] M Billeter, G Wagner, and K W uthrich. Solution nmr structuredetermination of proteins revisisted. Journal of Biomolecular NMR,42:155\u2013158, 2008.[6] Chengsong Liu, Dwayne Chu, Rhonda D. Wideman, R. ScottHouliston, Hannah J. Wong, and Elizabeth M. Meiering.Thermodynamics of denaturation of hisactophilin, a \u03b2-trefoil protein.Biochemistry, 40(13):3817\u20133827, 2001. doi: 10.1021\/bi002609i. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/bi002609i. PMID:11300762.[7] Valerij Ya. Grinberg, Natalia V. Grinberg, Tatiana V. Burova,Michele Dalgalarrondo, and Thomas Haertle\u00b4. Ethanol-inducedconformational transitions in holo--lactalbumin: Spectral andcalorimetric studies. Biopolymers, 46(4):253\u2013265, 1998. ISSN1097-0282. doi:10.1002\/(SICI)1097-0282(19981005)46:4\u3008253::AID-BIP7\u30093.0.CO;2-O.120URLhttp:\/\/dx.doi.org\/10.1002\/(SICI)1097-0282(19981005)46:4<253::AID-BIP7>3.0.CO;2-O.[8] WF Claussen and MF Polglase. Solubilities and structures inaqueous aliphatic hydrocarbon solutions. Journal of the AmericanChemical Society, 74(19):4817\u20134819, 1952.[9] R. Aveyard and A. S. C. Lawrence. Calorimetric studies onn-aliphatic alcohol + water and n-aliphatic alcohol + waterdetergent systems. Trans. Faraday Soc., 60:2265\u20132278, 1964. doi:10.1039\/TF9646002265.[10] E. M. Arnett and D. R. McKelvey. In J. F. Coetzee and C. D.Ritchie, editors, Solute-Solvent Interactions, chapter 6, pages344\u2013395. Dekker, New York, 1969.[11] P. L. Privalov and S. J. Gill. Stability of protein-structure andhydrophobic interaction. Adv. Prot. Chem., 39:191\u2013234, 1988.[12] Rufus Lumry and Shyamala Rajender. Enthalpyentropycompensation phenomena in water solutions of proteins and smallmolecules: A ubiquitous properly of water. Biopolymers, 9(10):1125\u20131227, 1970. ISSN 1097-0282. doi: 10.1002\/bip.1970.360091002.URL http:\/\/dx.doi.org\/10.1002\/bip.1970.360091002.[13] Jack D. Dunitz. Win some, lose some: enthalpy-entropycompensation in weak intermolecular interactions. Chemistry andBiology, 2(11):709 \u2013 712, 1995. ISSN 1074-5521. doi:http:\/\/dx.doi.org\/10.1016\/1074-5521(95)90097-7. URL http:\/\/www.sciencedirect.com\/science\/article\/pii\/1074552195900977.[14] Hong Qian and J. J. Hopfield. Entropyenthalpy compensation:Perturbation and relaxation in thermodynamic systems. The Journalof Chemical Physics, 105(20):9292\u20139298, 1996. doi:http:\/\/dx.doi.org\/10.1063\/1.472728.[15] Kim Sharp. Entropyenthalpy compensation: Fact or artifact?Protein Science, 10(3):661\u2013667, 2001. ISSN 1469-896X. doi:10.1110\/ps.37801. URL http:\/\/dx.doi.org\/10.1110\/ps.37801.[16] Andrew T Fenley, Hari S Muddana, and Michael K Gilson.Entropy\u2013enthalpy transduction caused by conformational shifts can121obscure the forces driving protein\u2013ligand binding. Proceedings of theNational Academy of Sciences, 109(49):20006\u201320011, 2012.[17] John D. Chodera and David L. Mobley. Entropy-enthalpycompensation: Role and ramifications in biomolecular ligandrecognition and design. Annual Review of Biophysics, 42(1):121\u2013142,2013. doi: 10.1146\/annurev-biophys-083012-130318. URLhttp:\/\/www.annualreviews.org\/doi\/abs\/10.1146\/annurev-biophys-083012-130318. PMID: 23654303.[18] Emilio Gallicchio, Masahito Mogami Kubo, and Ronald M. Levy.Entropyenthalpy compensation in solvation and ligand bindingrevisited. Journal of the American Chemical Society, 120(18):4526\u20134527, 1998. doi: 10.1021\/ja974061h.[19] A. Ben-Naim and Y. Marcus. Solvation thermodynamics of nonionicsolutes. J. Chem. Phys., 81(4):2016\u20132027, 1984. doi:http:\/\/dx.doi.org\/10.1063\/1.447824.[20] Remo Perozzo, Gerd Folkers, and Leonardo Scapozza.Thermodynamics of proteinligand interactions: History, presence,and future aspects. Journal of Receptors and Signal Transduction, 24(1-2):1\u201352, 2004. doi: 10.1081\/RRS-120037896. URL http:\/\/informahealthcare.com\/doi\/abs\/10.1081\/RRS-120037896.PMID: 15344878.[21] Virginie Lafont, Anthony A. Armstrong, Hiroyasu Ohtaka, YoshiakiKiso, L. Mario Amzel, and Ernesto Freire. Compensating enthalpicand entropic changes hinder binding affinity optimization. ChemicalBiology and Drug Design, 69(6):413\u2013422, 2007. ISSN 1747-0285. doi:10.1111\/j.1747-0285.2007.00519.x. URLhttp:\/\/dx.doi.org\/10.1111\/j.1747-0285.2007.00519.x.[22] Vijay M. Krishnamurthy, Brooks R. Bohall, Vincent Semetey, andGeorge M. Whitesides. The paradoxical thermodynamic basis for theinteraction of ethylene glycol, glycine, and sarcosine chains withbovine carbonic anhydrase ii: an unexpected manifestation ofenthalpy\/entropy compensation. Journal of the American ChemicalSociety, 128(17):5802\u20135812, 2006. doi: 10.1021\/ja060070r. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/ja060070r. PMID:16637649.122[23] Derek R. Dee, Yasumi Horimoto, and Rickey Y. Yada. Conservedprosegment residues stabilize a late-stage folding transition state ofpepsin independently of ground states. PLoS ONE, 9(7):e101339, 072014. doi: 10.1371\/journal.pone.0101339.[24] Andrew J. Baldwin, Tuomas P. J. Knowles, Gian Gaetano Tartaglia,Anthony W. Fitzpatrick, Glyn L. Devlin, Sarah Lucy Shammas,Christopher A. Waudby, Maria F. Mossuto, Sarah Meehan, Sally L.Gras, John Christodoulou, Spencer J. Anthony-Cahill, Paul D.Barker, Michele Vendruscolo, and Christopher M. Dobson.Metastability of native proteins and the phenomenon of amyloidformation. Journal of the American Chemical Society, 133(36):14160\u201314163, 2011. doi: 10.1021\/ja2017703. PMID: 21650202.[25] S. S. Plotkin and J. N. Onuchic. Understanding protein folding withenergy landscape theory i: Basic concepts. Quart. Rev. Biophys., 35(2):111\u2013167, 2002.[26] R. R. Krug, W. G. Hunter, and R. A. Grieger. Statisticalinterpretation of enthalpy-entropy compensation. Nature, 261:566\u2013567, 1976. doi: 10.1038\/261566a0. URLhttp:\/\/dx.doi.org\/10.1038\/261566a0.[27] E. Gallicchio, M. M. Kubo, and R. M. Levy. Enthalpy entropy andcavity decomposition of alkane hydration free energies: Numericalresults and implications for theories of hydrophobic solvation. TheJournal of Physical Chemistry B, 104(26):6271\u20136285, 2000. doi:10.1021\/jp0006274. URL http:\/\/dx.doi.org\/10.1021\/jp0006274.[28] Erik Persson and Bertil Halle. Cell water dynamics on multiple timescales. Proc. Natl. Acad. Sci. U. S. A., 105(17):6266\u20136271, 2008. doi:10.1073\/pnas.0709585105.[29] PH Yancey, ME Clark, SC Hand, RD Bowlus, and GN Somero.Living with water stress: evolution of osmolyte systems. Science, 217(4566):1214\u20131222, 1982. doi: 10.1126\/science.7112124. URLhttp:\/\/www.sciencemag.org\/content\/217\/4566\/1214.abstract.[30] R. John Ellis and Allen P. Minton. Protein aggregation in crowdedenvironments. Biol. Chem., 387:485\u2013497, 2006.[31] Huan-Xiang Zhou, Germa\u00b4n Rivas, and Allen P. Minton.Macromolecular crowding and confinement: Biochemical,123biophysical, and potential physiological consequences. Annu. Rev.Biophys., 37(1):375\u2013397, 2008.[32] Allen P. Minton. The influence of macromolecular crowding andmacromolecular confinement on biochemical reactions inphysiological media. J. Biol. Chem., 276:10577\u201310580, 2001.[33] L. Ellgaard and A. Helenius. Quality control in the endoplasmicreticulum. Nat. Rev. Mol. Cell. Biol., 4:181\u2013191, 2003.[34] Deepak R. Canchi, Dietmar Paschek, and Angel E. Garc\u00b4\u0131a.Equilibrium study of protein denaturation by urea. J. Am. Chem.Soc., 132(7):2338\u20132344, 2010. doi: 10.1021\/ja909348c. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/ja909348c.[35] A. Linhananta, S. Hadizadeh, and S. S. Plotkin. An effective solventtheory connecting the underlying mechanisms of osmolytes anddenaturants for protein stability. Biophys. J., 100:459\u2013468, 2011.[36] Miriam Friedel, Daniel J. Sheeler, and Joan-Emma Shea. Effects ofconfinement and crowding on the thermodynamics and kinetics offolding of a minimalist beta-barrel protein. J. Chem. Phys., 118(17):8106\u20138113, 2003. doi: 10.1063\/1.1564048.[37] M. S. Cheung, D. Klimov, and D. Thirumalai. Molecular crowdingenhances native state stability and refolding rates of globularproteins. Proc. Natl. Acad. Sci. U. S. A., 102:4753 \u2013 4758, 2005.[38] C. Ionescu-Tirgoviste and F. Despa. Biophysical alteration of thesecretory track in \u03b2-cells due to molecular overcrowding: therelevance for diabetes. Integr. Biol., 3:173\u2013179, 2011.[39] A. P. Minton. Implications of macromolecular crowding for proteinassembly. Curr. Opin. Struct. Biol., 10:34\u201339, 2000.[40] M. Wojciechowski and M. Cieplak. Effects of confinement andcrowding on folding of model proteins. BioSystems, 94:248\u2013252, 2008.[41] William L. Jorgensen, Jayaraman Chandrasekhar, Jeffry D. Madura,Roger W. Impey, and Michael L. Klein. Comparison of simplepotential functions for simulating liquid water. The Journal ofChemical Physics, 79(2):926\u2013935, 1983. doi: 10.1063\/1.445869.124[42] K. Vanommeslaeghe, E. Hatcher, C. Acharya, S. Kundu, S. Zhong,J. Shim, E. Darian, O. Guvench, P. Lopes, I. Vorobyov, and A. D.Mackerell. Charmm general force field: A force field for drug-likemolecules compatible with the charmm all-atom additive biologicalforce fields. Journal of Computational Chemistry, 31(4):671\u2013690,2010. ISSN 1096-987X. doi: 10.1002\/jcc.21367. URLhttp:\/\/dx.doi.org\/10.1002\/jcc.21367.[43] Christopher J. Cramer and Donald G. Truhlar. Implicit solvationmodels: equilibria, structure, spectra, and dynamics. ChemicalReviews, 99(8):2161\u20132200, 1999. doi: 10.1021\/cr960149m. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/cr960149m.[44] Jianhan Chen, Charles L Brooks III, and Jana Khandogin. Recentadvances in implicit solvent-based methods for biomolecularsimulations. Current Opinion in Structural Biology, 18(2):140 \u2013 148,2008. ISSN 0959-440X. doi:http:\/\/dx.doi.org\/10.1016\/j.sbi.2008.01.003. URL http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0959440X08000079.Theory and simulation \/ Macromolecular assemblages.[45] Christopher J Cramer and Donald G Truhlar. Implicit solvationmodels: equilibria, structure, spectra, and dynamics. Chem.Rev.-Columbus, 99(8):2161, 1999.[46] Nathan A. Baker, David Sept, Simpson Joseph, Michael J. Holst,and J. Andrew McCammon. Electrostatics of nanosystems:Application to microtubules and the ribosome. Proc. Natl. Acad. Sci.USA, 98(18):10037\u201310041, 2001. doi: 10.1073\/pnas.181342398.[47] Wonpil Im, Dmitrii Beglov, and Benot Roux. Continuum solvationmodel: Computation of electrostatic forces from numerical solutionsto the poisson-boltzmann equation. Computer PhysicsCommunications, 111(13):59 \u2013 75, 1998. ISSN 0010-4655. doi:http:\/\/dx.doi.org\/10.1016\/S0010-4655(98)00016-2. URLhttp:\/\/www.sciencedirect.com\/science\/article\/pii\/S0010465598000162.[48] Di Qiu, Peter S. Shenkin, Frank P. Hollinger, and W. Clark Still. Thegb\/sa continuum model for solvation. a fast analytical method forthe calculation of approximate born radii. The Journal of Physical125Chemistry A, 101(16):3005\u20133014, 1997. doi: 10.1021\/jp961992r.URL http:\/\/pubs.acs.org\/doi\/abs\/10.1021\/jp961992r.[49] Jiang Zhu, Yunyu Shi, and Haiyan Liu. Parametrization of ageneralized born\/solvent-accessible surface area model andapplications to the simulation of protein dynamics. The Journal ofPhysical Chemistry B, 106(18):4844\u20134853, 2002. doi:10.1021\/jp020058v. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/jp020058v.[50] Jennifer L. Knight and Charles L. Brooks. Surveying implicit solventmodels for estimating small molecule absolute hydration freeenergies. Journal of Computational Chemistry, 32(13):2909\u20132923,2011. ISSN 1096-987X. doi: 10.1002\/jcc.21876. URLhttp:\/\/dx.doi.org\/10.1002\/jcc.21876.[51] D Hawkins, C Cramer, and D Truhlar. Parametrized models ofaqueous free energies of solvation based on pairwise descreening ofsolute atomic charges from a dielectric medium. Journal of PhysicalChemistry, 100:19824\u201319839, 1996.[52] A Onufriev, D Bashford, and D Case. Exploring protein native statesand large-scale conformational changes with a modified generalizedborn model. PROTEINS: Structure, Function, and Genetics, 55:383\u2013394, 2004.[53] B. Lee and F.M. Richards. The interpretation of protein structures:estimation of static accessibility. J. Mol. Biol., 55:379\u2013400, 1971.[54] D. Eisenberg and A. D. McLachlan. Solvation energy in proteinfolding and binding. Nature, 319:199\u2013203, 1986.[55] Chunhu Tan, Yu-Hong Tan, and Ray Luo. Implicit nonpolar solventmodels. The Journal of Physical Chemistry B, 111(42):12263\u201312274,2007. doi: 10.1021\/jp073399n. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/jp073399n.[56] Jianhan Chen and Charles L. Brooks. Critical importance oflength-scale dependence in implicit modeling of hydrophobicinteractions. Journal of the American Chemical Society, 129(9):2444\u20132445, 2007. doi: 10.1021\/ja068383+. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/ja068383.126[57] Allison Ferguson, Zhirong Liu, and Hue Sun Chan. Desolvationbarrier effects are a likely contributor to the remarkable diversity inthe folding rates of small proteins. Journal of Molecular Biology, 389(3):619 \u2013 636, 2009.[58] Urs Haberthu\u00a8r and Amedeo Caflisch. Facts: Fast analyticalcontinuum treatment of solvation. Journal of ComputationalChemistry, 29(5):701\u2013715, 2008. ISSN 1096-987X. doi:10.1002\/jcc.20832. URL http:\/\/dx.doi.org\/10.1002\/jcc.20832.[59] Jane R. Allison, Katharina Boguslawski, Franca Fraternali, andWilfred F. van Gunsteren. A refined, efficient mean solvation forcemodel that includes the interior volume contribution. The Journal ofPhysical Chemistry B, 115(15):4547\u20134557, 2011.[60] Kunitsugu Soda. Solvent exclusion effect predicted by the scaledparticle theory as an important factor of the hydrophobic effect. J.Phys. Soc. Jpn., 62(5):1782\u20131793, 1993.[61] A.J. Saunders, P. R. Davis-Searles, D. L. Allen, G. J. Pielak, , andD. A. Erie. Osmolyte-induced changes in protein conformationequilibria. Biopolymers, 53:293\u2013307, 2000.[62] John A. Schellman. Protein stability in mixed solvents: A balance ofcontact interaction and excluded volume. Biophys. J., 85(1):108\u2013125,2003.[63] Takashi Imai, Yuichi Harano, Masahiro Kinoshita, AndriyKovalenko, and Fumio Hirata. Theoretical analysis on changes inthermodynamic quantities upon protein folding: Essential role ofhydration. J. Chem. Phys., 126:225102, 2007.[64] Jason A. Wagoner and Nathan A. Baker. Assessing implicit modelsfor nonpolar mean solvation forces: The importance of dispersionand volume terms. Proc. Natl. Acad. Sci. U. S. A., 103(22):8331\u20138336, 2006. doi: 10.1073\/pnas.0600118103.[65] Tyler Luchko, Sergey Gusarov, Daniel R. Roe, Carlos Simmerling,David A. Case, Jack Tuszynski, and Andriy Kovalenko.Three-dimensional molecular theory of solvation coupled withmolecular dynamics in amber. Journal of Chemical Theory andComputation, 6(3):607\u2013624, 2010. doi: 10.1021\/ct900460m. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/ct900460m.127[66] David Chandler and Hans C. Andersen. Optimized clusterexpansions for classical fluids. ii. theory of molecular liquids. TheJournal of Chemical Physics, 57(5):1930\u20131937, 1972. doi:http:\/\/dx.doi.org\/10.1063\/1.1678513. URL http:\/\/scitation.aip.org\/content\/aip\/journal\/jcp\/57\/5\/10.1063\/1.1678513.[67] Andriy Kovalenko and Fumio Hirata. Self-consistent description of ametalwater interface by the kohnsham density functional theory andthe three-dimensional reference interaction site model. The Journalof Chemical Physics, 110(20):10095\u201310112, 1999. doi:http:\/\/dx.doi.org\/10.1063\/1.478883. URL http:\/\/scitation.aip.org\/content\/aip\/journal\/jcp\/110\/20\/10.1063\/1.478883.[68] Andreas Vitalis and Rohit V. Pappu. Absinth: A new continuumsolvation model for simulations of polypeptides in aqueous solutions.Journal of Computational Chemistry, 30(5):673\u2013699, 2009. ISSN1096-987X. doi: 10.1002\/jcc.21005. URLhttp:\/\/dx.doi.org\/10.1002\/jcc.21005.[69] W. Kohn and L. J. Sham. Self-consistent equations includingexchange and correlation effects. Phys. Rev., 140:A1133\u2013A1138, 1965.[70] T Van Mourik, M B uhl, and M-P Gaigeot. Density functionaltheory across chemistry, physics and biology. PhilosophicalTransactions Series A, Mathematical, Physical, and EngineeringSciences, 372, 2014.[71] Kieron Burke, Jan Werschnik, and E. K. U. Gross. Time-dependentdensity functional theory: Past, present, and future. The Journal ofChemical Physics, 123(6):062206, 2005.[72] N David Mermin. Thermal properties of the inhomogeneous electrongas. Phys. Rev., 137(5A):A1441, 1965.[73] C. Ebner, W. F. Saam, and D. Stroud. Density-functional theory ofsimple classical fluids. i. surfaces. Phys. Rev. A, 14:2264\u20132273, Dec1976. doi: 10.1103\/PhysRevA.14.2264.[74] R Evans. The nature of the liquid-vapour interface and other topicsin the statistical mechanics of non-uniform, classical fluids. Adv.Phys., 28(2):143\u2013200, 1979.128[75] David Chandler, John D. McCoy, and Sherwin J. Singer. Densityfunctional theory of nonuniform polyatomic systems. i. generalformulation. J. Chem. Phys., 85(10):5971\u20135976, 1986. doi:10.1063\/1.451510. URLhttp:\/\/link.aip.org\/link\/?JCP\/85\/5971\/1.[76] R. Evans. Density functionals in the theory of inhomogeneous fluids.In D. Henderson, editor, Fundamentals of inhomogeneous fluids.Dekker, New York, 1992.[77] T. Biben, J. P. Hansen, and Y. Rosenfeld. Generic density functionalfor electric double layers in a molecular solvent. Phys. Rev. E, 57:R3727\u2013R3730, Apr 1998. doi: 10.1103\/PhysRevE.57.R3727.[78] John M. Richardson. Variational theory of the radial distributionfunction. The Journal of Chemical Physics, 23(12):2304\u20132308, 1955.doi: http:\/\/dx.doi.org\/10.1063\/1.1740743. URL http:\/\/scitation.aip.org\/content\/aip\/journal\/jcp\/23\/12\/10.1063\/1.1740743.[79] Jianzhong Wu. Density functional theory for chemical engineering:From capillarity to soft materials. American Institute of ChemicalEngineers Journal, 52:1169, 2006.[80] Christopher P. Emborsky, Zhengzheng Feng, Kenneth R. Cox, andWalter G. Chapman. Recent advances in classical density functionaltheory for associating and polyatomic molecules. Fluid PhaseEquilibria, 306(1):15 \u2013 30, 2011. ISSN 0378-3812. doi:http:\/\/dx.doi.org\/10.1016\/j.fluid.2011.02.007. URL http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0378381211000653.[81] Akira R. Kinjo and Shoji Takada. Effects of macromolecularcrowding on protein folding and aggregation studied by densityfunctional theory: statics. Phys. Rev. E, 66:031911, Sep 2002. doi:10.1103\/PhysRevE.66.031911. URLhttp:\/\/link.aps.org\/doi\/10.1103\/PhysRevE.66.031911.[82] Daniel Borgis, Lionel Gendre, and Rosa Ramirez. Molecular densityfunctional theory: Application to solvation and electron-transferthermodynamics in polar solvents. The Journal of PhysicalChemistry B, 116(8):2504\u20132512, 2012. doi: 10.1021\/jp210817s.129[83] S. S. Plotkin and J. N. Onuchic. Investigation of routes and funnelsin protein folding by free energy functional methods. Proc. Natl.Acad. Sci. USA, 97:6509\u20136514, 2000.[84] S. S. Plotkin and J. N. Onuchic. Structural and energeticheterogeneity in protein folding i: Theory. J. Chem. Phys., 116(12):5263\u20135283, 2002.[85] Y. Singh, J. P. Stoessel, and P. G. Wolynes. Hard-sphere glass andthe density-functional theory of aperiodic crystals. Phys. Rev. Lett.,54:1059\u20131062, Mar 1985. doi: 10.1103\/PhysRevLett.54.1059. URLhttp:\/\/link.aps.org\/doi\/10.1103\/PhysRevLett.54.1059.[86] T. R. Kirkpatrick and P. G. Wolynes. Connections between somekinetic and equilibrium theories of the glass transition. Phys. Rev. A,35(7):3072\u20133080, 1987.[87] Randall W. Hall and Peter G. Wolynes. The aperiodic crystalpicture and free energy barriers in glasses. J. Chem. Phys., 86(5):2943\u20132948, 1987. doi: 10.1063\/1.452045. URLhttp:\/\/link.aip.org\/link\/?JCP\/86\/2943\/1.[88] X. Y. Xia and P. G. Wolynes. Fragilities of liquids predicted fromthe random first order transition theory of glasses. Proc. Natl. Acad.Sci. U. S. A., 97:2990\u20132994, 2000.[89] Xiaoyu Xia and Peter G. Wolynes. Microscopic theory ofheterogeneity and nonexponential relaxations in supercooled liquids.Phys. Rev. Lett., 86:5526\u20135529, Jun 2001. doi:10.1103\/PhysRevLett.86.5526. URLhttp:\/\/link.aps.org\/doi\/10.1103\/PhysRevLett.86.5526.[90] Jacob D Stevenson, Jo\u00a8rg Schmalian, and Peter G Wolynes. Theshapes of cooperatively rearranging regions in glass-forming liquids.Nat. Phys., 2(4):268\u2013274, 2006.[91] Vassiliy Lubchenko and Peter G. Wolynes. Theory of structuralglasses and supercooled liquids. Annu. Rev. Phys. Chem., 58(1):235\u2013266, 2007. doi: 10.1146\/annurev.physchem.58.032806.104653.[92] B. A. Shoemaker, J. Wang, and P. G. Wolynes. Structuralcorrelations in protein folding funnels. Proc. Nat. Acad. Sci. U. S.A., 94:777\u2013782, 1997.130[93] B. A. Shoemaker, J. Wang, and P. G. Wolynes. Exploring structuresin protein folding funnels with free energy functionals: the transitionstate ensemble. J. Mol. Biol., 287:675\u2013694, 1999.[94] J. J. Portman, S. Takada, and P. G. Wolynes. Microscopic theory ofprotein folding rates. I. Fine structure of the free energy profile andfolding routes from a variational approach. J. Chem. Phys., 114:5069\u20135081, 2001.[95] T. K. Vanderlick, L. E. Scriven, and H. T. Davis. Molecular theoriesof confined fluids. The Journal of Chemical Physics, 90(4):2422\u20132436, 1989. doi: http:\/\/dx.doi.org\/10.1063\/1.455985. URLhttp:\/\/scitation.aip.org\/content\/aip\/journal\/jcp\/90\/4\/10.1063\/1.455985.[96] Yu Chen Shen and David W Oxtoby. Density functional theory ofcrystal growth: Lennard-jones fluids. Journal of Chemical Physics,104:4233, 1996.[97] Guillaume Jeanmairet, Maximilien Levesque, Rodolphe Vuilleumier,and Daniel Borgis. Molecular density functional theory of water.Journal of Physical Chemistry Letters, 4:619\u2013624, 2013.[98] Guillaume Jeanmairet, Maximilien Levesque, and Daniel Borgis.Molecular density functional theory of water describinghydrophobicity at short and long length scales. Journal of ChemicalPhysics, 139:154101, 2013.[99] Sander Pronk, Szilrd Pll, Roland Schulz, Per Larsson, Pr Bjelkmar,Rossen Apostolov, Michael R. Shirts, Jeremy C. Smith, Peter M.Kasson, David van der Spoel, Berk Hess, and Erik Lindahl. Gromacs4.5: a high-throughput and highly parallel open source molecularsimulation toolkit. Bioinformatics, 29(7):845\u2013854, 2013. doi:10.1093\/bioinformatics\/btt055. URL http:\/\/bioinformatics.oxfordjournals.org\/content\/29\/7\/845.abstract.[100] Dmitry N. Ivankov and Alexei V. Finkelstein. Prediction of proteinfolding rates from the amino acid sequence-predicted secondarystructure. Proceedings of the National Academy of Sciences of theUnited States of America, 101(24):8942\u20138944, 2004. doi:10.1073\/pnas.0402659101. URLhttp:\/\/www.pnas.org\/content\/101\/24\/8942.abstract.131[101] David A. Sivak, John D. Chodera, and Gavin E. Crooks. Usingnonequilibrium fluctuation theorems to understand and correcterrors in equilibrium and nonequilibrium simulations of discretelangevin dynamics. Phys. Rev. X, 3:011007, Jan 2013. doi:10.1103\/PhysRevX.3.011007. URLhttp:\/\/link.aps.org\/doi\/10.1103\/PhysRevX.3.011007.[102] Yaoqi Zhou, Martin Karplus, Keith D. Ball, and R. Stephen Berry.The distance fluctuation criterion for melting: Comparison ofsquare-well and morse potential models for clusters andhomopolymers. The Journal of Chemical Physics, 116(5), 2002.[103] R. A. Buckingham. The classical equation of state of gaseous helium,neon and argon. Proceedings of the Royal Society of London A:Mathematical, Physical and Engineering Sciences, 168(933):264\u2013283,1938. ISSN 0080-4630. doi: 10.1098\/rspa.1938.0173.[104] B. R. Brooks, C. L. Brooks, III, A. D. Mackerell, Jr., L. Nilsson,R. J. Petrella, B. Roux, Y. Won, G. Archontis, C. Bartels,S. Boresch, A. Caflisch, L. Caves, Q. Cui, A. R. Dinner, M. Feig,S. Fischer, J. Gao, M. Hodoscek, W. Im, K. Kuczera, T. Lazaridis,J. Ma, V. Ovchinnikov, E. Paci, R. W. Pastor, C. B. Post, J. Z. Pu,M. Schaefer, B. Tidor, R. M. Venable, H. L. Woodcock, X. Wu,W. Yang, D. M. York, and M. Karplus. CHARMM: TheBiomolecular Simulation Program. JOURNAL OFCOMPUTATIONAL CHEMISTRY, 30(10, Sp. Iss. SI):1545\u20131614,JUL 30 2009.[105] M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A.Robb, J. R. Cheeseman, J. A. Montgomery, Jr., T. Vreven, K. N.Kudin, J. C. Burant, J. M. Millam, S. S. Iyengar, J. Tomasi,V. Barone, B. Mennucci, M. Cossi, G. Scalmani, N. Rega, G. A.Petersson, H. Nakatsuji, M. Hada, M. Ehara, K. Toyota, R. Fukuda,J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai,M. Klene, X. Li, J. E. Knox, H. P. Hratchian, J. B. Cross,V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann,O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski,P. Y. Ayala, K. Morokuma, G. A. Voth, P. Salvador, J. J.Dannenberg, V. G. Zakrzewski, S. Dapprich, A. D. Daniels, M. C.Strain, O. Farkas, D. K. Malick, A. D. Rabuck, K. Raghavachari,J. B. Foresman, J. V. Ortiz, Q. Cui, A. G. Baboul, S. Clifford,132J. Cioslowski, B. B. Stefanov, G. Liu, A. Liashenko, P. Piskorz,I. Komaromi, R. L. Martin, D. J. Fox, T. Keith, M. A. Al-Laham,C. Y. Peng, A. Nanayakkara, M. Challacombe, P. M. W. Gill,B. Johnson, W. Chen, M. W. Wong, C. Gonzalez, and J. A. Pople.Gaussian 03, Revision C.02. Gaussian, Inc., Wallingford, CT, 2004.[106] William L. Jorgensen, David S. Maxwell, and Julian Tirado-Rives.Development and testing of the opls all-atom force field onconformational energetics and properties of organic liquids. Journalof the American Chemical Society, 118(45):11225\u201311236, 1996.[107] Wendy D. Cornell, Piotr Cieplak, Christopher I. Bayly, Ian R.Gould, Kenneth M. Merz, David M. Ferguson, David C. Spellmeyer,Thomas Fox, James W. Caldwell, and Peter A. Kollman. A secondgeneration force field for the simulation of proteins, nucleic acids,and organic molecules. Journal of the American Chemical Society,117(19):5179\u20135197, 1995. doi: 10.1021\/ja00124a002.[108] Nathan Schmid, AndreasP. Eichenberger, Alexandra Choutko,Sereina Riniker, Moritz Winger, AlanE. Mark, and WilfredF. vanGunsteren. Definition and testing of the gromos force-field versions54a7 and 54b7. European Biophysics Journal, 40(7):843\u2013856, 2011.ISSN 0175-7571. doi: 10.1007\/s00249-011-0700-9.[109] Valentina Tozzini. Coarse-grained models for proteins. CurrentOpinion in Structural Biology, 15(2):144 \u2013 150, 2005. ISSN0959-440X. doi: http:\/\/dx.doi.org\/10.1016\/j.sbi.2005.02.005. URLhttp:\/\/www.sciencedirect.com\/science\/article\/pii\/S0959440X05000515. Theory and simulation\/Macromolecularassemblages.[110] N Go\u00af. Theoretical studies of protein folding. Annual Review ofBiophysics and Bioengineering, 12(1):183\u2013210, 1983. doi:10.1146\/annurev.bb.12.060183.001151. URLhttp:\/\/dx.doi.org\/10.1146\/annurev.bb.12.060183.001151.PMID: 6347038.[111] Aram Davtyan, Nicholas P. Schafer, Weihua Zheng, Cecilia Clementi,Peter G. Wolynes, and Garegin A. Papoian. Awsem-md: Proteinstructure prediction using coarse-grained physical potentials andbioinformatically based local structure biasing. The Journal ofPhysical Chemistry B, 116(29):8494\u20138503, 2012. doi:13310.1021\/jp212541y. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/jp212541y.[112] Daniel L. Ensign, Peter M. Kasson, and Vijay S. Pande.Heterogeneity even at the speed limit of folding: Large-scalemolecular dynamics study of a fast-folding variant of the villinheadpiece. Journal of Molecular Biology, 374(3):806 \u2013 816, 2007.ISSN 0022-2836. doi: http:\/\/dx.doi.org\/10.1016\/j.jmb.2007.09.069.URL http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0022283607012685.[113] D.E. Shaw, R.O. Dror, J.K. Salmon, J.P. Grossman, K.M.Mackenzie, J.A. Bank, C. Young, M.M. Deneroff, B. Batson, K.J.Bowers, E. Chow, M.P. Eastwood, D.J. Ierardi, J.L. Klepeis, J.S.Kuskin, R.H. Larson, K. Lindorff-Larsen, P. Maragakis, M.A.Moraes, S. Piana, Y. Shan, and B. Towles. Millisecond-scalemolecular dynamics simulations on anton. In High PerformanceComputing Networking, Storage and Analysis, Proceedings of theConference on, pages 1\u201311, Nov 2009. doi: 10.1145\/1654059.1654126.[114] John G. Kirkwood. Statistical mechanics of fluid mixtures. TheJournal of Chemical Physics, 3(5):300\u2013313, 1935. doi:10.1063\/1.1749657. URLhttp:\/\/link.aip.org\/link\/?JCP\/3\/300\/1.[115] Miguel Jorge, Nuno M. Garrido, Antonio J. Queimada, Ioannis G.Economou, and Eugenia A. Macedo. Effect of the integrationmethod on the accuracy and computational efficiency of free energycalculations using thermodynamic integration. Journal of ChemicalTheory and Computation, 6(4):1018\u20131027, 2010. doi:10.1021\/ct900661c.[116] Charles H. Bennett. Efficient estimation of free energy differencesfrom Monte Carlo data. J Comput Phys, 22:245\u2013268, 1976.[117] C. Jarzynski. Nonequilibrium equality for free energy differences.Phys. Rev. Lett., 78:2690\u20132693, Apr 1997.[118] Jeff Gore, Felix Ritort, and Carlos Bustamante. Bias and error inestimates of equilibrium free-energy differences from nonequilibriummeasurements. Proceedings of the National Academy of Sciences, 100(22):12564\u201312569, 2003.134[119] Marc Souaille and Benoit Roux. Extension to the weightedhistogram analysis method: combining umbrella sampling with freeenergy calculations. Computer Physics Communications, 135(1):40 \u201357, 2001. ISSN 0010-4655. doi:http:\/\/dx.doi.org\/10.1016\/S0010-4655(00)00215-0. URLhttp:\/\/www.sciencedirect.com\/science\/article\/pii\/S0010465500002150.[120] C. B. Anfinsen and H. A. Scheraga. Experimental and theoreticalaspects of protein folding. In Advances in Protein Chemistry,volume 29, pages 205\u2013301, New York, 1975. Academic Press.[121] Sophie E. Jackson and Alan R. Fersht. Folding of chymotrypsininhibitor 2. 1. Evidence for a two state transition. Biochemistry, 30:10428\u201310435, 1991.[122] G. I. Makhatadze and P. L. Privalov. Energetics of protein structure.Adv. Protein Chem., 47:307\u2013425, 1995.[123] P.L. Privalov, E.I. Tiktopulo, S.Yu. Venyaminov, Yu.V. Griko, G.I.Makhatadze, and N.N. Khechinashvili. Heat capacity andconformation of proteins in the denatured state. Journal ofMolecular Biology, 205(4):737 \u2013 750, 1989. doi:http:\/\/dx.doi.org\/10.1016\/0022-2836(89)90318-5.[124] K. A. Dill, S. Bromberg, K. Yue, K. M. Fiebig, D. P. Yee, P. D.Thomas, and H. S. Chan. Principles of protein folding\u2014Aperspective from simple exact models. Protein Science, 4:561\u2013602,1995.[125] S. S. Plotkin and J. N. Onuchic. Understanding protein folding withenergy landscape theory ii: Quantitative aspects. Quart. Rev.Biophys., 35(3):205\u2013286, 2002.[126] Christopher D. Snow, Houbi Nguyen, Vijay S. Pande, and MartinGruebele. Absolute comparison of simulated and experimentalprotein-folding dynamics. Nature, 420(6):102\u2013106, 2002.[127] Thomas R Weikl, Matteo Palassini, and Ken A Dill. Cooperativityin two-state protein folding kinetics. Protein Science, 13(3):822\u2013829,2004.135[128] M. R. Ejtehadi, S. P. Avall, and S. S. Plotkin. Three-bodyinteractions improve the prediction of rate and mechanism in proteinfolding models. Proc. Natl. Acad. Sci. USA, 101(42):15088\u201315093,2004.[129] Elizabeth Rhoades, Mati Cohen, Benjamin Schuler, and Gilad Haran.Two-state folding observed in individual protein molecules. Journalof the American Chemical Society, 126(45):14686\u201314687, 2004.[130] Beatriz Ibarra-Molero and Jose Manuel Sanchez-Ruiz. Statisticaldifferential scanning calorimetry: Probing protein folding-unfoldingensembles. In Victor Mun\u02dcoz, editor, Protein folding, misfolding andaggregation: Classical themes and novel approaches, RSCBiomolecular Sciences, pages 85\u2013103. Royal Society of Chemistry,Cambridge, UK, 2008.[131] Douglas Poland. Empirical protein partition functions. The Journalof Physical Chemistry B, 116(23):6683\u20136693, 2012. doi:10.1021\/jp211794u.[132] Peter L. Privalov. Microcalorimetry of Macromolecules: The PhysicalBasis of Biological Structures. John Wiley & Sons, Inc., New York,2012.[133] Lucas N. R. Wafer, Werner W. Streicher, and George I. Makhatadze.Thermodynamics of the trp-cage miniprotein unfolding in urea.Proteins: Structure, Function, and Bioinformatics, 78(6):1376\u20131381,2010. ISSN 1097-0134. doi: 10.1002\/prot.22681.[134] Andrew T. Fenley, Hari S. Muddana, and Michael K. Gilson.Entropyenthalpy transduction caused by conformational shifts canobscure the forces driving proteinligand binding. Proceedings of theNational Academy of Sciences, 109(49):20006\u201320011, 2012. doi:10.1073\/pnas.1213180109. URLhttp:\/\/www.pnas.org\/content\/109\/49\/20006.abstract.[135] Wayne J. Becktel and John A. Schellman. Protein stability curves.Biopolymers, 26:1859\u20131877, 1987.[136] Patrick L Wintrode, George I Makhatadze, and Peter L Privalov.Thermodynamics of ubiquitin unfolding. Proteins: Structure,Function, and Bioinformatics, 18(3):246\u2013253, 1994.136[137] GI Makhatadze and PL Privalov. Heat capacity of proteins: I. partialmolar heat capacity of individual amino acid residues in aqueoussolution: hydration effect. J. Mol. Biol., 213(2):375\u2013384, 1990.[138] Mark E. Zweifel and Doug Barrick. Relationships between thetemperature dependence of solvent denaturation and the denaturantdependence of protein stability curves. Biophysical Chemistry,101-102(0):221 \u2013 237, 2002. doi:http:\/\/dx.doi.org\/10.1016\/S0301-4622(02)00181-3. URLhttp:\/\/www.sciencedirect.com\/science\/article\/pii\/S0301462202001813.[139] Eric M. Nicholson and J. Martin Scholtz. Conformational stability ofthe escherichia coli hpr protein: test of the linear extrapolationmethod and a thermodynamic characterization of cold denaturation.Biochemistry, 35(35):11369\u201311378, 1996. doi: 10.1021\/bi960863y.[140] James U. Bowie and Robert T. Sauer. Equilibrium dissociation andunfolding of the arc repressor dimer. Biochemistry, 28(18):7139\u20137143, 1989. doi: 10.1021\/bi00444a001. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/bi00444a001.[141] Fan-Guo Meng, Yuan-Kai Hong, Hua-Wei He, Arkadii E. Lyubarev,Boris I. Kurganov, Yong-Bin Yan, and Hai-Meng Zhou. Osmophobiceffect of glycerol on irreversible thermal denaturation of rabbitcreatine kinase. Biophysical Journal, 87(4):2247 \u2013 2254, 2004. ISSN0006-3495. doi: http:\/\/dx.doi.org\/10.1529\/biophysj.104.044784.URL http:\/\/www.sciencedirect.com\/science\/article\/pii\/S0006349504737029.[142] Natas\u02c7a Poklar, Nina Petrovc\u02c7ic\u02c7, Miha Oblak, and Gorazd Vesnaver.Thermodynamic stability of ribonuclease a in alkylurea solutions andpreferential solvation changes accompanying its thermaldenaturation: A calorimetric and spectroscopic study. ProteinScience, 8(4):832\u2013840, 1999. ISSN 1469-896X. doi:10.1110\/ps.8.4.832. URL http:\/\/dx.doi.org\/10.1110\/ps.8.4.832.[143] Alan Fersht. Structure and Mechanism in Protein Science: A Guideto Enzyme Catalysis and Protein Folding. W H Freeman & Co, 3rdedition, 1998.[144] Edward P. O\u2019Brien, Guy Ziv, Gilad Haran, Bernard R. Brooks, andD. Thirumalai. Effects of denaturants and osmolytes on proteins are137accurately predicted by the molecular transfer model. Proc. Natl.Acad. Sci. U. S. A., 105(36):13403\u201313408, 2008. doi:10.1073\/pnas.0802113105.[145] M. Auton and D. W. Bolen. Predicting the energetics ofosmolyte-induced protein folding\/unfolding. Proc. Natl. Acad. Sci.U. S. A., 102(42):15065\u201315068, 2005.[146] Jai K. Kaushik and Rajiv Bhat. Why is trehalose an exceptionalprotein stabilizer?: An analysis of the thermal stability of proteins inthe presence of the compatible osmolyte trehalose. Journal ofBiological Chemistry, 278(29):26458\u201326465, 2003. doi:10.1074\/jbc.M300815200. URLhttp:\/\/www.jbc.org\/content\/278\/29\/26458.abstract.[147] Rajendrakumar Singh, Inamul Haque, and Faizan Ahmad.Counteracting osmolyte trimethylamine n-oxide destabilizes proteinsat ph below its pka: Measurements of thermodynamic parameters ofproteins in the presence and absence of trimethylamine n-oxide.Journal of Biological Chemistry, 280(12):11035\u201311042, 2005. doi:10.1074\/jbc.M410716200. URLhttp:\/\/www.jbc.org\/content\/280\/12\/11035.abstract.[148] Fabrizio Chiti, Nico A. J. van Nuland, Niccolo\u00b4 Taddei, FrancescaMagherini, Massimo Stefani, Giampietro Ramponi, andChristopher M. Dobson. Conformational stability of muscleacylphosphatase: the role of temperature, denaturant concentration,and ph. Biochemistry, 37(5):1447\u20131455, 1998. doi:10.1021\/bi971692f. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/bi971692f.[149] Vishwas R. Agashe and Jayant B. Udgaonkar. Thermodynamics ofdenaturation of barstar: Evidence for cold denaturation andevaluation of the interaction with guanidine hydrochloride.Biochemistry, 34(10):3286\u20133299, 1995. doi: 10.1021\/bi00010a019.URL http:\/\/pubs.acs.org\/doi\/abs\/10.1021\/bi00010a019.[150] S. Padmanabhan, D. V. Laurents, A. M. Ferna\u00b4ndez,M. Elias-Arnanz, J. Ruiz-Sanz, P. L. Mateo, M. Rico, and V. V.Filimonov. Thermodynamic analysis of the structural stability ofphage 434 cro protein. Biochemistry, 38(47):15536\u201315547, 1999. doi:13810.1021\/bi991757. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/bi991757.[151] T. Kaminyama, Y. Sadahide, Y. Nogusa, and K. Gekko. Biochim.Biophys. Acta, 1434:44\u201357, 1999.[152] James W. Bryson, John R. Desjarlais, Tracy M. Handel, andWilliam F. Degrado. From coiled coils to small globular proteins:Design of a native-like three-helix bundle. Protein Science, 7(6):1404\u20131414, 1998. ISSN 1469-896X. doi: 10.1002\/pro.5560070617.URL http:\/\/dx.doi.org\/10.1002\/pro.5560070617.[153] F. Catanzano, A. Gambuti, G. Graziano, and G. Barone. Interactionwith d-glucose and thermal denaturation of yeast hexokinase b: Adsc study. Journal of Biochemistry, 121(3):568\u2013577, 1997. URLhttp:\/\/jb.oxfordjournals.org\/content\/121\/3\/568.abstract.[154] Nisar Ahmad, V. R. Srinivas, G. Bhanuprakash Reddy, andAvadhesha Surolia. Thermodynamic characterization of theconformational stability of the homodimeric protein, pea lectin.Biochemistry, 37(47):16765\u201316772, 1998. doi: 10.1021\/bi9811720.URL http:\/\/pubs.acs.org\/doi\/abs\/10.1021\/bi9811720.[155] E. L. Kovrigin and S. A. Potekhin. Biofizika, 41:1201\u20131206, 1996.[156] S. Knapp, Rudolf Ladenstein, and Erwin A. Galinski. Extrinsicprotein stabilization by the naturally occurring osmolytes\u03b2-hydroxyectoine and betaine. Extremophiles, 3(3):191\u2013198, 1999.ISSN 1431-0651. doi: 10.1007\/s007920050116. URLhttp:\/\/dx.doi.org\/10.1007\/s007920050116.[157] Marcelo M. Santoro, Yufeng Liu, Saber M. A. Khan, Li Xiang Hou,and D. W. Bolen. Increased thermal stability of proteins in thepresence of naturally occurring osmolytes. Biochemistry, 31(23):5278\u20135283, 1992. doi: 10.1021\/bi00138a006. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/bi00138a006.[158] Su Xu, Sanbo Qin, and Xian-Ming Pan. Thermal and conformationalstability of ssh10b protein from archaeon sulfolobus shibattae.Biochem J, 382(Pt 2):433\u2013440, 2004.[159] Yaqiang Wang, Mohona Sarkar, Austin E. Smith, Alexander S.Krois, and Gary J. Pielak. Macromolecular crowding and protein139stability. Journal of the American Chemical Society, 134(40):16614\u201316618, 2012. doi: 10.1021\/ja305300m. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/ja305300m.[160] George I. Makhatadze, Marin M. Lopez, John M. Richardson III,and Susan T. Thomas. Anion binding to the ubiquitin molecule.Protein Science, 7(3):689\u2013697, 1998. ISSN 1469-896X. doi:10.1002\/pro.5560070318. URLhttp:\/\/dx.doi.org\/10.1002\/pro.5560070318.[161] Michael Senske, Lisa Trk, Benjamin Born, Martina Havenith,Christian Herrmann, and Simon Ebbinghaus. Protein stabilizationby macromolecular crowding through enthalpy rather than entropy.Journal of the American Chemical Society, 136(25):9036\u20139041, 2014.[162] A Ben-Naim. Hydrophobic interaction and structural changes in thesolvent. Biopolymers, 14(7):1337\u20131355, 1975.[163] Ernest Grunwald. Thermodynamic properties, propensity laws, andsolvent models in solutions in self-associating solvents. application toaqueous alcohol solutions. Journal of the American ChemicalSociety, 106(19):5414\u20135420, 1984. doi: 10.1021\/ja00331a006.[164] HsiangAi Yu and Martin Karplus. A thermodynamic analysis ofsolvation. The Journal of Chemical Physics, 89(4):2366\u20132379, 1988.doi: http:\/\/dx.doi.org\/10.1063\/1.455080.[165] B Lee. Enthalpy-entropy compensation in the thermodynamics ofhydrophobicity. Biophysical chemistry, 51(2):271\u2013278, 1994.[166] Eric A. Mills and Steven S. Plotkin. Density functional theory forprotein transfer free energy. The Journal of Physical Chemistry B,117:13278\u201313290, 2013. doi: 10.1021\/jp403600q.[167] Daniel Borgis, Lionel Gendre, and Rosa Ramirez. Molecular densityfunctional theory: Application to solvation and electron-transferthermodynamics in polar solvents. J. Phys. Chem. B, 116(8):2504\u20132512, 2012. doi: 10.1021\/jp210817s.[168] Charles. Tanford. Isothermal unfolding of globular proteins inaqueous urea solutions. J. Am. Chem. Soc., 86(10):2050\u20132059, 1964.doi: 10.1021\/ja01064a028.140[169] Jonathan E. Kohn, Ian S. Millett, Jaby Jacob, Bojan Zagrovic,Thomas M. Dillon, Nikolina Cingel, Robin S. Dothager, SoenkeSeifert, P. Thiyagarajan, Tobin R. Sosnick, M. Zahid Hasan, Vijay S.Pande, Ingo Ruczinski, Sebastian Doniach, and Kevin W. Plaxco.Random-coil behavior and the dimensions of chemically unfoldedproteins. Proc. Natl. Acad. Sci. U. S. A., 101(34):12491\u201312496, 2004.[170] Jeffrey K. Noel, Paul C. Whitford, and Jose\u00b4 N. Onuchic. The shadowmap: A general contact definition for capturing the dynamics ofbiomolecular folding and function. J. Phys. Chem. B, 116(29):8692\u20138702, 2012. doi: 10.1021\/jp300852d. URLhttp:\/\/pubs.acs.org\/doi\/abs\/10.1021\/jp300852d.[171] Jeffry K. Noel, Paul C. Whitford, Karissa Y. Sanbonmatsu, andJose\u00b4 N. ONuchi. Smog@ctbp: simplified deployment of structurebased models in gromacs. Nucleic Acids Res., 38:W657\u2013661, 2010.[172] C. Clementi, H. Nymeyer, and J. N. Onuchic. Topological andenergetic factors: what determines the structural details of thetransition state ensemble and \u201cen-route\u201d intermediates for proteinfolding? An investigation for small globular proteins. J. Mol. Biol.,298:937\u2013953, 2000.[173] G. J. Kleywegt and T. A. Jones. Detection, delineation,measurement and display of cavities in macromolecular structures.Acta Crystallogr., Sect. D: Biol. Crystallogr., 50(2):178\u2013185, Mar1994. doi: 10.1107\/S0907444993011333. URLhttp:\/\/dx.doi.org\/10.1107\/S0907444993011333.[174] Kristine M. Kast, Ju\u00a8rgen Brickmann, Stefan M. Kast, andR. Stephen Berry. Binary phases of aliphatic N-oxides and water:Force field development and molecular dynamics simulation. J.Phys. Chem. A, 107(27):5342\u20135351, 2003. doi: 10.1021\/jp027336a.URL http:\/\/pubs.acs.org\/doi\/abs\/10.1021\/jp027336a.[175] John A. Schellman. The thermodynamics of solvent exchange.Biopolymers, 34(8):1015\u20131026, 1994. doi: 10.1002\/bip.360340805.[176] J. K. Myers, C. N. Pace, and J. M. Scholtz. Denaturant m valuesand heat capacity changes: Relation to changes in accessible surfaceareas of protein unfolding. Protein Sci., 4:2138\u20132148, 1995.141[177] Darwin O. V. Alonso and Ken A. Dill. Solvent denaturation andstabilization of globular proteins. Biochemistry, 30:5974\u20135985, 1991.[178] T. L. Hill. An Introduction to Statistical Thermodynamics. CourierDover Publications, New York, 1960.[179] Norman F Carnahan and Kenneth E Starling. Equation of state fornonattracting rigid spheres. J. Chem. Phys., 51:635, 1969.[180] Michael Plischke and Birger Bergersen. Statistical Physics. WorldScientific, Singapore, 3rd edition, 2006.[181] R. O. Jones and O. Gunnarsson. The density functional formalism,its applications and prospects. Rev. Mod. Phys., 61:689\u2013746, Jul1989. doi: 10.1103\/RevModPhys.61.689. URLhttp:\/\/link.aps.org\/doi\/10.1103\/RevModPhys.61.689.[182] Joseph T. Slusher. Accurate estimates of infinite-dilution chemicalpotentials of small hydrocarbons in water via molecular dynamicssimulation. J. Phys. Chem. B, 103(29):6075\u20136079, 1999. doi:10.1021\/jp990709w.[183] J. T. Wescott, L. R. Fisher, and S. Hanna. Use of thermodynamicintegration to calculate the hydration free energies of n-alkanes. J.Chem. Phys., 116(6):2361\u20132369, 2002. doi: 10.1063\/1.1431588.[184] Michael R. Shirts, Jed W. Pitera, William C. Swope, and Vijay S.Pande. Extremely precise free energy calculations of amino acid sidechain analogs: Comparison of common molecular mechanics forcefields for proteins. J. Chem. Phys., 119(11):5740\u20135761, 2003. doi:10.1063\/1.1587119.[185] A A Zamyatnin. Amino acid, peptide, and protein volume insolution. Annu. Rev. Biophys. Bioeng., 13(1):145\u2013165, 1984. doi:10.1146\/annurev.bb.13.060184.001045.[186] Matthew Auton and D. Wayne Bolen. Additive transfer free energiesof the peptide backbone unit that are independent of the modelcompound and the choice of concentration scale. Biochemistry, 43(5):1329\u20131342, 2004. doi: 10.1021\/bi035908r.[187] Rosa Ramirez and Daniel Borgis. Density functional theory ofsolvation and its relation to implicit solvent models. J. Phys. Chem.B, 109(14):6754\u20136763, 2005. doi: 10.1021\/jp045453v.142[188] Srabani Roy and Biman Bagchi. Solvation dynamics in liquid water.a novel interplay between librational and diffusive modes. J. Chem.Phys., 99(12):9938\u20139943, 1993. doi: 10.1063\/1.465392.[189] A. Chandra and B. Bagchi. Molecular theory of solvation andsolvation dynamics in a binary dipolar liquid. J. Chem. Phys., 94(12):8367\u20138377, 1991. doi: 10.1063\/1.460068.[190] Akira Yoshimori, Tyler J. F. Day, and G. N. Patey. An investigationof dynamical density functional theory for solvation in simplemixtures. J. Chem. Phys., 108(15):6378\u20136386, 1998. doi:10.1063\/1.476044.[191] B. Go\u00a8tzelmann, R. Evans, and S. Dietrich. Depletion forces in fluids.Phys. Rev. E, 57:6785\u20136800, Jun 1998. doi:10.1103\/PhysRevE.57.6785. URLhttp:\/\/link.aps.org\/doi\/10.1103\/PhysRevE.57.6785.[192] Fumio Oosawa and Sho Asakura. Surface tension of high-polymersolutions. J. Chem. Phys., 22(7):1255\u20131255, 1954. doi:10.1063\/1.1740346. URLhttp:\/\/link.aip.org\/link\/?JCP\/22\/1255\/1.[193] Phil Attard. Spherically inhomogeneous fluids. ii. hard-sphere solutein a hard-sphere solvent. J. Chem. Phys., 91(5):3083\u20133089, 1989.doi: 10.1063\/1.456931. URLhttp:\/\/link.aip.org\/link\/?JCP\/91\/3083\/1.[194] Phil Attard and G. N. Patey. Hypernetted-chain closure with bridgediagrams. asymmetric hard sphere mixtures. J. Chem. Phys., 92(8):4970\u20134982, 1990. doi: 10.1063\/1.458556. URLhttp:\/\/link.aip.org\/link\/?JCP\/92\/4970\/1.[195] H. N. W. Lekkerkerker, W. C. K. Poon, P. N. Pusey, A. Stroobants,and P. B. Warren. Phase-behavior of colloid plus polymer mixtures.Europhys. Lett., 20(6):559\u2013564, 1992.[196] Ronald Dickman, Phil Attard, and Veronika Simonian. Entropicforces in binary hard sphere mixtures: Theory and simulation. J.Chem. Phys., 107(1):205\u2013213, 1997. doi: 10.1063\/1.474367. URLhttp:\/\/link.aip.org\/link\/?JCP\/107\/205\/1.143[197] Jian-Min Yuan, Chia-Lin Chyan, Huan-Xiang Zhou, Tse-Yu Chung,Haibo Peng, Guanghui Ping, and Guoliang Yang. The effects ofmacromolecular crowding on the mechanical stability of proteinmolecules. Protein Sci., 17(12):2156\u20132166, 2008.[198] Rosa Ramirez, Michel Mareschal, and Daniel Borgis. Directcorrelation functions and the density functional theory of polarsolvents. Chemical Physics, 319:261\u2013272, 2005.[199] Thomas F. Coleman and Yuying Li. An interior trust regionapproach for nonlinear minimization subject to bounds. SIAMJournal on Optimization, 6(2):418\u2013445, 1996.[200] B. Lee and F.M. Richards. The interpretation of protein structures:Estimation of static accessibility. Journal of Molecular Biology, 55(3):379 \u2013 IN4, 1971. ISSN 0022-2836.[201] F Eisenhaber, P Lijnzaad, P Argos, C Sander, and M Scharf. Thedouble cubic lattice method: Efficient approaches to numericalintegration of surface area and volume and to dot surface contouringof molecular assemblies. Journal of Computational Chemistry, 16:273\u2013284, 1995.[202] R E Nettleton and M S Green. Expression in terms of moleculardistribution functions for the entropy density in an infinite system.Journal of Chemical Physics, 29:1365, 1958.[203] A Singer. Maximum entropy formulation of the kirkwoodsuperposition approximation. Journal of Chemical Physics, 121:3657,2004.[204] Assessing the performance of mm\/pbsa and mm\/gbsa methods. 3.the impact of force fields and ligand charge models. Journal ofPhysical Chemistry B, 117:8408\u20138421, 2013.[205] V Z Spassov, L Yan, and S Szalma. Introducing an implicitmembrane in generalized born\/solvent accessibility continuumsolvent models. Journal of Physical Chemistry B, 106:8726\u20138738,2002.[206] Kyle A. Beauchamp, Yu-Shan Lin, Rhiju Das, and Vijay S. Pande.Are protein force fields getting better? a systematic benchmark on524 diverse nmr measurements. Journal of Chemical Theory andComputation, 8(4):1409\u20131414, 2012.144[207] P Larsson and E Lindahl. A high-performance parallel-generalizedborn implementation enabled by tabulated interaction rescaling.Journal of Computational Chemistry, 31:2593\u20132600, 2010.[208] B Hess, C Kutzner, D van der Spoel, and E Lindahl. Gromacs 4:Algorithms for highly efficient, load-balanced, and scalable molecularsimulation. Journal of Chemical Theory and Computation, 4:435\u2013447, 2008.145Appendix AProof of Theorems in DFTIn this appendix we prove the Hohenberg-Kohn theorem. To start we defineTr... =\u221e\u2211N=01h3NN !\u222b \u222b...dr3Ndp3N (A.1)and\u039e = Tre\u2212\u03b2(H\u2212N\u00b5) (A.2)With these defined we can begin with the following:\u2126[f ] = Trf(H\u2212N\u00b5+ kBT ln f) (A.3)Lemma A.0.1 Let f0 be the equilibrium density. Then for any other den-sity f , \u2126[f ] > \u2126[f0].Proof: The equilibrium density can be expressed asfo =e\u2212\u03b2(H\u2212N\u00b5)\u039e(A.4)146So then\u2126[f0] = Tr(H\u2212N\u00b5\u2212 kBT ln \u039e\u2212H+N\u00b5) (A.5)= \u2212kBT ln \u039e (A.6)We can then write\u2126[f ]\u2212 \u2126[f0] = Trf(H\u2212N\u00b5+ kBT ln f + kBT ln \u039e) (A.7)Since H\u2212N\u00b5+ kBT ln \u039e = \u2212kBT ln f0,\u2126[f ]\u2212 \u2126[f0] = TrkBT (f ln f \u2212 f ln f0) (A.8)Since Trf = 1, we can write\u2126[f ]\u2212\u2126[f0] = TrkBT (f ln f\u2212f ln f0 +f0\u2212f) = Trf0kBT ( ff0lnff0+1\u2212 ff0)(A.9)Since x lnx \u2265 x \u2212 1 for all x, where the equality is only true when x = 1,\u2126[f ]\u2212 \u2126[f0] > 0, and hence Lemma A.0.1 is proved.We can now show the following theorem:Theorem A.0.2 The functional F [\u03c6] = Trf0(H \u2212 N\u00b5 + kBT ln f0) is aunique functional of \u03c6, the single particle density.To prove this we assume the opposite: that there are two densities f0and f \u20320, both of which are equilibrium densities for the given hamiltonian.Then, by Lemma A.0.1, we haveF [\u03c6\u2032] > F [\u03c6] (A.10)But, the choice of f \u20320 and f0 was arbitrary; the prime could have been oneither function. So equation A.10 cannot be true. Thus the density is aunique functional of the hamiltonian and hence of the external potential,proving theorem A.Finally, we have147Theorem A.0.3 The minimum value of F [\u03c6] is the free energy of the sys-tem and occurs when \u03c6 = \u03c60, the equilibrium densityThis follows because if F [\u03c6] at the equilibrium density is \u2126[f0] in equationA.6. Theorem shows that there cannot be another \u03c6 that satisfies thiscondition, and lemma A.0.1 shows that F is larger for any other \u03c6. Thustheorem A.0.3 is shown.148Appendix BSimulation ParametersIn this appendix we list the van der Waals parameters used in simulationsthroughout this thesis. The vdW interaction between two atoms i and j isgiven byVij = 4\u221a\u000fi\u000fj((\u03c3i + \u03c3j2rij)12\u2212(\u03c3i + \u03c3j2rij)6)(B.1)were \u03c3i and \u000fi are the van der Waals parameters for atom i (and likewisefor atom j) and rij is the distance between atoms i and j. The followingatom types are from the CHARMM forcefield.149Table B.1: van der Waals parameters for atoms used in simulations in thisthesis.Atom type \u03c3 (nm) \u000f (kJ\/mol)CA 0.355005321205 0.29288CC 0.356359487256 0.29288CP1 0.405358916754 0.08368CP2 0.387540942391 0.23012CP3 0.387540942391 0.23012CT1 0.405358916754 0.08368CT2 0.387540942391 0.23012CT3 0.367050271874 0.33472H 0.0400013524445 0.192464HA 0.235197261589 0.092048HB 0.235197261589 0.092048HP 0.242003727796 0.12552HR1 0.160361769265 0.192464HR2 0.12472582054 0.192464HR3 0.261567863646 0.0326352NC2 0.329632525712 0.8368NH1 0.329632525712 0.8368NH2 0.329632525712 0.8368NH3 0.329632525712 0.8368NPH 0.329632525712 0.8368NR1 0.329632525712 0.8368NR2 0.329632525712 0.8368NR3 0.329632525712 0.8368O 0.302905564168 0.50208OC 0.302905564168 0.50208S 0.356359487256 1.8828OWT3 0.315058 0.636386HWT3 0.0 0.0150","@language":"en"}],"Genre":[{"@value":"Thesis\/Dissertation","@language":"en"}],"GraduationDate":[{"@value":"2016-02","@language":"en"}],"IsShownAt":[{"@value":"10.14288\/1.0221256","@language":"en"}],"Language":[{"@value":"eng","@language":"en"}],"Program":[{"@value":"Physics","@language":"en"}],"Provider":[{"@value":"Vancouver : University of British Columbia Library","@language":"en"}],"Publisher":[{"@value":"University of British Columbia","@language":"en"}],"Rights":[{"@value":"Attribution-NonCommercial-NoDerivs 2.5 Canada","@language":"*"}],"RightsURI":[{"@value":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/2.5\/ca\/","@language":"*"}],"ScholarlyLevel":[{"@value":"Graduate","@language":"en"}],"Title":[{"@value":"Protein-solvent interactions and classical density functional theory","@language":"en"}],"Type":[{"@value":"Text","@language":"en"}],"URI":[{"@value":"http:\/\/hdl.handle.net\/2429\/55761","@language":"en"}],"SortDate":[{"@value":"2015-12-31 AD","@language":"en"}],"@id":"doi:10.14288\/1.0221256"}